Development Tip

C ++ 최적화 프로그램이 clock ()에 대한 호출을 재정렬하는 것이 합법적입니까?

yourdevel 2020. 11. 15. 11:52
반응형

C ++ 최적화 프로그램이 clock ()에 대한 호출을 재정렬하는 것이 합법적입니까?


The C ++ Programming Language 4th edition, page 225 읽기 : 컴파일러는 결과가 단순한 실행 순서와 동일하다면 성능을 향상시키기 위해 코드를 재정렬 할 수 있습니다 . 릴리스 모드의 Visual C ++와 같은 일부 컴파일러는이 코드를 재정렬합니다.

#include <time.h>
...
auto t0 = clock();
auto r  = veryLongComputation();
auto t1 = clock();

std::cout << r << "  time: " << t1-t0 << endl;

이 형식으로 :

auto t0 = clock();
auto t1 = clock();
auto r  = veryLongComputation();

std::cout << r << "  time: " << t1-t0 << endl;

이는 원래 코드와 다른 결과를 보장합니다 (0 대보고 된 시간보다 큼). 자세한 예는 다른 질문참조하십시오 . 이 동작이 C ++ 표준을 준수합니까?


컴파일러는 두 clock호출을 교환 할 수 없습니다 . t1뒤에 설정해야합니다 t0. 두 호출 모두 관찰 가능한 부작용입니다. 컴파일러는 관찰이 추상 기계의 가능한 관찰과 일치하는 한 관찰 가능한 효과 사이에, 심지어 관찰 가능한 부작용에 대해서도 순서를 변경할 수 있습니다.

C ++ 추상 기계는 공식적으로 유한 속도로 제한되지 않으므로 veryLongComputation()제로 시간에 실행될 수 있습니다. 실행 시간 자체는 관찰 가능한 효과로 정의되지 않습니다. 실제 구현이 일치 할 수 있습니다.

이 답변의 대부분은 컴파일러에 제한을 부과 하지 않는 C ++ 표준에 따라 다릅니다 .


음, 다음 Subclause 5.1.2.3 of the C Standard [ISO/IEC 9899:2011]과 같은 내용이 있습니다.

추상 기계에서 모든 표현식은 의미 체계에 지정된대로 평가됩니다. 실제 구현은 해당 값이 사용되지 않고 필요한 부작용이 발생하지 않는다고 추론 할 수있는 경우 표현식의 일부를 평가할 필요가 없습니다 (함수를 호출하거나 휘발성 개체에 액세스하여 발생하는 결과 포함).

그러므로 나는 이 행동 ( 당신이 설명한 행동)이 표준을 준수 한다고 정말로 의심합니다 .

게다가-재구성은 실제로 계산 결과에 영향을 미칩니다.하지만 컴파일러 관점에서 보면- int main()세계에 있고 시간 측정을 할 때-들여다보고 커널에 현재 시간을 요청하고 진행합니다. 외부 세계의 실제 시간이별로 중요하지 않은 메인 세계로 돌아갑니다. clock () 자체는 프로그램 및 변수에 영향을주지 않으며 프로그램 동작은 clock () 함수에 영향을주지 않습니다.

시계 값은 그들 사이의 차이를 계산하는 데 사용됩니다. 두 측정 사이에 무언가가 진행되고 있다면 컴파일러 관점에서 관련이 없습니다. 요청한 것은 클럭 차이 였고 측정 사이의 코드는 프로세스로서의 측정에 영향을 미치지 않기 때문입니다.

그러나 이것은 설명 된 행동이 매우 불쾌하다는 사실을 바꾸지는 않습니다.

부정확 한 측정은 불쾌하지만 훨씬 더 악화되고 위험해질 수 있습니다.

이 사이트 에서 가져온 다음 코드를 고려 하십시오 .

void GetData(char *MFAddr) {
    char pwd[64];
    if (GetPasswordFromUser(pwd, sizeof(pwd))) {
        if (ConnectToMainframe(MFAddr, pwd)) {
              // Interaction with mainframe
        }
    }
    memset(pwd, 0, sizeof(pwd));
}

정상적으로 컴파일되면 모든 것이 정상이지만 최적화가 적용되면 memset 호출이 최적화되어 심각한 보안 결함이 발생할 수 있습니다. 최적화 된 이유는 무엇입니까? 매우 간단합니다. 컴파일러는 다시 그 main()세계 에서 생각 하고 변수 pwd가 나중에 사용되지 않고 프로그램 자체에 영향을 미치지 않기 때문에 memset을 죽은 저장소로 간주합니다 .


예, 합법적입니다- 컴파일러가 호출 사이에 발생하는 코드 전체를 볼 수 있다면clock() .


veryLongComputation()내부적으로 불투명 한 함수 호출을 수행하는 경우 컴파일러가 부작용이의 부작용과 상호 교환 될 수 있음을 보장 할 수 없기 때문에 아니오입니다 clock().

그렇지 않으면 예, 상호 교환이 가능합니다.
이것은 시간이 일류 개체가 아닌 언어를 사용하여 지불하는 가격입니다.

Note that memory allocation (such as new) can fall in this category, as allocation function can be defined in a different translation unit and not compiled until the current translation unit is already compiled. So, if you merely allocate memory, the compiler is forced to treat the allocation and deallocation as worst-case barriers for everything -- clock(), memory barriers, and everything else -- unless it already has the code for the memory allocator and can prove that this is not necessary. In practice I don't think any compiler actually looks at the allocator code to try to prove this, so these types of function calls serve as barriers in practice.


At least by my reading, no, this is not allowed. The requirement from the standard is (§1.9/14):

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

The degree to which the compiler is free to reorder beyond that is defined by the "as-if" rule (§1.9/1):

This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

That leaves the question of whether the behavior in question (the output written by cout) is officially observable behavior. The short answer is that yes, it is (§1.9/8):

The least requirements on a conforming implementation are:
[...]
— At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.

At least as I read it, that means the calls to clock could be rearranged compared to the execution of your long computation if and only if it still produced identical output to executing the calls in order.

If, however, you wanted to take extra steps to ensure correct behavior, you could take advantage of one other provision (also §1.9/8):

— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

To take advantage of this, you'd modify your code slightly to become something like:

auto volatile t0 = clock();
auto volatile r  = veryLongComputation();
auto volatile t1 = clock();

Now, instead of having to base the conclusion on three separate sections of the standard, and still having only a fairly certain answer, we can look at exactly one sentence, and have an absolutely certain answer--with this code, re-ordering uses of clock vs., the long computation is clearly prohibited.


Let's suppose that the sequence is in a loop, and the veryLongComputation () randomly throws an exception. Then how many t0s and t1s will be calculated? Does it pre-calculate the random variables and reorder based on the precalculation - sometimes reordering and sometimes not?

Is the compiler smart enough to know that just a memory read is a read from shared memory. The read is a measure of how far the control rods have moved in a nuclear reactor. The clock calls are used to control the speed at which they are moved.

Or maybe the timing is controlling the grinding of a Hubble telescope mirror. LOL

Moving clock calls around seems too dangerous to leave to the decisions of compiler writers. So if it is legal, perhaps the standard is flawed.

IMO.


It is certainly not allowed, since it changes, as you have noted, the observeable behavior (different output) of the program (I won't go into the hypothetical case that veryLongComputation() might not consume any measurable time -- given the function's name, is presumably not the case. But even if that was the case, it wouldn't really matter). You wouldn't expect that it is allowable to reorder fopen and fwrite, would you.

Both t0 and t1 are used in outputting t1-t0. Therefore, the initializer expressions for both t0 and t1 must be executed, and doing so must follow all standard rules. The result of the function is used, so it is not possible to optimize out the function call, though it doesn't directly depend on t1 or vice versa, so one might naively be inclined to think that it's legal to move it around, why not. Maybe after the initialization of t1, which doesn't depend on the calculation?
Indirectly, however, the result of t1 does of course depend on side effects by veryLongComputation() (notably the computation taking time, if nothing else), which is exactly one of the reasons that there exist such a thing as "sequence point".

There are three "end of expression" sequence points (plus three "end of function" and "end of initializer" SPs), and at every sequence point it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
There is no way you can keep this promise if you move around the three statements, since the possible side effects of all functions called are not known. The compiler is only allowed to optimize if it can guarantee that it will keep the promise up. It can't, since the library functions are opaque, their code isn't available (nor is the code within veryLongComputation, necessarily known in that translation unit).

Compilers do however sometimes have "special knowledge" about library functions, such as some functions will not return or may return twice (think exit or setjmp).
However, since every non-empty, non-trivial function (and veryLongComputation is quite non-trivial from its name) will consume time, a compiler having "special knowledge" about the otherwise opaque clock library function would in fact have to be explicitly disallowed from reordering calls around this one, knowing that doing so not only may, but will affect the results.

Now the interesting question is why does the compiler do this anyway? I can think of two possibilities. Maybe your code triggers a "looks like benchmark" heuristic and the compiler is trying to cheat, who knows. It wouldn't be the first time (think SPEC2000/179.art, or SunSpider for two historic examples). The other possibility would be that somewhere inside veryLongComputation(), you inadvertedly invoke undefined behavior. In that case, the compiler's behavior would even be legal.

참고URL : https://stackoverflow.com/questions/26190364/is-it-legal-for-a-c-optimizer-to-reorder-calls-to-clock

반응형