GCC가 "x && (x & 4242)"의 논리 비트 AND 쌍을 "x & 4242"로 최적화 할 수없는 이유는 무엇입니까?
다음은 정확히 동일한 작업을 수행한다고 주장하는 두 가지 기능입니다.
bool fast(int x)
{
return x & 4242;
}
bool slow(int x)
{
return x && (x & 4242);
}
논리적으로 그들은 똑같은 일을하고, 100 % 확신하기 위해 두 가지 모두를 통해 가능한 입력 40 억을 모두 실행하는 테스트를 작성했고 일치했습니다. 그러나 어셈블리 코드는 다른 이야기입니다.
fast:
andl $4242, %edi
setne %al
ret
slow:
xorl %eax, %eax
testl %edi, %edi
je .L3
andl $4242, %edi
setne %al
.L3:
rep
ret
GCC가 중복 테스트를 제거하는 논리의 도약을 할 수 없다는 사실에 놀랐습니다. -O2, -O3 및 -Os와 함께 g ++ 4.4.3 및 4.7.2를 시도했는데 모두 동일한 코드를 생성했습니다. 플랫폼은 Linux x86_64입니다.
GCC가 두 경우 모두 동일한 코드를 생성 할만큼 똑똑하지 않아야하는 이유를 누군가 설명 할 수 있습니까? 또한 다른 컴파일러가 더 잘할 수 있는지 알고 싶습니다.
테스트 하네스를 추가하려면 편집하십시오.
#include <cstdlib>
#include <vector>
using namespace std;
int main(int argc, char* argv[])
{
// make vector filled with numbers starting from argv[1]
int seed = atoi(argv[1]);
vector<int> v(100000);
for (int j = 0; j < 100000; ++j)
v[j] = j + seed;
// count how many times the function returns true
int result = 0;
for (int j = 0; j < 100000; ++j)
for (int i : v)
result += slow(i); // or fast(i), try both
return result;
}
위의 내용은 -O3를 사용하여 Mac OS에서 clang 5.1로 테스트했습니다. 을 사용하면 2.9 초 fast()
, slow()
. 대신 모두 0으로 구성된 벡터를 사용하면 두 함수 간의 성능에 큰 차이가 없습니다.
이것이 옵티마이 저의 결함이며 아마도 완전한 버그로 보이는 것이 맞습니다.
중히 여기다:
bool slow(int x)
{
return x && (x & 4242);
}
bool slow2(int x)
{
return (x & 4242) && x;
}
GCC 4.8.1 (-O3)에 의해 방출 된 어셈블리 :
slow:
xorl %eax, %eax
testl %edi, %edi
je .L2
andl $4242, %edi
setne %al
.L2:
rep ret
slow2:
andl $4242, %edi
setne %al
ret
즉, slow2
이름이 잘못되었습니다.
나는 가끔 GCC에 패치를 기고했기 때문에 내 관점이 어떤 무게를 지니는지 여부는 논란의 여지가 있습니다 :-). 그러나 내 생각에 GCC가 이들 중 하나를 최적화하고 다른 하나는 최적화하지 않는 것이 이상합니다. 버그 보고서를 제출하는 것이 좋습니다 .
[최신 정보]
놀랍게도 작은 변화가 큰 차이를 만드는 것처럼 보입니다. 예를 들면 :
bool slow3(int x)
{
int y = x & 4242;
return y && x;
}
...generates "slow" code again. I have no hypothesis for this behavior.
You can experiment with all of these on multiple compilers here.
Exactly why should it be able to optimize the code? You're assuming that any transformation that works will be done. That's not at all how optimizers work. They're not Artificial Intelligences. They simply work by parametrically replacing known patterns. E.g. the "Common Subexpression Elimination" scans an expression for common subexpressions, and moves them forwards, if that does not change side effects.
(BTW, CSE shows that optimizers are already quite aware of what code movement is allowed in the possible presence of side effects. They know that you have to be careful with &&
. Whether expr && expr
can be CSE-optimized or not depends on the side effects of expr
.)
So, in summary: which pattern do you think applies here?
This is how your code looks in ARM which should make slow
run faster when input it 0.
fast(int):
movw r3, #4242
and r3, r0, r3
adds r0, r3, #0
movne r0, #1
bx lr
slow(int):
cmp r0, #0
bxeq lr
movw r3, #4242
and r3, r0, r3
adds r0, r3, #0
movne r0, #1
bx lr
However GCC would optimize very nicely when you start using such trivial functions anyway.
bool foo() {
return fast(4242) && slow(42);
}
becomes
foo():
mov r0, #1
bx lr
My point is sometimes such code requires more context to be optimized further so why would implementers of optimizers (improvers!) should bother?
Another example:
bool bar(int c) {
if (fast(c))
return slow(c);
}
becomes
bar(int):
movw r3, #4242
and r3, r0, r3
cmp r3, #0
movne r0, #1
bxne lr
bx lr
To perform this optimization, one needs to study the expression for two distinct cases: x == 0
, simplifying to false
, and x != 0
, simplifying to x & 4242
. And then be smart enough to see that the value of the second expression also yields the correct value even for x == 0
.
Let us imagine that the compiler performs a case study and finds simplifications.
If x != 0
, the expression simplifies to x & 4242
.
If x == 0
, the expression simplifies to false
.
After simplification, we obtain two completely unrelated expressions. To reconcile them, the compiler should ask unnatural questions:
If x != 0
, can false
be used instead of x & 4242
anyway ? [No]
If x == 0
, can x & 4242
be used instead of false
anyway ? [Yes]
The last compiler I worked on did not do these sorts of optimizations. Writing an optimizer to take advantage of optimizations related to combining binary and logical operators will not speed up the applications. The main reason for this is that people do not use binary operators like that very often. Many people don't feel comfortable with binary operators and those that do will typically not write useless operations that need to be optimized.
If I go to the trouble of writing
return (x & 4242)
and I understand what that means why would I bother with the extra step. For the same reason i would not write this suboptimal code
if (x==0) return false;
if (x==1) return true;
if (x==0xFFFEFD6) return false;
if (x==4242) return true;
return (x & 4242)
There is just better use of compiler dev's time than to optimize stuff that makes no difference. There are just so many bigger fish to fry in compiler optimization.
It is mildly interesting to note that this optimisation is not valid on all machines. Specifically if you run on a machine which uses the one's complement representation of negative numbers then:
-0 & 4242 == true
-0 && ( -0 & 4242 ) == false
GCC has never supported such representations, but they are allowed for by the C standard.
C places fewer restrictions on the behavior of signed integral types then unsigned integral types. Negative values in particular can legally do strange things with bit operations. If any possible arguments to the bit operation have legally unconstrained behavior, the compiler can't remove them.
For example, "x/y==1 or true" might crash the program if you divide by zero, so the compiler can't ignore the evaluation of the division. Negative signed values and bit operations never actually do things like that on any common system, but I'm not sure the language definition rules it out.
You should try the code with unsigned ints and see if that helps. If it does you'll know it's an issue with the types and not the expression.
'Development Tip' 카테고리의 다른 글
커밋 내역과 함께 파일 및 디렉토리를 하위 디렉토리로 이동 (0) | 2020.11.14 |
---|---|
Node.js-최대 호출 스택 크기 초과 (0) | 2020.11.14 |
특정 연령보다 오래된 모든 Linux 프로세스를 어떻게 종료합니까? (0) | 2020.11.14 |
Java Swing GUI에 대한 자동화 된 테스트 (0) | 2020.11.14 |
if 문에서 조건 평가 순서에 의존하는 것이 안전합니까? (0) | 2020.11.14 |