This is an archive of the discontinued LLVM Phabricator instance.

Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr().
ClosedPublic

Authored by thakis on Aug 20 2018, 10:28 AM.

Details

Summary

EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that work again.

This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice previously. This change fixes that bug.

Diff Detail

Event Timeline

thakis created this revision.Aug 20 2018, 10:28 AM
rsmith added a subscriber: rsmith.Aug 20 2018, 11:33 AM

I don't think this is NFC. Testcase:

long long int a, b, c, d;
unsigned char f() { return _InterlockedCompareExchange128(&(++a), ++b, ++c, &(++d)); }

Today, Clang increments a, b, c, and d twice each in f().

I don't think this is NFC. Testcase:

long long int a, b, c, d;
unsigned char f() { return _InterlockedCompareExchange128(&(++a), ++b, ++c, &(++d)); }

Today, Clang increments a, b, c, and d twice each in f().

Thanks for pointing this out, good to hear that this even happens to find a bug :-) I'll add some tests to document the progression.

I don't think this is NFC. Testcase:

long long int a, b, c, d;
unsigned char f() { return _InterlockedCompareExchange128(&(++a), ++b, ++c, &(++d)); }

Today, Clang increments a, b, c, and d twice each in f().

Thanks for pointing this out, good to hear that this even happens to find a bug :-) I'll add some tests to document the progression.

*happens to _fix_ a bug

EmitAArch64BuiltinExpr() also emits args into Ops before the big switch (with some more subtlety around the last arg that I don't understand), but then almost every switch case does EmitScalarExpr(E->getArg(n)).

Took me a while to remember, but I can at least explain this bit now I think. The key is that there are two kinds of NEON intrinsics. Overloaded ones have a constant last argument that describes the real type, and non-overloaded ones use that last argument as a normal parameter.

The first massive switch you see handles the ones in the second case, so it always CodeGens the last parameter, but if you scroll all the way down to line 7369 there's another switch where only the pregenerated Ops are used, and this last arg is visible as Ty and/or VTy.

It may or may not be the best way to handle that situation, of course.

thakis updated this revision to Diff 161824.Aug 21 2018, 2:41 PM
thakis edited the summary of this revision. (Show Details)

Add tests.

hans accepted this revision.Aug 21 2018, 2:56 PM
hans added a subscriber: hans.

lgtm

This revision is now accepted and ready to land.Aug 21 2018, 2:56 PM
thakis closed this revision.Aug 21 2018, 3:20 PM

r340348, thanks!

rnk added inline comments.Nov 24 2020, 1:02 PM
clang/lib/CodeGen/CGBuiltin.cpp
10471

I noticed that EmitMSVCBuiltinExpr evaluates args again, so each one of these intrinsic implementations has this same bug. :(

I'll put together a fix.

It's a bit not straightforward because AArch64 doesn't pre-evaluate all the builtin arguments like x86, so we need an adapter from one style to the other.