This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/CodeGen/
-
lib/
-
CodeGen/
1
CGBuiltin.cpp

Differential D50979

Eliminate instances of `EmitScalarExpr(E->getArg(n))` in EmitX86BuiltinExpr().
ClosedPublic

Authored by thakis on Aug 20 2018, 10:28 AM.

Download Raw Diff

Details

Reviewers

rnk
javed.absar
hans

Summary

EmitX86BuiltinExpr() emits all args into Ops at the beginning, so don't do that work again.

This changes behavior: If e.g. ++a was passed as an arg, we incremented a twice previously. This change fixes that bug.

Diff Detail

Event Timeline

thakis created this revision.Aug 20 2018, 10:28 AM

Herald added a reviewer: javed.absar. · View Herald TranscriptAug 20 2018, 10:28 AM

Herald added a subscriber: kristof.beyls. · View Herald Transcript

I don't think this is NFC. Testcase:

long long int a, b, c, d;
unsigned char f() { return _InterlockedCompareExchange128(&(++a), ++b, ++c, &(++d)); }

Today, Clang increments a, b, c, and d twice each in f().

In D50979#1206211, @rsmith wrote:
I don't think this is NFC. Testcase:
long long int a, b, c, d;
unsigned char f() { return _InterlockedCompareExchange128(&(++a), ++b, ++c, &(++d)); }
Today, Clang increments a, b, c, and d twice each in f().

Thanks for pointing this out, good to hear that this even happens to find a bug :-) I'll add some tests to document the progression.

In D50979#1206282, @thakis wrote:
In D50979#1206211, @rsmith wrote:
I don't think this is NFC. Testcase:
long long int a, b, c, d;
unsigned char f() { return _InterlockedCompareExchange128(&(++a), ++b, ++c, &(++d)); }
Today, Clang increments a, b, c, and d twice each in f().
Thanks for pointing this out, good to hear that this even happens to find a bug :-) I'll add some tests to document the progression.

*happens to _fix_ a bug

EmitAArch64BuiltinExpr() also emits args into Ops before the big switch (with some more subtlety around the last arg that I don't understand), but then almost every switch case does EmitScalarExpr(E->getArg(n)).

Took me a while to remember, but I can at least explain this bit now I think. The key is that there are two kinds of NEON intrinsics. Overloaded ones have a constant last argument that describes the real type, and non-overloaded ones use that last argument as a normal parameter.

The first massive switch you see handles the ones in the second case, so it always CodeGens the last parameter, but if you scroll all the way down to line 7369 there's another switch where only the pregenerated Ops are used, and this last arg is visible as Ty and/or VTy.

It may or may not be the best way to handle that situation, of course.

Add tests.

lgtm

This revision is now accepted and ready to land.Aug 21 2018, 2:56 PM

r340348, thanks!

rnk added inline comments.Nov 24 2020, 1:02 PM

clang/lib/CodeGen/CGBuiltin.cpp
10471	I noticed that EmitMSVCBuiltinExpr evaluates args again, so each one of these intrinsic implementations has this same bug. :( I'll put together a fix. It's a bit not straightforward because AArch64 doesn't pre-evaluate all the builtin arguments like x86, so we need an adapter from one style to the other.

Herald added a subscriber: pengfei. · View Herald TranscriptNov 24 2020, 1:02 PM

rnk mentioned this in D92061: [MS] Fix double evaluation of MSVC builtin arguments.Nov 24 2020, 3:53 PM

rnk mentioned this in rG3bd067272671: [MS] Fix double evaluation of MSVC builtin arguments.Nov 25 2020, 11:57 AM

Revision Contents

Path

Size

clang/

lib/

CodeGen/

CGBuiltin.cpp

21 lines

Diff 161505

clang/lib/CodeGen/CGBuiltin.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 10,462 Lines • ▼ Show 20 Lines	case X86::BI_InterlockedExchangeSub64:
return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedExchangeSub, E);		return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedExchangeSub, E);
case X86::BI_InterlockedOr64:		case X86::BI_InterlockedOr64:
return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedOr, E);		return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedOr, E);
case X86::BI_InterlockedXor64:		case X86::BI_InterlockedXor64:
return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedXor, E);		return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedXor, E);
case X86::BI_InterlockedDecrement64:		case X86::BI_InterlockedDecrement64:
return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedDecrement, E);		return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedDecrement, E);
case X86::BI_InterlockedIncrement64:		case X86::BI_InterlockedIncrement64:
return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedIncrement, E);		return EmitMSVCBuiltinExpr(MSVCIntrin::_InterlockedIncrement, E);
		rnkUnsubmitted Not Done Reply Inline Actions I noticed that EmitMSVCBuiltinExpr evaluates args again, so each one of these intrinsic implementations has this same bug. :( I'll put together a fix. It's a bit not straightforward because AArch64 doesn't pre-evaluate all the builtin arguments like x86, so we need an adapter from one style to the other. rnk: I noticed that EmitMSVCBuiltinExpr evaluates args again, so each one of these intrinsic…
case X86::BI_InterlockedCompareExchange128: {		case X86::BI_InterlockedCompareExchange128: {
// InterlockedCompareExchange128 doesn't directly refer to 128bit ints,		// InterlockedCompareExchange128 doesn't directly refer to 128bit ints,
// instead it takes pointers to 64bit ints for Destination and		// instead it takes pointers to 64bit ints for Destination and
// ComparandResult, and exchange is taken as two 64bit ints (high & low).		// ComparandResult, and exchange is taken as two 64bit ints (high & low).
// The previous value is written to ComparandResult, and success is		// The previous value is written to ComparandResult, and success is
// returned.		// returned.

llvm::Type *Int128Ty = Builder.getInt128Ty();		llvm::Type *Int128Ty = Builder.getInt128Ty();
llvm::Type *Int128PtrTy = Int128Ty->getPointerTo();		llvm::Type *Int128PtrTy = Int128Ty->getPointerTo();

Value *Destination =		Value *Destination =
Builder.CreateBitCast(EmitScalarExpr(E->getArg(0)), Int128PtrTy);		Builder.CreateBitCast(Ops[0], Int128PtrTy);
Value *ExchangeHigh128 =		Value *ExchangeHigh128 = Builder.CreateZExt(Ops[1], Int128Ty);
Builder.CreateZExt(EmitScalarExpr(E->getArg(1)), Int128Ty);		Value *ExchangeLow128 = Builder.CreateZExt(Ops[2], Int128Ty);
Value *ExchangeLow128 =		Address ComparandResult(Builder.CreateBitCast(Ops[3], Int128PtrTy),
Builder.CreateZExt(EmitScalarExpr(E->getArg(2)), Int128Ty);
Address ComparandResult(
Builder.CreateBitCast(EmitScalarExpr(E->getArg(3)), Int128PtrTy),
getContext().toCharUnitsFromBits(128));		getContext().toCharUnitsFromBits(128));

Value *Exchange = Builder.CreateOr(		Value *Exchange = Builder.CreateOr(
Builder.CreateShl(ExchangeHigh128, 64, "", false, false),		Builder.CreateShl(ExchangeHigh128, 64, "", false, false),
ExchangeLow128);		ExchangeLow128);

Value *Comparand = Builder.CreateLoad(ComparandResult);		Value *Comparand = Builder.CreateLoad(ComparandResult);

AtomicCmpXchgInst *CXI =		AtomicCmpXchgInst *CXI =
Show All 34 Lines	case X86::BI__int2c: {
CS.setAttributes(NoReturnAttr);		CS.setAttributes(NoReturnAttr);
return CS.getInstruction();		return CS.getInstruction();
}		}
case X86::BI__readfsbyte:		case X86::BI__readfsbyte:
case X86::BI__readfsword:		case X86::BI__readfsword:
case X86::BI__readfsdword:		case X86::BI__readfsdword:
case X86::BI__readfsqword: {		case X86::BI__readfsqword: {
llvm::Type *IntTy = ConvertType(E->getType());		llvm::Type *IntTy = ConvertType(E->getType());
Value *Ptr = Builder.CreateIntToPtr(EmitScalarExpr(E->getArg(0)),		Value *Ptr =
llvm::PointerType::get(IntTy, 257));		Builder.CreateIntToPtr(Ops[0], llvm::PointerType::get(IntTy, 257));
LoadInst *Load = Builder.CreateAlignedLoad(		LoadInst *Load = Builder.CreateAlignedLoad(
IntTy, Ptr, getContext().getTypeAlignInChars(E->getType()));		IntTy, Ptr, getContext().getTypeAlignInChars(E->getType()));
Load->setVolatile(true);		Load->setVolatile(true);
return Load;		return Load;
}		}
case X86::BI__readgsbyte:		case X86::BI__readgsbyte:
case X86::BI__readgsword:		case X86::BI__readgsword:
case X86::BI__readgsdword:		case X86::BI__readgsdword:
case X86::BI__readgsqword: {		case X86::BI__readgsqword: {
llvm::Type *IntTy = ConvertType(E->getType());		llvm::Type *IntTy = ConvertType(E->getType());
Value *Ptr = Builder.CreateIntToPtr(EmitScalarExpr(E->getArg(0)),		Value *Ptr =
llvm::PointerType::get(IntTy, 256));		Builder.CreateIntToPtr(Ops[0], llvm::PointerType::get(IntTy, 256));
LoadInst *Load = Builder.CreateAlignedLoad(		LoadInst *Load = Builder.CreateAlignedLoad(
IntTy, Ptr, getContext().getTypeAlignInChars(E->getType()));		IntTy, Ptr, getContext().getTypeAlignInChars(E->getType()));
Load->setVolatile(true);		Load->setVolatile(true);
return Load;		return Load;
}		}
case X86::BI__builtin_ia32_paddusb512:		case X86::BI__builtin_ia32_paddusb512:
case X86::BI__builtin_ia32_paddusw512:		case X86::BI__builtin_ia32_paddusw512:
case X86::BI__builtin_ia32_paddusb256:		case X86::BI__builtin_ia32_paddusb256:
▲ Show 20 Lines • Show All 1,794 Lines • Show Last 20 Lines