Download Raw Diff

Details

Reviewers

lebedev.ri
RKSimon
spatel
xbolva00
eugenis
vitalybuka
kcc

Commits

rG4452cc4086ac: [VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan

Summary

Similar to the tsan suppression in
Utils/VNCoercion.cpp:getLoadLoadClobberFullWidthSize (rL175034; load widening used by GVN),
the D81766 optimization should be suppressed under tsan due to potential
spurious data race reports:

struct A {
  int i;
  const short s; // the load cannot be vectorized because
  int modify;    // it overlaps with bytes being concurrently modified
  long pad1, pad2;
};
// __tsan_read16 does not know that some bytes are undef and accessing is safe

Similarly, under asan, users can mark memory regions with
__asan_poison_memory_region. A widened load can lead to a spurious
use-after-poison error. hwasan/memtag should be similarly suppressed.

mustSuppressSpeculation suppresses asan/hwasan/tsan but not memtag, so
we need to exclude memtag in vectorizeLoadInsert.

Note, memtag suppression can be relaxed if the load is aligned to the
its granule (usually 16), but that is out of scope of this patch.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MaskRay created this revision.Sep 11 2020, 12:00 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 11 2020, 12:00 PM

Herald added subscribers: llvm-commits, hiraditya. · View Herald Transcript

MaskRay requested review of this revision.Sep 11 2020, 12:00 PM

Harbormaster completed remote builds in B71418: Diff 291303.Sep 11 2020, 12:18 PM

MaskRay edited the summary of this revision. (Show Details)Sep 11 2020, 12:46 PM

Making general combines dependent on whether we're running sanitizers doesn't sound like a great idea - won't the impact on codegen make it trickier for sanitizers to correctly assist with identifying specific issues?

In D87538#2268840, @RKSimon wrote:

Making general combines dependent on whether we're running sanitizers doesn't sound like a great idea - won't the impact on codegen make it trickier for sanitizers to correctly assist with identifying specific issues?

Searching for Attribute::SanitizeThread can reveal a few other places where certain optimizations are disabled.
For example, in Utils/VNCoercion.cpp getLoadLoadClobberFullWidthSize (load widening used by GVN), there is a very similar pattern (rL175034; the test became stale; the large chunk of code in analyzeLoadFromClobberingLoad is untested now...).

suppress asan as well

Regarding msan, I guess it is safe, because traps are only caused on control-flow dependent instructions. The IR passes know that the widened load has undef elements...

Fix CanWiden

MaskRay edited the summary of this revision. (Show Details)Sep 11 2020, 2:52 PM

MaskRay added reviewers: eugenis, vitalybuka.Sep 11 2020, 2:55 PM

Harbormaster completed remote builds in B71434: Diff 291341.Sep 11 2020, 3:25 PM

Harbormaster completed remote builds in B71435: Diff 291343.

MaskRay added a reviewer: kcc.Sep 11 2020, 3:31 PM

MaskRay edited the summary of this revision. (Show Details)

vitalybuka added inline comments.Sep 11 2020, 3:58 PM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
147	Shouldn't we disable only this case?
152	If we have the mask maybe we can use it for proper check on Asan side?
693	It should apply to SanitizeHWAddress and SanitizeMemTag as well

Suppress hwasan/memtag

MaskRay marked 2 inline comments as done.Sep 11 2020, 4:33 PM

MaskRay added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
147	This branch performs (1) load widening followed by (2) widening/narrowing the result with shufflevector. Due to the first step (load widening), the whole optimization (D86160) is unsafe. Added a specific test for it.
693	Thanks! (I know little about hwasan/memtag. I will learn about them)

MaskRay marked 2 inline comments as done.Sep 11 2020, 4:35 PM

MaskRay added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
152	Yeah, if the information can be conveyed to the asan side then it can probably be retained. (But the optimization may not be worth the additional efforts...)

Harbormaster completed remote builds in B71445: Diff 291356.Sep 11 2020, 5:01 PM

@spatel ☺️

eugenis added inline comments.Sep 14 2020, 11:52 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
152	Right, I wonder if we could come up with some sort of metadata to promise that only the first N bytes (or first N vector elements) of the loaded value will be used. ASan can use it to relax the check, and MSan can validate it.

I have no experience with the sanitizer requirements, but IIUC we can't accurately sanitize any code that uses this transform with this restriction.
If that's what is happening/intended in other passes, I guess there's no other way around it.
But if it is happening in other passes, shouldn't we use some standard function to bail out? For example llvm::mustSuppressSpeculation()?

Use mustSuppressSpeculation

Herald added a subscriber: jfb. · View Herald TranscriptSep 14 2020, 12:23 PM

eugenis added inline comments.Sep 14 2020, 12:28 PM

llvm/lib/Analysis/ValueTracking.cpp
4422 ↗	(On Diff #291647)	MemTag allows extending memory access up to the next 16-byte granule. Memtag is a production thing, so it's important that we do not lose performance there when possible.

Harbormaster completed remote builds in B71598: Diff 291647.Sep 14 2020, 12:57 PM

spatel added inline comments.Sep 14 2020, 1:07 PM

llvm/lib/Analysis/ValueTracking.cpp
4414 ↗	(On Diff #291647)	Definitely get a 2nd reviewer opinion if we are making changes here (I don't know enough about this)... But if I'm seeing it correctly, isSimple() is not a superset of isUnordered(). Ie, a load can have AtomicOrdering::Unordered but still be simple?

MaskRay added inline comments.Sep 14 2020, 1:53 PM

llvm/lib/Analysis/ValueTracking.cpp
4414 ↗	(On Diff #291647)	isSimple is a subset of isUnordered. An AtomicOrdering::Unordered load is `isUnordered` but not `isSimple`.
4422 ↗	(On Diff #291647)	Do you suggest that we should remove SanitizeMemTag here and exclude SanitizeMemTag from VectorCombine.cpp instead? (VectorCombine.cpp can check alignment and allow SanitizeMemTag as an optimization but that will take more code that I don't want to take risk. I am posting the patch to fix a tsan spurious report. )

vitalybuka accepted this revision.Sep 14 2020, 2:59 PM

vitalybuka added inline comments.

llvm/lib/Analysis/ValueTracking.cpp
4414 ↗	(On Diff #291647)	Should this be a separate patch?

This revision is now accepted and ready to land.Sep 14 2020, 2:59 PM

Actually we need decide on SanitizeMemTag. I don't have answer to your question.

This revision now requires changes to proceed.Sep 14 2020, 3:00 PM

Revert changes from ValueTracking.cpp

OK, I want to play safe and have removed ValueTracking.cpp changes. Added the following notes to the description:

mustSuppressSpeculation suppresses asan/hwasan/tsan but not memtag, so we need to exclude memtag in vectorizeLoadInsert.

Note, memtag suppression can be relaxed if the load is aligned to the its granule (usually 16), but that is out of scope of this patch.

Harbormaster completed remote builds in B71632: Diff 291701.Sep 14 2020, 3:40 PM

If I understand @eugenis comment, for memtag it's better to allow loads inside of 16byte granules.
But I guess patch as-is is already improvement and 16byte fix can be a followup optimization.

This revision is now accepted and ready to land.Sep 14 2020, 4:31 PM

In D87538#2272578, @vitalybuka wrote:

If I understand @eugenis comment, for memtag it's better to allow loads inside of 16byte granules.
But I guess patch as-is is already improvement and 16byte fix can be a followup optimization.

Thanks! Yeah, memtag performance improvement can be done as a follow-up. The isUnordered() condition in mustSuppressSpeculation may need some thoughts (I left a comment in D66688)

I'd like an approval from @spatel or @RKSimon

spatel added inline comments.Sep 15 2020, 4:35 AM

llvm/test/Transforms/VectorCombine/X86/load.ll
431	Does this test a different code path than the first added test (`gep10_load_i16_insert_v8i16_asan`)?

MaskRay added inline comments.Sep 15 2020, 9:01 AM

llvm/test/Transforms/VectorCombine/X86/load.ll
431	It is for D86160. If you think it is unneeded I can delete it. Do you have more concerns?

spatel accepted this revision.Sep 15 2020, 9:05 AM

spatel added inline comments.

llvm/test/Transforms/VectorCombine/X86/load.ll
431	Nope - let's keep it to verify that we are fully disabling the load transforms for the sanitizer cases.

Closed by commit rG4452cc4086ac: [VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan (authored by MaskRay). · Explain WhySep 15 2020, 9:52 AM

This revision was automatically updated to reflect the committed changes.

MaskRay added a commit: rG4452cc4086ac: [VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan.

MaskRay mentioned this in D66688: [LoopVectorize] Leverage speculation safety to avoid masked.loads.Sep 15 2020, 11:42 AM

Diff 291343

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show First 20 Lines • Show All 138 Lines • ▼ Show 20 Lines	bool VectorCombine::vectorizeLoadInsert(Instruction &I) {
// It is safe and potentially profitable to load a vector directly:		// It is safe and potentially profitable to load a vector directly:
// inselt undef, load Scalar, 0 --> load VecPtr		// inselt undef, load Scalar, 0 --> load VecPtr
IRBuilder<> Builder(Load);		IRBuilder<> Builder(Load);
Value *CastedPtr = Builder.CreateBitCast(PtrOp, MinVecTy->getPointerTo(AS));		Value *CastedPtr = Builder.CreateBitCast(PtrOp, MinVecTy->getPointerTo(AS));
Value *VecLd = Builder.CreateAlignedLoad(MinVecTy, CastedPtr, Alignment);		Value *VecLd = Builder.CreateAlignedLoad(MinVecTy, CastedPtr, Alignment);

// If the insert type does not match the target's minimum vector type,		// If the insert type does not match the target's minimum vector type,
// use an identity shuffle to shrink/grow the vector.		// use an identity shuffle to shrink/grow the vector.
if (Ty != MinVecTy) {		if (Ty != MinVecTy) {
		vitalybukaUnsubmitted Done Reply Inline Actions Shouldn't we disable only this case? vitalybuka: Shouldn't we disable only this case?
		MaskRayAuthorUnsubmitted Done Reply Inline Actions This branch performs (1) load widening followed by (2) widening/narrowing the result with shufflevector. Due to the first step (load widening), the whole optimization (D86160) is unsafe. Added a specific test for it. MaskRay: This branch performs (1) load widening followed by (2) widening/narrowing the result with…
unsigned OutputNumElts = Ty->getNumElements();		unsigned OutputNumElts = Ty->getNumElements();
SmallVector<int, 16> Mask(OutputNumElts, UndefMaskElem);		SmallVector<int, 16> Mask(OutputNumElts, UndefMaskElem);
for (unsigned i = 0; i < OutputNumElts && i < MinVecNumElts; ++i)		for (unsigned i = 0; i < OutputNumElts && i < MinVecNumElts; ++i)
Mask[i] = i;		Mask[i] = i;
VecLd = Builder.CreateShuffleVector(VecLd, UndefValue::get(MinVecTy), Mask);		VecLd = Builder.CreateShuffleVector(VecLd, UndefValue::get(MinVecTy), Mask);
		vitalybukaUnsubmitted Not Done Reply Inline Actions If we have the mask maybe we can use it for proper check on Asan side? vitalybuka: If we have the mask maybe we can use it for proper check on Asan side?
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Yeah, if the information can be conveyed to the asan side then it can probably be retained. (But the optimization may not be worth the additional efforts...) MaskRay: Yeah, if the information can be conveyed to the asan side then it can probably be retained.
		eugenisUnsubmitted Not Done Reply Inline Actions Right, I wonder if we could come up with some sort of metadata to promise that only the first N bytes (or first N vector elements) of the loaded value will be used. ASan can use it to relax the check, and MSan can validate it. eugenis: Right, I wonder if we could come up with some sort of metadata to promise that only the first N…
}		}
replaceValue(I, *VecLd);		replaceValue(I, *VecLd);
++NumVecLoad;		++NumVecLoad;
return true;		return true;
}		}

/// Determine which, if any, of the inputs should be replaced by a shuffle		/// Determine which, if any, of the inputs should be replaced by a shuffle
/// followed by extract from a different index.		/// followed by extract from a different index.
▲ Show 20 Lines • Show All 521 Lines • ▼ Show 20 Lines
bool VectorCombine::run() {		bool VectorCombine::run() {
if (DisableVectorCombine)		if (DisableVectorCombine)
return false;		return false;

// Don't attempt vectorization if the target does not support vectors.		// Don't attempt vectorization if the target does not support vectors.
if (!TTI.getNumberOfRegisters(TTI.getRegisterClassForType(/Vector/ true)))		if (!TTI.getNumberOfRegisters(TTI.getRegisterClassForType(/Vector/ true)))
return false;		return false;

		// Do not vectorize scalar load under asan or tsan. The widened load may
		// overlap bytes marked as __asan_poison_memory_region or bytes being
		// concurrently modified.
		bool CanWiden = !F.hasFnAttribute(Attribute::SanitizeAddress) &&
		vitalybukaUnsubmitted Done Reply Inline Actions It should apply to SanitizeHWAddress and SanitizeMemTag as well vitalybuka: It should apply to SanitizeHWAddress and SanitizeMemTag as well
		MaskRayAuthorUnsubmitted Done Reply Inline Actions Thanks! (I know little about hwasan/memtag. I will learn about them) MaskRay: Thanks! (I know little about hwasan/memtag. I will learn about them)
		!F.hasFnAttribute(Attribute::SanitizeThread);
bool MadeChange = false;		bool MadeChange = false;
for (BasicBlock &BB : F) {		for (BasicBlock &BB : F) {
// Ignore unreachable basic blocks.		// Ignore unreachable basic blocks.
if (!DT.isReachableFromEntry(&BB))		if (!DT.isReachableFromEntry(&BB))
continue;		continue;
// Do not delete instructions under here and invalidate the iterator.		// Do not delete instructions under here and invalidate the iterator.
// Walk the block forwards to enable simple iterative chains of transforms.		// Walk the block forwards to enable simple iterative chains of transforms.
// TODO: It could be more efficient to remove dead instructions		// TODO: It could be more efficient to remove dead instructions
// iteratively in this loop rather than waiting until the end.		// iteratively in this loop rather than waiting until the end.
for (Instruction &I : BB) {		for (Instruction &I : BB) {
if (isa<DbgInfoIntrinsic>(I))		if (isa<DbgInfoIntrinsic>(I))
continue;		continue;
Builder.SetInsertPoint(&I);		Builder.SetInsertPoint(&I);
		if (CanWiden)
MadeChange \|= vectorizeLoadInsert(I);		MadeChange \|= vectorizeLoadInsert(I);
MadeChange \|= foldExtractExtract(I);		MadeChange \|= foldExtractExtract(I);
MadeChange \|= foldBitcastShuf(I);		MadeChange \|= foldBitcastShuf(I);
MadeChange \|= scalarizeBinopOrCmp(I);		MadeChange \|= scalarizeBinopOrCmp(I);
MadeChange \|= foldExtractedCmps(I);		MadeChange \|= foldExtractedCmps(I);
}		}
}		}

// We're done with transforms, so remove dead instructions.		// We're done with transforms, so remove dead instructions.
▲ Show 20 Lines • Show All 64 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/X86/load.ll

	Show First 20 Lines • Show All 286 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: ret <8 x i16> [[R]]			; CHECK-NEXT: ret <8 x i16> [[R]]
	;			;
	%gep = getelementptr inbounds <8 x i16>, <8 x i16>* %p, i64 1, i64 0			%gep = getelementptr inbounds <8 x i16>, <8 x i16>* %p, i64 1, i64 0
	%s = load i16, i16* %gep, align 16			%s = load i16, i16* %gep, align 16
	%r = insertelement <8 x i16> undef, i16 %s, i64 0			%r = insertelement <8 x i16> undef, i16 %s, i64 0
	ret <8 x i16> %r			ret <8 x i16> %r
	}			}

				; Negative test - disable under asan because widened load can cause spurious
				; use-after-poison issues when __asan_poison_memory_region is used.

				define <8 x i16> @gep10_load_i16_insert_v8i16_asan(<8 x i16>* align 16 dereferenceable(32) %p) sanitize_address {
				; CHECK-LABEL: @gep10_load_i16_insert_v8i16_asan(
				; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds <8 x i16>, <8 x i16> [[P:%.*]], i64 1, i64 0
				; CHECK-NEXT: [[S:%.]] = load i16, i16 [[GEP]], align 16
				; CHECK-NEXT: [[R:%.*]] = insertelement <8 x i16> undef, i16 [[S]], i64 0
				; CHECK-NEXT: ret <8 x i16> [[R]]
				;
				%gep = getelementptr inbounds <8 x i16>, <8 x i16>* %p, i64 1, i64 0
				%s = load i16, i16* %gep, align 16
				%r = insertelement <8 x i16> undef, i16 %s, i64 0
				ret <8 x i16> %r
				}

				; Negative test - disable under tsan because widened load may overlap bytes
				; being concurrently modified. tsan does not know that some bytes are undef.

				define <8 x i16> @gep10_load_i16_insert_v8i16_tsan(<8 x i16>* align 16 dereferenceable(32) %p) sanitize_thread {
				; CHECK-LABEL: @gep10_load_i16_insert_v8i16_tsan(
				; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds <8 x i16>, <8 x i16> [[P:%.*]], i64 1, i64 0
				; CHECK-NEXT: [[S:%.]] = load i16, i16 [[GEP]], align 16
				; CHECK-NEXT: [[R:%.*]] = insertelement <8 x i16> undef, i16 [[S]], i64 0
				; CHECK-NEXT: ret <8 x i16> [[R]]
				;
				%gep = getelementptr inbounds <8 x i16>, <8 x i16>* %p, i64 1, i64 0
				%s = load i16, i16* %gep, align 16
				%r = insertelement <8 x i16> undef, i16 %s, i64 0
				ret <8 x i16> %r
				}

	; Negative test - can't safely load the offset vector, but could load+shuffle.			; Negative test - can't safely load the offset vector, but could load+shuffle.

	define <8 x i16> @gep10_load_i16_insert_v8i16_deref(<8 x i16>* align 16 dereferenceable(31) %p) {			define <8 x i16> @gep10_load_i16_insert_v8i16_deref(<8 x i16>* align 16 dereferenceable(31) %p) {
	; CHECK-LABEL: @gep10_load_i16_insert_v8i16_deref(			; CHECK-LABEL: @gep10_load_i16_insert_v8i16_deref(
	; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds <8 x i16>, <8 x i16> [[P:%.*]], i64 1, i64 0			; CHECK-NEXT: [[GEP:%.]] = getelementptr inbounds <8 x i16>, <8 x i16> [[P:%.*]], i64 1, i64 0
	; CHECK-NEXT: [[S:%.]] = load i16, i16 [[GEP]], align 16			; CHECK-NEXT: [[S:%.]] = load i16, i16 [[GEP]], align 16
	; CHECK-NEXT: [[R:%.*]] = insertelement <8 x i16> undef, i16 [[S]], i64 0			; CHECK-NEXT: [[R:%.*]] = insertelement <8 x i16> undef, i16 [[S]], i64 0
	; CHECK-NEXT: ret <8 x i16> [[R]]			; CHECK-NEXT: ret <8 x i16> [[R]]
	▲ Show 20 Lines • Show All 84 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P:%.]] to <4 x float>			; CHECK-NEXT: [[TMP1:%.]] = bitcast float [[P:%.]] to <4 x float>
	; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4			; CHECK-NEXT: [[TMP2:%.]] = load <4 x float>, <4 x float> [[TMP1]], align 4
	; CHECK-NEXT: [[R:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <2 x i32> <i32 0, i32 1>			; CHECK-NEXT: [[R:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> undef, <2 x i32> <i32 0, i32 1>
	; CHECK-NEXT: ret <2 x float> [[R]]			; CHECK-NEXT: ret <2 x float> [[R]]
	;			;
	%s = load float, float* %p, align 4			%s = load float, float* %p, align 4
	%r = insertelement <2 x float> undef, float %s, i32 0			%r = insertelement <2 x float> undef, float %s, i32 0
	ret <2 x float> %r			ret <2 x float> %r
	}			}
				spatelUnsubmitted Not Done Reply Inline Actions Does this test a different code path than the first added test (`gep10_load_i16_insert_v8i16_asan`)? spatel: Does this test a different code path than the first added test…
				MaskRayAuthorUnsubmitted Done Reply Inline Actions It is for D86160. If you think it is unneeded I can delete it. Do you have more concerns? MaskRay: It is for D86160. If you think it is unneeded I can delete it. Do you have more concerns?
				spatelUnsubmitted Not Done Reply Inline Actions Nope - let's keep it to verify that we are fully disabling the load transforms for the sanitizer cases. spatel: Nope - let's keep it to verify that we are fully disabling the load transforms for the…

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 291343

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/X86/load.ll

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsanClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 291343

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/X86/load.ll

[VectorCombine] Don't vectorize scalar load under asan/hwasan/memtag/tsan
ClosedPublic