Download Raw Diff

Details

Reviewers

lebedev.ri
fhahn
spatel
efriedma
hgreving
nlopes

Commits

rG2670c7dd5b25: [VectorCombine] Fix alignment in single element store

Summary

This concern was raised in D98240. It's a miscompile and thanks for comments from @lebedev.ri.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

qiucf created this revision.May 31 2021, 10:40 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 31 2021, 10:40 AM

qiucf requested review of this revision.May 31 2021, 10:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 31 2021, 10:40 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

tschuett added a subscriber: tschuett.May 31 2021, 10:44 AM

tschuett added inline comments.

llvm/lib/Transforms/Vectorize/VectorCombine.cpp
834	This line is too cute for me, but ...

Hm, this first needs more radical fixes - this is currently miscompiling vector indexes: https://alive2.llvm.org/ce/z/aWtH9w
I would suggest to first simply require the index to be constant.

Right, canScalarizeAccess() already does that, okay.
But then, it would be good to have a positive test for @insert_store_nonconst() :)

lebedev.ri added inline comments.May 31 2021, 11:34 AM

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

835

(alive *seems* to be happy with this)
This is going to be precise and optimal for constant indexes,
but i think we can get at least the lower-bound estimate for variable indexes:
the new address will be offset from the base address by DL.getTypeStoreSize(NewElement->getType()),
so i think we can do

else
  NewAlignment = commonAlignment(
          NewAlignment,
          DL.getTypeStoreSize(NewElement->getType()));

Please add positive tests:

; New alignment should be 8
define void @src(<8 x i64>* %q, i64 %s, i32 %idx) {
  %cmp = icmp ult i32 %idx, 2
  call void @llvm.assume(i1 %cmp)

  %i = load <8 x i64>, <8 x i64>* %q, align 8
  %vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx
  store <8 x i64> %vecins, <8 x i64>* %q, align 8
  ret void
}

; New alignment should be 4
define void @src(<8 x i64>* %q, i64 %s, i32 %idx) {
  %cmp = icmp ult i32 %idx, 2
  call void @llvm.assume(i1 %cmp)

  %i = load <8 x i64>, <8 x i64>* %q, align 4
  %vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx
  store <8 x i64> %vecins, <8 x i64>* %q, align 4
  ret void
}

Harbormaster completed remote builds in B106941: Diff 348837.May 31 2021, 12:00 PM

Add another test (first..)

Harbormaster completed remote builds in B107049: Diff 348984.Jun 1 2021, 9:12 AM

Please feel free to just directly commit new tests.
The new tests i asked for should be positive tests - they should be getting transformed (missing @llvm.assume())

lebedev.ri requested changes to this revision.Jun 2 2021, 6:48 AM

This revision now requires changes to proceed.Jun 2 2021, 6:48 AM

qiucf updated this revision to Diff 350528.Jun 8 2021, 1:38 AM

Harbormaster completed remote builds in B108158: Diff 350528.Jun 8 2021, 2:10 AM

Thanks.
This looks fine to me now.
Can anyone spot any issues with the new alignment logic? @fhahn @spatel?

llvm/test/Transforms/VectorCombine/load-insert-store.ll
128	I think we still want those two tests i suggested, they demonstrate that we don't increase alignment from the maximal one allowed. Please precommit the tests.

spatel added inline comments.Jun 8 2021, 8:02 AM

llvm/test/Transforms/VectorCombine/load-insert-store.ll
23	How do we justify this increase in alignment? The original code had minimal `align 1`, so it could be anything. We are creating a scalar store at an address 6 bytes over that, so it could still be anything?

lebedev.ri added inline comments.Jun 8 2021, 9:26 AM

llvm/test/Transforms/VectorCombine/load-insert-store.ll
23	This change is correct. Before `store <...>, align 1`, we have already established that the `%q` is more aligned, as per the `load <...>` with an implicit alignment, which isn't `1`. https://alive2.llvm.org/ce/z/C2qnUc

spatel added inline comments.Jun 8 2021, 9:45 AM

llvm/test/Transforms/VectorCombine/load-insert-store.ll
23	Ah, thanks for explaining. IIUC, we add explicit alignment to all load/store in IR now, so we should add the `align 16` to this test to avoid confusion - and a test comment would be nice too :).
128	+1 - additional tests and pre-commit will make this easier to understand.

Add explicit align to affected test.
Add comment for implicit alignment.

Harbormaster completed remote builds in B108321: Diff 350760.Jun 8 2021, 7:45 PM

spatel added inline comments.Jun 9 2021, 9:39 AM

llvm/test/Transforms/VectorCombine/load-insert-store.ll

128

Let me know if I'm not seeing it, but we want 1 test with nonconst index where the original alignment is less than the presumed alignment for the new scalar store:

define void @src(<8 x i64>* %q, i64 %s, i32 %idx) {
  %cmp = icmp ult i32 %idx, 2
  call void @llvm.assume(i1 %cmp)
  %i = load <8 x i64>, <8 x i64>* %q, align 4
  %vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx
  store <8 x i64> %vecins, <8 x i64>* %q, align 2 ; make this different just to exercise the logic a bit more
  ret void
}

(better, but still not quite there, my previous comments still stand unaddressed...)

This revision now requires changes to proceed.Jun 9 2021, 9:40 AM

qiucf updated this revision to Diff 351074.Jun 9 2021, 11:31 PM

Harbormaster completed remote builds in B108547: Diff 351074.Jun 10 2021, 12:11 AM

LGTM unless there are other comments.
Thanks.

This revision is now accepted and ready to land.Jun 10 2021, 2:16 AM

LGTM

Closed by commit rG2670c7dd5b25: [VectorCombine] Fix alignment in single element store (authored by qiucf). · Explain WhyJun 10 2021, 7:32 PM

This revision was automatically updated to reflect the committed changes.

qiucf added a commit: rG2670c7dd5b25: [VectorCombine] Fix alignment in single element store.

Landed. Thanks for the review!

Diff 351336

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

Show First 20 Lines • Show All 825 Lines • ▼ Show 20 Lines	if (!Load->isSimple() \|\| Load->getParent() != SI->getParent() \|\|
MemoryLocation::get(SI), AA))		MemoryLocation::get(SI), AA))
return false;		return false;

Value *GEP = GetElementPtrInst::CreateInBounds(		Value *GEP = GetElementPtrInst::CreateInBounds(
SI->getPointerOperand(), {ConstantInt::get(Idx->getType(), 0), Idx});		SI->getPointerOperand(), {ConstantInt::get(Idx->getType(), 0), Idx});
Builder.Insert(GEP);		Builder.Insert(GEP);
StoreInst *NSI = Builder.CreateStore(NewElement, GEP);		StoreInst *NSI = Builder.CreateStore(NewElement, GEP);
NSI->copyMetadata(*SI);		NSI->copyMetadata(*SI);
if (SI->getAlign() < NSI->getAlign())		Align NewAlignment = std::max(SI->getAlign(), Load->getAlign());
		tschuettUnsubmitted Not Done Reply Inline Actions This line is too cute for me, but ... tschuett: This line is too cute for me, but ...
NSI->setAlignment(SI->getAlign());		if (auto *C = dyn_cast<ConstantInt>(Idx))
		lebedev.riUnsubmitted Not Done Reply Inline Actions (alive seems to be happy with this) This is going to be precise and optimal for constant indexes, but i think we can get at least the lower-bound estimate for variable indexes: the new address will be offset from the base address by `DL.getTypeStoreSize(NewElement->getType())`, so i think we can do else NewAlignment = commonAlignment( NewAlignment, DL.getTypeStoreSize(NewElement->getType())); Please add positive tests: ; New alignment should be 8 define void @src(<8 x i64>* %q, i64 %s, i32 %idx) { %cmp = icmp ult i32 %idx, 2 call void @llvm.assume(i1 %cmp) %i = load <8 x i64>, <8 x i64>* %q, align 8 %vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx store <8 x i64> %vecins, <8 x i64>* %q, align 8 ret void } ; New alignment should be 4 define void @src(<8 x i64>* %q, i64 %s, i32 %idx) { %cmp = icmp ult i32 %idx, 2 call void @llvm.assume(i1 %cmp) %i = load <8 x i64>, <8 x i64>* %q, align 4 %vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx store <8 x i64> %vecins, <8 x i64>* %q, align 4 ret void } lebedev.ri: (alive seems to be happy with this) This is going to be precise and optimal for constant…
		NewAlignment = commonAlignment(
		NewAlignment,
		C->getZExtValue() * DL.getTypeStoreSize(NewElement->getType()));
		else
		NewAlignment = commonAlignment(
		NewAlignment, DL.getTypeStoreSize(NewElement->getType()));
		NSI->setAlignment(NewAlignment);
replaceValue(I, *NSI);		replaceValue(I, *NSI);
// Need erasing the store manually.		// Need erasing the store manually.
I.eraseFromParent();		I.eraseFromParent();
return true;		return true;
}		}

return false;		return false;
}		}
▲ Show 20 Lines • Show All 194 Lines • Show Last 20 Lines

llvm/test/Transforms/VectorCombine/load-insert-store.ll

Show All 14 Lines	entry:
store <16 x i8> %vecins, <16 x i8>* %q, align 16		store <16 x i8> %vecins, <16 x i8>* %q, align 16
ret void		ret void
}		}

define void @insert_store_i16_align1(<8 x i16>* %q, i16 zeroext %s) {		define void @insert_store_i16_align1(<8 x i16>* %q, i16 zeroext %s) {
; CHECK-LABEL: @insert_store_i16_align1(		; CHECK-LABEL: @insert_store_i16_align1(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <8 x i16>, <8 x i16> [[Q:%.*]], i32 0, i32 3		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <8 x i16>, <8 x i16> [[Q:%.*]], i32 0, i32 3
; CHECK-NEXT: store i16 [[S:%.]], i16 [[TMP0]], align 1		; CHECK-NEXT: store i16 [[S:%.]], i16 [[TMP0]], align 2
		spatelUnsubmitted Not Done Reply Inline Actions How do we justify this increase in alignment? The original code had minimal `align 1`, so it could be anything. We are creating a scalar store at an address 6 bytes over that, so it could still be anything? spatel: How do we justify this increase in alignment? The original code had minimal `align 1`, so it…
		lebedev.riUnsubmitted Not Done Reply Inline Actions This change is correct. Before `store <...>, align 1`, we have already established that the `%q` is more aligned, as per the `load <...>` with an implicit alignment, which isn't `1`. https://alive2.llvm.org/ce/z/C2qnUc lebedev.ri: This change is correct. Before `store <...>, align 1`, we have already established that the…
		spatelUnsubmitted Done Reply Inline Actions Ah, thanks for explaining. IIUC, we add explicit alignment to all load/store in IR now, so we should add the `align 16` to this test to avoid confusion - and a test comment would be nice too :). spatel: Ah, thanks for explaining. IIUC, we add explicit alignment to all load/store in IR now, so we…
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%0 = load <8 x i16>, <8 x i16>* %q		%0 = load <8 x i16>, <8 x i16>* %q
%vecins = insertelement <8 x i16> %0, i16 %s, i32 3		%vecins = insertelement <8 x i16> %0, i16 %s, i32 3
store <8 x i16> %vecins, <8 x i16>* %q, align 1		store <8 x i16> %vecins, <8 x i16>* %q, align 1
ret void		ret void
}		}
▲ Show 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
;		;
entry:		entry:
%0 = load <16 x i8>, <16 x i8>* %q		%0 = load <16 x i8>, <16 x i8>* %q
%vecins = insertelement <16 x i8> %0, i8 %s, i32 %idx		%vecins = insertelement <16 x i8> %0, i8 %s, i32 %idx
store <16 x i8> %vecins, <16 x i8>* %q		store <16 x i8> %vecins, <16 x i8>* %q
ret void		ret void
}		}

		; To verify align here is narrowed to scalar store size
		lebedev.riUnsubmitted Not Done Reply Inline Actions I think we still want those two tests i suggested, they demonstrate that we don't increase alignment from the maximal one allowed. Please precommit the tests. lebedev.ri: I think we still want those two tests i suggested, they demonstrate that we don't increase…
		spatelUnsubmitted Not Done Reply Inline Actions +1 - additional tests and pre-commit will make this easier to understand. spatel: +1 - additional tests and pre-commit will make this easier to understand.
		spatelUnsubmitted Not Done Reply Inline Actions Let me know if I'm not seeing it, but we want 1 test with nonconst index where the original alignment is less than the presumed alignment for the new scalar store: define void @src(<8 x i64>* %q, i64 %s, i32 %idx) { %cmp = icmp ult i32 %idx, 2 call void @llvm.assume(i1 %cmp) %i = load <8 x i64>, <8 x i64>* %q, align 4 %vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx store <8 x i64> %vecins, <8 x i64>* %q, align 2 ; make this different just to exercise the logic a bit more ret void } spatel: Let me know if I'm not seeing it, but we want 1 test with nonconst index where the original…
define void @insert_store_nonconst_large_alignment(<4 x i32>* %q, i32 zeroext %s, i32 %idx) {		define void @insert_store_nonconst_large_alignment(<4 x i32>* %q, i32 zeroext %s, i32 %idx) {
; CHECK-LABEL: @insert_store_nonconst_large_alignment(		; CHECK-LABEL: @insert_store_nonconst_large_alignment(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[IDX:%.]], 4		; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[IDX:%.]], 4
; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])
; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <4 x i32>, <4 x i32> [[Q:%.*]], i32 0, i32 [[IDX]]		; CHECK-NEXT: [[TMP0:%.]] = getelementptr inbounds <4 x i32>, <4 x i32> [[Q:%.*]], i32 0, i32 [[IDX]]
; CHECK-NEXT: store i32 [[S:%.]], i32 [[TMP0]], align 4		; CHECK-NEXT: store i32 [[S:%.]], i32 [[TMP0]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%cmp = icmp ult i32 %idx, 4		%cmp = icmp ult i32 %idx, 4
call void @llvm.assume(i1 %cmp)		call void @llvm.assume(i1 %cmp)
%i = load <4 x i32>, <4 x i32>* %q, align 128		%i = load <4 x i32>, <4 x i32>* %q, align 128
%vecins = insertelement <4 x i32> %i, i32 %s, i32 %idx		%vecins = insertelement <4 x i32> %i, i32 %s, i32 %idx
store <4 x i32> %vecins, <4 x i32>* %q, align 128		store <4 x i32> %vecins, <4 x i32>* %q, align 128
ret void		ret void
}		}

define void @insert_store_nonconst_align_maximum_8(<8 x i64>* %q, i64 %s, i32 %idx) {		define void @insert_store_nonconst_align_maximum_8(<8 x i64>* %q, i64 %s, i32 %idx) {
; CHECK-LABEL: @insert_store_nonconst_align_maximum_8(		; CHECK-LABEL: @insert_store_nonconst_align_maximum_8(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[IDX:%.]], 2		; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[IDX:%.]], 2
; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <8 x i64>, <8 x i64> [[Q:%.*]], i32 0, i32 [[IDX]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <8 x i64>, <8 x i64> [[Q:%.*]], i32 0, i32 [[IDX]]
; CHECK-NEXT: store i64 [[S:%.]], i64 [[TMP1]], align 4		; CHECK-NEXT: store i64 [[S:%.]], i64 [[TMP1]], align 8
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%cmp = icmp ult i32 %idx, 2		%cmp = icmp ult i32 %idx, 2
call void @llvm.assume(i1 %cmp)		call void @llvm.assume(i1 %cmp)
%i = load <8 x i64>, <8 x i64>* %q, align 8		%i = load <8 x i64>, <8 x i64>* %q, align 8
%vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx		%vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx
store <8 x i64> %vecins, <8 x i64>* %q, align 8		store <8 x i64> %vecins, <8 x i64>* %q, align 8
ret void		ret void
Show All 15 Lines	;
ret void		ret void
}		}

define void @insert_store_nonconst_align_larger(<8 x i64>* %q, i64 %s, i32 %idx) {		define void @insert_store_nonconst_align_larger(<8 x i64>* %q, i64 %s, i32 %idx) {
; CHECK-LABEL: @insert_store_nonconst_align_larger(		; CHECK-LABEL: @insert_store_nonconst_align_larger(
; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[IDX:%.]], 2		; CHECK-NEXT: [[CMP:%.]] = icmp ult i32 [[IDX:%.]], 2
; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])		; CHECK-NEXT: call void @llvm.assume(i1 [[CMP]])
; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <8 x i64>, <8 x i64> [[Q:%.*]], i32 0, i32 [[IDX]]		; CHECK-NEXT: [[TMP1:%.]] = getelementptr inbounds <8 x i64>, <8 x i64> [[Q:%.*]], i32 0, i32 [[IDX]]
; CHECK-NEXT: store i64 [[S:%.]], i64 [[TMP1]], align 2		; CHECK-NEXT: store i64 [[S:%.]], i64 [[TMP1]], align 4
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%cmp = icmp ult i32 %idx, 2		%cmp = icmp ult i32 %idx, 2
call void @llvm.assume(i1 %cmp)		call void @llvm.assume(i1 %cmp)
%i = load <8 x i64>, <8 x i64>* %q, align 4		%i = load <8 x i64>, <8 x i64>* %q, align 4
%vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx		%vecins = insertelement <8 x i64> %i, i64 %s, i32 %idx
store <8 x i64> %vecins, <8 x i64>* %q, align 2		store <8 x i64> %vecins, <8 x i64>* %q, align 2
ret void		ret void
▲ Show 20 Lines • Show All 340 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Fix alignment in single element store
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 351336

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/load-insert-store.ll

This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Fix alignment in single element storeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 351336

llvm/lib/Transforms/Vectorize/VectorCombine.cpp

llvm/test/Transforms/VectorCombine/load-insert-store.ll

[VectorCombine] Fix alignment in single element store
ClosedPublic