This is an archive of the discontinued LLVM Phabricator instance.

LSV: Always try to adjust the alloca alignment
AbandonedPublic

Authored by arsenm on Aug 25 2016, 10:20 PM.

Download Raw Diff

Details

Reviewers

• tstellarAMD
jlebar
escha
asbirlea

Summary

Although the target may support unaligned access, it's likely
still better to increase the alignment.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 69319.Aug 25 2016, 10:20 PM

arsenm retitled this revision from to LSV: Always try to adjust the alloca alignment.

arsenm updated this object.

arsenm added reviewers: jlebar, escha, asbirlea.

arsenm added a subscriber: llvm-commits.

Herald added a reviewer: • tstellarAMD. · View Herald TranscriptAug 25 2016, 10:20 PM

Herald added a subscriber: mzolotukhin. · View Herald Transcript

accessIsMisaligned returns false if the TTI reports that the access is not "fast". But here we seem to be ignoring that and aligning the access anyway.

Is there prior art for ignoring TTI and assuming that unaligned accesses are slower than aligned accesses? Alternatively, could you just make your target return false for "fast"?

In D23908#528067, @jlebar wrote:

accessIsMisaligned returns false if the TTI reports that the access is not "fast". But here we seem to be ignoring that and aligning the access anyway.

Is there prior art for ignoring TTI and assuming that unaligned accesses are slower than aligned accesses? Alternatively, could you just make your target return false for "fast"?

I think what "fast" means is ambiguous. The way this is used changes the meaning of fast in the context of the uses. For example in DAGCombiner, this is used to check if it's OK to merge a store if it creates a less-aligned vector store. That's a bit different than the question of changing the underlying alignment. The unaligned operation might be faster to some unaligned expansion, but that doesn't mean it's still not better still to have the aligned access

asbirlea added inline comments.Aug 29 2016, 2:05 PM

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp
822	If the target does not support unaligned access of the type, the condition below may exit without vectorizing, but after having changed the alignment for the alloca. Is this the intended behavior?

The unaligned operation might be faster to some unaligned expansion, but that doesn't mean it's still not better still to have the aligned access

Indeed, but there's a cost to increasing the alignment too, right? In particular, it prevents us from packing our stack as efficiently.

It's an assumption of this pass that vectorization is always beneficial. So of course we want to increase the alignment when the access would otherwise be illegal. And it also seems reasonable to increase the alignment when the TTI tells us the access would not be "fast". But increasing the alignment on the mere suspicion that it would be faster, without any indication from the target that it would in fact be an improvement...I am less comfortable with that.

An alternative would be to add a new late-IR pass that increases the alignment of all allocas and, on your targets, run that after the LSV. Then we'd be making a target-specific decision to enable the pass.

asbirlea resigned from this revision.Sep 15 2021, 12:26 PM

Herald added subscribers: kerbowa, nhaehnle, wdng, jvesely. · View Herald TranscriptSep 15 2021, 12:26 PM

arsenm abandoned this revision.Jun 9 2023, 6:54 PM

arsenm mentioned this in rGa3938700856f: AMDGPU: Extract test out of old patch.Jun 9 2023, 6:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 9 2023, 6:54 PM

Herald added a subscriber: • pcwang-thead. · View Herald Transcript

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

LoadStoreVectorizer.cpp

29 lines

test/

Transforms/

LoadStoreVectorizer/

AMDGPU/

adjust-alloca-alignment.ll

50 lines

Diff 69319

lib/Transforms/Vectorize/LoadStoreVectorizer.cpp

Show First 20 Lines • Show All 810 Lines • ▼ Show 20 Lines	bool Vectorizer::vectorizeStoreChain(

// We won't try again to vectorize the elements of the chain, regardless of		// We won't try again to vectorize the elements of the chain, regardless of
// whether we succeed below.		// whether we succeed below.
InstructionsProcessed->insert(Chain.begin(), Chain.end());		InstructionsProcessed->insert(Chain.begin(), Chain.end());

// Check alignment restrictions.		// Check alignment restrictions.
unsigned Alignment = getAlignment(S0);		unsigned Alignment = getAlignment(S0);

// If the store is going to be misaligned, don't vectorize it.		if (S0->getPointerAddressSpace() == 0 &&
if (accessIsMisaligned(SzInBytes, AS, Alignment)) {		Alignment < StackAdjustedAlignment) {
if (S0->getPointerAddressSpace() != 0)		// Even if the target supports unaligned access of the type, it still may be
return false;		// better to adjust the alignment.
		asbirleaUnsubmitted Not Done Reply Inline Actions If the target does not support unaligned access of the type, the condition below may exit without vectorizing, but after having changed the alignment for the alloca. Is this the intended behavior? asbirlea: If the target does not support unaligned access of the type, the condition below may exit…

if (AllocaInst *AI = canAdjustAllocaAlignment(S0->getPointerOperand(),		if (AllocaInst *AI = canAdjustAllocaAlignment(S0->getPointerOperand(),
EltSzInBytes, Alignment)) {		EltSzInBytes, Alignment)) {
Alignment = StackAdjustedAlignment;		Alignment = StackAdjustedAlignment;
if (AI->getAlignment() < Alignment)		if (AI->getAlignment() < Alignment)
AI->setAlignment(Alignment);		AI->setAlignment(Alignment);
} else
return false;
}		}
		}

		// If the store is going to be misaligned, don't vectorize it.
		if (accessIsMisaligned(SzInBytes, AS, Alignment))
		return false;

BasicBlock::iterator First, Last;		BasicBlock::iterator First, Last;
std::tie(First, Last) = getBoundaryInstrs(Chain);		std::tie(First, Last) = getBoundaryInstrs(Chain);
Builder.SetInsertPoint(&*Last);		Builder.SetInsertPoint(&*Last);

Value *Vec = UndefValue::get(VecTy);		Value *Vec = UndefValue::get(VecTy);

if (VecStoreTy) {		if (VecStoreTy) {
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	bool Vectorizer::vectorizeLoadChain(

// We won't try again to vectorize the elements of the chain, regardless of		// We won't try again to vectorize the elements of the chain, regardless of
// whether we succeed below.		// whether we succeed below.
InstructionsProcessed->insert(Chain.begin(), Chain.end());		InstructionsProcessed->insert(Chain.begin(), Chain.end());

// Check alignment restrictions.		// Check alignment restrictions.
unsigned Alignment = getAlignment(L0);		unsigned Alignment = getAlignment(L0);

// If the load is going to be misaligned, don't vectorize it.		if (L0->getPointerAddressSpace() == 0 && Alignment < StackAdjustedAlignment) {
if (accessIsMisaligned(SzInBytes, AS, Alignment)) {
if (L0->getPointerAddressSpace() != 0)
return false;

if (AllocaInst *AI = canAdjustAllocaAlignment(L0->getPointerOperand(),		if (AllocaInst *AI = canAdjustAllocaAlignment(L0->getPointerOperand(),
EltSzInBytes, Alignment)) {		EltSzInBytes, Alignment)) {
Alignment = StackAdjustedAlignment;		Alignment = StackAdjustedAlignment;
if (AI->getAlignment() < Alignment)		if (AI->getAlignment() < Alignment)
AI->setAlignment(Alignment);		AI->setAlignment(Alignment);
} else
return false;
}		}
		}

		// If the load is going to be misaligned, don't vectorize it.
		if (accessIsMisaligned(SzInBytes, AS, Alignment))
		return false;

DEBUG({		DEBUG({
dbgs() << "LSV: Loads to vectorize:\n";		dbgs() << "LSV: Loads to vectorize:\n";
for (Instruction *I : Chain)		for (Instruction *I : Chain)
I->dump();		I->dump();
});		});

// getVectorizablePrefix already computed getBoundaryInstrs. The value of		// getVectorizablePrefix already computed getBoundaryInstrs. The value of
▲ Show 20 Lines • Show All 78 Lines • Show Last 20 Lines

test/Transforms/LoadStoreVectorizer/AMDGPU/adjust-alloca-alignment.ll

Show All 15 Lines	define void @load_unknown_offset_align1_i8(i8 addrspace(1)* noalias %out, i32 %offset) #0 {
%val0 = load i8, i8* %ptr0, align 1		%val0 = load i8, i8* %ptr0, align 1
%ptr1 = getelementptr inbounds i8, i8* %ptr0, i32 1		%ptr1 = getelementptr inbounds i8, i8* %ptr0, i32 1
%val1 = load i8, i8* %ptr1, align 1		%val1 = load i8, i8* %ptr1, align 1
%add = add i8 %val0, %val1		%add = add i8 %val0, %val1
store i8 %add, i8 addrspace(1)* %out		store i8 %add, i8 addrspace(1)* %out
ret void		ret void
}		}

		; ALL-LABEL: @load_alloca16_unknown_offset_align1_i8(
		; ALL: alloca [128 x i8], align 16
		; UNALIGNED: load <2 x i8>, <2 x i8>* %{{[0-9]+}}, align 1{{$}}

		; ALIGNED: load i8, i8* %ptr0, align 1{{$}}
		; ALIGNED: load i8, i8* %ptr1, align 1{{$}}
		define void @load_alloca16_unknown_offset_align1_i8(i8 addrspace(1)* noalias %out, i32 %offset) #0 {
		%alloca = alloca [128 x i8], align 16
		%ptr0 = getelementptr inbounds [128 x i8], [128 x i8]* %alloca, i32 0, i32 %offset
		%val0 = load i8, i8* %ptr0, align 1
		%ptr1 = getelementptr inbounds i8, i8* %ptr0, i32 1
		%val1 = load i8, i8* %ptr1, align 1
		%add = add i8 %val0, %val1
		store i8 %add, i8 addrspace(1)* %out
		ret void
		}

; ALL-LABEL: @load_unknown_offset_align1_i16(		; ALL-LABEL: @load_unknown_offset_align1_i16(
; ALL: alloca [128 x i16], align 1{{$}}		; ALL: alloca [128 x i16], align 1{{$}}
; UNALIGNED: load <2 x i16>, <2 x i16>* %{{[0-9]+}}, align 1{{$}}		; UNALIGNED: load <2 x i16>, <2 x i16>* %{{[0-9]+}}, align 1{{$}}

; ALIGNED: load i16, i16* %ptr0, align 1{{$}}		; ALIGNED: load i16, i16* %ptr0, align 1{{$}}
; ALIGNED: load i16, i16* %ptr1, align 1{{$}}		; ALIGNED: load i16, i16* %ptr1, align 1{{$}}
define void @load_unknown_offset_align1_i16(i16 addrspace(1)* noalias %out, i32 %offset) #0 {		define void @load_unknown_offset_align1_i16(i16 addrspace(1)* noalias %out, i32 %offset) #0 {
%alloca = alloca [128 x i16], align 1		%alloca = alloca [128 x i16], align 1
%ptr0 = getelementptr inbounds [128 x i16], [128 x i16]* %alloca, i32 0, i32 %offset		%ptr0 = getelementptr inbounds [128 x i16], [128 x i16]* %alloca, i32 0, i32 %offset
%val0 = load i16, i16* %ptr0, align 1		%val0 = load i16, i16* %ptr0, align 1
%ptr1 = getelementptr inbounds i16, i16* %ptr0, i32 1		%ptr1 = getelementptr inbounds i16, i16* %ptr0, i32 1
%val1 = load i16, i16* %ptr1, align 1		%val1 = load i16, i16* %ptr1, align 1
%add = add i16 %val0, %val1		%add = add i16 %val0, %val1
store i16 %add, i16 addrspace(1)* %out		store i16 %add, i16 addrspace(1)* %out
ret void		ret void
}		}

; Although the offset is unknown here, we know it is a multiple of the element size.		; Although the offset is unknown here, we know it is a multiple of the element size.
; ALL-LABEL: @load_unknown_offset_align1_i32(		; ALL-LABEL: @load_unknown_offset_align1_i32(
; UNALIGNED: alloca [128 x i32], align 1		; ALL: alloca [128 x i32], align 4
; UNALIGNED: load <2 x i32>, <2 x i32>* %{{[0-9]+}}, align 1{{$}}		; ALL: load <2 x i32>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}

; ALIGNED: alloca [128 x i32], align 4
; ALIGNED: load <2 x i32>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}
define void @load_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {		define void @load_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {
%alloca = alloca [128 x i32], align 1		%alloca = alloca [128 x i32], align 1
%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset		%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset
%val0 = load i32, i32* %ptr0, align 1		%val0 = load i32, i32* %ptr0, align 1
%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1		%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1
%val1 = load i32, i32* %ptr1, align 1		%val1 = load i32, i32* %ptr1, align 1
%add = add i32 %val0, %val1		%add = add i32 %val0, %val1
store i32 %add, i32 addrspace(1)* %out		store i32 %add, i32 addrspace(1)* %out
ret void		ret void
}		}

; FIXME: Should always increase alignment of the load
; Make sure alloca alignment isn't decreased		; Make sure alloca alignment isn't decreased
; ALL-LABEL: @load_alloca16_unknown_offset_align1_i32(		; ALL-LABEL: @load_alloca16_unknown_offset_align1_i32(
; ALL: alloca [128 x i32], align 16		; ALL: alloca [128 x i32], align 16
		; ALL: load <2 x i32>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}
; UNALIGNED: load <2 x i32>, <2 x i32>* %{{[0-9]+}}, align 1{{$}}
; ALIGNED: load <2 x i32>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}
define void @load_alloca16_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {		define void @load_alloca16_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {
%alloca = alloca [128 x i32], align 16		%alloca = alloca [128 x i32], align 16
%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset		%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset
%val0 = load i32, i32* %ptr0, align 1		%val0 = load i32, i32* %ptr0, align 1
%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1		%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1
%val1 = load i32, i32* %ptr1, align 1		%val1 = load i32, i32* %ptr1, align 1
%add = add i32 %val0, %val1		%add = add i32 %val0, %val1
store i32 %add, i32 addrspace(1)* %out		store i32 %add, i32 addrspace(1)* %out
Show All 27 Lines	define void @store_unknown_offset_align1_i16(i16 addrspace(1)* noalias %out, i32 %offset) #0 {
store i16 9, i16* %ptr0, align 1		store i16 9, i16* %ptr0, align 1
%ptr1 = getelementptr inbounds i16, i16* %ptr0, i32 1		%ptr1 = getelementptr inbounds i16, i16* %ptr0, i32 1
store i16 10, i16* %ptr1, align 1		store i16 10, i16* %ptr1, align 1
ret void		ret void
}		}

; Although the offset is unknown here, we know it is a multiple of the element size.		; Although the offset is unknown here, we know it is a multiple of the element size.
; ALL-LABEL: @store_unknown_offset_align1_i32(		; ALL-LABEL: @store_unknown_offset_align1_i32(
; UNALIGNED: alloca [128 x i32], align 1		; ALL: alloca [128 x i32], align 4
; UNALIGNED: store <2 x i32> <i32 9, i32 10>, <2 x i32>* %{{[0-9]+}}, align 1{{$}}		; ALL: store <2 x i32> <i32 9, i32 10>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}

; ALIGNED: alloca [128 x i32], align 4
; ALIGNED: store <2 x i32> <i32 9, i32 10>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}
define void @store_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {		define void @store_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {
%alloca = alloca [128 x i32], align 1		%alloca = alloca [128 x i32], align 1
%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset		%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset
store i32 9, i32* %ptr0, align 1		store i32 9, i32* %ptr0, align 1
%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1		%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1
store i32 10, i32* %ptr1, align 1		store i32 10, i32* %ptr1, align 1
ret void		ret void
}		}

attributes #0 = { nounwind }		; Make sure the alignment of the alloca isn't decreased
		; ALL-LABEL: @store_alloca16_unknown_offset_align1_i32(
		; ALL: alloca [128 x i32], align 16
		; ALL: store <2 x i32> <i32 9, i32 10>, <2 x i32>* %{{[0-9]+}}, align 4{{$}}
		define void @store_alloca16_unknown_offset_align1_i32(i32 addrspace(1)* noalias %out, i32 %offset) #0 {
		%alloca = alloca [128 x i32], align 16
		%ptr0 = getelementptr inbounds [128 x i32], [128 x i32]* %alloca, i32 0, i32 %offset
		store i32 9, i32* %ptr0, align 1
		%ptr1 = getelementptr inbounds i32, i32* %ptr0, i32 1
		store i32 10, i32* %ptr1, align 1
		ret void
		}

		attributes #0 = { nounwind }