Page MenuHomePhabricator

[CodeGen] Ensure callers of CreateStackTemporary use sensible alignments

Authored by david-arm on May 21 2020, 3:00 AM.



In two instances of CreateStackTemporary we are sometimes promoting
alignments beyond the stack alignment. I have introduced a new function
called getReducedAlign that will return the alignment for the broken
down parts of illegal vector types. For example, on NEON a <32 x i8>
type is made up of two <16 x i8> types - in this case the sensible
alignment is 16 bytes, not 32.

In the legalization code wherever we create stack temporaries I have
started using the reduced alignments instead for illegal vector types.

I added a test to


that tries to insert an element into an illegal fixed vector type
that involves creating a temporary stack object.

Diff Detail

Event Timeline

david-arm created this revision.May 21 2020, 3:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 21 2020, 3:00 AM
david-arm updated this revision to Diff 265488.May 21 2020, 6:12 AM
arsenm added a subscriber: arsenm.May 21 2020, 6:55 AM

See also D29810 and D28920. For AMDGPU there's only pain by using a stack object alignment > 4, so completely ignoring the ABI alignment for stack temporaries is a good option. The stack alignment is chosen to be higher to accommodate common types needing a higher ABI alignment


I don't understand the new check. This should be either a register with a materialized register, or a constant. Did the total stack usage here increase?

david-arm marked an inline comment as done.May 22 2020, 2:15 AM
david-arm added inline comments.

I think the stack usage has dropped and the offsets were originally 0x200 and 0x400, but are now 0x80 and 0x280. The problem I found was that the instruction used seemed to differ for each RUN line. That's why my GCN-DAG lines look so horrible! I wasn't sure how best to fix the test, since it seems to be using the stack offsets as a convenient way to test instruction selection.

I'm not sure we can really do this, at least not the way it's written.

The behavior of CreateStackTemporary(TypeSize Bytes, Align Alignment) is completely unambiguous; the caller has to decide the size and alignment, and generate other code consistent with that. For the other overloads, the alignment is computed implicitly. The callers then make some assumption about the alignment. If we change the alignment computation, we also have to change the code that uses it. I'm uncomfortable with assuming that a lower alignment is sufficient for all callers.

I recommend changing the six callers in type legalization (LegalizeVectorTypes.cpp, LegalizeTypesGeneric.cpp, LegalizeTypes.cpp, LegalizeIntegerTypes.cpp) to explicitly pass in the right size/alignment. That should be enough to solve the issue you care about without unpredictable side-effects on other code.

david-arm updated this revision to Diff 268110.Jun 3 2020, 2:45 AM
david-arm retitled this revision from [CodeGen] Make CreateStackTemporary choose better alignments to [CodeGen] Ensure callers of CreateStackTemporary use sensible alignments.
david-arm edited the summary of this revision. (Show Details)

Hi @efriedma, I hope I've addressed your previous comment! I've avoided changing CreateStackTemporary for now and tried to fix up the callers as you suggested.

efriedma added inline comments.Jun 4 2020, 9:35 AM

You changed the alignment of the stack temporary... but you still didn't change the alignment of the load or the store to that stack slot, so they're still wrong. (Same applies to other changes.)

david-arm marked an inline comment as done.Jun 4 2020, 11:53 PM
david-arm added inline comments.

OK fair enough, but previously the code also didn't specify the same alignment chosen by CreateStackTemporary either. So I didn't realise I had to explicitly add the alignment to the loads/stores. My new NEON test now passes with an alignment of 16 bytes, whereas previously it was 32. Since it just worked I assumed that the StackPtr itself contained the alignment, but I can certainly take a look. I don't mind adding the alignments explicitly to the loads and stores at all if that's needed.

david-arm updated this revision to Diff 268718.Jun 5 2020, 2:35 AM
david-arm marked an inline comment as done.

I changed the loads/stores to use the alignment explicitly as it didn't change the behaviour in any way and it's clearer that way.

efriedma added inline comments.Jun 5 2020, 2:21 PM

When getStore() etc. is called without an explicit alignment, the alignment is inferred based on VT of the stored value, not the pointer.

If CreateStackTemporary() also computes the alignment based on the VT, it's fine: the resulting alignment is consistent. But if CreateStackTemporary() does something different, then it's inconsistent, and we might miscompile.

We're gradually moving towards deprecating the getStore() variants that don't explicitly specify the alignment because this is confusing.


It would be a good idea to explicitly specify the alignment to the getTruncStore, too. Probably it will be naturally aligned anyway, but better to be safe. Something like commonAlignment(SmallestAlign, EltVT.getSizeInBits() / 8).


See above comment about getTruncStore.

david-arm updated this revision to Diff 269114.Jun 8 2020, 12:47 AM
efriedma accepted this revision.Jun 8 2020, 2:11 PM


This revision is now accepted and ready to land.Jun 8 2020, 2:11 PM
This revision was automatically updated to reflect the committed changes.
c-rhodes added inline comments.

NumRegs is producing an unused variable warning