LDS is allocated to 64-dword alignment. In order to prove a memory address being 8 byte aligned it is sufficient to check that the offset if multiple of 8.
This allows generating ds_read/write_b64 instead of ds_read/write2_b32.
Paths
| Differential D44045
[AMDGPU] Adjusted alignment-check for local address space; AbandonedPublic Authored by FarhanaAleen on Mar 2 2018, 2:25 PM.
Details
Diff Detail Event TimelineHerald added subscribers: t-tye, tpr, dstuttard and 5 others. · View Herald TranscriptMar 2 2018, 2:25 PM
Comment Actions I think it is simpler than that. If a local symbol must be 64 dword aligned, it should be declared as a such and not 4 byte aligned as we have. Although I am not really sure this is true it is always 64 dword aligned. Consider: local int x; Do you mean this allocation would take 128 dwords? I highly doubt. I suppose only the first symbol is 64 dword aligned, and everything after is just naturally aligned wrt element type size. So a logic to leverage actual allocation alignment can be useful only after all LDS is allocated and allocation is flattened into a single LDS memory array. Comment Actions I don't understand this. You should only be using the node's alignment here. If there's a way to infer a higher alignment for something, that should be an optimization much earlier than selection.
This revision now requires changes to proceed.Mar 2 2018, 5:48 PM
Comment Actions Thank you guys. My assumption was wrong, I was thinking that each allocation gets 64-dword alignment.
Revision Contents
Diff 136851 lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
lib/Target/AMDGPU/AMDGPUInstructions.td
test/CodeGen/AMDGPU/ds_read2.ll
test/CodeGen/AMDGPU/ds_read2_offset_order.ll
test/CodeGen/AMDGPU/ds_read2_superreg.ll
test/CodeGen/AMDGPU/ds_write2.ll
|
Also if you're referring to the allocation granularity for the entire program, that doesn't reflect the individual symbols allocated. We certainly don't allocate the individual globals with at least 8 byte alignment