Page MenuHomePhabricator

[AMDGPU] Adjusted alignment-check for local address space;

Authored by FarhanaAleen on Mar 2 2018, 2:25 PM.



LDS is allocated to 64-dword alignment. In order to prove a memory address being 8 byte aligned it is sufficient to check that the offset if multiple of 8.

This allows generating ds_read/write_b64 instead of ds_read/write2_b32.

Diff Detail

Event Timeline

FarhanaAleen created this revision.Mar 2 2018, 2:25 PM
rampitec added inline comments.Mar 2 2018, 3:22 PM

It is not immediately clear to me if the base pointer is always an LDS allocated symbol. I guess it can also be a GEP, right?

Let's consider the situation:

sym is a 64-dword aligned LDS allocation.
gep1 = &sym[1] -> it is only 4 bytes aligned.
gep2 = &gep1[8] -> it is still 4 bytes aligned.

I suppose for the gep2 your logic will incorrectly report 8 byte alignment.

I think it is simpler than that. If a local symbol must be 64 dword aligned, it should be declared as a such and not 4 byte aligned as we have.
Such logic shall not be needed at the first place, llvm should be able to deduce proper alignment given proper input.

Although I am not really sure this is true it is always 64 dword aligned. Consider:

local int x;
local int y;

Do you mean this allocation would take 128 dwords? I highly doubt.

I suppose only the first symbol is 64 dword aligned, and everything after is just naturally aligned wrt element type size. So a logic to leverage actual allocation alignment can be useful only after all LDS is allocated and allocation is flattened into a single LDS memory array.

arsenm requested changes to this revision.Mar 2 2018, 5:48 PM

I don't understand this. You should only be using the node's alignment here. If there's a way to infer a higher alignment for something, that should be an optimization much earlier than selection.


No reason to use an external function call here

This revision now requires changes to proceed.Mar 2 2018, 5:48 PM
arsenm added inline comments.Mar 2 2018, 5:54 PM

Also if you're referring to the allocation granularity for the entire program, that doesn't reflect the individual symbols allocated. We certainly don't allocate the individual globals with at least 8 byte alignment

Thank you guys. My assumption was wrong, I was thinking that each allocation gets 64-dword alignment.

FarhanaAleen abandoned this revision.Mar 2 2018, 6:35 PM

You can leverage higher alignment after you have actual allocation map.