This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Adjusted alignment-check for local address space;
AbandonedPublic

Authored by FarhanaAleen on Mar 2 2018, 2:25 PM.

Details

Reviewers
rampitec
arsenm
Summary

LDS is allocated to 64-dword alignment. In order to prove a memory address being 8 byte aligned it is sufficient to check that the offset if multiple of 8.

This allows generating ds_read/write_b64 instead of ds_read/write2_b32.

Diff Detail

Event Timeline

FarhanaAleen created this revision.Mar 2 2018, 2:25 PM
rampitec added inline comments.Mar 2 2018, 3:22 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
667

It is not immediately clear to me if the base pointer is always an LDS allocated symbol. I guess it can also be a GEP, right?

Let's consider the situation:

sym is a 64-dword aligned LDS allocation.
gep1 = &sym[1] -> it is only 4 bytes aligned.
gep2 = &gep1[8] -> it is still 4 bytes aligned.

I suppose for the gep2 your logic will incorrectly report 8 byte alignment.

I think it is simpler than that. If a local symbol must be 64 dword aligned, it should be declared as a such and not 4 byte aligned as we have.
Such logic shall not be needed at the first place, llvm should be able to deduce proper alignment given proper input.

Although I am not really sure this is true it is always 64 dword aligned. Consider:

local int x;
local int y;

Do you mean this allocation would take 128 dwords? I highly doubt.

I suppose only the first symbol is 64 dword aligned, and everything after is just naturally aligned wrt element type size. So a logic to leverage actual allocation alignment can be useful only after all LDS is allocated and allocation is flattened into a single LDS memory array.

arsenm requested changes to this revision.Mar 2 2018, 5:48 PM

I don't understand this. You should only be using the node's alignment here. If there's a way to infer a higher alignment for something, that should be an optimization much earlier than selection.

test/CodeGen/AMDGPU/ds_read2.ll
661

No reason to use an external function call here

This revision now requires changes to proceed.Mar 2 2018, 5:48 PM
arsenm added inline comments.Mar 2 2018, 5:54 PM
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
661–663

Also if you're referring to the allocation granularity for the entire program, that doesn't reflect the individual symbols allocated. We certainly don't allocate the individual globals with at least 8 byte alignment

Thank you guys. My assumption was wrong, I was thinking that each allocation gets 64-dword alignment.

FarhanaAleen abandoned this revision.Mar 2 2018, 6:35 PM

You can leverage higher alignment after you have actual allocation map.