This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format
ClosedPublic

Authored by nhaehnle on Mar 16 2016, 1:04 PM.

Details

Summary

We cannot easily deduce that an offset is in a SGPRs, but the Mesa frontend
cannot easily make use of an explicit soffset parameter. Furthermore, it is
likely that in the future, LLVM will be in a better position that the frontend
to choose an SGPR offset if possible.

Since there aren't any frontend uses of these intrinsics in upstream
repositories yet, I would like to take this opportunity to change the
intrinsic signatures to a single offset parameter, which is then selected
to immediate offsets or voffsets using a ComplexPattern.

Diff Detail

Event Timeline

nhaehnle updated this revision to Diff 50847.Mar 16 2016, 1:04 PM
nhaehnle retitled this revision from to AMDGPU: use ComplexPattern for offsets in llvm.amdgcn.buffer.load/store.format.
nhaehnle updated this object.
nhaehnle added a subscriber: llvm-commits.
lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1128–1134

If an offset is too big for the ImmOffset field we should split the offset and store the part in ImmOffset and part in soffset.

If the part in soffset is less than 64, then you save a move instruction since you can encode inline immediates up to 64 in the soffset field, and for cases when the part in soffset is greater than 64 up to 2^16, you can use an s_movk instruction to copy the immediate into register, saving 4 bytes of code size.

nhaehnle updated this revision to Diff 50941.Mar 17 2016, 8:50 AM

Use isSchedulingBoundary instead of implicit-use of EXEC, which gets rid of
the target-independent modifications.

This is indeed more conservative, as you can tell from the change in
si-scheduler.ll: previously, the later scheduling passes managed to move the
initial s_wqm_b64 after the s_load_dwordx4 & x8, i.e. we lose slightly in
latency hiding.

The impact should be small, and it does make sense to land a more conservative
and robust patch initially.

nhaehnle updated this revision to Diff 50942.Mar 17 2016, 9:15 AM

Refine the constant handling for some boundary cases. Note that the S_MOV_B32
automatically becomes an s_movk_i32 when possible - I've also added a test
case for that.

lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp
1136–1142

When the immediate is greater than 4095 + 64, if you put the overflow in ImmOffset, then when you have sequential loads they can all read the non-overflow part from the same SGPR, which would save instructions.

nhaehnle updated this revision to Diff 50981.Mar 17 2016, 3:26 PM

I like that idea. Here's an updated patch, including a test to verify that
the soffset constant really is re-used.

tstellarAMD accepted this revision.Mar 18 2016, 8:28 AM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Mar 18 2016, 8:28 AM
This revision was automatically updated to reflect the committed changes.