Page MenuHomePhabricator

[AMDGPU] Shrink scalar AND, OR, XOR instructions
AcceptedPublic

Authored by grahamsellers on Thu, Dec 6, 11:36 AM.

Details

Reviewers
arsenm
nhaehnle
Summary

This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.

It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x

In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).

Diff Detail

Event Timeline

grahamsellers created this revision.Thu, Dec 6, 11:36 AM
arsenm added inline comments.Thu, Dec 6, 11:49 AM
lib/Target/AMDGPU/SIShrinkInstructions.cpp
515

This loop is starting to get big, can you move this to a function?

529–532

Is there any real reason you need to handle this? Constants are canonicalized to the RHS (we just undo this for VALU instructions because that's the only way to shrink)

arsenm added inline comments.Thu, Dec 6, 12:18 PM
test/CodeGen/AMDGPU/andorbitset.ll
2–7

The check labels are confusing. You should stop using FUNC here at all, and just use GCN. These are also both SI run lines, so I don't see why that's separate

16

Why only SI?

grahamsellers marked an inline comment as done.

Addressing review comments.

grahamsellers marked 4 inline comments as done.Thu, Dec 6, 1:22 PM
grahamsellers added inline comments.
test/CodeGen/AMDGPU/andorbitset.ll
16

I'm not sure what's going on but if you supply only -march=amdgcn and no -mcpu, then it appears the register allocator doesn't end up allocating the source and destination of the s_or_b32 instruction to the same register even though we use setRegAllocationHint. That part of the code was modeled on a similar chunk for S_MULK_I32 earlier in the function. Interestingly, the s_mulk.ll test doesn't have a RUN line which doesn't specify the CPU. I'll do the same.

grahamsellers marked an inline comment as done.Thu, Dec 6, 1:22 PM
arsenm added inline comments.Thu, Dec 6, 6:47 PM
test/CodeGen/AMDGPU/andorbitset.ll
16

You should generally just pick an explicit target. The default cpu is something approximating Tahiti but isn't quite the same

arsenm accepted this revision.Thu, Dec 6, 6:54 PM

LGTM

This revision is now accepted and ready to land.Thu, Dec 6, 6:54 PM