This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] Shrink scalar AND, OR, XOR instructions
ClosedPublic

Authored by grahamsellers on Dec 6 2018, 11:36 AM.

Details

Reviewers
arsenm
nhaehnle
Summary

This change attempts to shrink scalar AND, OR and XOR instructions which take an immediate that isn't inlineable.

It performs:
AND s0, s0, ~(1 << n) -> BITSET0 s0, n
OR s0, s0, (1 << n) -> BITSET1 s0, n
AND s0, s1, x -> ANDN2 s0, s1, ~x
OR s0, s1, x -> ORN2 s0, s1, ~x
XOR s0, s1, x -> XNOR s0, s1, ~x

In particular, this catches setting and clearing the sign bit for fabs (and x, 0x7ffffffff -> bitset0 x, 31 and or x, 0x80000000 -> bitset1 x, 31).

Diff Detail

Event Timeline

grahamsellers created this revision.Dec 6 2018, 11:36 AM
arsenm added inline comments.Dec 6 2018, 11:49 AM
lib/Target/AMDGPU/SIShrinkInstructions.cpp
515

This loop is starting to get big, can you move this to a function?

529–532

Is there any real reason you need to handle this? Constants are canonicalized to the RHS (we just undo this for VALU instructions because that's the only way to shrink)

arsenm added inline comments.Dec 6 2018, 12:18 PM
test/CodeGen/AMDGPU/andorbitset.ll
2–7

The check labels are confusing. You should stop using FUNC here at all, and just use GCN. These are also both SI run lines, so I don't see why that's separate

16

Why only SI?

grahamsellers marked an inline comment as done.

Addressing review comments.

grahamsellers marked 4 inline comments as done.Dec 6 2018, 1:22 PM
grahamsellers added inline comments.
test/CodeGen/AMDGPU/andorbitset.ll
16

I'm not sure what's going on but if you supply only -march=amdgcn and no -mcpu, then it appears the register allocator doesn't end up allocating the source and destination of the s_or_b32 instruction to the same register even though we use setRegAllocationHint. That part of the code was modeled on a similar chunk for S_MULK_I32 earlier in the function. Interestingly, the s_mulk.ll test doesn't have a RUN line which doesn't specify the CPU. I'll do the same.

grahamsellers marked an inline comment as done.Dec 6 2018, 1:22 PM
arsenm added inline comments.Dec 6 2018, 6:47 PM
test/CodeGen/AMDGPU/andorbitset.ll
16

You should generally just pick an explicit target. The default cpu is something approximating Tahiti but isn't quite the same

arsenm accepted this revision.Dec 6 2018, 6:54 PM

LGTM

This revision is now accepted and ready to land.Dec 6 2018, 6:54 PM

Do you need someone to commit this?

foad added a subscriber: foad.Mar 31 2020, 1:45 AM
foad added inline comments.
lib/Target/AMDGPU/SIShrinkInstructions.cpp
215–218

Is it worth trying to do the converse too:
ANDN2 -> AND
ORN2 -> OR
XNOR -> XOR
Or would those cases never appear in the first place?

231–232

Return early if !SrcImm->isImm() || AMDGPU::isInlinableLiteral32(SrcImm->getImm(), ST.hasInv2PiInlineImm()).

261–262

What does this do? I can't see how SrcImm == Src0 would ever be true here.

arsenm closed this revision.Apr 7 2020, 10:21 AM

This looks like it was already committed way back in b297379ef07829ac7f06c0e2058a889366c46a82