This is an archive of the discontinued LLVM Phabricator instance.

Add tests for atomic bittest with register/memory operands
AbandonedPublic

Authored by goldstein.w.n on Dec 23 2022, 5:00 PM.

Details

Summary

Transform AtomicRMW logic operations to BT{R|C|S} if only changing/testing a single bit.

This is essentially expanding on the optimizations added on: D120199
but applies the optimization to cases where the bit being changed /
tested is not am IMM but is a provable power of 2.

The only case currently added for cases like:
atomic_fetch_xor(p, 1 << c, ATOMIC_RELAXED) & (1 << c)

Which instead of using a cmpxchg loop can be done with btcl; setcc; shl.

There are still a variety of missed cases that could/should be
addressed in the future. This commit documents many of those
cases with Todos.

  1. This is a combination of 2 commits.

Diff Detail

Event Timeline

goldstein.w.n created this revision.Dec 23 2022, 5:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2022, 5:00 PM
goldstein.w.n requested review of this revision.Dec 23 2022, 5:00 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 23 2022, 5:00 PM

The test is so large! Are we over testing for this feature?
Beside, split the test to another patch and only lieve diff here.

llvm/test/CodeGen/X86/atomic-rm-bit-test.ll
2–6

Does having bmi/bmi2 affect the selection to bts?

13

Use nounwind to suppress the cfi* directives.

30

Why still generating cmpxchg?

The test is so large! Are we over testing for this feature?

Up to you but they are all distinct cases.

It tests:
[uint16_t, uint32_t, uint64_t] X [Constant, non-Constant] X [Return Value, Return SetZ(Value), Return SetNZ(Value), BrZ(Value), BrNZ(Value)] X [4-7 distinct cases changing how the powers of 2 are generated]

So it turns into ALOT of tests. If you want to drop some I would guess SetZ/SetNZ and BrZ/BrNZ are the
most redundant. Also the Constant tests are a bit unrelated although there are gaps in the current
impl that I hope to address in future patches so was trying to make this an all inclusive test
that could be incrementally filled in.

I think all but the shl1_neq_* tests should be able to lower to bt{r|c|s} and all the
other cmpxchg cases are because of various issues with the impl.

Beside, split the test to another patch and only lieve diff here.

Its in two commits. You prefer an entirely seperate patch?

The test is so large! Are we over testing for this feature?
Beside, split the test to another patch and only lieve diff here.

llvm/test/CodeGen/X86/atomic-rm-bit-test.ll
2–6

It may affect the _*val tests which return the original AtomicRMW value as is. I.e return __atomic_fetch_xor(p, 1 << c, __ATOMIC_RELAXED) & (1 << c) b.c that gets a shlx vs shl in codegen. Can drop if you think its unnecessary.

13

Will do for V2 (going to wait for reply on some other comments first as it seems the tests might go to another PR / some of them may be dropped).

30

So there are a lot of missing cases in AtomicRMW ->
bt{c|r|s}. (The code has comments for many of the cases).

The IR for this is Trunc(Shl(1, shift_cnt)). This is the natural
codegen for C-code doing 1 << c b.c its UB for c >= 16 but at the
point where we are checking if we can use atomic bittest I wasn't sure
we could actually be sure that it couldn't be an intentional
side-affect for Trunc(Shl(1, shift_cnt)) to be zero for shift_cnt>= 16 (if say this IR was from some language where shifting by >=
type width was defined as returning zero).

This can be changed fairly easily by modify FindSingleBitChange if
the reasoning above is incorrect. Although maybe in another patch?
There are alot of missing cases for all types (alot get messed up by
InstrCombinePass).

goldstein.w.n marked an inline comment as not done.Dec 23 2022, 11:34 PM
goldstein.w.n added inline comments.
llvm/test/CodeGen/X86/atomic-rm-bit-test.ll
30

So there are a lot of missing cases in AtomicRMW ->
bt{c|r|s}. (The code has comments for many of the cases).

The IR for this is Trunc(Shl(1, shift_cnt)). This is the natural
codegen for C-code doing 1 << c b.c its UB for c >= 16 but at the
point where we are checking if we can use atomic bittest I wasn't sure
we could actually be sure that it couldn't be an intentional
side-affect for Trunc(Shl(1, shift_cnt)) to be zero for shift_cnt>= 16 (if say this IR was from some language where shifting by >=
type width was defined as returning zero).

This can be changed fairly easily by modify FindSingleBitChange if
the reasoning above is incorrect. Although maybe in another patch?
There are alot of missing cases for all types (alot get messed up by
InstrCombinePass).

30

Sorry realized I accidentally submitted this incomplete.

I'm planning some future patches to handle more of the many missing
bt{c|r|s} cases. Handling this cmpxchg case (assuming it can be
done correctly and the Trunc isn't a dealbreaker) will involved
adding more cases to FindSingleBitChange and
emitBitTestAtomicRMWIntrinsic. My thinking submitting this patch
with out the baseline 1 << (c & SHIFT_MASK) is to get some feedback
on the framework before increasing the complexity with a bunch of
additional cases.

goldstein.w.n marked an inline comment as not done.Jan 2 2023, 1:22 AM

ping.

Note, this review is essentially dead and has been superseded by:
https://reviews.llvm.org/D140938
and
https://reviews.llvm.org/D140939

goldstein.w.n abandoned this revision.Jan 12 2023, 11:35 AM