This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Add codegen support for cmpxchg on LA64
ClosedPublic

Authored by gonglingqin on Oct 14 2022, 12:53 AM.

Diff Detail

Event Timeline

gonglingqin created this revision.Oct 14 2022, 12:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 12:53 AM
gonglingqin requested review of this revision.Oct 14 2022, 12:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2022, 12:53 AM
hev added a subscriber: hev.Oct 14 2022, 1:16 AM
hev added inline comments.
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
21

Not needed.

ll: full-membar + load-exclusive

26–28

I think we should reduce the number of instructions between ll and sc to make ll/sc complete as fast as possible.

for refer: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/arch/loongarch/include/asm/cmpxchg.h?h=next-20221014#n114

xry111 added inline comments.Oct 14 2022, 2:00 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
21

Jiaxun told me (via linux-mips):

I had checked with Loongson guys and they confirmed that the workaround still needs to be applied to latest 3A4000
processors, including 3A4000 for MIPS and 3A5000 for LoongArch. Though, the reason behind the workaround varies with the evaluation
of their uArch, for GS464V based core, barrier is required as the uArch design allows regular load to be reordered after an atomic linked
load, and that would break assumption of compiler atomic constraints.

In GCC we use dbar 0x700, so in the future HW engineers can fix this issue and make dbar 0x700 no-op.

xry111 added inline comments.Oct 14 2022, 2:08 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
21

Ouch, I mean the dbar 1792 instruction at LBB0_3. Yes this one can be removed for 3A5000.

But what should we do if LLDBAR bit is 0 in CPUCFG?

hev added inline comments.Oct 14 2022, 2:34 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
21

Nice question.

What case we need memory barrier before atomic-op? Atomic-op with store-release semantics? Why not make sc as membar + store-conditional?

If memory barrier semantics of ll are still membar + load-exclusive in future, and the atomic-op with load-acquire semantics, I think we can't make dbar 0x700 as no-op.

gonglingqin added inline comments.Oct 18 2022, 5:01 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
21

@hev, @xry111, Thank you for your input, I will remove this dbar. In addition, thanks to @hev's suggestion, dbar is added before sc to ensure correctness when LLDBAR bit is 0.

26–28

Thanks, I will modify it.

Address @hev and @xry111's comments.

SixWeining accepted this revision.Oct 23 2022, 7:20 PM

LGTM. Maybe this is suboptimal to cover all current and future LoongArch models, but let's move on to finish basic functionalities and continue to polish the current implementation.

Let's wait a couple of days to see if others have more inputs.

This revision is now accepted and ready to land.Oct 23 2022, 7:20 PM
xen0n accepted this revision.Oct 26 2022, 3:12 AM

In general this looks good, as we're still kinda experimental, it's probably better to land so we get wider test coverage, e.g. currently this is blocking bootstrap of LLVM/Clang.

xen0n added a comment.Oct 26 2022, 3:26 AM

Wait. I didn't see any processing wrt the return value of cmpxchg. Would you mind adding some test cases where the %res is not optimized out, so we can get to verify the return value behavior is in line with expectations?

Wait. I didn't see any processing wrt the return value of cmpxchg. Would you mind adding some test cases where the %res is not optimized out, so we can get to verify the return value behavior is in line with expectations?

Thanks for your advice, I will add test cases.

Add test cases.

xen0n added inline comments.Oct 26 2022, 11:51 PM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
202

Okay so my suspicion is confirmed, and the implementation is wrong... the [[ https://llvm.org/docs/LangRef.html#cmpxchg-instruction | result of a strong cmpxchg ]] should be 1 if $a3 == $a1, and 0 otherwise. It shouldn't be the original value.

Please check if I'm misunderstanding anything...

gonglingqin added inline comments.Oct 26 2022, 11:59 PM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
202

The original value at the location is returned, together with a flag indicating success (true) or failure (false).

The return values of cmpxchg include {original value, i1}, which is checked in this test case for original value.

%res = extractvalue { i32, i1 } %tmp, 0
ret i32 %res
xen0n added inline comments.Oct 27 2022, 12:40 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
202

Ah I see. It's compound. Then how about some more tests checking the i1 part too?

gonglingqin added inline comments.Oct 27 2022, 1:20 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
202

Sure, I'll add test cases.

Address @xen0n's comments.

xen0n added inline comments.Oct 27 2022, 3:59 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
327

This seems wrong: the following xor insn responsible for the same value -> 0 part operates on full GRLen, but here a1 i.e. %cmp gets zero-extended to a0, yet a3 i.e. the value from memory is loaded with ll.w that sign-extends. I expect this will fail for cases where cmp == *ptr && cmp < 0.

Either this should be a sign-extension, or the comparison shouldn't be xor.

gonglingqin added inline comments.Oct 27 2022, 4:56 AM
llvm/test/CodeGen/LoongArch/ir-instruction/atomic-cmpxchg.ll
327

Thanks for the fact checking! I will fix it.

Address @xen0n's comments.

xen0n accepted this revision.Oct 27 2022, 5:06 AM

Thanks!

This revision was landed with ongoing or failed builds.Oct 27 2022, 6:22 AM
This revision was automatically updated to reflect the committed changes.