This is an archive of the discontinued LLVM Phabricator instance.

[BPF] support atomic instructions
ClosedPublic

Authored by yonghong-song on Jan 3 2020, 2:32 PM.

Details

Summary

Implement fetch_<op>/fetch_and_<op>/exchange/compare-and-exchange
instructions for BPF. Specially, the following gcc intrinsics
are implemented.

__sync_fetch_and_add (32, 64)
__sync_fetch_and_sub (32, 64)
__sync_fetch_and_and (32, 64)
__sync_fetch_and_or  (32, 64) 
__sync_fetch_and_xor (32, 64)
__sync_lock_test_and_set (32, 64)
__sync_val_compare_and_swap (32, 64)

For sync_fetch_and_sub, internally, it is implemented as
a negation followed by
sync_fetch_and_add.
For __sync_lock_test_and_set, despite its name, it actually
does an atomic exchange and return the old content.

https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html

For intrinsics like sync_{add,sub}_and_fetch and
sync_bool_compare_and_swap, the compiler is able to generate
codes using sync_fetch_and_{add,sub} and sync_val_compare_and_swap.

Similar to xadd, atomic xadd, xor and xxor (atomic_<op>)
instructions are added for atomic operations which do not
have return values. LLVM will check the return value for
__sync_fetch_and_{add,and,or,xor}.
If the return value is used, instructions atomic_fetch_<op>
will be used. Otherwise, atomic_<op> instructions will be used.

All new instructions only support 64bit and 32bit with alu32 mode.
old xadd instruction still supports 32bit without alu32 mode.

For encoding, please take a look at test atomics_2.ll.

Diff Detail

Event Timeline

yonghong-song created this revision.Jan 3 2020, 2:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 3 2020, 2:32 PM
yonghong-song retitled this revision from [WIP][BPF] support compare-and-exchange instruction to [WIP][BPF] support exchange/compare-and-exchange instruction.
yonghong-song edited the summary of this revision. (Show Details)

implement exchange instruction

yonghong-song edited the summary of this revision. (Show Details)

use the same XADD opcode for new instructions.

yonghong-song edited the summary of this revision. (Show Details)Oct 28 2020, 11:52 PM
yonghong-song edited the summary of this revision. (Show Details)

changed encoding based on internal discussion on encoding
add a test

Herald added a project: Restricted Project. · View Herald TranscriptNov 2 2020, 11:47 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
jackmanb added inline comments.
llvm/lib/Target/BPF/BPFInstrFormats.td
98

Per Alexei's email comments let's call this BPF_FETCH?

jackmanb added inline comments.Nov 3 2020, 4:11 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
889

I don't know if we want to define these for 16 and 8bit operands - XADD isn't defined for those.

yonghong-song added inline comments.Nov 3 2020, 7:16 AM
llvm/lib/Target/BPF/BPFInstrFormats.td
98

I can do this. Note naming in llvm may not be exactly the same as in kernel, but agree that we should keep it close.

llvm/lib/Target/BPF/BPFInstrInfo.td
889

One of our original use cases is to do atomic operations with kernel data structures. For example, if the kernel may do xchg with char, short, it would be good for bpf to support it as well e.g., for bpf-cc (bpf congestion control), both bpf and kernel may do atomic operations on the same data structure.

I may add byte/half support to others (fetch_and_add, fetch_and_sub) as well if you reach consensus here.

ast added inline comments.Nov 3 2020, 9:39 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
839

i think -mcpu=v4 should include alu32.
Otherwise the test matrix will keep increasing. It's already time consuming to test v1,v2,v3.
If v4 would mean with and without alu32 that would stay this way for long time and any further extensions
would be doubling the test matrix further.

954

let Defs = [R0], Uses = [R0]
and BPFISelLowering would need to do getCopyToReg+cmpxchg+getCopyFromReg similar to X86ISelLowering ?

yonghong-song retitled this revision from [WIP][BPF] support exchange/compare-and-exchange instruction to [BPF] support atomic instructions.
yonghong-song edited the summary of this revision. (Show Details)
  • remove RFC tag.
  • addressed comments for encoding, etc.
yonghong-song edited the summary of this revision. (Show Details)Nov 3 2020, 10:39 AM
yonghong-song marked an inline comment as done.Nov 3 2020, 10:45 AM
yonghong-song added inline comments.
llvm/lib/Target/BPF/BPFInstrInfo.td
954

Yes, Just with Uses = [R0] or Uses = [W0] and change patterns can make it work.

ast added inline comments.Nov 3 2020, 1:56 PM
llvm/test/CodeGen/BPF/atomics_2.ll
125

Looks like it's generating correct code without special handling in ISelLowering. Nice. I wonder why x86 had to use it. Because of eflags, I guess?

yonghong-song added inline comments.Nov 3 2020, 2:37 PM
llvm/test/CodeGen/BPF/atomics_2.ll
125

which specific x86 codes you are referring to?

Sorry I was disrupted and not able to work on this last week! I've just got started trying to integrate this with my kernel patches.

llvm/lib/Target/BPF/BPFInstrInfo.td
790

FYI - I just spotted some stray \t in here (is it helpful to point this out? If not let me know, I will ignore in future)

808

Sorry I'm a beginner with the LLVM code, could you explain what val does? I didn't notice this when I looked through here before.

To try and get a clue I tried just removing this line and then compiling the following code:

C
// SPDX-License-Identifier: GPL-2.0
#include <stdatomic.h>

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

__u64 test_data_64 = 0;
__u64 test1_result = 0;

SEC("fentry/bpf_fentry_test1")
int BPF_PROG(test1, int a)
{
        /* atomic_fetch_add(&test_data_64, 1); */
        test1_result = __sync_fetch_and_add(&test_data_64, 1);
        return 0;
}

And I was able to load and run the program, with the kernel on my WIP branch: https://github.com/bjackman/linux-bpf/tree/wips/bpf-atomics-v0

The result looks like this:

shell
$ llvm-objdump -d atomics_test.o

atomics_test.o: file format elf64-bpf


Disassembly of section fentry/bpf_fentry_test1:

0000000000000000 <test1>:
       0:       b7 01 00 00 01 00 00 00 r1 = 1
       1:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
       3:       db 12 00 00 01 00 00 00 r1 = atomic_fetch_add((u64 *)(r2 + 0), PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-objdump -d atomics_test.o 
Segmentation fault

Aside from the fact that llvm-objdump crashed, the encoding db 12 00 00 01 00 00 00 seems correct to me. If I add the let Inst{11-8} = val back in I get db 12 00 00 01 01 00 00 which I don't understand.

jackmanb added inline comments.Nov 16 2020, 9:38 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
808

Ah wait, I guess this is adding a 3rd operand register? In my example it's unclear because it is R1 which is also the dst operand. I was envisaging we just have semantics like src = atomic_fetch_add(dst+off, src) but you are instead proposing dst = atomic_fetch_add(dst+off, val)?

jackmanb added inline comments.Nov 16 2020, 9:39 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
808

Sorry I mean dst = atomic_fetch_add(src+off, val)

yonghong-song added inline comments.Nov 16 2020, 9:49 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
790

\t is not allowed. I will run clang-format next time to catch such violations. Thanks for letting me know.

808

Sorry I'm a beginner with the LLVM code, could you explain what val does? I didn't notice this when I looked through here before.

For the below statement:

test1_result = __sync_fetch_and_add(&test_data_64, 1);

'val' represents the register which holds the value '1'.

bit 4-7 is also used in compare-and-exchange insn as you need a memory location, in-register old/new values.

To try and get a clue I tried just removing this line and then compiling the following code:

C
// SPDX-License-Identifier: GPL-2.0
#include <stdatomic.h>

#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

__u64 test_data_64 = 0;
__u64 test1_result = 0;

SEC("fentry/bpf_fentry_test1")
int BPF_PROG(test1, int a)
{
        /* atomic_fetch_add(&test_data_64, 1); */
        test1_result = __sync_fetch_and_add(&test_data_64, 1);
        return 0;
}

And I was able to load and run the program, with the kernel on my WIP branch: https://github.com/bjackman/linux-bpf/tree/wips/bpf-atomics-v0

The result looks like this:

shell
$ llvm-objdump -d atomics_test.o

atomics_test.o: file format elf64-bpf


Disassembly of section fentry/bpf_fentry_test1:

0000000000000000 <test1>:
       0:       b7 01 00 00 01 00 00 00 r1 = 1
       1:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r2 = 0 ll
       3:       db 12 00 00 01 00 00 00 r1 = atomic_fetch_add((u64 *)(r2 + 0), PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llvm-objdump -d atomics_test.o 
Segmentation fault

the crash may come from that the 'val' is not encoded properly. I will double check.

Aside from the fact that llvm-objdump crashed, the encoding db 12 00 00 01 00 00 00 seems correct to me. If I add the let Inst{11-8} = val back in I get db 12 00 00 01 01 00 00 which I don't understand.

in this particular case, yes, as final asm code looks like

r1 = atomic_fetch_add((u64 *)(r2 + 0), r1)

where the value "r1" and the result "r1" shared the same register. A little bit compiler work is need to enforce "val" register and result register always the same.

My current design allows:

r3 = atomic_fetch_add((u64 *)(r2 + 0), r1)

and I think this is more flexible as people may later on still use "r1". But let me know whether you prefer

r1 = atomic_fetch_add((u64 *)(r2 + 0), r1)

always.

jackmanb added inline comments.Nov 17 2020, 3:23 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
808

Got it - this design will introduce some impedance mismatch with x86, since XADD has only two operands (r3 = atomic_fetch_add((u64 *)(r2 + 0), r1) cannot be implemented in a single instruction). It also hard-codes RAX as the third operand for [cmp]xchg so my current kernel implementation hard-codes R0 - that's also what Alexei supported in the separate email thread.

Will you be available to discuss this in the BPF office hours this week? There are a few possible ways forward, we just need to decide which tradeoff is best.

ast added inline comments.Nov 17 2020, 9:27 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
808

+1 to Brendan's point.

(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))
is a 3 op instruction. All bpf insns are 2 op. I think it's incorrect to make such departure from 2 op for this set of insns,
because it moves it further away from x64.
$val reg and $dst reg need to be the same to match 'lock xadd' semantics of x64 insn.
It's not only about reducing complexity of JIT, but to make compiler perform register allocation with actual HW in mind. Otherwise code gen with extra reg will assume that $dst can be different, but in reality it won't and JIT would have to emit extra insns to avoid corrupting $dst. So if there is no constraint on $dst==$val then reg alloc will work against the JIT and against the HW which we lead to redundant code after JIT in almost all cases.

As far as fetch_and_sub() I think it's ok to keep it as a separate BPF insn though there is no x64 xsub insn.
long foo(long *a, long b) {
return __sync_fetch_and_sub(a, b);
}
compiles into

mov     rax, rsi
neg     rax
lock            xadd    qword ptr [rdi], rax
ret

The BPF JIT will emit two x64 insn for single BPF atomic_fetch_sub insn.

BPF atomic_fetch_add insn should map to exactly one x64 'lock xadd' insn in all conditions.

use the same register for dst and val for fetch-add, fetch-sub and xchg instructions.
for fetch-sub, if it is deemed later using different registers for dst/val is preferred, I can make the change then.

ast added a comment.Nov 17 2020, 8:34 PM

looks good. Before landing we need to agree on the full set of instructions that -mcpu=v4 will support.
atomic_fetch_or|xor|and are probably needed as instructions. The kernel JIT will generate x86 cmpxchg for them.
Because if llvm generates bpf cmpxchg insn then we'd need to teach the verifier to recognize infinite loops.

make sense. will add and/or/xor support in the next revision. will also think about how to support hardware-level barrier. Totally agree that we should have adequate atomic op support in cpu=v4.

jackmanb added inline comments.Nov 18 2020, 6:44 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
823

There is another mismatch between what I implemented in the kernel and what you've implemented here, which is that I used dst as the pointer base and src as the writeback register but you did the opposite.

IIUC what we have here at the moment is dst = sync_fetch_and_add(*(src + off), dst);. The advantage is that dst makes more intuitive sense as a writeback register.

If we instead had src = sync_fetch_and_add(*(dst + off), src) we have the unintuitive writeback to src, but now a) the 2nd argument (the addend) makes more intuitive sense, and b) dst being the pointer base is consistent with the rest of BPF_STX.

907

If we go down the route of matching operands with x86 as you have done for XFALU64 and XCHG, I think we should also do it for cmpxchg.

IIUC this is dst = atomic_cmpxchg(*(src + off), r0, new);

But to do it in a single x86 instruction we need to have only 2 operands + the hard-coded r0. r0 = atomic_xmpxchg(*(dst + off), r0, src); would seem most natural to me.

yonghong-song added inline comments.Nov 18 2020, 8:01 AM
llvm/lib/Target/BPF/BPFInstrInfo.td
823

could you double check? my encoding is actually

dst = sync_fetch_and_add(*(src + off), dst);.

The encoding itself is very similar to xadd/stx.

907

We can do this:

r0 = atomic_xmpxchg(*(dst + off), r0, src);
yonghong-song edited the summary of this revision. (Show Details)

add fetch_and_{and,or,xor} support
force cmpxchg insn return results in r0 like r0 = cmpxchg(addr, r0, new)

yonghong-song edited the summary of this revision. (Show Details)

add support for a barrier function. The correspond C code is sync_synchronize().
If we want to have variant like barrier for read/write/rw, etc. we may need to define bpf specific barrier functions as
sync_synchronize() does not allow taking additional parameters. I tentatively use one particular encoding but we can certainly change that later.

ast added inline comments.Nov 18 2020, 2:54 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
907

I'm confused. Isn't that what that is already?
r0 = atomic_cmpxchg(*(dst + off), r0, src);

In "class CMPXCHG":
let Inst{51-48} = addr{19-16}; this is 'dst_reg' in bpf_insn.
let Inst{55-52} = new;
this is 'src_reg' in bpf_insn.

Same as old xadd and new fetch_add insns.
What am I missing?

yonghong-song added inline comments.Nov 18 2020, 7:55 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
907

I just fixed to return R0/W0 in the early afternoon. So you looked at old comments with new code and that is why it is confusion :-)

Can we please keep barriers out of scope? I think there's a lot of design to be done there and I'd rather just get the core atomics working first.

BTW I got

[ 31%] Building LanaiGenDAGISel.inc...                                                                                                       
Included from /usr/local/google/home/jackmanb/src/llvm-project/llvm/lib/Target/BPF/BPF.td:13:                                                
/usr/local/google/home/jackmanb/src/llvm-project/llvm/lib/Target/BPF/BPFInstrInfo.td:677:1: error: Pattern doesn't match mayStore = 1        
def : Pat<(atomic_fence (timm), (timm)), (SYNC)>;                                                                                            
^                                                                                                                                            
error: pattern conflicts

But removing line 677 fixes it.

Thanks. Somehow my build is successful. I kind of know what is the possible issue but wonder why I did not hit it.

Can we please keep barriers out of scope? I think there's a lot of design to be done there and I'd rather just get the core atomics working first.

The reason I put it here is to make *cpu=v4* roughly complete. Because if you adds barrier later we may need to add "cpu=v5", and I want to avoid that. But I agree this is a little bit overwhelming on you and let us discuss in bpf office hour how to proceed.

For the time being, alternatively, you can remove the following code
+let hasSideEffects = 1, mayLoad = 1, mayStore = 1 in {
+ def SYNC : TYPE_LD_ST<BPF_ATOMIC.Value, BPF_DW.Value,
+ (outs), (ins), "sync", []> {
+ let Inst{7-4} = BPF_BARRIER.Value;
+ let BPFClass = BPF_STX;
+ }
+}
+
+def : Pat<(atomic_fence (timm), (timm)), (SYNC)>;
to workaround compilation issue.

yonghong-song edited the summary of this revision. (Show Details)
  • remove all char/short atomic operations
  • remove barrier instruction
ast added inline comments.Nov 19 2020, 11:25 AM
llvm/test/CodeGen/BPF/atomics_2.ll
15

test_and_set is not the same as xchg.
xchg doesn't do comparison.

yonghong-song added inline comments.Nov 19 2020, 11:36 AM
llvm/test/CodeGen/BPF/atomics_2.ll
15

I am looking at here:

https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html

which mentions:
type __sync_lock_test_and_set (type *ptr, type value, ...)

This builtin, as described by Intel, is not a traditional test-and-set operation, but rather an atomic exchange operation. It writes value into *ptr, and returns the previous contents of *ptr.

Many targets have only minimal support for such locks, and do not support a full exchange operation. In this case, a target may support reduced functionality here by which the only valid value to store is the immediate constant 1. The exact value actually stored in *ptr is implementation defined.

This builtin is not a full barrier, but rather an acquire barrier. This means that references after the builtin cannot move to (or be speculated to) before the builtin, but previous memory stores may not be globally visible yet, and previous memory loads may not yet be satisfied.

So it does not do compare.

Or alternatively for llvm atomic builtin,

https://llvm.org/docs/Atomics.html

We have:

iN __atomic_load_N(iN *ptr, iN val, int ordering)

void atomic_store_N(iN *ptr, iN val, int ordering)
iN
atomic_exchange_N(iN *ptr, iN val, int ordering)
bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure_order)

But as I mentioned in bpf office hour meeting, a "ordering" is required and I do not know how to deal with it.

ast added inline comments.Nov 19 2020, 11:51 AM
llvm/test/CodeGen/BPF/atomics_2.ll
15

Thanks for the info. gcc builtin has a misleading name.
Please mention this quirk in the tests comment and in the commit log.

yonghong-song edited the summary of this revision. (Show Details)

add proper comment in test and clarify in commit message for __sync_lock_test_and_set which actually means an atomic exchange operation.

ast added a comment.Nov 19 2020, 9:19 PM

Looks like the test didn't change. Only commit log... that's fine. I think the diff is ready to land, but let's wait for the kernel patches to be ready as well.

In D72184#2407264, @ast wrote:

Looks like the test didn't change. Only commit log... that's fine. I think the diff is ready to land, but let's wait for the kernel patches to be ready as well.

I added some comments in test... Yes, make sense to wait until we have at least a RFC for kernel patches we know the interface sort of works.

okay, my fault. I must have uploaded an old version of the patch so comment is not in the test. will upload the new version tonight.

add some comments in test w.r.t. __sync_lock_test_and_set()

jackmanb added a comment.EditedNov 23 2020, 7:21 AM

I thought a little more about something I was saying in the office hours.

I'm pretty sure GCC's __atomic_store(&x, &y, order) should fail to compile for anything other than order=__ATOMIC_RELAXED, since with the current kernel patchset we have BPF_SET (which is called BPF_XCHG in this LLVM patch) implemented with the kernel's atomic_store, which _doesn't imply a barrier_.

One alternative solution might be just to scrap BPF_SET without BPF_FETCH (BPF_SET | BPF_FETCH is implemented with atomic_xchg which _does_ imply a barrier IIUC). As we discussed in the office hours, BPF_CMPSET (called BPF_CMPXCHG in this LLVM patch) won't be supported without BPF_FETCH so there is sort of precedent for this.

BTW to investigate my previous comment I tried compiling this code:

// SPDX-License-Identifier: GPL-2.0
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>

__u64 add64_value = 1;
__u64 add64_result;
__u32 add32_value = 1;
__u32 add32_result;
__u64 add_stack_value_copy;
__u64 add_stack_result;
SEC("fentry/bpf_fentry_test1")
int BPF_PROG(add, int a)
{
        __u64 add_stack_value = 1;

        add64_result = __sync_fetch_and_add(&add64_value, 2);
        add32_result = __sync_fetch_and_add(&add32_value, 2);
        add_stack_result = __sync_fetch_and_add(&add_stack_value, 2);
        add_stack_value_copy = add_stack_value;

        return 0;
}

__u64 cmpxchg64_value = 1;
__u64 cmpxchg64_result_fail;
__u64 cmpxchg64_result_succeed;
__u32 cmpxchg32_value = 1;
__u32 cmpxchg32_result_fail;
__u32 cmpxchg32_result_succeed;
SEC("fentry/bpf_fentry_test1")
int BPF_PROG(cmpxchg, int a)
{
        cmpxchg64_result_fail = __sync_val_compare_and_swap(
                &cmpxchg64_value, 0, 3);
        cmpxchg64_result_succeed = __sync_val_compare_and_swap(
                &cmpxchg64_value, 1, 2);

        cmpxchg32_result_fail = __sync_val_compare_and_swap(
                &cmpxchg32_value, 0, 3);
        cmpxchg32_result_succeed = __sync_val_compare_and_swap(
                &cmpxchg32_value, 1, 2);

        return 0;
}

__u64 xchg64_value = 1;
__u64 xchg64_result;
__u32 xchg32_value = 1;
__u32 xchg32_result;
SEC("fentry/bpf_fentry_test1")
int BPF_PROG(xchg, int a)
{
        __u64 val64 = 2;
        __u32 val32 = 2;

        __atomic_exchange(&xchg64_value, &val64, &xchg64_result, __ATOMIC_RELAXED);
        __atomic_exchange(&xchg32_value, &val32, &xchg32_result, __ATOMIC_RELAXED);
        __atomic_store(&xchg64_result, &val64, __ATOMIC_SEQ_CST);

        return 0;
}

(By modifying this code to add the __atomic_store and then running make -C ~/src/linux/linux/tools/testing/selftests/bpf -j test_progs from the base of the kernel tree)

And got a crash:

$ make -C ~/src/linux/linux/tools/testing/selftests/bpf -j test_progs  
make: Entering directory '/usr/local/google/home/jackmanb/src/linux/linux/tools/testing/selftests/bpf'
  CLNG-LLC [test_maps] atomics_test.o
LLVM ERROR: Cannot select: 0x5638e9444528: ch = AtomicStore<(store seq_cst 8 into @xchg64_result)> 0x5638e94445f8, 0x5638e9444660, Constant:i64<2>, progs/atomics_test.c:59:2 @[ progs/atomics_test.c:52:5 ]
  0x5638e9444660: i64 = BPFISD::Wrapper TargetGlobalAddress:i64<i64* @xchg64_result> 0, progs/atomics_test.c:57:2 @[ progs/atomics_test.c:52:5 ]
    0x5638e9444b40: i64 = TargetGlobalAddress<i64* @xchg64_result> 0, progs/atomics_test.c:57:2 @[ progs/atomics_test.c:52:5 ]
  0x5638e94628c0: i64 = Constant<2>
In function: xchg
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace.
Stack dump:
0.      Program arguments: llc -mattr=dwarfris -march=bpf -mcpu=v4 -mattr=+alu32 -filetype=obj -o /usr/local/google/home/jackmanb/src/linux/linux/tools/testing/selftests/bpf/atomics_test.o
1.      Running pass 'Function Pass Manager' on module '<stdin>'.
2.      Running pass 'BPF DAG->DAG Pattern Instruction Selection' on function '@xchg'
 #0 0x00005638e543ae1d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/usr/local/bin/llc+0x26f4e1d)
 #1 0x00005638e5438c14 llvm::sys::RunSignalHandlers() (/usr/local/bin/llc+0x26f2c14)
 #2 0x00005638e5438d70 SignalHandler(int) (/usr/local/bin/llc+0x26f2d70)
 #3 0x00007ff91703b140 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x14140)
 #4 0x00007ff916b1edb1 raise ./signal/../sysdeps/unix/sysv/linux/raise.c:51:1
 #5 0x00007ff916b08537 abort ./stdlib/abort.c:81:7
 #6 0x00005638e53b393c llvm::report_fatal_error(llvm::Twine const&, bool) (/usr/local/bin/llc+0x266d93c)
 #7 0x00005638e53b3a6e (/usr/local/bin/llc+0x266da6e)
 #8 0x00005638e527bac0 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) (/usr/local/bin/llc+0x2535ac0)
 #9 0x00005638e527c8b1 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) (/usr/local/bin/llc+0x25368b1)
#10 0x00005638e527a5d7 llvm::SelectionDAGISel::DoInstructionSelection() (/usr/local/bin/llc+0x25345d7)
#11 0x00005638e52833f5 llvm::SelectionDAGISel::CodeGenAndEmitDAG() (/usr/local/bin/llc+0x253d3f5)
#12 0x00005638e5287471 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) (/usr/local/bin/llc+0x2541471)
#13 0x00005638e5289dbc llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) (.part.0) (/usr/local/bin/llc+0x2543dbc)
#14 0x00005638e491a5d4 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) (/usr/local/bin/llc+0x1bd45d4)
#15 0x00005638e4d0a050 llvm::FPPassManager::runOnFunction(llvm::Function&) (/usr/local/bin/llc+0x1fc4050)
#16 0x00005638e4d0b5dc llvm::FPPassManager::runOnModule(llvm::Module&) (/usr/local/bin/llc+0x1fc55dc)
#17 0x00005638e4d09208 llvm::legacy::PassManagerImpl::run(llvm::Module&) (/usr/local/bin/llc+0x1fc3208)
#18 0x00005638e33ba5d7 main (/usr/local/bin/llc+0x6745d7)
#19 0x00007ff916b09cca __libc_start_main ./csu/../csu/libc-start.c:308:16
#20 0x00005638e344dd1a _start (/usr/local/bin/llc+0x707d1a)
make: *** [Makefile:396: /usr/local/google/home/jackmanb/src/linux/linux/tools/testing/selftests/bpf/atomics_test.o] Error 134
make: Leaving directory '/usr/local/google/home/jackmanb/src/linux/linux/tools/testing/selftests/bpf'

Ya, the above llvm crash is expected as bpf backend does not handle AtomicStore.

For kernel code, I can see:

kvm/x86.c:      vcpu->arch.nmi_pending += atomic_xchg(&vcpu->arch.nmi_queued, 0);

...

kvm/x86.c:              atomic_set(&kvm_guest_has_master_clock, 1);

So for atomic_set we do not return a value, right?

I did not see kernel has atomic_store, do you mean atomic_set?

Do you suggest we also implement atomic_set? There is no need for 64-bit architecture like x64, right?

jackmanb added a comment.EditedNov 23 2020, 8:54 AM

I did not see kernel has atomic_store, do you mean atomic_set?

Sorry yep I meant atomic_set

Do you suggest we also implement atomic_set? There is no need for 64-bit architecture like x64, right?

Yeah actually now I think about it, atomic_set is pretty pointless, I think it's only there in the kernel to support strong type-checking of atomic_t; It doesn't imply any barriers.

I think we can do without it, it makes more sense to implement __atomic_store alongside barrier instructions.

yonghong-song edited the summary of this revision. (Show Details)
  • remove atomic_fetch_sub which can be implemented with neg + atomic_fetch_add
  • add support for xand, xor, xxor (xadd already been supported)
  • for any given __sync_fetch_and_{add, and, or, xor}, llvm will generate either atomic_fetch_<op> or x<op> depending on whether the return value is used or not.
ast added inline comments.Dec 1 2020, 8:54 PM
llvm/lib/Target/BPF/BPFMIChecking.cpp
199

With this logic in place Andrii has a point. There is no need for -mcpu=v4 flag.
The builtins will produce new insns and it will be up to kernel to accept them or not.

207

The default is error prone. Why not to check for XFORD explicitly and default to fatal_error?

yonghong-song added inline comments.Dec 1 2020, 9:28 PM
llvm/lib/Target/BPF/BPFMIChecking.cpp
199

will make the change (removing -mcpu=v4).

207

Yes, we can do this although theoretically default fatal_error should never happen.

yonghong-song added inline comments.Dec 1 2020, 9:39 PM
llvm/lib/Target/BPF/BPFMIChecking.cpp
199

There will be one user-level change:

. if user has "ret = __sync_fetch_and_add(p, r);", today's llvm will generate a fatal error saying that invalid XADD since return value is used.
. but removing -mcpu=v4. the compilation will be successful and the error will happen in kernel since atomic_fetch_add is not supported.

I guess this is an okay change, right?

ast added inline comments.Dec 1 2020, 9:42 PM
llvm/lib/Target/BPF/BPFMIChecking.cpp
199

right. With new llvm compilation will succeed, but it will fail to load on older kernel and will work as expected on newer kernel. There is a change where the error will be seen, but that's acceptable.

yonghong-song edited the summary of this revision. (Show Details)
  • remove -mcpu=v4.
  • for new instructions (except xadd), for 32bit mode, only alu32 mode is supported. I chose this way since we have -mcpu=v3 for a while which has alu32 as default. We really want to promote alu32 mode. The new kernel which supports atomic op definitely supports alu32 well...

use llvm_unreachable() in default case of the switch statement

ast accepted this revision.Dec 2 2020, 9:34 AM
This revision is now accepted and ready to land.Dec 2 2020, 9:34 AM

fix clang-format warnings.

This revision was landed with ongoing or failed builds.Dec 3 2020, 7:46 AM
This revision was automatically updated to reflect the committed changes.