This patch lifts the restriction on external functions introduced in
e497548. It seems that restriction was intended to produce better
compiler errors when -Wno-strict-prototypes was used, but it produces
false positives when memory intrinsics are lowered to libcalls.
Details
- Reviewers
ast yonghong-song eddyz87
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
llvm/lib/Target/BPF/BPFISelLowering.cpp | ||
---|---|---|
43–50 | If you do this, you also need to update the callers which are still doing work after calling fail, but fail now never returns. Also in LowerCall, the fail call in the ExternalSymbolSDNode case should be removed. That is a legitimate condition when compiling rust code where the builtin is provided by another crate so it shouldn't abort. |
llvm/lib/Target/BPF/BPFISelLowering.cpp | ||
---|---|---|
43–50 | Thank you - I've reworked this patch. |
Please note that a number of unit tests fails with this patch applied:
- LLVM :: CodeGen/BPF/many_args1.ll
- LLVM :: CodeGen/BPF/many_args2.ll
- LLVM :: CodeGen/BPF/struct_ret1.ll
- LLVM :: CodeGen/BPF/vararg1.ll
- LLVM :: CodeGen/BPF/warn-call.ll
These failures could be caught when ninja check-all target is executed.
I don't think that cleanup changes should be intermixed with functionality changes.
As the summary notes there three separate changes in this revision:
- code cleanup (which is, in fact, somewhat controversial):
- LLVM Coding Standards actually require brace-less if statements (link)
- I don't understand the crusade against assert and llvm_unreachable
- change of the behavior for ExternalSymbolSDNode: this was not a part of Diff 1, so I assume that it was not a part of the @tamird's initial intent. Is there a use case for this change? The libbpf commit 0a34245 deals only with external maps, not external functions.
- changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.
I think that all three should be tracked as separate revisions.
Thanks. Most of the failures were caused by slight changes to error strings.
I don't think that cleanup changes should be intermixed with functionality changes.
As the summary notes there three separate changes in this revision:
- code cleanup (which is, in fact, somewhat controversial):
- LLVM Coding Standards actually require brace-less if statements (link)
- I don't understand the crusade against assert and llvm_unreachable
Extracted these changes to https://reviews.llvm.org/D156136. Let's discuss there.
When compiling Rust code, we see this warning trigger on memset intrinsics, but this is not a problem because we provide an implementation of memset and the intrinsic is eventually lowered to a call to the function we provide. Is there a finer-grained change that would achieve our goals?
- changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.
I agree in principle, but as note in code comments, this produces worse diagnostics when multiple functions contain such errors. Consider also that it was important to avoid exiting, hence D20571, but I do not understand why. Perhaps @ast can fill this in.
I think that all three should be tracked as separate revisions.
Will do once you have some guidance for me on the above.
Could you please point me to some example (on github or something like that)?
I want to insert some prints to see what is the difference in DAG nodes between this and functions declared extern in C code.
- changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.
I agree in principle, but as note in code comments, this produces worse diagnostics when multiple functions contain such errors. Consider also that it was important to avoid exiting, hence D20571, but I do not understand why. Perhaps @ast can fill this in.
If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().
This reproduces in the aya integration tests in this PR on this commit: https://github.com/aya-rs/bpf-linker/pull/77/commits/6624b7d673a7b65cc634e94a4a8a63f79aa90f64
It's not exactly a simple repro. I can try to give you just the IR if that'd be helpful?
- changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.
I agree in principle, but as note in code comments, this produces worse diagnostics when multiple functions contain such errors. Consider also that it was important to avoid exiting, hence D20571, but I do not understand why. Perhaps @ast can fill this in.
If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().
OK, I can do that, but that still doesn't address the original motivation. @ast can you please explain why exiting was considered as something to avoid?
This reproduces in the aya integration tests in this PR on this commit: https://github.com/aya-rs/bpf-linker/pull/77/commits/6624b7d673a7b65cc634e94a4a8a63f79aa90f64
It's not exactly a simple repro. I can try to give you just the IR if that'd be helpful?
Yes, please, that would be helpful.
I use a reasonably fresh LLVM main (88b6d291bbdf) with the following debug print:
diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp index 83a4bfb2f758..7937f8918fc8 100644 --- a/llvm/lib/Target/BPF/BPFISelLowering.cpp +++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp @@ -455,6 +455,8 @@ SDValue BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI, InGlue = Chain.getValue(1); } + llvm::dbgs() << "Callee: "; + Callee.dump(); // If the callee is a GlobalAddress node (quite common, every direct call is) // turn it into a TargetGlobalAddress node so that legalize doesn't hack it. // Likewise ExternalSymbol -> TargetExternalSymbol.
I try to llc the above gist and see the following:
$ llc -mcpu=v3 extern-builtins-from-tamird.ll -o /dev/null && echo "done" Callee: t10: i64 = Constant<1> Callee: t81: i64 = Constant<25> Callee: t8: i64 = Constant<1> Callee: t81: i64 = Constant<25> Callee: t8: i64 = Constant<1> Callee: t81: i64 = Constant<25> Callee: t8: i64 = Constant<1> Callee: t82: i64 = Constant<25> Callee: t8: i64 = Constant<1> Callee: t81: i64 = Constant<25> Callee: t8: i64 = Constant<1> Callee: t81: i64 = Constant<25> done
It looks like the ExternalSymbolSDNode branch is not triggered by the gist :(
Yeah, I also can't reproduce using llc. But in the actual code this module is fed into LLVMTargetMachineEmitToFile and those errors do show up.
Basing on the previous comments I assume that one of the functions for which such errors are printed is llvm.memset.p0.i64.
When llc processes the module.ll example that you have shared I see that calls to llvm.memset.p0.i64 are replaced by store instructions, e.g. the following IR:
%8 = call noundef ptr inttoptr (i64 1 to ptr)(ptr noundef nonnull @AYA_LOG_BUF, ptr noundef nonnull %2) #8 ... %68 = getelementptr inbounds i8, ptr %8, i64 62 ... call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(7) %68, i8 0, i64 7, i1 false)
Is converted to the following byte code:
r1 = AYA_LOG_BUF ll call 1 # bpf_map_lookup_elem ... r1 = 0 *(u8 *)(r0 + 69) = w1 ... *(u8 *)(r0 + 62) = w1
This replacement happens when selection DAG is constructed (I assume this is default lowering logic for memset).
So, whatever is calling LLVMTargetMachineEmitToFile on your side must do some adjustments to the BPFISelLowering logic.
What tool does the LLVMTargetMachineEmitToFile call? Is it open source? Or am I confused?
Yeah, it's open source. All the code is in https://github.com/aya-rs/bpf-linker, and in particular the LLVMTargetMachineEmitToFile call is https://github.com/aya-rs/bpf-linker/blob/28bfade2a4c44911a8813b418e3bdfd39b9e3b6a/src/llvm/mod.rs#L279-L299
I've been mucking with this locally. I wrote a simple C program, that loads the IR and calls LLVMTargetMachineEmitToFile, and I can confirm that I *do not* see any warnings. Very strange. Here's the code: https://gist.github.com/tamird/e180a08d00a6572d95ae2035594ddd1e
I also modified bpf-linker to cycle the module to disk and back, and while I still see the error, the number of times it is emitted goes from 6 to 1 (!!).
diff diff --git a/src/llvm/mod.rs b/src/llvm/mod.rs index 4cc4181..f2e66fe 100644 --- a/src/llvm/mod.rs +++ b/src/llvm/mod.rs @@ -11,6 +11,7 @@ use std::{ use libc::c_char as libc_char; use llvm_sys::bit_reader::*; +use llvm_sys::ir_reader::*; use llvm_sys::core::*; use llvm_sys::debuginfo::LLVMStripModuleDebugInfo; use llvm_sys::linker::LLVMLinkModules2; @@ -47,6 +48,7 @@ unsafe fn parse_command_line_options<T: AsRef<str>>(args: &[T], overview: &str) .collect::<Vec<_>>(); let c_ptrs = c_args.iter().map(|s| s.as_ptr()).collect::<Vec<_>>(); let overview = CString::new(overview).unwrap(); + error!("args={:?}, overview={:?}", c_args, overview); LLVMParseCommandLineOptions(c_ptrs.len() as i32, c_ptrs.as_ptr(), overview.as_ptr()); } @@ -284,6 +286,28 @@ pub unsafe fn codegen( ) -> Result<(), String> { let mut message = Message::new(); + let path = "/home/tamird/src/aya/test/integration-ebpf/module.ll"; + // dump IR for the final linked module for debugging purposes + let path = CString::new(path).unwrap(); + + write_ir(module, path.as_c_str()).unwrap(); + + let mut buffer = ptr::null_mut(); + let ret = unsafe { LLVMCreateMemoryBufferWithContentsOfFile(path.as_ptr(), &mut buffer, ptr::null_mut()) }; + assert_eq!(ret, 0); + + let mut module = ptr::null_mut(); + let ret = unsafe { LLVMParseIRInContext(LLVMGetGlobalContext(), buffer, &mut module, ptr::null_mut()) }; + assert_eq!(ret, 0); + + error!("buffer={:p}", buffer); + + let triple = "bpfel-unknown-none"; + let target = target_from_triple(CString::new(triple).unwrap().as_c_str()).unwrap(); + let tm = create_target_machine(target, triple, "", "").unwrap(); + if LLVMTargetMachineEmitToFile(
Before:
23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported. 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported. 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported. 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported. 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported. 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
After:
23:51:04 [ERROR] buffer=0x5582fe80f5b0 error: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
I do not understand what's going on here.
Actually, it seems that the change from multiple logged lines to a single one had to do with LLVMGetGlobalContext; if I pass the existing context then I get the same number of warning lines whether I cycle through disk or not.
Figured it out - if you pass --bpf-expand-memcpy-in-order to llc, you will reproduce the A call to built-in function 'memset' is not supported. prints.
I can see the error with this flag, thanks.
Also, here is a simpler example which could be triggered from clang:
$ cat test.c extern void *memset(void *s, int c, unsigned long n); extern void efoo(char *p); void bar(char *p, unsigned long l) { memset(p, 0, l); efoo(p); } $ clang -O2 --target=bpf -mcpu=v3 test.c -emit-llvm -S -o test.ll $ cat test.ll ... ; Function Attrs: nounwind define dso_local void @bar(ptr noundef %p, i64 noundef %l) local_unnamed_addr #0 { entry: tail call void @llvm.memset.p0.i64(ptr align 1 %p, i8 0, i64 %l, i1 false) tail call void @efoo(ptr noundef %p) #3 ret void } ; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write) declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1 declare dso_local void @efoo(ptr noundef) local_unnamed_addr #2 attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="v3" } attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) } attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="v3" } attributes #3 = { nounwind } ... $ llc -mcpu=v3 test.ll -o - error: <unknown>:0:0: in function bar void (ptr, i64): A call to built-in function 'memset' is not supported. Callee: t21: i64 = GlobalAddress<ptr @efoo> 0 .globl bar # -- Begin function bar .p2align 3 .type bar,@function bar: # @bar .Lbar$local: .type .Lbar$local,@function # %bb.0: # %entry r3 = r2 r6 = r1 w2 = 0 call memset # <------ Note the leftover call to `memset` r1 = r6 call efoo exit .Lfunc_end0: .size bar, .Lfunc_end0-bar .size .Lbar$local, .Lfunc_end0-bar # -- End function
The warnings would be reported for builtin extern functions (list is here: llvm/include/llvm/IR/RuntimeLibcalls.def).
But will not be reported for regular extern functions, like efoo in the example above.
As far as I understand all BPF code that uses clang uses llvm.memset patterns that could be statically expanded as a bunch of stores and BPF runtime does not really provide memset implementation.
So on one hand it probably would be better to retain this warning for clang, but on the other hand these warnings really belong to linker not compiler.
If libbpf is used as a linker I think it would report an error if code w/o suitable memset implementation is loaded.
Is there a way to pass a list of supported builtins as a parameter here?
@yonghong-song, what do you think?
To better reiterate my position on the warning for ExternalSymbolSDNode: in C analogy such warnings would be generated by the linker, not compiler.
So, I vote to agree with @tamird here and remove the warning (or at-least make it optional).
llvm/lib/Target/BPF/BPFISelLowering.cpp | ||
---|---|---|
466 | How does it help aya? In general I think it's ok to remove this, but we need to define the behavior more accurately here. In particular with memcpy, do we generate asm that calls "llvm.memcpy.p0.p0.i64" name or does it become plain "memcpy" ? |
I applied this patch and did an experiment with bpftool linker functionality. The following are details.
First let us try one example with func mmemset which is not a builtin function.
$ cat t1.c #define __hidden __attribute__((visibility("hidden"))) __hidden extern void *mmemset(void *s, int c, unsigned long n); void do_memset(char *x, unsigned long l) { mmemset(x, 0, l); } $ cat t2.c void *mmemset(void *s, int c, unsigned long n); void *mmemset(void *x, int c, unsigned long n) { // custom memset, I am using step of 2, otherwise the // compiler is smart enough and may transform // the code back to memset! char *ch = x; for (int i = 0; i < n; i+=2) ch[i] = c; return 0; } $ clang --target=bpf -O2 -g -c t1.c $ clang --target=bpf -O2 -g -c t2.c $ bpftool gen object final.o t1.o t2.o $ llvm-readelf -s t1.o Symbol table '.symtab' contains 13 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t1.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 .text 3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .debug_loclists 4: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .debug_abbrev 5: 0000000000000000 0 SECTION LOCAL DEFAULT 8 .debug_str_offsets 6: 0000000000000000 0 SECTION LOCAL DEFAULT 10 .debug_str 7: 0000000000000000 0 SECTION LOCAL DEFAULT 11 .debug_addr 8: 0000000000000000 0 SECTION LOCAL DEFAULT 16 .debug_frame 9: 0000000000000000 0 SECTION LOCAL DEFAULT 18 .debug_line 10: 0000000000000000 0 SECTION LOCAL DEFAULT 20 .debug_line_str 11: 0000000000000000 32 FUNC GLOBAL DEFAULT 2 do_memset 12: 0000000000000000 0 NOTYPE GLOBAL HIDDEN UND mmemset $ llvm-readelf -s t2.o Symbol table '.symtab' contains 14 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t2.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 .text 3: 0000000000000038 0 NOTYPE LOCAL DEFAULT 2 LBB0_3 4: 0000000000000010 0 NOTYPE LOCAL DEFAULT 2 LBB0_2 5: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .debug_loclists 6: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .debug_abbrev 7: 0000000000000000 0 SECTION LOCAL DEFAULT 7 .debug_str_offsets 8: 0000000000000000 0 SECTION LOCAL DEFAULT 9 .debug_str 9: 0000000000000000 0 SECTION LOCAL DEFAULT 10 .debug_addr 10: 0000000000000000 0 SECTION LOCAL DEFAULT 15 .debug_frame 11: 0000000000000000 0 SECTION LOCAL DEFAULT 17 .debug_line 12: 0000000000000000 0 SECTION LOCAL DEFAULT 19 .debug_line_str 13: 0000000000000000 72 FUNC GLOBAL DEFAULT 2 mmemset $ llvm-readelf -s final.o Symbol table '.symtab' contains 8 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t1.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .text 3: 0000000000000000 32 FUNC GLOBAL DEFAULT 3 do_memset 4: 0000000000000020 72 FUNC GLOBAL HIDDEN 3 mmemset 5: 0000000000000000 0 FILE LOCAL DEFAULT ABS t2.c 6: 0000000000000058 0 NOTYPE LOCAL DEFAULT 3 LBB0_3 7: 0000000000000030 0 NOTYPE LOCAL DEFAULT 3 LBB0_2
So bpftool linking succeeded with global function mmemset.
Now let rename the mmemset to memset in the above t1.c and t2.c.
$ cat t1.c #define __hidden __attribute__((visibility("hidden"))) __hidden extern void *memset(void *s, int c, unsigned long n); void do_memset(char *x, unsigned long l) { memset(x, 0, l); } $ cat t2.c void *memset(void *s, int c, unsigned long n); void *memset(void *x, int c, unsigned long n) { // custom memset, I am using step of 2, otherwise the // compiler is smart enough and may transform // the code back to memset! char *ch = x; for (int i = 0; i < n; i+=2) ch[i] = c; return 0; } $ clang --target=bpf -O2 -g -c t1.c $ clang --target=bpf -O2 -g -c t2.c $ bpftool gen object final.o t1.o t2.o libbpf: failed to find BTF info for global/extern symbol 'memset' Error: failed to link 't1.o': No such file or directory (2) $ llvm-readelf -s t1.o Symbol table '.symtab' contains 13 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t1.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 .text 3: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .debug_loclists 4: 0000000000000000 0 SECTION LOCAL DEFAULT 5 .debug_abbrev 5: 0000000000000000 0 SECTION LOCAL DEFAULT 8 .debug_str_offsets 6: 0000000000000000 0 SECTION LOCAL DEFAULT 10 .debug_str 7: 0000000000000000 0 SECTION LOCAL DEFAULT 11 .debug_addr 8: 0000000000000000 0 SECTION LOCAL DEFAULT 16 .debug_frame 9: 0000000000000000 0 SECTION LOCAL DEFAULT 18 .debug_line 10: 0000000000000000 0 SECTION LOCAL DEFAULT 20 .debug_line_str 11: 0000000000000000 32 FUNC GLOBAL DEFAULT 2 do_memset 12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND memset $ llvm-readelf -s t2.o Symbol table '.symtab' contains 14 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS t2.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 .text 3: 0000000000000038 0 NOTYPE LOCAL DEFAULT 2 LBB0_3 4: 0000000000000010 0 NOTYPE LOCAL DEFAULT 2 LBB0_2 5: 0000000000000000 0 SECTION LOCAL DEFAULT 3 .debug_loclists 6: 0000000000000000 0 SECTION LOCAL DEFAULT 4 .debug_abbrev 7: 0000000000000000 0 SECTION LOCAL DEFAULT 7 .debug_str_offsets 8: 0000000000000000 0 SECTION LOCAL DEFAULT 9 .debug_str 9: 0000000000000000 0 SECTION LOCAL DEFAULT 10 .debug_addr 10: 0000000000000000 0 SECTION LOCAL DEFAULT 15 .debug_frame 11: 0000000000000000 0 SECTION LOCAL DEFAULT 17 .debug_line 12: 0000000000000000 0 SECTION LOCAL DEFAULT 19 .debug_line_str 13: 0000000000000000 72 FUNC GLOBAL DEFAULT 2 memset
The major difference is in t1.o,
12: 0000000000000000 0 NOTYPE GLOBAL HIDDEN UND mmemset vs. 12: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND memset
So for builtin function memset, the 'visibility' is not set to HIDDEN although
we have an attribute for it.
I would like to have more study for this to understand why and how to fix it
before we lift the error message.
I checked the code and found that ExternalSymbolSDNode is indeed corresponding to builtin/library functions.
In SelectionDAG/SelectionDAG.cpp, we have
getExternalSymbol(TLI->getLibcallName(RTLIB::MEMCPY), ... getExternalSymbol(TLI->getLibcallName(LibraryCall), ... getExternalSymbol(TLI->getLibcallName(RTLIB::MEMMOVE), ... getExternalSymbol(TLI->getLibcallName(LibraryCall), ... getExternalSymbol(BzeroName, TLI->getPointerTy(DL)), std::move(Args)); ... getExternalSymbol(TLI->getLibcallName(RTLIB::MEMSET), ... getExternalSymbol(TLI->getLibcallName(LibraryCall), ... SDValue Callee = getExternalSymbol(TLI->getLibcallName(LC),
llvm/lib/Target/BPF/BPFISelLowering.cpp | ||
---|---|---|
466 |
In aya, we provide implementations of these built-ins (memset, memcpy, etc.). In bpf-linker, we combine all the IR into a single module before codegen. Currently this error fires, which we must treat as fatal or ignore by string matching. It turns out that codegen generates a call to plain "memset", and the resulting code is completely valid. So this error is spurious.
Looks like yonghong discovered that this symbol is indeed promoted to global.
Yes, it becomes "plain". |
@yonghong-song I have learned some things. If you repeat your experiment (even without this patch) but pass -fno-builtin when compiling t1.c, everything works.
Here's the diff in IR:
diff --git a/t1.ll b/t1-fno-builtin.ll index 64d172bf9ab5..324a98687ade 100644 --- a/t1.ll +++ b/t1-fno-builtin.ll @@ -3,24 +3,24 @@ source_filename = "t1.c" target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" target triple = "bpf" -; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(argmem: write) -define dso_local void @do_memset(ptr nocapture noundef writeonly %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 { +; Function Attrs: nounwind +define dso_local void @do_memset(ptr noundef %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 { entry: call void @llvm.dbg.value(metadata ptr %x, metadata !15, metadata !DIExpression()), !dbg !17 call void @llvm.dbg.value(metadata i64 %l, metadata !16, metadata !DIExpression()), !dbg !17 - tail call void @llvm.memset.p0.i64(ptr align 1 %x, i8 0, i64 %l, i1 false), !dbg !18 + %call = tail call ptr @memset(ptr noundef %x, i32 noundef 0, i64 noundef %l) #3, !dbg !18 ret void, !dbg !19 } -; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write) -declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1 +declare !dbg !20 hidden ptr @memset(ptr noundef, i32 noundef, i64 noundef) local_unnamed_addr #1 ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) declare void @llvm.dbg.value(metadata, metadata, metadata) #2 -attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(argmem: write) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } -attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) } +attributes #0 = { nounwind "frame-pointer"="all" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } +attributes #1 = { "frame-pointer"="all" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) } +attributes #3 = { nobuiltin nounwind "no-builtins" } !llvm.dbg.cu = !{!0} !llvm.module.flags = !{!2, !3, !4, !5, !6} @@ -46,3 +46,12 @@ attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memo !17 = !DILocation(line: 0, scope: !8) !18 = !DILocation(line: 4, column: 3, scope: !8) !19 = !DILocation(line: 5, column: 1, scope: !8) +!20 = !DISubprogram(name: "memset", scope: !1, file: !1, line: 2, type: !21, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !25) +!21 = !DISubroutineType(types: !22) +!22 = !{!23, !23, !24, !13} +!23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64) +!24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) +!25 = !{!26, !27, !28} +!26 = !DILocalVariable(name: "s", arg: 1, scope: !20, file: !1, line: 2, type: !23) +!27 = !DILocalVariable(name: "c", arg: 2, scope: !20, file: !1, line: 2, type: !24) +!28 = !DILocalVariable(name: "n", arg: 3, scope: !20, file: !1, line: 2, type: !13)
The same result is obtained if I annotate do_memset with __attribute__((no_builtin("memset"))):
diff --git a/t1.c b/t1-no-builtin.c index 2698b334b898..3cded7c57e99 100644 --- a/t1.c +++ b/t1-no-builtin.c @@ -1,5 +1,5 @@ #define __hidden __attribute__((visibility("hidden"))) __hidden extern void *memset(void *s, int c, unsigned long n); -void do_memset(char *x, unsigned long l) { +void do_memset(char *x, unsigned long l) __attribute__((no_builtin("memset"))) { memset(x, 0, l); }
diff --git a/t1.ll b/t1-no-builtin.ll index 64d172bf9ab5..f9a2b088084e 100644 --- a/t1.ll +++ b/t1-no-builtin.ll @@ -1,25 +1,25 @@ -; ModuleID = 't1.c' -source_filename = "t1.c" +; ModuleID = 't1-no-builtin.c' +source_filename = "t1-no-builtin.c" target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128" target triple = "bpf" -; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(argmem: write) -define dso_local void @do_memset(ptr nocapture noundef writeonly %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 { +; Function Attrs: mustprogress nofree nounwind willreturn memory(argmem: readwrite) +define dso_local void @do_memset(ptr noundef %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 { entry: call void @llvm.dbg.value(metadata ptr %x, metadata !15, metadata !DIExpression()), !dbg !17 call void @llvm.dbg.value(metadata i64 %l, metadata !16, metadata !DIExpression()), !dbg !17 - tail call void @llvm.memset.p0.i64(ptr align 1 %x, i8 0, i64 %l, i1 false), !dbg !18 + %call = tail call ptr @memset(ptr noundef %x, i32 noundef 0, i64 noundef %l), !dbg !18 ret void, !dbg !19 } -; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write) -declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1 +; Function Attrs: mustprogress nofree nounwind willreturn memory(argmem: readwrite) +declare !dbg !20 hidden ptr @memset(ptr noundef writeonly, i32 noundef, i64 noundef) local_unnamed_addr #1 ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none) declare void @llvm.dbg.value(metadata, metadata, metadata) #2 -attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(argmem: write) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } -attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) } +attributes #0 = { mustprogress nofree nounwind willreturn memory(argmem: readwrite) "frame-pointer"="all" "no-builtin-memset" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } +attributes #1 = { mustprogress nofree nounwind willreturn memory(argmem: readwrite) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" } attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) } !llvm.dbg.cu = !{!0} @@ -27,7 +27,7 @@ attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memo !llvm.ident = !{!7} !0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 17.0.0 (https://github.com/llvm/llvm-project.git 52807ba3836ef30c54622f227c4a23e2eb9375cd)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None) -!1 = !DIFile(filename: "t1.c", directory: "/home/tamird/src/llvm-project-latest/foo", checksumkind: CSK_MD5, checksum: "7d959830f84c1a82a1efb0fbb5b9c592") +!1 = !DIFile(filename: "t1-no-builtin.c", directory: "/home/tamird/src/llvm-project-latest/foo", checksumkind: CSK_MD5, checksum: "4a930bd64fe9fcad8934a5f1a66a94d4") !2 = !{i32 7, !"Dwarf Version", i32 5} !3 = !{i32 2, !"Debug Info Version", i32 3} !4 = !{i32 1, !"wchar_size", i32 4} @@ -46,3 +46,12 @@ attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memo !17 = !DILocation(line: 0, scope: !8) !18 = !DILocation(line: 4, column: 3, scope: !8) !19 = !DILocation(line: 5, column: 1, scope: !8) +!20 = !DISubprogram(name: "memset", scope: !1, file: !1, line: 2, type: !21, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !25) +!21 = !DISubroutineType(types: !22) +!22 = !{!23, !23, !24, !13} +!23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64) +!24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed) +!25 = !{!26, !27, !28} +!26 = !DILocalVariable(name: "s", arg: 1, scope: !20, file: !1, line: 2, type: !23) +!27 = !DILocalVariable(name: "c", arg: 2, scope: !20, file: !1, line: 2, type: !24) +!28 = !DILocalVariable(name: "n", arg: 3, scope: !20, file: !1, line: 2, type: !13)
bpftool gen object works with both of these.
I don't entirely understand the mechanism that causes memset to end up not-hidden, but it seems that BPF should always imply -fno-builtin for this reason. I've sent a PR to this effect in aya: https://github.com/aya-rs/aya/pull/698.
I'll abandon this, but it may be worthwhile to implement BPF-implies-fno-builtin *in clang*, if possible.
Doesn't this still run at the end of a function, rather than at the end of an entire file? We can report multiple functions have errored, but we can't emit all the errors in the file.
Indeed it does. My bad, for some reason I thought it does so at the end of module. And I don't see any hook that could be executed at the end of the module. Still, imho, backend should not proceed with compilation in such cases even if only first error is reported.
Yes, we fixed the embedder. The question was is it possible to harden this such that future embedders don't make the same mistake.
llvm/lib/Target/BPF/BPFISelLowering.cpp | ||
---|---|---|
466 | Since backend couldn't unroll memset it should error hard. |
I'll abandon this, but it may be worthwhile to implement BPF-implies-fno-builtin *in clang*, if possible.
Thanks @tamird Indeed -fno-builtin or attribute no-builtin can fix the issue. But I think we should not put no-builtin as default since based on my experiment a lot of users using memset with a known constant size, expecting the compiler will transform it into a load, and we do not want to break such use cases. But we could advise users to use -fno-builtin or no-builtin attribute if their usage of memset involves a unknown size to a very large known size. This way, we can keep the fail message here since survival of builtin memset func means bpftool linking cannot resolve it.
llvm/lib/Target/BPF/BPFISelLowering.cpp | ||
---|---|---|
466 | Right. The no_builtin tricks are too big of hammers for what we want here, which is to still apply constant size optimizations but not apply special treatment to these functions when they've been defined by the user, which is what @yonghong-song was saying in his earlier comments. I may dig into that when I have some time, but for now we're just ignoring these errors in aya's linker. Is there a place (issue tracker, mailing list, other) to document these musings and describe possible future work? |
I may dig into that when I have some time, but for now we're just ignoring these errors in aya's linker.
what do you mean "ignoring in aya" ?
If bpf prog contains a call to memset and you link it with a generic unbounded loop in aya linker the verifier will reject such code.
I thought aya suppose to produce valid programs.
We're ignoring the error because the error is in fact not an error, the target is doing the right thing, and aya (well, rustc) always provides memory builtins so memset gets linked correctly and a valid program is produced.
Could you point to memset implementation that gets linked eventually?
Is it a global function or static ? Asking because they're verified differently by the kernel.
I have a hard time imagining a generic implementation that can pass the verifier.
Normally in rust programs, memory builtins are provided by the compiler-builtins crate, which is the equivalent of compiler-rt in LLVM. It's automatically linked in by the compiler to provide libcalls used by LLVM. This is the code https://github.com/rust-lang/compiler-builtins/blob/efd227f9c2ac220a8699188e937eb8be810df317/src/mem/mod.rs#L46. As you see it returns the dst pointer, which breaks the verifier if it's used on stack pointers (which is something rustc sometimes does when initializing stack values). So we have an override in aya that is pretty much the same but returns void.
Is it a global function or static ? Asking because they're verified differently by the kernel.
I have a hard time imagining a generic implementation that can pass the verifier.
I am aware - it's a static so it works. And also the bounds for the inputs are statically known in all the cases where rustc uses memset internally (eg to zero initialize values with a fixed known type layout). The usual bound checks are required when calling something like https://doc.rust-lang.org/stable/std/primitive.slice.html#method.copy_from_slice.
If you do this, you also need to update the callers which are still doing work after calling fail, but fail now never returns.
Also in LowerCall, the fail call in the ExternalSymbolSDNode case should be removed. That is a legitimate condition when compiling rust code where the builtin is provided by another crate so it shouldn't abort.