This is an archive of the discontinued LLVM Phabricator instance.

[BPF] allow external calls
AbandonedPublic

Authored by tamird on Jul 20 2023, 3:02 PM.

Details

Summary

This patch lifts the restriction on external functions introduced in
e497548. It seems that restriction was intended to produce better
compiler errors when -Wno-strict-prototypes was used, but it produces
false positives when memory intrinsics are lowered to libcalls.

Diff Detail

Event Timeline

tamird created this revision.Jul 20 2023, 3:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 3:02 PM
tamird requested review of this revision.Jul 20 2023, 3:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2023, 3:02 PM
tamird added a subscriber: alessandrod.
alessandrod added inline comments.Jul 20 2023, 6:21 PM
llvm/lib/Target/BPF/BPFISelLowering.cpp
53

If you do this, you also need to update the callers which are still doing work after calling fail, but fail now never returns.

Also in LowerCall, the fail call in the ExternalSymbolSDNode case should be removed. That is a legitimate condition when compiling rust code where the builtin is provided by another crate so it shouldn't abort.

tamird updated this revision to Diff 542944.Jul 21 2023, 8:37 AM

Completely reworked the patch; no longer generating fatal errors.

tamird retitled this revision from BPF: fail reports a fatal error to [BPF] Do not miscompile, allow external calls.Jul 21 2023, 8:38 AM
tamird edited the summary of this revision. (Show Details)
tamird signed these changes with MFA.Jul 21 2023, 8:44 AM
tamird marked an inline comment as done.
tamird added inline comments.
llvm/lib/Target/BPF/BPFISelLowering.cpp
53

Thank you - I've reworked this patch.

tamird updated this revision to Diff 542997.Jul 21 2023, 11:04 AM
tamird marked an inline comment as done.

Fix broken link in commit message.

tamird edited the summary of this revision. (Show Details)Jul 21 2023, 11:04 AM
ast added a reviewer: eddyz87.Jul 22 2023, 5:16 PM

Please note that a number of unit tests fails with this patch applied:

  • LLVM :: CodeGen/BPF/many_args1.ll
  • LLVM :: CodeGen/BPF/many_args2.ll
  • LLVM :: CodeGen/BPF/struct_ret1.ll
  • LLVM :: CodeGen/BPF/vararg1.ll
  • LLVM :: CodeGen/BPF/warn-call.ll

These failures could be caught when ninja check-all target is executed.

I don't think that cleanup changes should be intermixed with functionality changes.
As the summary notes there three separate changes in this revision:

  • code cleanup (which is, in fact, somewhat controversial):
    • LLVM Coding Standards actually require brace-less if statements (link)
    • I don't understand the crusade against assert and llvm_unreachable
  • change of the behavior for ExternalSymbolSDNode: this was not a part of Diff 1, so I assume that it was not a part of the @tamird's initial intent. Is there a use case for this change? The libbpf commit 0a34245 deals only with external maps, not external functions.
  • changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.

I think that all three should be tracked as separate revisions.

Please note that a number of unit tests fails with this patch applied:

  • LLVM :: CodeGen/BPF/many_args1.ll
  • LLVM :: CodeGen/BPF/many_args2.ll
  • LLVM :: CodeGen/BPF/struct_ret1.ll
  • LLVM :: CodeGen/BPF/vararg1.ll
  • LLVM :: CodeGen/BPF/warn-call.ll

These failures could be caught when ninja check-all target is executed.

Thanks. Most of the failures were caused by slight changes to error strings.

I don't think that cleanup changes should be intermixed with functionality changes.
As the summary notes there three separate changes in this revision:

  • code cleanup (which is, in fact, somewhat controversial):
    • LLVM Coding Standards actually require brace-less if statements (link)
    • I don't understand the crusade against assert and llvm_unreachable

Extracted these changes to https://reviews.llvm.org/D156136. Let's discuss there.

  • change of the behavior for ExternalSymbolSDNode: this was not a part of Diff 1, so I assume that it was not a part of the @tamird's initial intent. Is there a use case for this change? The libbpf commit 0a34245 deals only with external maps, not external functions.

When compiling Rust code, we see this warning trigger on memset intrinsics, but this is not a problem because we provide an implementation of memset and the intrinsic is eventually lowered to a call to the function we provide. Is there a finer-grained change that would achieve our goals?

  • changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.

I agree in principle, but as note in code comments, this produces worse diagnostics when multiple functions contain such errors. Consider also that it was important to avoid exiting, hence D20571, but I do not understand why. Perhaps @ast can fill this in.

I think that all three should be tracked as separate revisions.

Will do once you have some guidance for me on the above.

ast requested changes to this revision.Jul 24 2023, 10:36 AM
This revision now requires changes to proceed.Jul 24 2023, 10:36 AM
  • change of the behavior for ExternalSymbolSDNode: this was not a part of Diff 1, so I assume that it was not a part of the @tamird's initial intent. Is there a use case for this change? The libbpf commit 0a34245 deals only with external maps, not external functions.

When compiling Rust code, we see this warning trigger on memset intrinsics, but this is not a problem because we provide an implementation of memset and the intrinsic is eventually lowered to a call to the function we provide. Is there a finer-grained change that would achieve our goals?

Could you please point me to some example (on github or something like that)?
I want to insert some prints to see what is the difference in DAG nodes between this and functions declared extern in C code.

  • changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.

I agree in principle, but as note in code comments, this produces worse diagnostics when multiple functions contain such errors. Consider also that it was important to avoid exiting, hence D20571, but I do not understand why. Perhaps @ast can fill this in.

If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().

  • change of the behavior for ExternalSymbolSDNode: this was not a part of Diff 1, so I assume that it was not a part of the @tamird's initial intent. Is there a use case for this change? The libbpf commit 0a34245 deals only with external maps, not external functions.

When compiling Rust code, we see this warning trigger on memset intrinsics, but this is not a problem because we provide an implementation of memset and the intrinsic is eventually lowered to a call to the function we provide. Is there a finer-grained change that would achieve our goals?

Could you please point me to some example (on github or something like that)?
I want to insert some prints to see what is the difference in DAG nodes between this and functions declared extern in C code.

This reproduces in the aya integration tests in this PR on this commit: https://github.com/aya-rs/bpf-linker/pull/77/commits/6624b7d673a7b65cc634e94a4a8a63f79aa90f64

It's not exactly a simple repro. I can try to give you just the IR if that'd be helpful?

  • changes in LowerCallResult and LowerOperation regarding undef: I think both zeros and undef's are a bad choice here. If we can't compile it we should report error and abort compilation.

I agree in principle, but as note in code comments, this produces worse diagnostics when multiple functions contain such errors. Consider also that it was important to avoid exiting, hence D20571, but I do not understand why. Perhaps @ast can fill this in.

If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().

OK, I can do that, but that still doesn't address the original motivation. @ast can you please explain why exiting was considered as something to avoid?

This reproduces in the aya integration tests in this PR on this commit: https://github.com/aya-rs/bpf-linker/pull/77/commits/6624b7d673a7b65cc634e94a4a8a63f79aa90f64

It's not exactly a simple repro. I can try to give you just the IR if that'd be helpful?

Yes, please, that would be helpful.

This reproduces in the aya integration tests in this PR on this commit: https://github.com/aya-rs/bpf-linker/pull/77/commits/6624b7d673a7b65cc634e94a4a8a63f79aa90f64

It's not exactly a simple repro. I can try to give you just the IR if that'd be helpful?

Yes, please, that would be helpful.

https://gist.github.com/tamird/f3ff414f183b085c3e37cd7d5b5754de

This reproduces in the aya integration tests in this PR on this commit: https://github.com/aya-rs/bpf-linker/pull/77/commits/6624b7d673a7b65cc634e94a4a8a63f79aa90f64

It's not exactly a simple repro. I can try to give you just the IR if that'd be helpful?

Yes, please, that would be helpful.

https://gist.github.com/tamird/f3ff414f183b085c3e37cd7d5b5754de

I use a reasonably fresh LLVM main (88b6d291bbdf) with the following debug print:

diff --git a/llvm/lib/Target/BPF/BPFISelLowering.cpp b/llvm/lib/Target/BPF/BPFISelLowering.cpp
index 83a4bfb2f758..7937f8918fc8 100644
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -455,6 +455,8 @@ SDValue BPFTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
     InGlue = Chain.getValue(1);
   }
 
+  llvm::dbgs() << "Callee: ";
+  Callee.dump();
   // If the callee is a GlobalAddress node (quite common, every direct call is)
   // turn it into a TargetGlobalAddress node so that legalize doesn't hack it.
   // Likewise ExternalSymbol -> TargetExternalSymbol.

I try to llc the above gist and see the following:

$ llc -mcpu=v3 extern-builtins-from-tamird.ll -o /dev/null && echo "done"
Callee: t10: i64 = Constant<1>
Callee: t81: i64 = Constant<25>
Callee: t8: i64 = Constant<1>
Callee: t81: i64 = Constant<25>
Callee: t8: i64 = Constant<1>
Callee: t81: i64 = Constant<25>
Callee: t8: i64 = Constant<1>
Callee: t82: i64 = Constant<25>
Callee: t8: i64 = Constant<1>
Callee: t81: i64 = Constant<25>
Callee: t8: i64 = Constant<1>
Callee: t81: i64 = Constant<25>
done

It looks like the ExternalSymbolSDNode branch is not triggered by the gist :(

Yeah, I also can't reproduce using llc. But in the actual code this module is fed into LLVMTargetMachineEmitToFile and those errors do show up.

Yeah, I also can't reproduce using llc. But in the actual code this module is fed into LLVMTargetMachineEmitToFile and those errors do show up.

Basing on the previous comments I assume that one of the functions for which such errors are printed is llvm.memset.p0.i64.
When llc processes the module.ll example that you have shared I see that calls to llvm.memset.p0.i64 are replaced by store instructions, e.g. the following IR:

%8 = call noundef ptr inttoptr (i64 1 to ptr)(ptr noundef nonnull @AYA_LOG_BUF, ptr noundef nonnull %2) #8
...
%68 = getelementptr inbounds i8, ptr %8, i64 62
...
call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(7) %68, i8 0, i64 7, i1 false)

Is converted to the following byte code:

r1 = AYA_LOG_BUF ll
call 1                 # bpf_map_lookup_elem
...
r1 = 0
*(u8 *)(r0 + 69) = w1
...
*(u8 *)(r0 + 62) = w1

This replacement happens when selection DAG is constructed (I assume this is default lowering logic for memset).
So, whatever is calling LLVMTargetMachineEmitToFile on your side must do some adjustments to the BPFISelLowering logic.
What tool does the LLVMTargetMachineEmitToFile call? Is it open source? Or am I confused?

Yeah, it's open source. All the code is in https://github.com/aya-rs/bpf-linker, and in particular the LLVMTargetMachineEmitToFile call is https://github.com/aya-rs/bpf-linker/blob/28bfade2a4c44911a8813b418e3bdfd39b9e3b6a/src/llvm/mod.rs#L279-L299

I've been mucking with this locally. I wrote a simple C program, that loads the IR and calls LLVMTargetMachineEmitToFile, and I can confirm that I *do not* see any warnings. Very strange. Here's the code: https://gist.github.com/tamird/e180a08d00a6572d95ae2035594ddd1e

I also modified bpf-linker to cycle the module to disk and back, and while I still see the error, the number of times it is emitted goes from 6 to 1 (!!).

diff
diff --git a/src/llvm/mod.rs b/src/llvm/mod.rs
index 4cc4181..f2e66fe 100644
--- a/src/llvm/mod.rs
+++ b/src/llvm/mod.rs
@@ -11,6 +11,7 @@ use std::{
 
 use libc::c_char as libc_char;
 use llvm_sys::bit_reader::*;
+use llvm_sys::ir_reader::*;
 use llvm_sys::core::*;
 use llvm_sys::debuginfo::LLVMStripModuleDebugInfo;
 use llvm_sys::linker::LLVMLinkModules2;
@@ -47,6 +48,7 @@ unsafe fn parse_command_line_options<T: AsRef<str>>(args: &[T], overview: &str)
         .collect::<Vec<_>>();
     let c_ptrs = c_args.iter().map(|s| s.as_ptr()).collect::<Vec<_>>();
     let overview = CString::new(overview).unwrap();
+    error!("args={:?}, overview={:?}", c_args, overview);
     LLVMParseCommandLineOptions(c_ptrs.len() as i32, c_ptrs.as_ptr(), overview.as_ptr());
 }
 
@@ -284,6 +286,28 @@ pub unsafe fn codegen(
 ) -> Result<(), String> {
     let mut message = Message::new();
 
+    let path = "/home/tamird/src/aya/test/integration-ebpf/module.ll";
+            // dump IR for the final linked module for debugging purposes
+            let path = CString::new(path).unwrap();
+
+        write_ir(module, path.as_c_str()).unwrap();
+
+                    let mut buffer = ptr::null_mut();                                                 
+            let ret = unsafe { LLVMCreateMemoryBufferWithContentsOfFile(path.as_ptr(), &mut buffer, ptr::null_mut()) };
+            assert_eq!(ret, 0);                                                                                                                                                              
+                                         
+            let mut module = ptr::null_mut();
+            let ret = unsafe { LLVMParseIRInContext(LLVMGetGlobalContext(), buffer, &mut module, ptr::null_mut()) };
+            assert_eq!(ret, 0);                                                                                                                                                              
+                                                                                                                                                                                             
+            error!("buffer={:p}", buffer);    
+
+            let triple = "bpfel-unknown-none";
+            let target = target_from_triple(CString::new(triple).unwrap().as_c_str()).unwrap();
+            let tm = create_target_machine(target, triple, "", "").unwrap();
+
     if LLVMTargetMachineEmitToFile(

Before:

23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
 
 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
 
 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
 
 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
 
 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.
 
 23:45:03 [ERROR] llvm: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.

After:

23:51:04 [ERROR] buffer=0x5582fe80f5b0
error: <unknown>:0:0: in function test_log i32 (ptr): A call to built-in function 'memset' is not supported.

I do not understand what's going on here.

Actually, it seems that the change from multiple logged lines to a single one had to do with LLVMGetGlobalContext; if I pass the existing context then I get the same number of warning lines whether I cycle through disk or not.

Figured it out - if you pass --bpf-expand-memcpy-in-order to llc, you will reproduce the A call to built-in function 'memset' is not supported. prints.

Figured it out - if you pass --bpf-expand-memcpy-in-order to llc, you will reproduce the A call to built-in function 'memset' is not supported. prints.

I can see the error with this flag, thanks.

Also, here is a simpler example which could be triggered from clang:

$ cat test.c
extern void *memset(void *s, int c, unsigned long n);
extern void efoo(char *p);
void bar(char *p, unsigned long l) {
  memset(p, 0, l);
  efoo(p);
}

$ clang -O2 --target=bpf -mcpu=v3 test.c -emit-llvm -S -o test.ll
$ cat test.ll
...
; Function Attrs: nounwind
define dso_local void @bar(ptr noundef %p, i64 noundef %l) local_unnamed_addr #0 {
entry:
  tail call void @llvm.memset.p0.i64(ptr align 1 %p, i8 0, i64 %l, i1 false)
  tail call void @efoo(ptr noundef %p) #3
  ret void
}

; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

declare dso_local void @efoo(ptr noundef) local_unnamed_addr #2

attributes #0 = { nounwind "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="v3" }
attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #2 = { "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="v3" }
attributes #3 = { nounwind }
...
$ llc -mcpu=v3 test.ll -o - 
error: <unknown>:0:0: in function bar void (ptr, i64): A call to built-in function 'memset' is not supported.

Callee: t21: i64 = GlobalAddress<ptr @efoo> 0
	.globl	bar                             # -- Begin function bar
	.p2align	3
	.type	bar,@function
bar:                                    # @bar
.Lbar$local:
	.type	.Lbar$local,@function
# %bb.0:                                # %entry
	r3 = r2
	r6 = r1
	w2 = 0
	call memset  # <------ Note the leftover call to `memset`
	r1 = r6
	call efoo
	exit
.Lfunc_end0:
	.size	bar, .Lfunc_end0-bar
	.size	.Lbar$local, .Lfunc_end0-bar
                                        # -- End function

The warnings would be reported for builtin extern functions (list is here: llvm/include/llvm/IR/RuntimeLibcalls.def).
But will not be reported for regular extern functions, like efoo in the example above.
As far as I understand all BPF code that uses clang uses llvm.memset patterns that could be statically expanded as a bunch of stores and BPF runtime does not really provide memset implementation.
So on one hand it probably would be better to retain this warning for clang, but on the other hand these warnings really belong to linker not compiler.
If libbpf is used as a linker I think it would report an error if code w/o suitable memset implementation is loaded.
Is there a way to pass a list of supported builtins as a parameter here?
@yonghong-song, what do you think?

To better reiterate my position on the warning for ExternalSymbolSDNode: in C analogy such warnings would be generated by the linker, not compiler.
So, I vote to agree with @tamird here and remove the warning (or at-least make it optional).

tamird updated this revision to Diff 544354.Jul 26 2023, 7:07 AM

Updated to only allow external calls.

tamird retitled this revision from [BPF] Do not miscompile, allow external calls to [BPF] allow external calls.Jul 26 2023, 7:11 AM
tamird edited the summary of this revision. (Show Details)
ast added inline comments.Jul 26 2023, 4:45 PM
llvm/lib/Target/BPF/BPFISelLowering.cpp
466

How does it help aya?

In general I think it's ok to remove this, but we need to define the behavior more accurately here.
Does it mean that we promote builtin to global ? What is the visibility of such symbol? What relocation do we use ? What libbpf and libbpf linker should do with it?

In particular with memcpy, do we generate asm that calls "llvm.memcpy.p0.p0.i64" name or does it become plain "memcpy" ?

I applied this patch and did an experiment with bpftool linker functionality. The following are details.

First let us try one example with func mmemset which is not a builtin function.

$ cat t1.c                                                                                                                                  
#define __hidden __attribute__((visibility("hidden")))                                                                                                                      
__hidden extern void *mmemset(void *s, int c, unsigned long n);                                                                                                             
void do_memset(char *x, unsigned long l) {                                                                                                                                  
  mmemset(x, 0, l);                                                                                                                                                         
}                                                                                                                                                                           
$ cat t2.c                                                                                                                                  
void *mmemset(void *s, int c, unsigned long n);                                                                                                                             
void *mmemset(void *x, int c, unsigned long n) {
  // custom memset, I am using step of 2, otherwise the
  // compiler is smart enough and may transform
  // the code back to memset!
  char *ch = x;
  for (int i = 0; i < n; i+=2) ch[i] = c;
  return 0;
}
$ clang --target=bpf -O2 -g -c t1.c                    
$ clang --target=bpf -O2 -g -c t2.c                    
$ bpftool gen object final.o t1.o t2.o                 
$ llvm-readelf -s t1.o
   
Symbol table '.symtab' contains 13 entries: 
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS t1.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     2 .text
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT     4 .debug_loclists
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT     5 .debug_abbrev
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT     8 .debug_str_offsets
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    10 .debug_str
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    11 .debug_addr               
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    16 .debug_frame
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    18 .debug_line               
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    20 .debug_line_str
    11: 0000000000000000    32 FUNC    GLOBAL DEFAULT     2 do_memset                 
    12: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN    UND mmemset
$ llvm-readelf -s t2.o                                 
    
Symbol table '.symtab' contains 14 entries: 
   Num:    Value          Size Type    Bind   Vis       Ndx Name                      
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS t2.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     2 .text
     3: 0000000000000038     0 NOTYPE  LOCAL  DEFAULT     2 LBB0_3
     4: 0000000000000010     0 NOTYPE  LOCAL  DEFAULT     2 LBB0_2
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT     3 .debug_loclists
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT     4 .debug_abbrev
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT     7 .debug_str_offsets
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT     9 .debug_str
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    10 .debug_addr
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    15 .debug_frame
    11: 0000000000000000     0 SECTION LOCAL  DEFAULT    17 .debug_line
    12: 0000000000000000     0 SECTION LOCAL  DEFAULT    19 .debug_line_str
    13: 0000000000000000    72 FUNC    GLOBAL DEFAULT     2 mmemset
$ llvm-readelf -s final.o

Symbol table '.symtab' contains 8 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS t1.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     3 .text
     3: 0000000000000000    32 FUNC    GLOBAL DEFAULT     3 do_memset
     4: 0000000000000020    72 FUNC    GLOBAL HIDDEN      3 mmemset
     5: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS t2.c
     6: 0000000000000058     0 NOTYPE  LOCAL  DEFAULT     3 LBB0_3
     7: 0000000000000030     0 NOTYPE  LOCAL  DEFAULT     3 LBB0_2

So bpftool linking succeeded with global function mmemset.

Now let rename the mmemset to memset in the above t1.c and t2.c.

$ cat t1.c
#define __hidden __attribute__((visibility("hidden")))
__hidden extern void *memset(void *s, int c, unsigned long n);
void do_memset(char *x, unsigned long l) {
  memset(x, 0, l);
}
$ cat t2.c
void *memset(void *s, int c, unsigned long n);
void *memset(void *x, int c, unsigned long n) {
  // custom memset, I am using step of 2, otherwise the
  // compiler is smart enough and may transform
  // the code back to memset!
  char *ch = x;
  for (int i = 0; i < n; i+=2) ch[i] = c;
  return 0;
}
$ clang --target=bpf -O2 -g -c t1.c
$ clang --target=bpf -O2 -g -c t2.c
$ bpftool gen object final.o t1.o t2.o
libbpf: failed to find BTF info for global/extern symbol 'memset'
Error: failed to link 't1.o': No such file or directory (2)
$ llvm-readelf -s t1.o

Symbol table '.symtab' contains 13 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS t1.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     2 .text
     3: 0000000000000000     0 SECTION LOCAL  DEFAULT     4 .debug_loclists
     4: 0000000000000000     0 SECTION LOCAL  DEFAULT     5 .debug_abbrev
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT     8 .debug_str_offsets
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT    10 .debug_str
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT    11 .debug_addr
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT    16 .debug_frame
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    18 .debug_line
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    20 .debug_line_str
    11: 0000000000000000    32 FUNC    GLOBAL DEFAULT     2 do_memset
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND memset
$ llvm-readelf -s t2.o

Symbol table '.symtab' contains 14 entries:
   Num:    Value          Size Type    Bind   Vis       Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT   UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT   ABS t2.c
     2: 0000000000000000     0 SECTION LOCAL  DEFAULT     2 .text
     3: 0000000000000038     0 NOTYPE  LOCAL  DEFAULT     2 LBB0_3
     4: 0000000000000010     0 NOTYPE  LOCAL  DEFAULT     2 LBB0_2
     5: 0000000000000000     0 SECTION LOCAL  DEFAULT     3 .debug_loclists
     6: 0000000000000000     0 SECTION LOCAL  DEFAULT     4 .debug_abbrev
     7: 0000000000000000     0 SECTION LOCAL  DEFAULT     7 .debug_str_offsets
     8: 0000000000000000     0 SECTION LOCAL  DEFAULT     9 .debug_str
     9: 0000000000000000     0 SECTION LOCAL  DEFAULT    10 .debug_addr
    10: 0000000000000000     0 SECTION LOCAL  DEFAULT    15 .debug_frame
    11: 0000000000000000     0 SECTION LOCAL  DEFAULT    17 .debug_line
    12: 0000000000000000     0 SECTION LOCAL  DEFAULT    19 .debug_line_str
    13: 0000000000000000    72 FUNC    GLOBAL DEFAULT     2 memset

The major difference is in t1.o,

    12: 0000000000000000     0 NOTYPE  GLOBAL HIDDEN    UND mmemset
vs.
    12: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT   UND memset

So for builtin function memset, the 'visibility' is not set to HIDDEN although
we have an attribute for it.

I would like to have more study for this to understand why and how to fix it
before we lift the error message.

I checked the code and found that ExternalSymbolSDNode is indeed corresponding to builtin/library functions.
In SelectionDAG/SelectionDAG.cpp, we have

                    getExternalSymbol(TLI->getLibcallName(RTLIB::MEMCPY),
...
                    getExternalSymbol(TLI->getLibcallName(LibraryCall),
...
                    getExternalSymbol(TLI->getLibcallName(RTLIB::MEMMOVE),
...
                    getExternalSymbol(TLI->getLibcallName(LibraryCall),
...
        getExternalSymbol(BzeroName, TLI->getPointerTy(DL)), std::move(Args));
...
                     getExternalSymbol(TLI->getLibcallName(RTLIB::MEMSET),
...
                    getExternalSymbol(TLI->getLibcallName(LibraryCall),
...
  SDValue Callee = getExternalSymbol(TLI->getLibcallName(LC),
tamird added inline comments.Jul 27 2023, 3:54 AM
llvm/lib/Target/BPF/BPFISelLowering.cpp
466

How does it help aya?

In aya, we provide implementations of these built-ins (memset, memcpy, etc.). In bpf-linker, we combine all the IR into a single module before codegen. Currently this error fires, which we must treat as fatal or ignore by string matching.

It turns out that codegen generates a call to plain "memset", and the resulting code is completely valid. So this error is spurious.

In general I think it's ok to remove this, but we need to define the behavior more accurately here.
Does it mean that we promote builtin to global ? What is the visibility of such symbol? What relocation do we use ? What libbpf and libbpf linker should do with it?

Looks like yonghong discovered that this symbol is indeed promoted to global.

In particular with memcpy, do we generate asm that calls "llvm.memcpy.p0.p0.i64" name or does it become plain "memcpy" ?

Yes, it becomes "plain".

tamird abandoned this revision.Jul 27 2023, 4:02 PM

@yonghong-song I have learned some things. If you repeat your experiment (even without this patch) but pass -fno-builtin when compiling t1.c, everything works.

Here's the diff in IR:

diff --git a/t1.ll b/t1-fno-builtin.ll
index 64d172bf9ab5..324a98687ade 100644
--- a/t1.ll
+++ b/t1-fno-builtin.ll
@@ -3,24 +3,24 @@ source_filename = "t1.c"
 target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
 target triple = "bpf"
 
-; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(argmem: write)
-define dso_local void @do_memset(ptr nocapture noundef writeonly %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 {
+; Function Attrs: nounwind
+define dso_local void @do_memset(ptr noundef %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 {
 entry:
   call void @llvm.dbg.value(metadata ptr %x, metadata !15, metadata !DIExpression()), !dbg !17
   call void @llvm.dbg.value(metadata i64 %l, metadata !16, metadata !DIExpression()), !dbg !17
-  tail call void @llvm.memset.p0.i64(ptr align 1 %x, i8 0, i64 %l, i1 false), !dbg !18
+  %call = tail call ptr @memset(ptr noundef %x, i32 noundef 0, i64 noundef %l) #3, !dbg !18
   ret void, !dbg !19
 }
 
-; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write)
-declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1
+declare !dbg !20 hidden ptr @memset(ptr noundef, i32 noundef, i64 noundef) local_unnamed_addr #1
 
 ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
 declare void @llvm.dbg.value(metadata, metadata, metadata) #2
 
-attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(argmem: write) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
-attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) }
+attributes #0 = { nounwind "frame-pointer"="all" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+attributes #1 = { "frame-pointer"="all" "no-builtins" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
+attributes #3 = { nobuiltin nounwind "no-builtins" }
 
 !llvm.dbg.cu = !{!0}
 !llvm.module.flags = !{!2, !3, !4, !5, !6}
@@ -46,3 +46,12 @@ attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memo
 !17 = !DILocation(line: 0, scope: !8)
 !18 = !DILocation(line: 4, column: 3, scope: !8)
 !19 = !DILocation(line: 5, column: 1, scope: !8)
+!20 = !DISubprogram(name: "memset", scope: !1, file: !1, line: 2, type: !21, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !25)
+!21 = !DISubroutineType(types: !22)
+!22 = !{!23, !23, !24, !13}
+!23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
+!24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!25 = !{!26, !27, !28}
+!26 = !DILocalVariable(name: "s", arg: 1, scope: !20, file: !1, line: 2, type: !23)
+!27 = !DILocalVariable(name: "c", arg: 2, scope: !20, file: !1, line: 2, type: !24)
+!28 = !DILocalVariable(name: "n", arg: 3, scope: !20, file: !1, line: 2, type: !13)

The same result is obtained if I annotate do_memset with __attribute__((no_builtin("memset"))):

diff --git a/t1.c b/t1-no-builtin.c
index 2698b334b898..3cded7c57e99 100644
--- a/t1.c
+++ b/t1-no-builtin.c
@@ -1,5 +1,5 @@
 #define __hidden __attribute__((visibility("hidden")))
 __hidden extern void *memset(void *s, int c, unsigned long n);
-void do_memset(char *x, unsigned long l) {
+void do_memset(char *x, unsigned long l) __attribute__((no_builtin("memset"))) {
   memset(x, 0, l);
 }
diff --git a/t1.ll b/t1-no-builtin.ll
index 64d172bf9ab5..f9a2b088084e 100644
--- a/t1.ll
+++ b/t1-no-builtin.ll
@@ -1,25 +1,25 @@
-; ModuleID = 't1.c'
-source_filename = "t1.c"
+; ModuleID = 't1-no-builtin.c'
+source_filename = "t1-no-builtin.c"
 target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
 target triple = "bpf"
 
-; Function Attrs: mustprogress nofree nosync nounwind willreturn memory(argmem: write)
-define dso_local void @do_memset(ptr nocapture noundef writeonly %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 {
+; Function Attrs: mustprogress nofree nounwind willreturn memory(argmem: readwrite)
+define dso_local void @do_memset(ptr noundef %x, i64 noundef %l) local_unnamed_addr #0 !dbg !8 {
 entry:
   call void @llvm.dbg.value(metadata ptr %x, metadata !15, metadata !DIExpression()), !dbg !17
   call void @llvm.dbg.value(metadata i64 %l, metadata !16, metadata !DIExpression()), !dbg !17
-  tail call void @llvm.memset.p0.i64(ptr align 1 %x, i8 0, i64 %l, i1 false), !dbg !18
+  %call = tail call ptr @memset(ptr noundef %x, i32 noundef 0, i64 noundef %l), !dbg !18
   ret void, !dbg !19
 }
 
-; Function Attrs: mustprogress nocallback nofree nounwind willreturn memory(argmem: write)
-declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1
+; Function Attrs: mustprogress nofree nounwind willreturn memory(argmem: readwrite)
+declare !dbg !20 hidden ptr @memset(ptr noundef writeonly, i32 noundef, i64 noundef) local_unnamed_addr #1
 
 ; Function Attrs: nocallback nofree nosync nounwind speculatable willreturn memory(none)
 declare void @llvm.dbg.value(metadata, metadata, metadata) #2
 
-attributes #0 = { mustprogress nofree nosync nounwind willreturn memory(argmem: write) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
-attributes #1 = { mustprogress nocallback nofree nounwind willreturn memory(argmem: write) }
+attributes #0 = { mustprogress nofree nounwind willreturn memory(argmem: readwrite) "frame-pointer"="all" "no-builtin-memset" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
+attributes #1 = { mustprogress nofree nounwind willreturn memory(argmem: readwrite) "frame-pointer"="all" "no-trapping-math"="true" "stack-protector-buffer-size"="8" }
 attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
 
 !llvm.dbg.cu = !{!0}
@@ -27,7 +27,7 @@ attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memo
 !llvm.ident = !{!7}
 
 !0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 17.0.0 (https://github.com/llvm/llvm-project.git 52807ba3836ef30c54622f227c4a23e2eb9375cd)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
-!1 = !DIFile(filename: "t1.c", directory: "/home/tamird/src/llvm-project-latest/foo", checksumkind: CSK_MD5, checksum: "7d959830f84c1a82a1efb0fbb5b9c592")
+!1 = !DIFile(filename: "t1-no-builtin.c", directory: "/home/tamird/src/llvm-project-latest/foo", checksumkind: CSK_MD5, checksum: "4a930bd64fe9fcad8934a5f1a66a94d4")
 !2 = !{i32 7, !"Dwarf Version", i32 5}
 !3 = !{i32 2, !"Debug Info Version", i32 3}
 !4 = !{i32 1, !"wchar_size", i32 4}
@@ -46,3 +46,12 @@ attributes #2 = { nocallback nofree nosync nounwind speculatable willreturn memo
 !17 = !DILocation(line: 0, scope: !8)
 !18 = !DILocation(line: 4, column: 3, scope: !8)
 !19 = !DILocation(line: 5, column: 1, scope: !8)
+!20 = !DISubprogram(name: "memset", scope: !1, file: !1, line: 2, type: !21, flags: DIFlagPrototyped, spFlags: DISPFlagOptimized, retainedNodes: !25)
+!21 = !DISubroutineType(types: !22)
+!22 = !{!23, !23, !24, !13}
+!23 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: null, size: 64)
+!24 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+!25 = !{!26, !27, !28}
+!26 = !DILocalVariable(name: "s", arg: 1, scope: !20, file: !1, line: 2, type: !23)
+!27 = !DILocalVariable(name: "c", arg: 2, scope: !20, file: !1, line: 2, type: !24)
+!28 = !DILocalVariable(name: "n", arg: 3, scope: !20, file: !1, line: 2, type: !13)

bpftool gen object works with both of these.

I don't entirely understand the mechanism that causes memset to end up not-hidden, but it seems that BPF should always imply -fno-builtin for this reason. I've sent a PR to this effect in aya: https://github.com/aya-rs/aya/pull/698.

I'll abandon this, but it may be worthwhile to implement BPF-implies-fno-builtin *in clang*, if possible.

If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().

Doesn't this still run at the end of a function, rather than at the end of an entire file? We can report multiple functions have errored, but we can't emit all the errors in the file.

If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().

Doesn't this still run at the end of a function, rather than at the end of an entire file? We can report multiple functions have errored, but we can't emit all the errors in the file.

Indeed it does. My bad, for some reason I thought it does so at the end of module. And I don't see any hook that could be executed at the end of the module. Still, imho, backend should not proceed with compilation in such cases even if only first error is reported.

If we want to report this for multiple functions I think we can set some flag and exit in TargetLoweringBase::finalizeLowering().

Doesn't this still run at the end of a function, rather than at the end of an entire file? We can report multiple functions have errored, but we can't emit all the errors in the file.

Indeed it does. My bad, for some reason I thought it does so at the end of module. And I don't see any hook that could be executed at the end of the module. Still, imho, backend should not proceed with compilation in such cases even if only first error is reported.

Yes, we fixed the embedder. The question was is it possible to harden this such that future embedders don't make the same mistake.

ast added inline comments.Jul 28 2023, 4:45 PM
llvm/lib/Target/BPF/BPFISelLowering.cpp
466

Since backend couldn't unroll memset it should error hard.
The linking step doesn't exist most of the time.
From the user perspective it's better to fail at compile time instead of program loading time where the verifier errors are obscure.
I think it would be good to always unroll memset into a loop even when it's huge.
And similar with memcpy.
-fno-builtin and attribute((no_builtin)) tricks are unrelated to any of this.

I'll abandon this, but it may be worthwhile to implement BPF-implies-fno-builtin *in clang*, if possible.

Thanks @tamird Indeed -fno-builtin or attribute no-builtin can fix the issue. But I think we should not put no-builtin as default since based on my experiment a lot of users using memset with a known constant size, expecting the compiler will transform it into a load, and we do not want to break such use cases. But we could advise users to use -fno-builtin or no-builtin attribute if their usage of memset involves a unknown size to a very large known size. This way, we can keep the fail message here since survival of builtin memset func means bpftool linking cannot resolve it.

tamird added inline comments.Jul 28 2023, 6:53 PM
llvm/lib/Target/BPF/BPFISelLowering.cpp
466

Right. The no_builtin tricks are too big of hammers for what we want here, which is to still apply constant size optimizations but not apply special treatment to these functions when they've been defined by the user, which is what @yonghong-song was saying in his earlier comments. I may dig into that when I have some time, but for now we're just ignoring these errors in aya's linker.

Is there a place (issue tracker, mailing list, other) to document these musings and describe possible future work?

ast added a comment.Jul 28 2023, 8:09 PM

I may dig into that when I have some time, but for now we're just ignoring these errors in aya's linker.

what do you mean "ignoring in aya" ?
If bpf prog contains a call to memset and you link it with a generic unbounded loop in aya linker the verifier will reject such code.
I thought aya suppose to produce valid programs.

I may dig into that when I have some time, but for now we're just ignoring these errors in aya's linker.

what do you mean "ignoring in aya" ?
If bpf prog contains a call to memset and you link it with a generic unbounded loop in aya linker the verifier will reject such code.
I thought aya suppose to produce valid programs.

We're ignoring the error because the error is in fact not an error, the target is doing the right thing, and aya (well, rustc) always provides memory builtins so memset gets linked correctly and a valid program is produced.

ast added a comment.Jul 30 2023, 9:44 AM

I may dig into that when I have some time, but for now we're just ignoring these errors in aya's linker.

what do you mean "ignoring in aya" ?
If bpf prog contains a call to memset and you link it with a generic unbounded loop in aya linker the verifier will reject such code.
I thought aya suppose to produce valid programs.

We're ignoring the error because the error is in fact not an error, the target is doing the right thing, and aya (well, rustc) always provides memory builtins so memset gets linked correctly and a valid program is produced.

Could you point to memset implementation that gets linked eventually?
Is it a global function or static ? Asking because they're verified differently by the kernel.
I have a hard time imagining a generic implementation that can pass the verifier.

We're ignoring the error because the error is in fact not an error, the target is doing the right thing, and aya (well, rustc) always provides memory builtins so memset gets linked correctly and a valid program is produced.

Could you point to memset implementation that gets linked eventually?

Normally in rust programs, memory builtins are provided by the compiler-builtins crate, which is the equivalent of compiler-rt in LLVM. It's automatically linked in by the compiler to provide libcalls used by LLVM. This is the code https://github.com/rust-lang/compiler-builtins/blob/efd227f9c2ac220a8699188e937eb8be810df317/src/mem/mod.rs#L46. As you see it returns the dst pointer, which breaks the verifier if it's used on stack pointers (which is something rustc sometimes does when initializing stack values). So we have an override in aya that is pretty much the same but returns void.

Is it a global function or static ? Asking because they're verified differently by the kernel.
I have a hard time imagining a generic implementation that can pass the verifier.

I am aware - it's a static so it works. And also the bounds for the inputs are statically known in all the cases where rustc uses memset internally (eg to zero initialize values with a fixed known type layout). The usual bound checks are required when calling something like https://doc.rust-lang.org/stable/std/primitive.slice.html#method.copy_from_slice.