This is an archive of the discontinued LLVM Phabricator instance.

BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible
ClosedPublic

Authored by yonghong-song on Sep 4 2020, 9:32 AM.

Details

Summary

Move abstractMemberAccess and PreserveDIType passes as early as
possible, right after clang code generation.

Currently, compiler may transform the above code

p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
  p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
  p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
  bpf_probe_read(buf, buf_size, p2);
}

to

p1 = llvm.bpf.builtin.preserve.struct.access(base, 0, 0);
p2 = llvm.bpf.builtin.preserve.struct.access(p1, 1, 2);
a = llvm.bpf.builtin.preserve_field_info(p2, EXIST);
if (a) {
  bpf_probe_read(buf, buf_size, p2);
}

and eventually assembly code looks like

reloc_exist = 1;
reloc_member_offset = 10; //calculate member offset from base
p2 = base + reloc_member_offset;
if (reloc_exist) {
  bpf_probe_read(bpf, buf_size, p2);
}

if during libbpf relocation resolution, reloc_exist is actually
resolved to 0 (not exist), reloc_member_offset relocation cannot
be resolved and will be patched with illegal instruction.
This will cause verifier failure.

This patch attempts to address this issue by do chaining
analysis and replace chains with special globals right
after clang code gen. This will remove the cse possibility
described in the above. The IR typically looks like

%6 = load @llvm.sk_buff:0:50$0:0:0:2:0
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6

for a particular address computation relocation.

But this transformation has another consequence, code sinking
may happen like below:

PHI = <possibly different @preserve_*_access_globals>
%7 = bitcast %struct.sk_buff* %2 to i8*
%8 = getelementptr i8, i8* %7, %6

For such cases, we will not able to generate relocations since
multiple relocations are merged into one.

This patch introduced a passthrough builtin
to prevent such optimization. Looks like inline assembly has more
impact for optimizaiton, e.g., inlining. Using passthrough has
less impact on optimizations.

A new IR pass is introduced at the beginning of target-dependent
IR optimization, which does:

  • report fatal error if any reloc global in PHI nodes
  • remove all bpf passthrough builtin functions

Changes for existing CORE tests:

  • for clang tests, add "-Xclang -disable-llvm-passes" flags to avoid builtin->reloc_global transformation so the test is still able to check correctness for clang generated IR.
  • for llvm CodeGen/BPF tests, add "opt -O2 <ir_file> | llvm-dis" command before "llc" command since "opt" is needed to call newly-placed builtin->reloc_global transformation. Add target triple in the IR file since "opt" requires it.
  • Since target triple is added in IR file, if a test may produce different results for different endianness, two tests will be created, one for bpfeb and another for bpfel, e.g., some tests for relocation of lshift/rshift of bitfields.
  • field-reloc-bitfield-1.ll has different relocations compared to old codes. This is because for the structure in the test, new code returns struct layout alignment 4 while old code is 8. Align 8 is more precise and permits double load. With align 4, the new mechanism uses 4-byte load, so generating different relocations.
  • test intrinsic-transforms.ll is removed. This is used to test cse on intrinsics so we do not lose metadata. Now metadata is attached to global and not instruction, it won't get lost with cse.

Diff Detail

Event Timeline

yonghong-song created this revision.Sep 4 2020, 9:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 4 2020, 9:32 AM
yonghong-song requested review of this revision.Sep 4 2020, 9:32 AM
yonghong-song edited the summary of this revision. (Show Details)
  • introduced __builtin_bpf_passthrough() and use it for user barrier instead of inline asm. I found inline asm has some impact on inlining when running selftests (specifically checked for map_ptr_kern.c).
  • introduced an IR pass at the beginning of target dependent optimizations to (1). ensure reloc globals not in PHI node, (2). remove __builtin_bpf_passthrough(). (3). coalescing reloc globals if possible.

This patch can now pass all selftests.

ast added inline comments.Sep 10 2020, 1:16 PM
llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp
105

Instead of adding this attribute when bpf_passthrough is inserted by the compiler would it be possible to have this attribute attached to the builtin unconditionally? Meaning that the user could use such builtin in their C code and it would contain NoMerge as well ?

yonghong-song added inline comments.Sep 10 2020, 1:45 PM
llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp
105

All the __builtin_preserve_* builtins will be gone at this phase (the first phase after clang codegen) after they are replaced by globals. So here we are dealing with cases where they may not have calls in the basic block which contains relocation globals.

yonghong-song retitled this revision from [RFC] BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible to [RFC PATCH 1] BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible.Sep 14 2020, 6:12 PM
yonghong-song edited the summary of this revision. (Show Details)

add seq_num arg to passthrough builtin.

ast added inline comments.Sep 21 2020, 6:56 PM
llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp
1074

why did you remove the whole comment? Only 'NoMerge' part of it is no longer relevant, right?
But the rest is still very helpful to understand the intent or I'm missing something?

yonghong-song added inline comments.Sep 21 2020, 9:03 PM
llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp
1074

Okay, NoMerge part is irrelevant now. The sequence number of will prevent any kind of merging/CSE, not just tail merging. I will restore the comments.

yonghong-song retitled this revision from [RFC PATCH 1] BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible to BPF: move AbstractMemberAccess and PreserveDIType passes to EP_EarlyAsPossible.
yonghong-song edited the summary of this revision. (Show Details)
    • remove RFC tag. The patch is complete and all tests are adjusted and passed.
    • remove the adjustIR phase which tries to coalescing two reloc-globals. My previous implementation did not compare the *last* type in the first reloc-global and the *first* type in the second reloc-global. This will require to traverse the access string (0:1:1:...). This seems too much work for this little optimization. I will delay it into the future as an enhancement.
  • adjusted existing clang/llvm CORE tests so they can pass.
Herald added a project: Restricted Project. · View Herald TranscriptSep 22 2020, 11:57 PM
ast accepted this revision.Sep 28 2020, 1:29 PM
This revision is now accepted and ready to land.Sep 28 2020, 1:29 PM
This revision was landed with ongoing or failed builds.Sep 28 2020, 4:57 PM
This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/BPF/CORE/intrinsic-struct.ll