This is an archive of the discontinued LLVM Phabricator instance.

[BPF] do compile-once run-everywhere relocation for bitfields
ClosedPublic

Authored by yonghong-song on Sep 24 2019, 1:42 PM.

Details

Summary

A bpf specific clang intrinsic is introduced:

u32 __builtin_preserve_field_info(member_access, info_kind)

Depending on info_kind, different information will
be returned to the program. A relocation is also
recorded for this builtin so that bpf loader can
patch the instruction on the target host.
This clang intrinsic is used to get certain information
to facilitate struct/union member relocations.

The offset relocation is extended by 4 bytes to
include relocation kind.
Currently supported relocation kinds are
enum {

FIELD_BYTE_OFFSET = 0,
FIELD_BYTE_SIZE,
FIELD_EXISTENCE,
FIELD_SIGNEDNESS,
FIELD_LSHIFT_U64,
FIELD_RSHIFT_U64,

};
for __builtin_preserve_field_info. The old
access offset relocation is covered by

FIELD_BYTE_OFFSET = 0.

An example:
struct s {

int a;
int b1:9;
int b2:4;

};
enum {

FIELD_BYTE_OFFSET = 0,
FIELD_BYTE_SIZE,
FIELD_EXISTENCE,
FIELD_SIGNEDNESS,
FIELD_LSHIFT_U64,
FIELD_RSHIFT_U64,

};

void bpf_probe_read(void *, unsigned, const void *);
int field_read(struct s *arg) {

unsigned long long ull = 0;
unsigned offset = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_OFFSET);
unsigned size = __builtin_preserve_field_info(arg->b2, FIELD_BYTE_SIZE);

#ifdef USE_PROBE_READ

bpf_probe_read(&ull, size, (const void *)arg + offset);
unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);

#if BYTE_ORDER == ORDER_BIG_ENDIAN

lshift = lshift + (size << 3) - 64;

#endif
#else

switch(size) {
case 1:
  ull = *(unsigned char *)((void *)arg + offset); break;
case 2:
  ull = *(unsigned short *)((void *)arg + offset); break;
case 4:
  ull = *(unsigned int *)((void *)arg + offset); break;
case 8:
  ull = *(unsigned long long *)((void *)arg + offset); break;
}
unsigned lshift = __builtin_preserve_field_info(arg->b2, FIELD_LSHIFT_U64);

#endif

ull <<= lshift;
if (__builtin_preserve_field_info(arg->b2, FIELD_SIGNEDNESS))
  return (long long)ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);
return ull >> __builtin_preserve_field_info(arg->b2, FIELD_RSHIFT_U64);

}

There is a minor overhead for bpf_probe_read() on big endian.

The code and relocation generated for field_read where bpf_probe_read() is
used to access argument data on little endian mode:

r3 = r1
r1 = 0
r1 = 4  <=== relocation (FIELD_BYTE_OFFSET)
r3 += r1
r1 = r10
r1 += -8
r2 = 4  <=== relocation (FIELD_BYTE_SIZE)
call bpf_probe_read
r2 = 51 <=== relocation (FIELD_LSHIFT_U64)
r1 = *(u64 *)(r10 - 8)
r1 <<= r2
r2 = 60 <=== relocation (FIELD_RSHIFT_U64)
r0 = r1
r0 >>= r2
r3 = 1  <=== relocation (FIELD_SIGNEDNESS)
if r3 == 0 goto LBB0_2
r1 s>>= r2
r0 = r1

LBB0_2:

exit

Compare to the above code between relocations FIELD_LSHIFT_U64 and
FIELD_LSHIFT_U64, the code with big endian mode has four more
instructions.

r1 = 41   <=== relocation (FIELD_LSHIFT_U64)
r6 += r1
r6 += -64
r6 <<= 32
r6 >>= 32
r1 = *(u64 *)(r10 - 8)
r1 <<= r6
r2 = 60   <=== relocation (FIELD_RSHIFT_U64)

The code and relocation generated when using direct load.

r2 = 0
r3 = 4
r4 = 4
if r4 s> 3 goto LBB0_3
if r4 == 1 goto LBB0_5
if r4 == 2 goto LBB0_6
goto LBB0_9

LBB0_6: # %sw.bb1

r1 += r3
r2 = *(u16 *)(r1 + 0)
goto LBB0_9

LBB0_3: # %entry

if r4 == 4 goto LBB0_7
if r4 == 8 goto LBB0_8
goto LBB0_9

LBB0_8: # %sw.bb9

r1 += r3
r2 = *(u64 *)(r1 + 0)
goto LBB0_9

LBB0_5: # %sw.bb

r1 += r3
r2 = *(u8 *)(r1 + 0)
goto LBB0_9

LBB0_7: # %sw.bb5

r1 += r3
r2 = *(u32 *)(r1 + 0)

LBB0_9: # %sw.epilog

r1 = 51
r2 <<= r1
r1 = 60
r0 = r2
r0 >>= r1
r3 = 1
if r3 == 0 goto LBB0_11
r2 s>>= r1
r0 = r2

LBB0_11: # %sw.epilog

exit

Considering verifier is able to do limited constant
propogation following branches. The following is the
code actually traversed.

r2 = 0
r3 = 4   <=== relocation
r4 = 4   <=== relocation
if r4 s> 3 goto LBB0_3

LBB0_3: # %entry

if r4 == 4 goto LBB0_7

LBB0_7: # %sw.bb5

r1 += r3
r2 = *(u32 *)(r1 + 0)

LBB0_9: # %sw.epilog

r1 = 51   <=== relocation
r2 <<= r1
r1 = 60   <=== relocation
r0 = r2
r0 >>= r1
r3 = 1
if r3 == 0 goto LBB0_11
r2 s>>= r1
r0 = r2

LBB0_11: # %sw.epilog

exit

For native load case, the load size is calculated to be the
same as the size of load width LLVM otherwise used to load
the value which is then used to extract the bitfield value.

Diff Detail

Event Timeline

yonghong-song created this revision.Sep 24 2019, 1:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 24 2019, 1:42 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
yonghong-song edited the summary of this revision. (Show Details)

builtin return u32 instead of u64.

yonghong-song retitled this revision from [WIP][CLANG][BPF] implement clang __builtin_bitfield_info() intrinsic to [WIP][CLANG][BPF] do compile-once run-everywhere relocation for bitfields.
yonghong-song edited the summary of this revision. (Show Details)

add BPF backend support

yonghong-song edited the summary of this revision. (Show Details)

use a bpf target builtin __builtin_get_field_info().

yonghong-song edited the summary of this revision. (Show Details)

return byte_size/offset/lshift/rshift based on size alignment

anakryiko added inline comments.Oct 2 2019, 9:17 PM
clang/lib/CodeGen/CGBuiltin.cpp
9325–9326

move this under if (!getDebugInfo) above and ditch IsError completely?

llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp
235 ↗(On Diff #222910)

getConstant returns u64, but AccessIndex is u32, is that a concern?

254 ↗(On Diff #222910)

just curious, why did you decide to not follow llvm.preserve.xxx pattern here? E.g., llvm.preserve.field.info would read just fine, I guess?

533 ↗(On Diff #222910)

is it possible to allow capturing same bitfield information for non-bitfield integers?

E.g, if my current vmlinux.h has some field as plain integer, but I know that some older kernel (or maybe future kernels?) had this field as a bitfield, I won't have to do anything nasty just to handle this generically. The less artificial restrictions we have, the better, IMO.

536 ↗(On Diff #222910)

I think we should be generic here and allow up to 128 bits and let libbpf handle its limitations separately (e.g., if we can't support >64 bits initially, libbpf will reject the program; but if we ever extend the support, at least we won't have to go back to Clang and fix it up).

541–548 ↗(On Diff #222910)

I'll need to do some drawing and math to check how this is going to be handled in big-endian and little-endian, I'll get back to you later with that.

600 ↗(On Diff #222910)

nit: I can't from the top of my head figure out what takes precedence: shift or multiplication, maybe add parens to make it obvious?

llvm/lib/Target/BPF/BTFDebug.cpp
1163 ↗(On Diff #222910)

why int64?

llvm/lib/Target/BPF/BTFDebug.h
227 ↗(On Diff #222910)

nit: this is not BTFOffsetReloc anymore, more like BTFFieldReloc

yonghong-song marked 9 inline comments as done.Oct 3 2019, 6:38 AM
yonghong-song added inline comments.
clang/lib/CodeGen/CGBuiltin.cpp
9325–9326

yes. previously I have two case of errors. Now only have one, it makes sense to merge them.

llvm/lib/Target/BPF/BPFAbstractMemberAccess.cpp
235 ↗(On Diff #222910)

it should be okay. the call is generated by clang where the access index will be in the range. Also, llvm ConstantInt class can only return 64bit values.
https://llvm.org/doxygen/classllvm_1_1ConstantInt.html
We just do the cast here. should be okay.

254 ↗(On Diff #222910)

previously i am using preserve.field.info and switch to get.field.info as it is indeed to return certain field info to the user. maybe we should get back to preserve.field.info to indicate that relocation is generated. will do.

533 ↗(On Diff #222910)

Yes, we can remove this restriction. Let us first get bitfield interface right and then extend them to nonbitfield.

536 ↗(On Diff #222910)

We can do this.

541–548 ↗(On Diff #222910)

sounds great.

600 ↗(On Diff #222910)

sure.

llvm/lib/Target/BPF/BTFDebug.cpp
1163 ↗(On Diff #222910)

Will change to u32.

llvm/lib/Target/BPF/BTFDebug.h
227 ↗(On Diff #222910)

Yes, it becomes FieldReloc now.

yonghong-song edited the summary of this revision. (Show Details)

using 64bit buf always, handling non bitfield members, etc.

yonghong-song edited the summary of this revision. (Show Details)Oct 4 2019, 9:27 AM
ast added inline comments.Oct 5 2019, 6:48 PM
llvm/lib/Target/BPF/BPFCORE.h
17 ↗(On Diff #223233)

why ACCESS_OFFSET is necessary?
Isn't it the same as BYTE_OFFSET but for non-bitfield?
May be single kind will do?

22 ↗(On Diff #223233)

All these names are not added as builtin enum before compilation starts, right?
So C program cannot just use FIELD_BYTE_OFFSET and needs to define its own enum first?
How about FIELD_LSHIFT_U64 instead?

23 ↗(On Diff #223233)

FIELD_RSHIFT_U64 ?

yonghong-song marked 4 inline comments as done and an inline comment as not done.Oct 5 2019, 8:03 PM
yonghong-song added inline comments.
llvm/lib/Target/BPF/BPFCORE.h
17 ↗(On Diff #223233)

This is the relocation type in the .BTF.ext. FIELD_ACCESS_OFFSET corresponds to odd offset relocation. The below newly added relocations are for field info.
I am using relocation enum except FIELD_ACCESS_OFFSET to be used for field_info in __builtin_preserve_field_info().

22 ↗(On Diff #223233)

Yes, C programs should define its own enum to have corresponding values first.
Do not familiar within builtin enum's. Let me take a look.

FIELD_LSHIFT_U64 is shorter and looks good.

23 ↗(On Diff #223233)

Yes, sounds better.

All these names are not added as builtin enum before compilation starts, right?

Do you prefer to have builtin enum? If we do this, we will emit this enum all the time,
regardless of whether people uses this builtin or not. Are you worried about API
stability?

I think defining explicitly in libbpf/btf.h probably a better choice?
Defining commonly used macros like bpf or BPF
makes more sense. Hidden enum's seems really rare.

yonghong-song marked an inline comment as done.Oct 5 2019, 8:43 PM
yonghong-song added inline comments.
llvm/lib/Target/BPF/BPFCORE.h
17 ↗(On Diff #223233)

Now I understand what you mean. Currently BYTE_OFFSET is calculated from the start of the last enclosing struct/union members while ACCESS_OFFSET could be the offset from outer struct/union like a.b.c.d.
Let me think whether we could consolidate two.

yonghong-song retitled this revision from [WIP][CLANG][BPF] do compile-once run-everywhere relocation for bitfields to [CLANG][BPF] do compile-once run-everywhere relocation for bitfields.
yonghong-song edited the summary of this revision. (Show Details)

fixed a few bugs from previous revision. Removed FIELD_ACCESS_OFFSET relocation which will be identical to FIELD_BYTE_OFFSET. All of these CORE related test cases are adjusted and passed now. Removed [WIP] prefix.
TODO: write new tests for the new functionality.

ast accepted this revision.Oct 6 2019, 4:22 PM

lgtm. thanks!

This revision is now accepted and ready to land.Oct 6 2019, 4:22 PM
yonghong-song retitled this revision from [CLANG][BPF] do compile-once run-everywhere relocation for bitfields to [BPF] do compile-once run-everywhere relocation for bitfields.
yonghong-song edited the summary of this revision. (Show Details)

add test cases. Also handle filed_info for field array elements properly.

ast added a comment.Oct 7 2019, 12:05 PM

thanks for adding the tests

yonghong-song edited the summary of this revision. (Show Details)

separate old FIELD_LSHIFT_U64 relocation to FIELD_LSHIFT_U64_BUF and FIELD_LSHIFT_U64_VAL. This allows to generate better code for native loads.

yonghong-song edited the summary of this revision. (Show Details)

use only one relocation for left shift to optimize for direct load. The bpf_probe_read() big endian mode could have 4 more arithmetic instructions compared to little endian mode. The performance impact should be negligible comparing to bpf_probe_read() itself.

ast accepted this revision.Oct 8 2019, 10:32 AM

perfect. ship it

This revision was automatically updated to reflect the committed changes.