This is an archive of the discontinued LLVM Phabricator instance.

[BPF] support for BPF_ST instruction in codegen
ClosedPublic

Authored by eddyz87 on Dec 31 2022, 7:46 AM.

Details

Summary

Generate store immediate instruction when CPUv4 is enabled.
For example:

$ cat test.c
struct foo {
  unsigned char  b;
  unsigned short h;
  unsigned int   w;
  unsigned long  d;
};
void bar(volatile struct foo *p) {
  p->b = 1;
  p->h = 2;
  p->w = 3;
  p->d = 4;
}

$ clang -O2 --target=bpf -mcpu=v4 test.c -c -o - | llvm-objdump -d -
...
0000000000000000 <bar>:
       0:	72 01 00 00 01 00 00 00	*(u8 *)(r1 + 0x0) = 0x1
       1:	6a 01 02 00 02 00 00 00	*(u16 *)(r1 + 0x2) = 0x2
       2:	62 01 04 00 03 00 00 00	*(u32 *)(r1 + 0x4) = 0x3
       3:	7a 01 08 00 04 00 00 00	*(u64 *)(r1 + 0x8) = 0x4
       4:	95 00 00 00 00 00 00 00	exit

Take special care to:

  • apply BPFMISimplifyPatchable::checkADDrr rewrite for BPF_ST
  • validate immediate value when BPF_ST write is 64-bit: BPF interprets (BPF_ST | BPF_MEM | BPF_DW) writes as writes with sign extension. Thus it is fine to generate such write when immediate is -1, but it is incorrect to generate such write when immediate is +0xffff_ffff.

Diff Detail

Event Timeline

eddyz87 created this revision.Dec 31 2022, 7:46 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 31 2022, 7:46 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
eddyz87 updated this revision to Diff 545494.Jul 30 2023, 5:48 PM

Generate BPF_ST when cpuv4 is specified.

eddyz87 updated this revision to Diff 547064.Aug 3 2023, 5:35 PM
eddyz87 retitled this revision from BPF: support for BPF_ST instruction in codegen to [BPF] support for BPF_ST instruction in codegen.
eddyz87 edited the summary of this revision. (Show Details)

BPFMISimplifyPatchable::checkADDrr for BPF_ST instructions.

eddyz87 published this revision for review.Aug 4 2023, 9:08 AM
eddyz87 added a reviewer: yonghong-song.

Hi Yonghong,

Could you please take a look at this revision? It enables generation of BPF_ST instruction when CPUv4 is selected.
When enabled the following kernel BPF selftests fail:

  • log_fixup/missing_map
  • spin_lock/lock_id_mapval_preserve
  • spin_lock/lock_id_innermapval_preserve

All failures are caused by the difference in the expected log messages (when BPF_ST is enabled less instructions are generated => instruction numbers in the log a slightly off). I will submit kernel patch to relax log messages after this revision is accepted (but before landing it).

Impact basing on the kernel selftests:

  • in total 653 *.bpf.o files are generated
  • 377 are identical
  • 265 have less instructions
  • 2 have more instructions

Most of the changes are obvious: sequences like r0 = 0; *(u64 *)(r10 - 8) = r0; are replaced by a single instruction. For tests where the number of instructions increased I took a closer look:

  • ip_check_defrag.bpf.o, # of insns increased from 58 to 63: a few more instructions are generated because of a difference in register allocation
  • pyperf_subprogs.bpf.o, # of insns increased from 4421 to 4434: I can't pinpoint a stage when additional instructions are generated, it seems to accumulate due to slight difference in register allocation and spilling decisions.
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2023, 9:08 AM

Note: CI failures seem to be completely unrelated...

yonghong-song accepted this revision.Aug 4 2023, 7:56 PM

LGTM except a few nits!

llvm/lib/Target/BPF/BPFInstrInfo.td
481

Extra space (' ') char here.

486

Maybe add an extra explanation here such that.
For such cases, a LD_imm64 insn is generated to store +0xffff_ffff, followed by a 64-bit STX store.

llvm/test/CodeGen/BPF/CORE/field-reloc-st-imm.ll
29 ↗(On Diff #547186)

Typically we put 'CHECK' inside the function or after the function. Could you put them after the 'bar' function?

This revision is now accepted and ready to land.Aug 4 2023, 7:56 PM
eddyz87 updated this revision to Diff 547613.Aug 6 2023, 4:54 PM
eddyz87 marked 3 inline comments as done.

Changes according to Yonghong's comments

This revision was automatically updated to reflect the committed changes.
eddyz87 reopened this revision.Aug 10 2023, 4:26 PM

Build bot reports a failure (link):

field-reloc-st-imm.ll:
*** Bad machine code: Explicit definition must be a register ***
- function:    bar
- basic block: %bb.0 entry (0x742f318)
- instruction: CORE_MEM 3, 416, %0:gpr, @"llvm.foo:0:4$0:2", debug-location !36; some-file.c:13:8
- operand 0:   3
*** Bad machine code: Explicit definition must be a register ***
- function:    bar
- basic block: %bb.0 entry (0x742f318)
- instruction: CORE_MEM 4, 410, %0:gpr, @"llvm.foo:0:8$0:3", debug-location !38; some-file.c:14:8
- operand 0:   4

I reverted commit to avoid obstructing buildbot, reopen this revision in order to investigate failure.

This revision is now accepted and ready to land.Aug 10 2023, 4:26 PM
eddyz87 updated this revision to Diff 549736.Aug 13 2023, 11:45 AM

Rebased on top of D157806.

Hi @yonghong-song,

The build bot failure (here) was caused by newly added test case field-reloc-st-imm.ll. When LLVM is built with LLVM_ENABLE_EXPENSIVE_CHECKS=ON flag this test triggered bug described in D157806. I rebased this revision on top of D157806. Could you please double check if you are still ok with these changes?

Still looks good. Thanks!

This revision was automatically updated to reflect the committed changes.