This is an archive of the discontinued LLVM Phabricator instance.

[WIP][BPF] Add sext load instructions
AbandonedPublic

Authored by yonghong-song on Sep 7 2022, 6:11 PM.

Details

Reviewers
ast
anakryiko
Summary

Add sign-extension load instructions.

For kernel bpf selftest pyperf180, the number of
instructions without this patch is 91472. With this
patch, the number of instructions is 80652. Most improvements
come from

r1 = *(u32 *)(r8 + 12) 
r1 <<= 32
r1 s>>= 32

to

r1 = *(s32 *)(r8 + 12)

Such a new instruction not only improves bpf code, but
can also improve jitted code since most architecture
has explicit sign extension insntructions.

To-Do list:

  • put the new functionality into e.g, -mcpu=v4
  • implement the linux kernel part to support the new insn

Diff Detail

Event Timeline

yonghong-song created this revision.Sep 7 2022, 6:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 6:11 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
yonghong-song requested review of this revision.Sep 7 2022, 6:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 7 2022, 6:11 PM
ast added inline comments.Sep 8 2022, 2:30 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
469

I believe x86 and arm64 have instructions that sign extends one register into another,
but not insns that sign extend during the load.
Such insns can be composed better to in case s32 came as a return value from a call.
Have you considered going that route?

anakryiko added inline comments.Sep 8 2022, 3:12 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
469

one of the motivations for this instruction was to allow libbpf to automatically change load instruction to a proper size of the field (https://github.com/libbpf/libbpf/blob/master/src/relo_core.c#L930-L940). This works currently only for unsigned fields. If we add some instruction which would have to be called after load, that will not help in this case. So maybe let's add both? Register-to-register would help with calls, load with sign extension will help CO-RE and generally signed loads?

anakryiko added inline comments.Sep 8 2022, 3:15 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
469

for call optimization, maybe having a family of calls that would perform desired size adjustment (32-bit to 64-bit) and optionally sign extension is another improvement?

E.g., what we have as "call" today would be basically 64-bit unsigned call, but we can have call{8,16,32,64}{s,u}, where call16s means that callee function is returning short in r0 and we request that r0 is sign-extended so it can be used as s64.

Just a thought.

ast added inline comments.Sep 8 2022, 3:18 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
469

calls don't have such attrs. that's no go.
I prefer to avoid new loads, since they don't map to insns directly.
libbpf should be able to insert an insn. It can adjust offsets already. Not that hard.

anakryiko added inline comments.Sep 8 2022, 3:23 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
469

Understood about call, np. And while you are technically correct about inserting instructions, that's a major PITA, so I'm not so keen about that.

But then re: x86 not having sign-extending loads. Isn't movsx exactly that (https://www.felixcloutier.com/x86/movsx:movsxd). Not an assembly expert, but seems like source can be a memory location, so that's exactly 1:1 mapping?

anakryiko added inline comments.Sep 8 2022, 3:28 PM
llvm/lib/Target/BPF/BPFInstrInfo.td
469

arm64 seems to have something like this as well? E.g., LDRSB (https://developer.arm.com/documentation/102374/0100/Loads-and-stores---zero-and-sign-extension)

All major architectures support sign extension load (x64, arm64, sparc, webassembly, lanai, etc.)

Yes, x64/arm64 support register->register sign extension (https://www.felixcloutier.com/x86/movsx:movsxd, https://developer.arm.com/documentation/dui0801/g/A64-General-Instructions/SXTW) let me take a look.

I hacked the kernel x64 jit part, with the following jit for sext load, it seems working okay from jit perspective.

+/* LDX: dst_reg = *(s8*)(src_reg + off) */
+static void emit_ldx_sext(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
+{
+       u8 *prog = *pprog;
+
+       switch (size) {
+       case BPF_B:
+               /* Emit 'movsx rax, byte ptr [rax + off]' */
+               EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xBE);
+               break;
+       case BPF_H:
+               /* Emit 'movsx rax, word ptr [rax + off]' */
+               EMIT3(add_2mod(0x48, src_reg, dst_reg), 0x0F, 0xBF);
+               break;
+       case BPF_W:
+               /* Emit 'movsx rax, dword ptr [rax+0x14]' */
+               EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x63);
+               break;
+       }
+       emit_insn_suffix(&prog, src_reg, dst_reg, off);
+       *pprog = prog;
+}

My hacking is incomplete so I only showed the partial x86 jit in the above.

yonghong-song abandoned this revision.Aug 3 2023, 7:22 PM