Add sign-extension load instructions.
For kernel bpf selftest pyperf180, the number of
instructions without this patch is 91472. With this
patch, the number of instructions is 80652. Most improvements
come from
r1 = *(u32 *)(r8 + 12) r1 <<= 32 r1 s>>= 32
to
r1 = *(s32 *)(r8 + 12)
Such a new instruction not only improves bpf code, but
can also improve jitted code since most architecture
has explicit sign extension insntructions.
To-Do list:
- put the new functionality into e.g, -mcpu=v4
- implement the linux kernel part to support the new insn
I believe x86 and arm64 have instructions that sign extends one register into another,
but not insns that sign extend during the load.
Such insns can be composed better to in case s32 came as a return value from a call.
Have you considered going that route?