This is an archive of the discontinued LLVM Phabricator instance.

[BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering
ClosedPublic

Authored by yonghong-song on Feb 5 2020, 5:09 PM.

Details

Summary

Currently, isTruncateFree() and isZExtFree callbacks return false
as they are not implemented in BPF backend. This may cause suboptimal
code generation. For example, if the load in the context of zero extension
has more than one use, the pattern zextload{i8,i16,i32} will
not be generated. Rather, the load will be matched first and
then the result is zero extended.

For example, in the test together with this commit, we have

I1: %0 = load i32, i32* %data_end1, align 4, !tbaa !2
I2: %conv = zext i32 %0 to i64 
... 
I3: %2 = load i32, i32* %data, align 4, !tbaa !7
I4: %conv2 = zext i32 %2 to i64 
... 
I5: %4 = trunc i64 %sub.ptr.lhs.cast to i32 
I6: %conv13 = sub i32 %4, %2
...

The I1 and I2 will match to one zextloadi32 DAG node, where SUBREG_TO_REG is
used to convert a 32bit register to 64bit one. During code generation,
SUBREG_TO_REG is a noop.

The %2 in I3 is used in both I4 and I6. If isTruncateFree() is false,
the current implementation will generate a SLL_ri and SRL_ri
for the zext part during lowering.

This patch implement isTruncateFree() in the BPF backend, so for the
above example, I3 and I4 will generate a zextloadi32 DAG node with
SUBREG_TO_REG is generated during lowering to Machine IR.

isZExtFree() is also implemented as it should help code gen as well.

This patch also enables the change in https://reviews.llvm.org/D73985
since it won't kick in generates MOV_32_64 machine instruction.

Diff Detail

Event Timeline

yonghong-song created this revision.Feb 5 2020, 5:09 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2020, 5:09 PM
ast requested changes to this revision.Feb 10 2020, 1:41 PM

I suspect isZExtFree() is also necessary.
BPF implementation probably should be similar to aarch64.
In isTruncateFree() do
return NumBits1 > NumBits2;
and in isZExtFree() do:
return NumBits1 == 32 && NumBits2 == 64;

This revision now requires changes to proceed.Feb 10 2020, 1:41 PM
yonghong-song retitled this revision from [BPF] implement isTruncateFree in BPFTargetLowering to [BPF] implement isTruncateFree and isZExtFree in BPFTargetLowering.
yonghong-song edited the summary of this revision. (Show Details)

add support isZExtFree as well

Note that I cannot do

In isTruncateFree() do
return NumBits1 > NumBits2;

This will permit a type truncate operation e.g., from 64bit to 8bit. This will cause a llvm internal assertion later since llvm expects
it produces a backend supported legal subregister type , which is 32bit in this case. PowerPC has similar implementation

bool PPCTargetLowering::isTruncateFree(Type *Ty1, Type *Ty2) const {
  if (!Ty1->isIntegerTy() || !Ty2->isIntegerTy())
    return false;
  unsigned NumBits1 = Ty1->getPrimitiveSizeInBits();
  unsigned NumBits2 = Ty2->getPrimitiveSizeInBits();
  return NumBits1 == 64 && NumBits2 == 32;
}
ast added a comment.Feb 10 2020, 9:22 PM

return NumBits1 > NumBits2

arm64 uses this approach, but arm64 doesn't have 8-bit subregisters and it works ?

make isTruncateFree more like AArch64 one. Remove !getHasAlu32() from isTruncateFree where it is not needed. But !getHasAlu32() is still needed in isZExtFree. Otherwise, we will get a llvm assertion like:

  CLNG-LLC [test_maps] test_xdp_noinline.o
llc: ../lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:978: void {anonymous}::SelectionDAGLegalize::LegalizeOp(llvm::SDNode*): Assertion `(TLI.getTypeAction(*DAG.getContext(), Op.getValueType()) == TargetLowering::TypeLegal || Op.getOpcode() == ISD::TargetConstant || Op.getOpcode() == ISD::Register) && "Unexpected illegal type!"' failed.
Stack dump:
0.      Program arguments: llc -march=bpf -mcpu=v2 -filetype=obj -o /data/users/yhs/work/net-next/tools/testing/selftests/bpf/no_alu32/test_xdp_noinline.$

arm64 uses this approach, but arm64 doesn't have 8-bit subregisters and it works ?

My mistake due to rush to push the commit to take care of kids :-(
You are right. isTruncateFree does not need check Alu32 Mode and does not need to check 64/32 bits. It seems working fine. I previously added Alu32 check and
strict 64/32 bit check as ultimately the code similar to PowerPC code. Alu32 mode is added since without alu32, truncation probably won't matter much.
This version uses code similar to AArch64, it won't get worse.

The assertion I mentioned will happen for isZExtFree() without checking Alu32.
In that the case (non-alu32 mode), DAG could be optimized as 32-bit ALU but with type marked as TypePromoteInteger, not TypeLegal, which will trigger an assertion.

ast accepted this revision.Feb 11 2020, 9:10 AM

Looks like more canonical implementation now. Thanks.
Please double check that all selftests/bpf still pass with and without alu32 before landing.

This revision is now accepted and ready to land.Feb 11 2020, 9:10 AM

Please double check that all selftests/bpf still pass with and without alu32 before landing.

checked last night and it all works.

jrfastab accepted this revision.Feb 11 2020, 9:34 AM

Added to my stack as well and passing my tests. LGTM.

This revision was automatically updated to reflect the committed changes.