This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] DForm instructions should be preferred when using zero register
Needs ReviewPublic

Authored by Conanap on Nov 11 2020, 9:32 AM.

Details

Reviewers
nemanjai
saghir
Group Reviewers
Restricted Project
Summary

DForm instructions should be preferred when using zero registers (PPC::ZERO and PPC::ZERO8). Ie, STXV in place of STXVX and LXV in place of LXVX.

Diff Detail

Event Timeline

Conanap created this revision.Nov 11 2020, 9:32 AM
Conanap requested review of this revision.Nov 11 2020, 9:32 AM
Conanap updated this revision to Diff 304555.Nov 11 2020, 9:36 AM

Updated a test case

amyk added a subscriber: amyk.Nov 11 2020, 3:22 PM

Please also address the clang-format comment.

llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
417

Please capitalize and end with a period for the comments.
Is it possible to elaborate a bit more on the comments? In terms of why we were prefer the D-Forms, and why we should not apply the transformation if its a frame index.

llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
2

I think this is an unrelated change.

Using dform with offset 0 can save one register r0/X0, this is benefit for register allocation? But adding it in PPCPreEmitPeephole pass which is after register allocation will make the benefit gone.
Maybe we need to do it before register allocation? For example at the place where the x-form with zero register is generated.

I checked one example loadConstant in test/CodeGen/PowerPC/f128-passByValue.ll.
We generate LXVX $zero8, in ISEL because we meet the worst case and we don't have d-form choice for the instruction selection. so we have to use x-form and in x-form selection, we have to use zero/zero8 as the base and use load address as the index. See PPCTargetLowering::SelectAddressRegRegOnly.

I guess most cases are with same reason for generating x-form + zero register, we meet the worst case in ISEL, so we have to use x-form + zero register form, with this form, we can always select a powerpc load/store instruction.

For me, a better solution should be change the worst case handling in ISEL, it is before RA and it is also transparent for types like STXVX/LXVX/ and also LDX/STDX, LFDX/STFDX...

NeHuang added inline comments.
llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
417

nit: add a period at the end of comment and do the same for other comments.

424

run clang-format to pass pre-merge checks

llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll
2

+1

Using dform with offset 0 can save one register r0/X0, this is benefit for register allocation? But adding it in PPCPreEmitPeephole pass which is after register allocation will make the benefit gone.
Maybe we need to do it before register allocation? For example at the place where the x-form with zero register is generated.

I checked one example loadConstant in test/CodeGen/PowerPC/f128-passByValue.ll.
We generate LXVX $zero8, in ISEL because we meet the worst case and we don't have d-form choice for the instruction selection. so we have to use x-form and in x-form selection, we have to use zero/zero8 as the base and use load address as the index. See PPCTargetLowering::SelectAddressRegRegOnly.

I guess most cases are with same reason for generating x-form + zero register, we meet the worst case in ISEL, so we have to use x-form + zero register form, with this form, we can always select a powerpc load/store instruction.

For me, a better solution should be change the worst case handling in ISEL, it is before RA and it is also transparent for types like STXVX/LXVX/ and also LDX/STDX, LFDX/STFDX...

I'm just going to jump in to give a little more background. The initial reason we wanted to do this was to enable an optimization that actually happens in the linker after the code is emitted.
To get the idea you can look at this test:

/llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll

Which contains this section:

; FIXME: we should always convert X-Form instructions that use
; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt.
define dso_local void @ReadWrite128() local_unnamed_addr #0 {
; CHECK-LABEL: ReadWrite128:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    pld r3, input128@got@pcrel(0), 1
; CHECK-NEXT:    lxvx vs0, 0, r3
; CHECK-NEXT:    pld r3, output128@got@pcrel(0), 1
; CHECK-NEXT:    stxvx vs0, 0, r3
; CHECK-NEXT:    blr
entry:
  %0 = load i128, i128* @input128, align 16
  store i128 %0, i128* @output128, align 16
  ret void
}

When we have a GOT access like this it is possible for the compiler to mark the instruction with R_PPC64_PCREL_OPT and then the linker merges the two instructions into one and replaces the second instruction with a nop. The problem is that this opt can only be done if the second instruction is DForm. We noticed that when we implemented this optimization we could not catch all of the cases because in some situations (like the one above) we use the XForm instead of the DForm.

Having said that, we should try to do this before the PreEmitPeephole. The optimization that adds the R_PPC64_PCREL_OPT relocation is also in the PreEmitPeephole and I'm not sure if it will be detected if we do both things at the same time (both as in convert the XForm to a DForm and then have the same opt use that DForm to add the relocation).

I agree that ISel is a better place for this. If we cannot do this in ISel then we should still try to do this before we get to the PreEmitPeephole or at least make sure that both the DForm is present and that the R_PPC64_PCREL_OPT relocation is added as we expected in the same pass.

Using dform with offset 0 can save one register r0/X0, this is benefit for register allocation? But adding it in PPCPreEmitPeephole pass which is after register allocation will make the benefit gone.
Maybe we need to do it before register allocation? For example at the place where the x-form with zero register is generated.

I checked one example loadConstant in test/CodeGen/PowerPC/f128-passByValue.ll.
We generate LXVX $zero8, in ISEL because we meet the worst case and we don't have d-form choice for the instruction selection. so we have to use x-form and in x-form selection, we have to use zero/zero8 as the base and use load address as the index. See PPCTargetLowering::SelectAddressRegRegOnly.

I guess most cases are with same reason for generating x-form + zero register, we meet the worst case in ISEL, so we have to use x-form + zero register form, with this form, we can always select a powerpc load/store instruction.

For me, a better solution should be change the worst case handling in ISEL, it is before RA and it is also transparent for types like STXVX/LXVX/ and also LDX/STDX, LFDX/STFDX...

I'm just going to jump in to give a little more background. The initial reason we wanted to do this was to enable an optimization that actually happens in the linker after the code is emitted.
To get the idea you can look at this test:

/llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll

Which contains this section:

; FIXME: we should always convert X-Form instructions that use
; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt.
define dso_local void @ReadWrite128() local_unnamed_addr #0 {
; CHECK-LABEL: ReadWrite128:
; CHECK:       # %bb.0: # %entry
; CHECK-NEXT:    pld r3, input128@got@pcrel(0), 1
; CHECK-NEXT:    lxvx vs0, 0, r3
; CHECK-NEXT:    pld r3, output128@got@pcrel(0), 1
; CHECK-NEXT:    stxvx vs0, 0, r3
; CHECK-NEXT:    blr
entry:
  %0 = load i128, i128* @input128, align 16
  store i128 %0, i128* @output128, align 16
  ret void
}

When we have a GOT access like this it is possible for the compiler to mark the instruction with R_PPC64_PCREL_OPT and then the linker merges the two instructions into one and replaces the second instruction with a nop. The problem is that this opt can only be done if the second instruction is DForm. We noticed that when we implemented this optimization we could not catch all of the cases because in some situations (like the one above) we use the XForm instead of the DForm.

Having said that, we should try to do this before the PreEmitPeephole. The optimization that adds the R_PPC64_PCREL_OPT relocation is also in the PreEmitPeephole and I'm not sure if it will be detected if we do both things at the same time (both as in convert the XForm to a DForm and then have the same opt use that DForm to add the relocation).

I agree that ISel is a better place for this. If we cannot do this in ISel then we should still try to do this before we get to the PreEmitPeephole or at least make sure that both the DForm is present and that the R_PPC64_PCREL_OPT relocation is added as we expected in the same pass.

After a discussion with the group I would like to correct what I said in the previous post.
There already is a plan to do this in ISel in a different patch. The reason we also want to do this optimization here is to try to catch situations where this pattern is not known in ISel and only appears after other optimizations later on. Ideally we do not want to have any situations where the XForm exists in the final binary and having this final check in the PreEmitPeephole should ensure that. Basically, we also want to do this check here to find anything that ISel may have missed.

steven.zhang added inline comments.
llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp
418

Some comments for current implementation.

  1. Can we do this inside TII->convertToImmediateForm()? Notice that we have statistic data for such convert. Though it now handles x-load + addi -> d-load, it could be extended to handle x-load, 0 -> d-load.
  2. I am not sure why just handle the lxvx. We have quite a lot x-form. ImmInstrInfo contains such information and can we query it to do the transformation instead of enum the opcode?
421

check isReg()

After a discussion with the group I would like to correct what I said in the previous post.
There already is a plan to do this in ISel in a different patch. The reason we also want to do this optimization here is to try to catch situations where this pattern is not known in ISel and only appears after other optimizations later on. Ideally we do not want to have any situations where the XForm exists in the final binary and having this final check in the PreEmitPeephole should ensure that. Basically, we also want to do this check here to find anything that ISel may have missed.

Good to know we have a plan to fix such kind of issue in ISEL. For the patterns generated after ISEL, I also think adding them in convertToImmediateForm is better. That function is called pre and post RA. It handles several patterns there, maybe we just need to add a new function like transformZeroInputXformToImmForm in that function?