DForm instructions should be preferred when using zero registers (PPC::ZERO and PPC::ZERO8). Ie, STXV in place of STXVX and LXV in place of LXVX.
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Please also address the clang-format comment.
llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp | ||
---|---|---|
417 | Please capitalize and end with a period for the comments. | |
llvm/test/CodeGen/PowerPC/VSX-XForm-Scalars.ll | ||
2 | I think this is an unrelated change. |
Using dform with offset 0 can save one register r0/X0, this is benefit for register allocation? But adding it in PPCPreEmitPeephole pass which is after register allocation will make the benefit gone.
Maybe we need to do it before register allocation? For example at the place where the x-form with zero register is generated.
I checked one example loadConstant in test/CodeGen/PowerPC/f128-passByValue.ll.
We generate LXVX $zero8, in ISEL because we meet the worst case and we don't have d-form choice for the instruction selection. so we have to use x-form and in x-form selection, we have to use zero/zero8 as the base and use load address as the index. See PPCTargetLowering::SelectAddressRegRegOnly.
I guess most cases are with same reason for generating x-form + zero register, we meet the worst case in ISEL, so we have to use x-form + zero register form, with this form, we can always select a powerpc load/store instruction.
For me, a better solution should be change the worst case handling in ISEL, it is before RA and it is also transparent for types like STXVX/LXVX/ and also LDX/STDX, LFDX/STFDX...
I'm just going to jump in to give a little more background. The initial reason we wanted to do this was to enable an optimization that actually happens in the linker after the code is emitted.
To get the idea you can look at this test:
/llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll
Which contains this section:
; FIXME: we should always convert X-Form instructions that use ; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt. define dso_local void @ReadWrite128() local_unnamed_addr #0 { ; CHECK-LABEL: ReadWrite128: ; CHECK: # %bb.0: # %entry ; CHECK-NEXT: pld r3, input128@got@pcrel(0), 1 ; CHECK-NEXT: lxvx vs0, 0, r3 ; CHECK-NEXT: pld r3, output128@got@pcrel(0), 1 ; CHECK-NEXT: stxvx vs0, 0, r3 ; CHECK-NEXT: blr entry: %0 = load i128, i128* @input128, align 16 store i128 %0, i128* @output128, align 16 ret void }
When we have a GOT access like this it is possible for the compiler to mark the instruction with R_PPC64_PCREL_OPT and then the linker merges the two instructions into one and replaces the second instruction with a nop. The problem is that this opt can only be done if the second instruction is DForm. We noticed that when we implemented this optimization we could not catch all of the cases because in some situations (like the one above) we use the XForm instead of the DForm.
Having said that, we should try to do this before the PreEmitPeephole. The optimization that adds the R_PPC64_PCREL_OPT relocation is also in the PreEmitPeephole and I'm not sure if it will be detected if we do both things at the same time (both as in convert the XForm to a DForm and then have the same opt use that DForm to add the relocation).
I agree that ISel is a better place for this. If we cannot do this in ISel then we should still try to do this before we get to the PreEmitPeephole or at least make sure that both the DForm is present and that the R_PPC64_PCREL_OPT relocation is added as we expected in the same pass.
After a discussion with the group I would like to correct what I said in the previous post.
There already is a plan to do this in ISel in a different patch. The reason we also want to do this optimization here is to try to catch situations where this pattern is not known in ISel and only appears after other optimizations later on. Ideally we do not want to have any situations where the XForm exists in the final binary and having this final check in the PreEmitPeephole should ensure that. Basically, we also want to do this check here to find anything that ISel may have missed.
llvm/lib/Target/PowerPC/PPCPreEmitPeephole.cpp | ||
---|---|---|
418 | Some comments for current implementation.
| |
421 | check isReg() |
After a discussion with the group I would like to correct what I said in the previous post.
There already is a plan to do this in ISel in a different patch. The reason we also want to do this optimization here is to try to catch situations where this pattern is not known in ISel and only appears after other optimizations later on. Ideally we do not want to have any situations where the XForm exists in the final binary and having this final check in the PreEmitPeephole should ensure that. Basically, we also want to do this check here to find anything that ISel may have missed.
Good to know we have a plan to fix such kind of issue in ISEL. For the patterns generated after ISEL, I also think adding them in convertToImmediateForm is better. That function is called pre and post RA. It handles several patterns there, maybe we just need to add a new function like transformZeroInputXformToImmForm in that function?
Please capitalize and end with a period for the comments.
Is it possible to elaborate a bit more on the comments? In terms of why we were prefer the D-Forms, and why we should not apply the transformation if its a frame index.