This is an archive of the discontinued LLVM Phabricator instance.

[TwoAddressInstructionPass] Don't update SrcRegMap for copies inserted for tied register constraint when the src isn't killed
Changes PlannedPublic

Authored by craig.topper on Sep 14 2018, 11:20 AM.

Details

Summary

If the input to the copy isn't killed then the copy likely is unavaoidable. This means at best the physical register would be the input to the copy and still be used somewhere else. So the output vreg for the copy won't be allocated to the physreg.

With this change we stop confusing our commuting decision with this misleading physical register information allowing it to make better decisions. As you can see it changed our commuting decisions and removed a lot of moves on X86.

Fixes the case from PR38929

Diff Detail

Event Timeline

craig.topper created this revision.Sep 14 2018, 11:20 AM
craig.topper added a subscriber: llvm-commits.
craig.topper edited the summary of this revision. (Show Details)Sep 14 2018, 11:25 AM

Looks awesome!

RKSimon added inline comments.Sep 14 2018, 1:07 PM
test/CodeGen/X86/psubus.ll
1435

This is a pity - why did the zero vector reuse a live register?

craig.topper added inline comments.Sep 14 2018, 1:45 PM
test/CodeGen/X86/avx512-regcall-NoMask.ll
1052

This is an increase in instructions. :(

Would D38128 help with this btw?

Would D38128 help with this btw?

D38128 has now landed - please can you rebase and see what effect it had?

Would D38128 help with this btw?

D38128 has now landed - please can you rebase and see what effect it had?

Just tested - it doesn't help either of the cases with increased instructions

bjope added a subscriber: bjope.Sep 20 2018, 11:46 AM

Did you explore the certain cases where it had regressed?

xbolva00 added a comment.EditedOct 15 2018, 12:29 AM

How D36224 (please rebase this patch after it lands) + this patch interacts? Maybe D36224 solves some issues?

Have you considered making a .mir-test that show the effect of TwoAddressInstruction pass here. Maybe an overkill since there already are plenty of test cases that are affected by the change, but it would make it easier to understand the tranformation difference that we introduce in the TwoAddressInstructionPass.

Yes, maybe do MIR tests for cases where we see regressions?

Any benchmark results w/o this patch?

Forgot to update this. I checked our internal benchmark list. I saw binary changes, but no significant gains or losses.

What do you want to do about the tests (I think just psubus.ll and avx512-regcall-NoMask.ll) that show an increase in moves?

@craig.topper What's the plan for this patch?

craig.topper marked an inline comment as done.Jan 21 2019, 9:08 PM
craig.topper added inline comments.
test/CodeGen/X86/psubus.ll
1435

This is the code just after register coalescing with the new code.

16B	  %2:vr128 = COPY $xmm2
32B	  %1:vr128 = COPY $xmm1
48B	  %11:vr128 = COPY $xmm0
64B	  %3:vr128 = V_SET0
80B	  %17:vr128 = COPY %11:vr128
96B	  %17:vr128 = PUNPCKLWDrr %17:vr128(tied-def 0), %3:vr128
128B	  %11:vr128 = PUNPCKHWDrr %11:vr128(tied-def 0), %3:vr128
144B	  %15:vr128 = MOVAPSrm $rip, 1, $noreg, %const.0, $noreg :: (load 16 from constant-pool)
160B	  %7:vr128 = COPY %2:vr128
176B	  %7:vr128 = PXORrr %7:vr128(tied-def 0), %15:vr128
192B	  %9:vr128 = COPY %11:vr128
208B	  %9:vr128 = PORrr %9:vr128(tied-def 0), %15:vr128
240B	  %9:vr128 = PCMPGTDrr %9:vr128(tied-def 0), %7:vr128
272B	  %11:vr128 = PANDrr %11:vr128(tied-def 0), %9:vr128
304B	  %9:vr128 = PANDNrr %9:vr128(tied-def 0), %2:vr128
336B	  %9:vr128 = PORrr %9:vr128(tied-def 0), %11:vr128
352B	  %13:vr128 = COPY %1:vr128
368B	  %13:vr128 = PXORrr %13:vr128(tied-def 0), %15:vr128
400B	  %15:vr128 = PORrr %15:vr128(tied-def 0), %17:vr128
432B	  %15:vr128 = PCMPGTDrr %15:vr128(tied-def 0), %13:vr128
464B	  %17:vr128 = PANDrr %17:vr128(tied-def 0), %15:vr128
496B	  %15:vr128 = PANDNrr %15:vr128(tied-def 0), %1:vr128
528B	  %15:vr128 = PORrr %15:vr128(tied-def 0), %17:vr128
560B	  %15:vr128 = PSUBDrr %15:vr128(tied-def 0), %1:vr128
592B	  %9:vr128 = PSUBDrr %9:vr128(tied-def 0), %2:vr128
624B	  %9:vr128 = PSLLDri %9:vr128(tied-def 0), 16
656B	  %9:vr128 = PSRADri %9:vr128(tied-def 0), 16
688B	  %15:vr128 = PSLLDri %15:vr128(tied-def 0), 16
720B	  %15:vr128 = PSRADri %15:vr128(tied-def 0), 16
752B	  %15:vr128 = PACKSSDWrr %15:vr128(tied-def 0), %9:vr128
768B	  $xmm0 = COPY %15:vr128

Given this order of instructions, if xmm0 wasn't available for %15 to use, then the COPY on the last line would have become a real move instead. So we'd just be trading the copy at 80B for the copy at 768B.

The original code commuted the PORrr at 400B which made %17 be used by the last copy instead.

The psubus change doesn't look too bad now - its just whether you are happy with the changes in avx512-regcall-NoMask.ll ?

bjope added a comment.Jan 22 2019, 2:03 AM

Have you considered making a .mir-test that show the effect of TwoAddressInstruction pass here. Maybe an overkill since there already are plenty of test cases that are affected by the change, but it would make it easier to understand the tranformation difference that we introduce in the TwoAddressInstructionPass.

I still got a feeling that a .mir-test could be nice. Is that something that you are planning to add or do you disagree?

Based on what happened in psubus.ll I'm not sure if this is the right fix. We lost the information that was making it make the correct decision. Maybe we really need another heuristic in the commute decision making code.

I wonder hot it looks now. Can you rebase it?

Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2019, 5:55 AM
xbolva00 added inline comments.May 3 2019, 3:36 PM
llvm/test/CodeGen/X86/vec_smulo.ll
371 ↗(On Diff #198062)

This is sad :/

craig.topper planned changes to this revision.May 7 2019, 1:30 PM

Moving back to Plan Changes because I still believe this is the wrong fix. I don't know what the right fix is.