This is an archive of the discontinued LLVM Phabricator instance.

Handle COPYs of physregs better (regalloc hints)
ClosedPublic

Authored by RKSimon on Sep 21 2017, 4:29 AM.

Download Raw Diff

Details

Reviewers

uweigand
qcolombet
MatzeB
t.p.northover
niravd
arsenm
rampitec
rovka
yanyh
aemerson
craig.topper
jonpa
efriedma
spatel
nemanjai

Commits

rG2d0f20cc0434: [X86] Handle COPYs of physregs better (regalloc hints)
rL342578: [X86] Handle COPYs of physregs better (regalloc hints)

Summary

While enabling the mischeduler for SystemZ, it was discovered that fore some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling.

Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

In D38128#904781, @jonpa wrote:

In D38128#904581, @efriedma wrote:

Not sure if it is clear that coalescing with %R8 is generally better than %R0.

What do you mean, it isn't clear? Is the performance problem not clear? Or do you mean you're not sure how to detect this situation when you're sorting the hints?

I guess given your initial wording ("not great"), I was not sure if this is general and serious enough so that we really want to add an additional heuristic like "COPY to compare" on top of the sorting by weight. I suppose then there should be a flag like HasCompareUser which is then the tie-breaker when the weight is the same, or?

I don't believe there's anything special about compare here, this applies the same to any other use. In general, if you have something like "COPY dest, src" followed by "use dest", it is nearly always better to replace the second instruction by "use src" if this is possible (i.e. if src is still live and unchanged at that point). This is simply because on most processor architectures "COPY dest, src / use src" can execute in parallel, while "COPY dest, src / use dest" must execute sequentially.

In D38128#904861, @uweigand wrote:

In D38128#904781, @jonpa wrote:

In D38128#904581, @efriedma wrote:

Not sure if it is clear that coalescing with %R8 is generally better than %R0.

What do you mean, it isn't clear? Is the performance problem not clear? Or do you mean you're not sure how to detect this situation when you're sorting the hints?

I guess given your initial wording ("not great"), I was not sure if this is general and serious enough so that we really want to add an additional heuristic like "COPY to compare" on top of the sorting by weight. I suppose then there should be a flag like HasCompareUser which is then the tie-breaker when the weight is the same, or?

I don't believe there's anything special about compare here, this applies the same to any other use. In general, if you have something like "COPY dest, src" followed by "use dest", it is nearly always better to replace the second instruction by "use src" if this is possible (i.e. if src is still live and unchanged at that point). This is simply because on most processor architectures "COPY dest, src / use src" can execute in parallel, while "COPY dest, src / use dest" must execute sequentially.

That makes sense.

Would any of these be an improvement worth trying, then?

(1) If the source phys-reg is contained in the regclass of an immediately following instruction using the vreg, then increase the priority of the hint of that phys-reg.
(2) A simpler alternative would be to simply prefer phys-reg sources more than phys-reg dest-regs (if weight is equal). That would catch all the cases of (1).

In D38128#904899, @jonpa wrote:

Would any of these be an improvement worth trying, then?

(1) If the source phys-reg is contained in the regclass of an immediately following instruction using the vreg, then increase the priority of the hint of that phys-reg.
(2) A simpler alternative would be to simply prefer phys-reg sources more than phys-reg dest-regs (if weight is equal). That would catch all the cases of (1).

I believe (2) makes sense ... certainly worth a try.

In D38128#905052, @uweigand wrote:

In D38128#904899, @jonpa wrote:

Would any of these be an improvement worth trying, then?

(1) If the source phys-reg is contained in the regclass of an immediately following instruction using the vreg, then increase the priority of the hint of that phys-reg.
(2) A simpler alternative would be to simply prefer phys-reg sources more than phys-reg dest-regs (if weight is equal). That would catch all the cases of (1).

I believe (2) makes sense ... certainly worth a try.

I see that on SPEC/z13, I now get a few more copys left. Compared to master:

lgr            :               341784               328956   -12828
lr             :                26208                25803     -405
...
Spill|Reload   :               165135               164761     -374

-> with prefer phys reg source COPYs:

lgr            :               341784               329060   -12724
lr             :                26208                25817     -391
...
Spill|Reload   :               165135               164934     -201

I also see looking at just the loops that while 120 innner loops have changed size, 103 of them has gotten
bigger, which seems to indicate that at least currently (Wei Mi has an upcoming regsplit patch that may
change this), this idea is not quite promising.

It did however handle the test case under discussion (ARM/swifterror.ll)

Reversing this heuristic so that phys-reg dest copys are prioritized:

lgr            :               341784               328907   -12877
lr             :                26208                25762     -446
...
Spill|Reload   :               165135               164807     -328

For the innermost loops, there are only 34 different (15 smaller).

I am not sure it is right to tamper with this in this way given the above...

Right. Always coalescing with the copy source of course extends the live range of the source, and thus increases overall register pressure. So it's not a good idea to do this unconditionally.

On the other hand, in the example that triggered this discussion, there was no register pressure problem and source and destination register just happened to be both still live at the point of use, anyway. In this specific case, it would be preferable to use the source instead of the dest register at the point of use. On the other hand, if keeping the source alive would cause extra spilling, we don't want that, but want to use the dest after all.

Not sure how exactly to handle this in LLVM. It is probably not simply a matter of preference in the register coalescer.

Eli, do you have any comment on this?

What specifically do you want me to comment on? I don't really understand the register allocator well enough to comment on your code changes.

In D38128#908337, @efriedma wrote:

What specifically do you want me to comment on? I don't really understand the register allocator well enough to comment on your code changes.

Given that this is something that Quentin has been meaning to fix before I did, and that this is *generally* good, the question is now if we can accept minor issues like this. On SystemZ, I think this is quite clear since we get 12500 less COPYs and also less spilling (on SPEC). Could you perhaps try it on your target and see if you likewise would agree this is good enough despite the swifterror test issue?

Also, it could be that other tests with a similar code actually now improve, since this was a random result. So just one regression alone should not have to stop the patch, unless it is a sign of a more general issue.

It's a minor issue; I'm okay with committing this as-is and trying to address the issue later.

It's a minor issue; I'm okay with committing this as-is and trying to address the issue later.

All right - I take this to mean that you approve of the ARM test changes, then. Thanks.

PING!

Patch updated with a few X86 tests updated. Again, someone from each target with the right experience should review the test changes I have made:

TODO:
CodeGen : AArch64 AMDGPU BPF Hexagon Mips PowerPC SPARC Thumb Thumb2 X86 XCore
DebugInfo : COFF X86

DONE:
CodeGen : ARM
DebugInfo :

ping @tstellar: CodeGen/AMDGPU/multilevel-break.ll

In D38128#910425, @jonpa wrote:

TODO:
CodeGen : AArch64 AMDGPU BPF Hexagon Mips PowerPC SPARC Thumb Thumb2 X86 XCore

AArch64/win64_vararg.ll is still ok.
arm64-aapcs.ll and func-argpassing.ll also seem fine.

swifterror.ll seems to get a very minor regression, not sure who to get an authoritative approval from though.

PING!

@Quentin, I implemented this in common code because you had already plans for this... It's been weeks now since I updated all the tests manually, which was not a small effort. Still, there is very little progress on the review of the tests :-/ Do you know who I should add as reviewers? (BTW patch slightly changed since you approved it - still ok, I hope?)

@Tom, I hope you can find the time to look into the AMDGPU tests as I thought you meant to do...

Review of test changes:

TODO:
CodeGen : AMDGPU BPF Hexagon Mips PowerPC SPARC Thumb Thumb2 X86 XCore
DebugInfo : COFF X86

PARTIALLY DONE (still more tests in need of review):
CodeGen :

AArch64: win64_vararg.ll, arm64-aapcs.ll, func-argpassing.ll (Martin Storsjo) swifterror.ll small regression / needs approval (Martin Storsjo)

DONE:
CodeGen :

ARM (Eli Friedman)

Just looking at the number and placement of reg-reg moves for PowerPC seems fine. It doesn't seem like there are regressions but no huge reduction in the number of copies. However, since the diff is without context, it's hard to tell exactly what is going on in the test cases. I'll apply this patch to ToT and have a look at how things change before signing off on the PPC changes.

The AMDGPU tests look OK, but the CHECK lines for test/CodeGen/AMDGPU/sgpr-control-flow.ll should be updated like this:
https://reviews.llvm.org/P8044

@Nemanja:

Just looking at the number and placement of reg-reg moves for PowerPC seems fine. It doesn't seem like there are regressions but no huge reduction in the number of copies. However, since the diff is without context, it's hard to tell exactly what is going on in the test cases. I'll apply this patch to ToT and have a look at how things change before signing off on the PPC changes.

Sorry about forgetting the context - fixed.

@Tom:

The AMDGPU tests look OK, but the CHECK lines for test/CodeGen/AMDGPU/sgpr-control-flow.ll should be updated like this: https://reviews.llvm.org/P8044

Thanks! sgpr-control-flow.ll updated, but there is also still the multilevel-break.ll failure. This is the only test that is failing even with this patch, and the reason for this is that I was not sure how to update it. We discussed this on Oct 11 here on Phabricator. Could you please take a look?

Test updates in need of review:

TODO:
CodeGen : BPF Hexagon Mips PowerPC SPARC Thumb Thumb2 X86 XCore
DebugInfo : COFF X86

PARTIALLY DONE (still more tests in need of review):
CodeGen :

AArch64 : win64_vararg.ll, arm64-aapcs.ll, func-argpassing.ll (Martin Storsjo) swifterror.ll: small regression, needs approval (Martin Storsjo)
AMDGPU: (Tom Stellard) All reviewed, but multilevel-break.ll needs updating (fails with patch)

DONE:
CodeGen :

ARM (Eli Friedman)

I think the changes to the multilevel-break.ll test are OK. The register usage is increased by 2, but this won't have any impact on performance for this test. Here is a patch to make this test pass with your changes: https://reviews.llvm.org/P8046

The PowerPC CodeGen test changes are fine. I've also confirmed that this patch reduces the total number of register move instructions (which implement the copies). So that LGTM.

This revision is now accepted and ready to land.Nov 8 2017, 10:55 AM

@Tom:

I think the changes to the multilevel-break.ll test are OK. The register usage is increased by 2, but this won't have any impact on performance for this test. Here is a patch to make this test pass with your changes: https://reviews.llv\

m.org/P8046
Applied. Thanks!

@Nemanja:

The PowerPC CodeGen test changes are fine. I've also confirmed that this patch reduces the total number of register move instructions (which implement the copies). So that LGTM.

Thanks, nice to hear :-)

Regression? : test/DebugInfo/X86/live-debug-variables.ll

Test updates in need of review:

TODO:
CodeGen   : BPF Hexagon Mips PowerPC SPARC Thumb Thumb2 X86 XCore
DebugInfo : COFF X86

PARTIALLY DONE :
 CodeGen :
  - AArch64: All done but swifterror.ll (small regression and needs approval according to Martin Storsjo)

DONE:
 CodeGen :
  - ARM (Eli Friedman)
  - AMDGPU (Tom Stellard)
  - PPC (nemanjai)

Ping!

Three backends now done (with confirmed reductions in number of COPYs after regalloc). ninja check passes with patch (all failing tests updated).

Please, could someone from each backend listed below take a look at the test changes (already updated).

TODO:
CodeGen   : BPF Hexagon Mips PowerPC SPARC Thumb Thumb2 X86 XCore
DebugInfo : COFF X86

PARTIALLY DONE :
 CodeGen :
  - AArch64: All done but swifterror.ll (small regression and needs approval according to Martin Storsjo)

APPROVED:
 CodeGen :
  - ARM (Eli Friedman)
  - AMDGPU (Tom Stellard)
  - PPC (Nemanja Ivanovic)

(Regression? : test/DebugInfo/X86/live-debug-variables.ll)

@Quentin: I think the patch is starting to be somewhat convincing, as it is now also confirmed and approved on PowerPC, as well as has gotten approved test changes for AMDGPU and CodeGen/ARM. Given that it's now been two months of review, and that I updated all the tests already a month ago, I wonder what you would say about activating this for now under a target preference hook? I think that SystemZ, PowerPC and AMDGPU could benefit from this now, while other targets could do so after reviewing test changes. Would this work?

Status table corrected:

TODO:
CodeGen   : BPF Hexagon Mips SPARC Thumb Thumb2 X86 XCore
DebugInfo : COFF X86

PARTIALLY DONE :
 CodeGen :
  - AArch64: All done but swifterror.ll (small regression and needs approval according to Martin Storsjo)

APPROVED:
 CodeGen :
  - CodeGen/ARM (Eli Friedman)
  - AMDGPU (Tom Stellard)
  - PowerPC (Nemanja Ivanovic)

(Regression? : test/DebugInfo/X86/live-debug-variables.ll)

Still LGTM :).
It makes sense to have it behind a hook for now, but could you follow-up with the targets owners so that it gets to be the default and we can get rid of the hook.
I believe this is general goodness and that we should use it for all targets going forward.

Patch updated to include a TargetRegisterInfo hook 'enableMultipleCopyHints()' that returns false per default, which gives NFC to trunk. OnlySystemZ returns true for now. It is up to all target maintainers to enable this and review test changes, which hopefully will be done
soon so that the hook can be removed again.

Some minor changes compared to last version in order to get the NFC to trunk when returning false in enableMultipleCopyHints():

reinstate the mri.isAllocatable() check for the hint, which was dropped for some reason (makes a small difference on the resulting weight in some rare cases).
If target has created a simple hint, it must be cleared before the best hint from CopyHints is inserted, so a new clearSimpleHint() method in MachineRegisterInfo was needed. Not sure that all those different methods for manipulating hints are really needed in the end once all targets have been converted. In particular, I am curious why AMDGPU sets those hints (of generic type) when they will be recomputed always here...
A new temporary variable CopyHint::HintOrder, in order to properly mimic the current behavior on trunk for NFC. This is because on trunk, the best hint is kept in the order of discovery, which is not what the CopyHints set was doing.

These hacks aren't very pretty, so PLEASE EVERYONE: HELP REMOVE THEM BY ENABLING THE MULTIPLE COPY HINTS!

@nemanjai: You approved this for Power PC, but one more test has changed now (licm-tocReg.ll), so please get the previous test changes from here on Phabricator (which apply fine) and review the new one, and then return true in enableMultipleCopyHints() once this is approved.

@tstellard, @efriedman: Thanks for helping reviewing the tests on your targets. I have not enabled those targets as I have not checked on those tests on latest trunk. I hope you can download those tests from here on Phabricator and make sure eveything's green as you enable this.

Please let me know if this is ok to commit.

jonpa requested review of this revision.Nov 28 2017, 6:16 AM

jonpa edited edge metadata.

ping!

The SystemZ changes LGTM. Given that Quentin approved the common part, this should be fine.

This revision is now accepted and ready to land.Dec 5 2017, 2:27 AM

Commited as r319754.

I will not close this review since the next step is to remove the temporary enableMultipleCopyHints() hook, after everyone is returning true.

Tom, Nemanja, Martin, Eli... I hope you will try this soon...

Targets that implement getRegAllocationHints() (ARM), might want to review that implementation in regards to the new common-code...

This has been committed, but is still only enabled for SystemZ.

I thought we might try to make progress on enabling this on all targets, so I decided to now do one target at a time.

First out is PowerPC:

CodeGen/PowerPC/licm-tocReg.ll: Updated. Seems to use one instruction (mr) less.
load-two-flts.ll: Updated: Smaller immediate offsets should be better, I hope.
ppc64-byval-align.ll: Updated: One move less, it seems.
select-i1-vs-i1.ll: Updated: a few functions a bit different. If-conversion?

Nemanja?

Ping!

Could anyone on PowerPC confirm these test changes and perhaps also that number of COPYs are decreasing overall, please.

jonpa requested review of this revision.Jan 18 2018, 9:03 AM

In D38128#980403, @jonpa wrote:

Ping!

Could anyone on PowerPC confirm these test changes and perhaps also that number of COPYs are decreasing overall, please.

Sorry about the delay. The test case changes look neutral or like improvements (slect-i1-vs-i1.ll looks like an improvement). I'm running it now to see the overall impact on the number of register copies. I'll report back soon.

This also reduces the overall number of register moves slightly. LGTM

This revision is now accepted and ready to land.Jan 29 2018, 10:36 AM

Thanks Nemanja, this is now enabled (r323858) also for PowerPC :-)

A 'ping' to the other targets that I updated tests for a while back ago - has anyone made any progress?

Thank you Jonas. As this no longer needs PPC input, I'm resigning from the review.

Let's try to enable this for AArch64, next.

I have reapplied the test changes from earlier. Martin, does this look OK?

Yes, this looks like mostly no-op reorderings. I don't remember exactly what part I thought was a potential performance regression in swifterror.ll before - it moves one "mov" instruction (in two places) to before a branch, so potentially executing one instruction more than before. I would say it's most probably ok (and the gains in some of the other register shuffling tests would make it sound like a net gain in any case).

I'm in no way authoritative for this though, so perhaps @aemerson or @qcolombet would like to give such a stamp of approval?

So with the absolutely minimal performance regression in swifterror.ll and lack of response otherwise, I think I'm ok with giving it the LGTM for aarch64.

This revision is now accepted and ready to land.Feb 9 2018, 12:44 AM

In D38128#1002953, @mstorsjo wrote:

So with the absolutely minimal performance regression in swifterror.ll and lack of response otherwise, I think I'm ok with giving it the LGTM for aarch64.

Thank you, Martin. Committed as r324720.

mstorsjo resigned from this revision.Feb 9 2018, 1:38 AM

This revision now requires review to proceed.Feb 9 2018, 1:38 AM

Sorry I didn't see this, I need to fix my email filters.

This revision is now accepted and ready to land.Feb 9 2018, 2:39 AM

aemerson closed this revision.Feb 9 2018, 2:39 AM

Enabled for ARM. Mostly same test updates as previously (still passing), plus a few more.

Somebody take a look, please. Eli?

jonpa reopened this revision.Feb 9 2018, 2:40 AM

This revision is now accepted and ready to land.Feb 9 2018, 2:40 AM

jonpa requested review of this revision.Feb 9 2018, 2:41 AM

Ping!

ARM test updates are waiting for approval.

ARM test changes LGTM

This revision is now accepted and ready to land.Feb 15 2018, 6:05 PM

In D38128#1009606, @efriedma wrote:

ARM test changes LGTM

Thanks, Eli.

r325327.

Enabled for AMDGPU, with just a few minor test updates.

@tstellar : This looks to be a subset of the test changes you previously approved, so I am hoping they still look ok?

[AMDGPU] Just a few test updates in need of approval.

Changes look non-essential to me.

This revision is now accepted and ready to land.Feb 16 2018, 8:13 AM

In D38128#1010271, @rampitec wrote:

Changes look non-essential to me.

Thank you, Stanislav. I committed this even though a last-minute change slipped in (diff from approved changes):

--- a/test/CodeGen/AMDGPU/ret.ll
+++ b/test/CodeGen/AMDGPU/ret.ll
@@ -178,8 +178,8 @@ bb:
 }

 ; GCN-LABEL: {{^}}sgpr:
-; GCN-DAG: s_add_i32 s0, s3, 2
-; GCN-DAG: s_mov_b32 s2, s3
+; GCN: s_mov_b32 s2, s3
+; GCN: s_add_i32 s0, s2, 2
 ; GCN-NOT: s_endpgm
 define amdgpu_vs { i32, i32, i32 } @sgpr([9 x <16 x i8>] addrspace(4)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
 bb:

This looks harmless to me - the add is using s2 instead of s3, directly after the move of s3 to s2, right? Please take a look.

Committed as r325425.

[BPF] multiple copy hints enabled. Tests updated and passing, please review.

[BPF] Please review updated tests.

In D38128#1011287, @jonpa wrote:
In D38128#1010271, @rampitec wrote:

Changes look non-essential to me.

Thank you, Stanislav. I committed this even though a last-minute change slipped in (diff from approved changes):
--- a/test/CodeGen/AMDGPU/ret.ll
+++ b/test/CodeGen/AMDGPU/ret.ll
@@ -178,8 +178,8 @@ bb:
 }

 ; GCN-LABEL: {{^}}sgpr:
-; GCN-DAG: s_add_i32 s0, s3, 2
-; GCN-DAG: s_mov_b32 s2, s3
+; GCN: s_mov_b32 s2, s3
+; GCN: s_add_i32 s0, s2, 2
 ; GCN-NOT: s_endpgm
 define amdgpu_vs { i32, i32, i32 } @sgpr([9 x <16 x i8>] addrspace(4)* byval %arg, i32 inreg %arg1, i32 inreg %arg2, float %arg3) #0 {
 bb:
This looks harmless to me - the add is using s2 instead of s3, directly after the move of s3 to s2, right? Please take a look.

Committed as r325425.

This change in mov/add does not affect anything, thanks.

(Yonghong Song)
I checked BFP, yes. The patch looks good to me and you can add my ACK and push in.

BFP enabled by r325457.

[Hexagon] Enabled with updated tests.

Please take a look at the test updates.

The Hexagon changes look ok.

This revision is now accepted and ready to land.Feb 21 2018, 7:52 AM

In D38128#1014574, @kparzysz wrote:

The Hexagon changes look ok.

Thank you.
r325697

[Mips] Multiple regalloc hints enabled with updated tests.

This revision now requires review to proceed.Feb 22 2018, 1:25 AM

sdardis mentioned this in rL325770: [mips] Regenerate tests for D38128 (NFC).Feb 22 2018, 3:56 AM

Hi Jonas, I've regenerated the tests in rL325770 and the changes seem ok. The main difference I'm seeing is some moves are being eliminated, but in a few cases in test/CodeGen/Mips/analyzebranch.ll the earlier move prevents a delay slot being filled but otherwise looks ok.

https://reviews.llvm.org/differential/diff/135383 is my local copy of this patch against the regenerated tests.

This revision is now accepted and ready to land.Feb 22 2018, 4:57 AM

sdardis patch applied.

In D38128#1015672, @sdardis wrote:

Hi Jonas, I've regenerated the tests in rL325770 and the changes seem ok. The main difference I'm seeing is some moves are being eliminated, but in a few cases in test/CodeGen/Mips/analyzebranch.ll the earlier move prevents a delay slot being filled but otherwise looks ok.

https://reviews.llvm.org/differential/diff/135383 is my local copy of this patch against the regenerated tests.

Thank you, Simon. I applied your patch with regenerated tests and committed it as r325870.

[SPARC] Patch enabled - please review the updated tests.

This revision now requires review to proceed.Feb 23 2018, 1:00 AM

Sparc changes look reasonable.

This revision is now accepted and ready to land.Feb 23 2018, 8:33 AM

In D38128#1017405, @jyknight wrote:

Sparc changes look reasonable.

Thank you, James. r326028.

[XCore] patch enabled - please review updated test and approve patch.

This revision now requires review to proceed.Feb 24 2018, 3:57 AM

please see in-line comments

test/CodeGen/XCore/byVal.ll
45–46 ↗	(On Diff #135790)	Hi jonpa, The output in test/CodeGen/XCore/byVal.ll is incorrect. viz the value in r1 is not copied into r0 before being overwritten by the value in sp[2] (r0's indirected value). These two lines should be the other way around. sorry robert

This revision now requires changes to proceed.Feb 25 2018, 9:51 AM

Test updated again.

jonpa marked an inline comment as done.Feb 25 2018, 11:58 PM

jonpa added inline comments.

test/CodeGen/XCore/byVal.ll
45–46 ↗	(On Diff #135790)	Ah, I think 'r1' slipped through in the place of 'r11'... Does it look ok now?

LGTM
thank you
robert

This revision is now accepted and ready to land.Feb 26 2018, 12:01 AM

In D38128#1018974, @robertlytton wrote:

LGTM
thank you
robert

Thank you, Robert. r326069

X86 is now the final backend to enable :-)

This patch enables X86 with regenerated tests in all (155) tests that previously had the "NOTE: : Assertions have been autogenerated..." comment. I have manually updated the remaining (28) ones.

Please review these test changes.

After this, we can finally get rid of the enableMultipleCopyHints() hook.

This revision now requires review to proceed.Feb 26 2018, 5:19 AM

thegameg added a subscriber: thegameg.Feb 26 2018, 7:29 AM

This looks like a nice improvement modulo a few issues:

In a number of places (notably register arguments to shift instructions) we generate movq instead of movl that's shorter and equivalent. I believe there's no real performance difference. Certainly we should for when compiling for size.
There's some unnecessary shuffling of register names in SSE4 tests.

Someone who's a more familiar with Debug should double check those tests though they seem fine to me.

test/CodeGen/X86/fast-isel-shift.ll
22 ↗	(On Diff #135887)	Why are we getting two kill comments about cx here?
test/CodeGen/X86/schedule-x86-64-shld.ll
129 ↗	(On Diff #135887)	This should be movl given optsize.
test/CodeGen/X86/sret-implicit.ll
13 ↗	(On Diff #135887)	This shouldn't be DAG matches anymore the 2nd line should always come first
27 ↗	(On Diff #135887)	Same as above
test/CodeGen/X86/vector-shift-ashr-128.ll
270 ↗	(On Diff #135887)	All of the SSE4 changes regarding shifts seem to generate unnecessary register shuffling.
test/CodeGen/X86/vectorcall.ll
26 ↗	(On Diff #135887)	another case of movq vs movl
152 ↗	(On Diff #135887)	This is probably fine, but confusing. Can you decompose this up into the X86 and X64? In fact, this file should probably be autogenerated with utils/update_llc_test_checks.py

RKSimon commandeered this revision.Apr 9 2018, 1:19 AM

RKSimon edited reviewers, added: jonpa; removed: RKSimon.

ping! Please don't forget to do this for X86, we still want to get rid of that temporary hook... :-)

RKSimon mentioned this in rL338262: [X86] Regenerate fast-isel tests..Jul 30 2018, 9:14 AM

RKSimon mentioned this in rL338264: [X86] Regenerate PKU test to merge 32/64-bit rdpkru checks.

RKSimon mentioned this in rL338265: [X86] Regenerate NOBMI/BMI combine-select tests..Jul 30 2018, 9:20 AM

Updated all x86 tests to trunk latest

Has anyone looked at @niravd's comments?

In D38128#1180526, @craig.topper wrote:

Has anyone looked at @niravd's comments?

I haven't - just updated all the tests

RKSimon added inline comments.Jul 30 2018, 11:50 AM

test/CodeGen/X86/vector-shift-ashr-128.ll
270 ↗	(On Diff #135887)	If I had to guess - this is probably due to SSE41's PBLENDV instructions being hardwired to use xmm0

RKSimon mentioned this in D52121: [X86] Fold (movmsk (setne (and X, (1 << C)), 0)) -> (movmsk (X << C)).Sep 14 2018, 3:10 PM

RKSimon mentioned this in D52109: [TwoAddressInstructionPass] Don't update SrcRegMap for copies inserted for tied register constraint when the src isn't killed.Sep 15 2018, 3:25 AM

Is this ready or any blockers?

rebased to trunk

In D38128#1235889, @xbolva00 wrote:

Is this ready or any blockers?

Other than the poor codegen issues (but not regressions) commented on by @niravd I don't think there are any blockers.

OK to commit?

The codesize issues are minor and shouldn't hold this patch up. The only blocker I see is the unnecessary data shuffling for SSE41 codegen which someone else should decide on.

In D38128#1238605, @RKSimon wrote:

OK to commit?

In D38128#1238797, @niravd wrote:

The codesize issues are minor and shouldn't hold this patch up. The only blocker I see is the unnecessary data shuffling for SSE41 codegen which someone else should decide on.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

I've gone through and marked all the places.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

test/CodeGen/X86/vector-shift-lshr-128.ll
232 ↗	(On Diff #165646)	Extra instruction here
test/CodeGen/X86/vselect-minmax.ll
4539 ↗	(On Diff #165646)	Extra instruction here.
4660 ↗	(On Diff #165646)	Extra instruction here.
4781 ↗	(On Diff #165646)	Extra Instruction here.
4901 ↗	(On Diff #165646)	Extra instruction here.
5019 ↗	(On Diff #165646)	Extra instructions here.
5170 ↗	(On Diff #165646)	Extra instructions here.
5319 ↗	(On Diff #165646)	Extra instructions here.
5468 ↗	(On Diff #165646)	Extra instructions here.
6995 ↗	(On Diff #165646)	Extra instructions here.
7116 ↗	(On Diff #165646)	Extra instruction here.
7235 ↗	(On Diff #165646)	Extra instruction here.
7356 ↗	(On Diff #165646)	Extra instructions here.
7505 ↗	(On Diff #165646)	Extra Instructions here.
7655 ↗	(On Diff #165646)	Extra instructions here.

In D38128#1239480, @niravd wrote:

I've gone through and marked all the places.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

These are all due to pblendvb/blendvpd/blendvps being hardwired to use the xmm0 as the mask register (limit goes away for avx)

In D38128#1239512, @RKSimon wrote:

In D38128#1239480, @niravd wrote:

I've gone through and marked all the places.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

These are all due to pblendvb/blendvpd/blendvps being hardwired to use the xmm0 as the mask register (limit goes away for avx)

Yep - thanks for marking all of those. That corner case shouldn't hold up the general improvement, and the number of customers specifically targeting SSE4.1 should go to zero over time, so LGTM.

This revision is now accepted and ready to land.Sep 19 2018, 9:41 AM

Closed by commit rL342578: [X86] Handle COPYs of physregs better (regalloc hints) (authored by RKSimon). · Explain WhySep 19 2018, 12:01 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

include/

llvm/

CodeGen/

MachineRegisterInfo.h

44 lines

lib/

CodeGen/

CalcSpillWeights.cpp

55 lines

TargetRegisterInfo.cpp

46 lines

test/

CodeGen/

SystemZ/

call-05.ll

3 lines

call-args-coalesce.mir

45 lines

9 lines

9 lines

1 line

30 lines

Diff 116666

include/llvm/CodeGen/MachineRegisterInfo.h

Show First 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	private:
/// and false otherwise.		/// and false otherwise.
bool IsUpdatedCSRsInitialized;		bool IsUpdatedCSRsInitialized;

/// Contains the updated callee saved register list.		/// Contains the updated callee saved register list.
/// As opposed to the static list defined in register info,		/// As opposed to the static list defined in register info,
/// all registers that were disabled are removed from the list.		/// all registers that were disabled are removed from the list.
SmallVector<MCPhysReg, 16> UpdatedCSRs;		SmallVector<MCPhysReg, 16> UpdatedCSRs;

/// RegAllocHints - This vector records register allocation hints for virtual		/// RegAllocHints - This vector records register allocation hints for
/// registers. For each virtual register, it keeps a register and hint type		/// virtual registers. For each virtual register, it keeps a pair of hint
/// pair making up the allocation hint. Hint type is target specific except		/// type and hints vector making up the allocation hints. Hint type is
/// for the value 0 which means the second value of the pair is the preferred		/// target specific except for the value 0. If the hinted register is
/// register for allocation. For example, if the hint is <0, 1024>, it means		/// virtual, it means the allocator should prefer the physical register
/// the allocator should prefer the physical register allocated to the virtual		/// allocated to it if any.
/// register of the hint.		IndexedMap<std::pair<unsigned, SmallVector<unsigned, 4>>,
IndexedMap<std::pair<unsigned, unsigned>, VirtReg2IndexFunctor> RegAllocHints;		VirtReg2IndexFunctor> RegAllocHints;

/// PhysRegUseDefLists - This is an array of the head of the use/def list for		/// PhysRegUseDefLists - This is an array of the head of the use/def list for
/// physical registers.		/// physical registers.
std::unique_ptr<MachineOperand *[]> PhysRegUseDefLists;		std::unique_ptr<MachineOperand *[]> PhysRegUseDefLists;

/// getRegUseDefListHead - Return the head pointer for the register use/def		/// getRegUseDefListHead - Return the head pointer for the register use/def
/// list for the specified virtual or physical register.		/// list for the specified virtual or physical register.
MachineOperand *&getRegUseDefListHead(unsigned RegNo) {		MachineOperand *&getRegUseDefListHead(unsigned RegNo) {
▲ Show 20 Lines • Show All 596 Lines • ▼ Show 20 Lines	#endif

/// getNumVirtRegs - Return the number of virtual registers created.		/// getNumVirtRegs - Return the number of virtual registers created.
unsigned getNumVirtRegs() const { return VRegInfo.size(); }		unsigned getNumVirtRegs() const { return VRegInfo.size(); }

/// clearVirtRegs - Remove all virtual registers (after physreg assignment).		/// clearVirtRegs - Remove all virtual registers (after physreg assignment).
void clearVirtRegs();		void clearVirtRegs();

/// setRegAllocationHint - Specify a register allocation hint for the		/// setRegAllocationHint - Specify a register allocation hint for the
/// specified virtual register.		/// specified virtual register. Any previous hints are cleared.
void setRegAllocationHint(unsigned VReg, unsigned Type, unsigned PrefReg) {		void setRegAllocationHint(unsigned VReg, unsigned Type, unsigned PrefReg) {
assert(TargetRegisterInfo::isVirtualRegister(VReg));		assert(TargetRegisterInfo::isVirtualRegister(VReg));
RegAllocHints[VReg].first = Type;		RegAllocHints[VReg].first = Type;
RegAllocHints[VReg].second = PrefReg;		RegAllocHints[VReg].second.clear();
		RegAllocHints[VReg].second.push_back(PrefReg);
		}

		/// addRegAllocationHint - Add a register allocation hint to the hints
		/// vector for VReg. Currently assumes any VReg getting hints by calling
		/// this will only have target independent hints.
		void addRegAllocationHint(unsigned VReg, unsigned PrefReg) {
		assert(TargetRegisterInfo::isVirtualRegister(VReg));
		assert(RegAllocHints[VReg].first == 0 &&
		"Only adding multiple copy hints for now");
		RegAllocHints[VReg].second.push_back(PrefReg);
}		}

/// Specify the preferred register allocation hint for the specified virtual		/// Specify the preferred register allocation hint for the specified virtual
/// register.		/// register.
void setSimpleHint(unsigned VReg, unsigned PrefReg) {		void setSimpleHint(unsigned VReg, unsigned PrefReg) {
setRegAllocationHint(VReg, /Type=/0, PrefReg);		setRegAllocationHint(VReg, /Type=/0, PrefReg);
}		}

/// getRegAllocationHint - Return the register allocation hint for the		/// getRegAllocationHint - Return the register allocation hint for the
/// specified virtual register.		/// specified virtual register. If there are many hints, this returns the
		/// one with the greatest weight.
std::pair<unsigned, unsigned>		std::pair<unsigned, unsigned>
getRegAllocationHint(unsigned VReg) const {		getRegAllocationHint(unsigned VReg) const {
assert(TargetRegisterInfo::isVirtualRegister(VReg));		assert(TargetRegisterInfo::isVirtualRegister(VReg));
		unsigned BestHint = (RegAllocHints[VReg].second.size() ?
		RegAllocHints[VReg].second[0] : 0);
		return std::pair<unsigned, unsigned>(RegAllocHints[VReg].first, BestHint);
		}

		/// getRegAllocationHints - Return a reference to the vector of all
		/// register allocation hints for VReg.
		const std::pair<unsigned, SmallVector<unsigned, 4> >
		&getRegAllocationHints(unsigned VReg) const {
		assert(TargetRegisterInfo::isVirtualRegister(VReg));
return RegAllocHints[VReg];		return RegAllocHints[VReg];
}		}

/// getSimpleHint - Return the preferred register allocation hint, or 0 if a		/// getSimpleHint - Return the preferred register allocation hint, or 0 if a
/// standard simple hint (Type == 0) is not set.		/// standard simple hint (Type == 0) is not set.
unsigned getSimpleHint(unsigned VReg) const {		unsigned getSimpleHint(unsigned VReg) const {
assert(TargetRegisterInfo::isVirtualRegister(VReg));		assert(TargetRegisterInfo::isVirtualRegister(VReg));
std::pair<unsigned, unsigned> Hint = getRegAllocationHint(VReg);		std::pair<unsigned, unsigned> Hint = getRegAllocationHint(VReg);
return Hint.first ? 0 : Hint.second;		return Hint.first ? 0 : Hint.second;
}		}

/// markUsesInDebugValueAsUndef - Mark every DBG_VALUE referencing the		/// markUsesInDebugValueAsUndef - Mark every DBG_VALUE referencing the
/// specified register as undefined which causes the DBG_VALUE to be		/// specified register as undefined which causes the DBG_VALUE to be
/// deleted during LiveDebugVariables analysis.		/// deleted during LiveDebugVariables analysis.
		qcolombetUnsubmitted Not Done Reply Inline Actions Could we use SmallVectorImpl to not leak the size of the vector? qcolombet: Could we use SmallVectorImpl to not leak the size of the vector?
		jonpaUnsubmitted Not Done Reply Inline Actions I'm not sure I tried all atlernatives, but there seems to be some constructor missing or something, because I get: include/llvm/ADT/IndexedMap.h:42:29: error: no matching function for call to ‘std::pair<unsigned int, llvm::SmallVectorImpl<unsigned int> >::pair()’ IndexedMap() : nullVal_(T()) {} ^~~ I simply did @ -89,11 +89,11 @@ private: /// type and hints vector making up the allocation hints. Only the first /// hint may be target specific, and in that case this is reflected by the /// first member of the pair being non-zero. If the hinted register is /// virtual, it means the allocator should prefer the physical register /// allocated to it if any. - IndexedMap<std::pair<unsigned, SmallVector<unsigned, 4>>, + IndexedMap<std::pair<unsigned, SmallVectorImpl<unsigned>>, VirtReg2IndexFunctor> RegAllocHints; /// PhysRegUseDefLists - This is an array of the head of the use/def list for /// physical registers. std::unique_ptr<MachineOperand []> PhysRegUseDefLists; @@ -750,11 +750,11 @@ public: return Hint.first ? 0 : Hint.second; } /// getRegAllocationHints - Return a reference to the vector of all /// register allocation hints for VReg. - const std::pair<unsigned, SmallVector<unsigned, 4>> + const std::pair<unsigned, SmallVectorImpl<unsigned>> &getRegAllocationHints(unsigned VReg) const { assert(TargetRegisterInfo::isVirtualRegister(VReg)); return RegAllocHints[VReg]; } BTW, it would be nice to know how SmallVector would leak if using it here... jonpa:* I'm not sure I tried all atlernatives, but there seems to be some constructor missing or…
		qcolombetUnsubmitted Not Done Reply Inline Actions By leak I mean exposing implementation details to users :) qcolombet: By leak I mean exposing implementation details to users :)
		jonpaUnsubmitted Not Done Reply Inline Actions Aah, I see :-) jonpa: Aah, I see :-)
void markUsesInDebugValueAsUndef(unsigned Reg) const;		void markUsesInDebugValueAsUndef(unsigned Reg) const;

/// Return true if the specified register is modified in this function.		/// Return true if the specified register is modified in this function.
/// This checks that no defining machine operands exist for the register or		/// This checks that no defining machine operands exist for the register or
/// any of its aliases. Definitions found on functions marked noreturn are		/// any of its aliases. Definitions found on functions marked noreturn are
/// ignored, to consider them pass 'true' for optional parameter		/// ignored, to consider them pass 'true' for optional parameter
/// SkipNoReturnDef. The register is also considered modified when it is set		/// SkipNoReturnDef. The register is also considered modified when it is set
/// in the UsedPhysRegMask.		/// in the UsedPhysRegMask.
▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

lib/CodeGen/CalcSpillWeights.cpp

Show First 20 Lines • Show All 57 Lines • ▼ Show 20 Lines	static unsigned copyHint(const MachineInstr *mi, unsigned reg,
}		}

if (!hreg)		if (!hreg)
return 0;		return 0;

if (TargetRegisterInfo::isVirtualRegister(hreg))		if (TargetRegisterInfo::isVirtualRegister(hreg))
return sub == hsub ? hreg : 0;		return sub == hsub ? hreg : 0;

		unsigned CopiedPReg = (hsub ? tri.getSubReg(hreg, hsub) : hreg);
const TargetRegisterClass *rc = mri.getRegClass(reg);		const TargetRegisterClass *rc = mri.getRegClass(reg);
		if (rc->contains(CopiedPReg))
		return CopiedPReg;

// Only allow physreg hints in rc.		// Check if reg:sub matches so that a super register could be hinted.
if (sub == 0)		if (sub)
return rc->contains(hreg) ? hreg : 0;		return tri.getMatchingSuperReg(CopiedPReg, sub, rc);

// reg:sub should match the physreg hreg.		return 0;
return tri.getMatchingSuperReg(hreg, sub, rc);
}		}

// Check if all values in LI are rematerializable		// Check if all values in LI are rematerializable
static bool isRematerializable(const LiveInterval &LI,		static bool isRematerializable(const LiveInterval &LI,
const LiveIntervals &LIS,		const LiveIntervals &LIS,
VirtRegMap *VRM,		VirtRegMap *VRM,
const TargetInstrInfo &TII) {		const TargetInstrInfo &TII) {
unsigned Reg = LI.reg;		unsigned Reg = LI.reg;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	VirtRegAuxInfo::calculateSpillWeightAndHint(LiveInterval &li) {
const TargetRegisterInfo &tri = *MF.getSubtarget().getRegisterInfo();		const TargetRegisterInfo &tri = *MF.getSubtarget().getRegisterInfo();
MachineBasicBlock *mbb = nullptr;		MachineBasicBlock *mbb = nullptr;
MachineLoop *loop = nullptr;		MachineLoop *loop = nullptr;
bool isExiting = false;		bool isExiting = false;
float totalWeight = 0;		float totalWeight = 0;
unsigned numInstr = 0; // Number of instructions using li		unsigned numInstr = 0; // Number of instructions using li
SmallPtrSet<MachineInstr*, 8> visited;		SmallPtrSet<MachineInstr*, 8> visited;

// Find the best physreg hint and the best virtreg hint.
float bestPhys = 0, bestVirt = 0;
unsigned hintPhys = 0, hintVirt = 0;

// Don't recompute a target specific hint.		// Don't recompute a target specific hint.
bool noHint = mri.getRegAllocationHint(li.reg).first != 0;		bool noHint = mri.getRegAllocationHint(li.reg).first != 0;

// Don't recompute spill weight for an unspillable register.		// Don't recompute spill weight for an unspillable register.
bool Spillable = li.isSpillable();		bool Spillable = li.isSpillable();

		// CopyHint is a sortable hint derived from a COPY instruction.
		struct CopyHint {
		unsigned Reg;
		float Weight;
		bool IsPhys;
		CopyHint(unsigned R, float W, bool P) : Reg(R), Weight(W), IsPhys(P) {}
		bool operator<(const CopyHint &rhs) const {
		// Always prefer any physreg hint.
		if (IsPhys != rhs.IsPhys)
		return (IsPhys && !rhs.IsPhys);
		if (Weight > rhs.Weight)
		return true;
		// (just for the purpose of maintaining the set)
		return Reg < rhs.Reg;
		}
		};

		std::set<CopyHint> CopyHints;
for (MachineRegisterInfo::reg_instr_iterator		for (MachineRegisterInfo::reg_instr_iterator
I = mri.reg_instr_begin(li.reg), E = mri.reg_instr_end();		I = mri.reg_instr_begin(li.reg), E = mri.reg_instr_end();
I != E; ) {		I != E; ) {
MachineInstr mi = &(I++);		MachineInstr mi = &(I++);
numInstr++;		numInstr++;
if (mi->isIdentityCopy() \|\| mi->isImplicitDef() \|\| mi->isDebugValue())		if (mi->isIdentityCopy() \|\| mi->isImplicitDef() \|\| mi->isDebugValue())
continue;		continue;
if (!visited.insert(mi).second)		if (!visited.insert(mi).second)
Show All 26 Lines	for (MachineRegisterInfo::reg_instr_iterator
unsigned hint = copyHint(mi, li.reg, tri, mri);		unsigned hint = copyHint(mi, li.reg, tri, mri);
if (!hint)		if (!hint)
continue;		continue;
// Force hweight onto the stack so that x86 doesn't add hidden precision,		// Force hweight onto the stack so that x86 doesn't add hidden precision,
// making the comparison incorrectly pass (i.e., 1 > 1 == true??).		// making the comparison incorrectly pass (i.e., 1 > 1 == true??).
//		//
// FIXME: we probably shouldn't use floats at all.		// FIXME: we probably shouldn't use floats at all.
volatile float hweight = Hint[hint] += weight;		volatile float hweight = Hint[hint] += weight;
if (TargetRegisterInfo::isPhysicalRegister(hint)) {		CopyHints.insert(CopyHint(hint, hweight, tri.isPhysicalRegister(hint)));
if (hweight > bestPhys && mri.isAllocatable(hint)) {
bestPhys = hweight;
hintPhys = hint;
}
} else {
if (hweight > bestVirt) {
bestVirt = hweight;
hintVirt = hint;
}
}
}		}

Hint.clear();		Hint.clear();

// Always prefer the physreg hint.		// Pass all the sorted copy hints to mri.
if (unsigned hint = hintPhys ? hintPhys : hintVirt) {		for (auto &Hint : CopyHints)
mri.setRegAllocationHint(li.reg, 0, hint);		mri.addRegAllocationHint(li.reg, Hint.Reg);

		if (CopyHints.size())
// Weakly boost the spill weight of hinted registers.		// Weakly boost the spill weight of hinted registers.
totalWeight *= 1.01F;		totalWeight *= 1.01F;
}

// If the live interval was already unspillable, leave it that way.		// If the live interval was already unspillable, leave it that way.
if (!Spillable)		if (!Spillable)
return;		return;

// Mark li as unspillable if all live ranges are tiny and the interval		// Mark li as unspillable if all live ranges are tiny and the interval
// is not live at any reg mask. If the interval is live at a reg mask		// is not live at any reg mask. If the interval is live at a reg mask
// spilling may be required.		// spilling may be required.
Show All 15 Lines

lib/CodeGen/TargetRegisterInfo.cpp

	Show First 20 Lines • Show All 362 Lines • ▼ Show 20 Lines
	void			void
	TargetRegisterInfo::getRegAllocationHints(unsigned VirtReg,			TargetRegisterInfo::getRegAllocationHints(unsigned VirtReg,
	ArrayRef<MCPhysReg> Order,			ArrayRef<MCPhysReg> Order,
	SmallVectorImpl<MCPhysReg> &Hints,			SmallVectorImpl<MCPhysReg> &Hints,
	const MachineFunction &MF,			const MachineFunction &MF,
	const VirtRegMap *VRM,			const VirtRegMap *VRM,
	const LiveRegMatrix *Matrix) const {			const LiveRegMatrix *Matrix) const {
	const MachineRegisterInfo &MRI = MF.getRegInfo();			const MachineRegisterInfo &MRI = MF.getRegInfo();
	std::pair<unsigned, unsigned> Hint = MRI.getRegAllocationHint(VirtReg);			const std::pair<unsigned, SmallVector<unsigned, 4>> &Hints_MRI =
				MRI.getRegAllocationHints(VirtReg);

	// Hints with HintType != 0 were set by target-dependent code.			// Hints with HintType != 0 were set by target-dependent code.
	// Such targets must provide their own implementation of			// Such targets must provide their own implementation of
	// TRI::getRegAllocationHints to interpret those hint types.			// TRI::getRegAllocationHints to interpret those hint types.
	assert(Hint.first == 0 && "Target must implement TRI::getRegAllocationHints");			assert(Hints_MRI.first == 0 &&
				"Target must implement TRI::getRegAllocationHints");

				for (auto Reg : Hints_MRI.second) {
	// Target-independent hints are either a physical or a virtual register.			// Target-independent hints are either a physical or a virtual register.
	unsigned Phys = Hint.second;			unsigned Phys = Reg;
	if (VRM && isVirtualRegister(Phys))			if (VRM && isVirtualRegister(Phys))
	Phys = VRM->getPhys(Phys);			Phys = VRM->getPhys(Phys);

	// Check that Phys is a valid hint in VirtReg's register class.			// Check that Phys is a valid hint in VirtReg's register class.
	if (!isPhysicalRegister(Phys))			if (!isPhysicalRegister(Phys))
	return;			return;
	if (MRI.isReserved(Phys))			if (MRI.isReserved(Phys))
	return;			return;
	// Check that Phys is in the allocation order. We shouldn't heed hints			// Check that Phys is in the allocation order. We shouldn't heed hints
	// from VirtReg's register class if they aren't in the allocation order. The			// from VirtReg's register class if they aren't in the allocation order. The
	// target probably has a reason for removing the register.			// target probably has a reason for removing the register.
	if (!is_contained(Order, Phys))			if (!is_contained(Order, Phys))
	return;			return;

	// All clear, tell the register allocator to prefer this register.			// All clear, tell the register allocator to prefer this register.
	Hints.push_back(Phys);			Hints.push_back(Phys);
	}			}
				}

	bool TargetRegisterInfo::canRealignStack(const MachineFunction &MF) const {			bool TargetRegisterInfo::canRealignStack(const MachineFunction &MF) const {
	return !MF.getFunction()->hasFnAttribute("no-realign-stack");			return !MF.getFunction()->hasFnAttribute("no-realign-stack");
	}			}

	bool TargetRegisterInfo::needsStackRealignment(			bool TargetRegisterInfo::needsStackRealignment(
	const MachineFunction &MF) const {			const MachineFunction &MF) const {
	const MachineFrameInfo &MFI = MF.getFrameInfo();			const MachineFrameInfo &MFI = MF.getFrameInfo();
	Show All 29 Lines

test/CodeGen/SystemZ/call-05.ll

Show First 20 Lines • Show All 365 Lines • ▼ Show 20 Lines	a:
ret void		ret void

b:		b:
store i32 1, i32 *@var;		store i32 1, i32 *@var;
ret void		ret void
}		}

; Check a conditional sibling call to an argument - will fail due to		; Check a conditional sibling call to an argument - will fail due to
; intervening lgr.		; intervening lgr.
		uweigandUnsubmitted Not Done Reply Inline Actions Comment should be updated -- this no longer fails now :-) uweigand: Comment should be updated -- this no longer fails now :-)
		jonpaUnsubmitted Not Done Reply Inline Actions Sorry - with the update this fails again, so no update... Should we make a point to try to handle this somehow? jonpa: Sorry - with the update this fails again, so no update... Should we make a point to try to…
define void @f21(i32 %val1, i32 %val2, void()* %fun) {		define void @f21(i32 %val1, i32 %val2, void()* %fun) {
; CHECK-LABEL: f21:		; CHECK-LABEL: f21:
; CHECK: crjhe %r2, %r3
; CHECK: lgr %r1, %r4		; CHECK: lgr %r1, %r4
; CHECK: br %r1		; CHECK: crbl %r2, %r3, 0(%r1)
; CHECK: br %r14		; CHECK: br %r14
%cond = icmp slt i32 %val1, %val2;		%cond = icmp slt i32 %val1, %val2;
br i1 %cond, label %a, label %b;		br i1 %cond, label %a, label %b;

a:		a:
tail call void %fun()		tail call void %fun()
ret void		ret void

▲ Show 20 Lines • Show All 80 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/call-args-coalesce.mir

This file was added.

				# RUN: llc -mtriple=s390x-linux-gnu -start-before=greedy %s -o - \| FileCheck %s
				# Test that %r2d is copied directly to %r1d

				--- \|

				define void @f5(void (i32, i32, i32, i32)* %foo) {
				tail call void %foo(i32 1, i32 2, i32 3, i32 4)
				ret void
				}

				; Function Attrs: nounwind
				declare void @llvm.stackprotector(i8, i8*) #0

				attributes #0 = { nounwind }

				...

				# CHECK: lgr %r1, %r2
				# CHECK-NOT: lgr

				---
				name: f5
				alignment: 2
				tracksRegLiveness: true
				registers:
				- { id: 0, class: gr64bit }
				- { id: 1, class: gr32bit }
				- { id: 2, class: gr32bit }
				- { id: 3, class: gr32bit }
				- { id: 4, class: gr32bit }
				liveins:
				- { reg: '%r2d', virtual-reg: '%0' }
				body: \|
				bb.0 (%ir-block.0):
				liveins: %r2d

				%0 = COPY %r2d
				%r2l = LHI 1
				%r3l = LHI 2
				%r4l = LHI 3
				%r5l = LHI 4
				%r1d = COPY %0
				CallBR csr_systemz, implicit %r1d, implicit %r2l, implicit killed %r3l, implicit killed %r4l, implicit killed %r5l

				...

test/CodeGen/SystemZ/fp-sqrt-01.ll

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store volatile float %sqrt16, float *%ptr		store volatile float %sqrt16, float *%ptr

ret void		ret void
}		}

; Check that a call to the normal sqrtf function is lowered.		; Check that a call to the normal sqrtf function is lowered.
define float @f8(float %dummy, float %val) {		define float @f8(float %dummy, float %val) {
; CHECK-LABEL: f8:		; CHECK-LABEL: f8:
; CHECK: sqebr %f0, %f2		; CHECK: sqebr %f1, %f0
; CHECK: cebr %f0, %f0		; CHECK: cebr %f1, %f1
; CHECK: bnor %r14		; CHECK: jgo sqrtf@PLT
; CHECK: {{ler\|ldr}} %f0, %f2		; CHECK: {{ler\|ldr}} %f0, %f1
; CHECK: jg sqrtf@PLT
%res = tail call float @sqrtf(float %val)		%res = tail call float @sqrtf(float %val)
ret float %res		ret float %res
}		}

test/CodeGen/SystemZ/fp-sqrt-02.ll

Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines	; CHECK: br %r14
store volatile double %sqrt16, double *%ptr		store volatile double %sqrt16, double *%ptr

ret void		ret void
}		}

; Check that a call to the normal sqrt function is lowered.		; Check that a call to the normal sqrt function is lowered.
define double @f8(double %dummy, double %val) {		define double @f8(double %dummy, double %val) {
; CHECK-LABEL: f8:		; CHECK-LABEL: f8:
; CHECK: sqdbr %f0, %f2		; CHECK: sqdbr %f1, %f0
; CHECK: cdbr %f0, %f0		; CHECK: cdbr %f1, %f1
; CHECK: bnor %r14		; CHECK: jgo sqrt@PLT
; CHECK: ldr %f0, %f2		; CHECK: ldr %f0, %f1
; CHECK: jg sqrt@PLT
%res = tail call double @sqrt(double %val)		%res = tail call double @sqrt(double %val)
ret double %res		ret double %res
}		}

test/CodeGen/SystemZ/swift-return.ll

	Show All 33 Lines
	}			}

	declare swiftcc { i16, i8 } @gen(i32)			declare swiftcc { i16, i8 } @gen(i32)

	; If we can't pass every return value in registers, we will pass everything			; If we can't pass every return value in registers, we will pass everything
	; in memroy. The caller provides space for the return value and passes			; in memroy. The caller provides space for the return value and passes
	; the address in %r2. The first input argument will be in %r3.			; the address in %r2. The first input argument will be in %r3.
	; CHECK-LABEL: test2:			; CHECK-LABEL: test2:
	; CHECK: lr %[[REG1:r[0-9]+]], %r2			; CHECK: lr %[[REG1:r[0-9]+]], %r2
				uweigandUnsubmitted Done Reply Inline Actions This goes directly into %r3 now? Then the test should directly check for this -- the test is about verifying the correct ABI register usage after all. uweigand: This goes directly into %r3 now? Then the test should directly check for this -- the test is…
				jonpaUnsubmitted Not Done Reply Inline Actions yes. ok. jonpa: yes. ok.
	; CHECK-DAG: la %r2, 160(%r15)			; CHECK-DAG: la %r2, 160(%r15)
	; CHECK-DAG: lr %r3, %[[REG1]]
	; CHECK: brasl %r14, gen2			; CHECK: brasl %r14, gen2
	; CHECK: l %r2, 160(%r15)			; CHECK: l %r2, 160(%r15)
	; CHECK: a %r2, 164(%r15)			; CHECK: a %r2, 164(%r15)
	; CHECK: a %r2, 168(%r15)			; CHECK: a %r2, 168(%r15)
	; CHECK: a %r2, 172(%r15)			; CHECK: a %r2, 172(%r15)
	; CHECK: a %r2, 176(%r15)			; CHECK: a %r2, 176(%r15)
	; CHECK-O0-LABEL: test2:			; CHECK-O0-LABEL: test2:
	; CHECK-O0: st %r2, [[SPILL1:[0-9]+]](%r15)			; CHECK-O0: st %r2, [[SPILL1:[0-9]+]](%r15)
	▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

test/CodeGen/SystemZ/swifterror.ll

	Show All 28 Lines

	; "caller" calls "foo" that takes a swifterror parameter.			; "caller" calls "foo" that takes a swifterror parameter.
	define float @caller(i8* %error_ref) {			define float @caller(i8* %error_ref) {
	; CHECK-LABEL: caller:			; CHECK-LABEL: caller:
	; Make a copy of error_ref because r2 is getting clobbered			; Make a copy of error_ref because r2 is getting clobbered
	; CHECK: lgr %r[[REG1:[0-9]+]], %r2			; CHECK: lgr %r[[REG1:[0-9]+]], %r2
	; CHECK: lghi %r9, 0			; CHECK: lghi %r9, 0
	; CHECK: brasl %r14, foo			; CHECK: brasl %r14, foo
	; CHECK: cgijlh %r9, 0,			; CHECK: ltgr %r2, %r9
				; CHECK: jlh .LBB1_2
	; Access part of the error object and save it to error_ref			; Access part of the error object and save it to error_ref
	; CHECK: lb %r[[REG2:[0-9]+]], 8(%r9)			; CHECK: lb %r[[REG2:[0-9]+]], 8(%r2)
	; CHECK: stc %r[[REG2]], 0(%r[[REG1]])			; CHECK: stc %r[[REG2]], 0(%r[[REG1]])
	; CHECK: lgr %r2, %r9
	; CHECK: brasl %r14, free			; CHECK: brasl %r14, free
	; CHECK-O0-LABEL: caller:			; CHECK-O0-LABEL: caller:
	; CHECK-O0: lghi %r9, 0			; CHECK-O0: lghi %r9, 0
	; CHECK-O0: brasl %r14, foo			; CHECK-O0: brasl %r14, foo
	; CHECK-O0: cghi %r9, 0			; CHECK-O0: cghi %r9, 0
	; CHECK-O0: jlh			; CHECK-O0: jlh
	entry:			entry:
	%error_ptr_ref = alloca swifterror %swift_error*			%error_ptr_ref = alloca swifterror %swift_error*
	Show All 15 Lines

	; "caller2" is the caller of "foo", it calls "foo" inside a loop.			; "caller2" is the caller of "foo", it calls "foo" inside a loop.
	define float @caller2(i8* %error_ref) {			define float @caller2(i8* %error_ref) {
	; CHECK-LABEL: caller2:			; CHECK-LABEL: caller2:
	; Make a copy of error_ref because r2 is getting clobbered			; Make a copy of error_ref because r2 is getting clobbered
	; CHECK: lgr %r[[REG1:[0-9]+]], %r2			; CHECK: lgr %r[[REG1:[0-9]+]], %r2
	; CHECK: lghi %r9, 0			; CHECK: lghi %r9, 0
	; CHECK: brasl %r14, foo			; CHECK: brasl %r14, foo
	; CHECK: cgijlh %r9, 0,			; CHECK: ltgr %r2, %r9
				; CHECK: jlh .LBB2_4
	; CHECK: ceb %f0,			; CHECK: ceb %f0,
	; CHECK: jnh			; CHECK: jnh
	; Access part of the error object and save it to error_ref			; Access part of the error object and save it to error_ref
	; CHECK: lb %r[[REG2:[0-9]+]], 8(%r9)			; CHECK: lb %r[[REG2:[0-9]+]], 8(%r2)
	; CHECK: stc %r[[REG2]], 0(%r[[REG1]])			; CHECK: stc %r[[REG2]], 0(%r[[REG1]])
	; CHECK: lgr %r2, %r9
	; CHECK: brasl %r14, free			; CHECK: brasl %r14, free
	; CHECK-O0-LABEL: caller2:			; CHECK-O0-LABEL: caller2:
	; CHECK-O0: lghi %r9, 0			; CHECK-O0: lghi %r9, 0
	; CHECK-O0: brasl %r14, foo			; CHECK-O0: brasl %r14, foo
	; CHECK-O0: cghi %r9, 0			; CHECK-O0: cghi %r9, 0
	; CHECK-O0: jlh			; CHECK-O0: jlh
	entry:			entry:
	%error_ptr_ref = alloca swifterror %swift_error*			%error_ptr_ref = alloca swifterror %swift_error*
	▲ Show 20 Lines • Show All 153 Lines • ▼ Show 20 Lines
	; "caller3" calls "foo_sret" that takes a swifterror parameter.			; "caller3" calls "foo_sret" that takes a swifterror parameter.
	define float @caller3(i8* %error_ref) {			define float @caller3(i8* %error_ref) {
	; CHECK-LABEL: caller3:			; CHECK-LABEL: caller3:
	; Make a copy of error_ref because r2 is getting clobbered			; Make a copy of error_ref because r2 is getting clobbered
	; CHECK: lgr %r[[REG1:[0-9]+]], %r2			; CHECK: lgr %r[[REG1:[0-9]+]], %r2
	; CHECK: lhi %r3, 1			; CHECK: lhi %r3, 1
	; CHECK: lghi %r9, 0			; CHECK: lghi %r9, 0
	; CHECK: brasl %r14, foo_sret			; CHECK: brasl %r14, foo_sret
	; CHECK: cgijlh %r9, 0,			; CHECK: ltgr %r2, %r9
				; CHECK: jlh .LBB6_2
	; Access part of the error object and save it to error_ref			; Access part of the error object and save it to error_ref
	; CHECK: lb %r0, 8(%r9)			; CHECK: lb %r0, 8(%r2)
	; CHECK: stc %r0, 0(%r[[REG1]])			; CHECK: stc %r0, 0(%r[[REG1]])
	; CHECK: lgr %r2, %r9
	; CHECK: brasl %r14, free			; CHECK: brasl %r14, free

	; CHECK-O0-LABEL: caller3:			; CHECK-O0-LABEL: caller3:
	; CHECK-O0: lghi %r9, 0			; CHECK-O0: lghi %r9, 0
	; CHECK-O0: lhi %r3, 1			; CHECK-O0: lhi %r3, 1
	; CHECK-O0: stg %r2, {{.*}}(%r15)			; CHECK-O0: stg %r2, {{.*}}(%r15)
	; CHECK-O0: lgr %r2, {{.*}}			; CHECK-O0: lgr %r2, {{.*}}
	; CHECK-O0: brasl %r14, foo_sret			; CHECK-O0: brasl %r14, foo_sret
	Show All 29 Lines
	; time with a different swifterror value, from "alloca swifterror".			; time with a different swifterror value, from "alloca swifterror".
	define float @caller_with_multiple_swifterror_values(i8* %error_ref, i8* %error_ref2) {			define float @caller_with_multiple_swifterror_values(i8* %error_ref, i8* %error_ref2) {
	; CHECK-LABEL: caller_with_multiple_swifterror_values:			; CHECK-LABEL: caller_with_multiple_swifterror_values:
	; CHECK-DAG: lgr %r[[REG1:[0-9]+]], %r2			; CHECK-DAG: lgr %r[[REG1:[0-9]+]], %r2
	; CHECK-DAG: lgr %r[[REG2:[0-9]+]], %r3			; CHECK-DAG: lgr %r[[REG2:[0-9]+]], %r3
	; The first swifterror value:			; The first swifterror value:
	; CHECK: lghi %r9, 0			; CHECK: lghi %r9, 0
	; CHECK: brasl %r14, foo			; CHECK: brasl %r14, foo
	; CHECK: cgijlh %r9, 0,			; CHECK: ltgr %r2, %r9
				; CHECK: jlh .LBB7_2
	; Access part of the error object and save it to error_ref			; Access part of the error object and save it to error_ref
	; CHECK: lb %r0, 8(%r9)			; CHECK: lb %r0, 8(%r2)
	; CHECK: stc %r0, 0(%r[[REG1]])			; CHECK: stc %r0, 0(%r[[REG1]])
	; CHECK: lgr %r2, %r9
	; CHECK: brasl %r14, free			; CHECK: brasl %r14, free

	; The second swifterror value:			; The second swifterror value:
	; CHECK: lghi %r9, 0			; CHECK: lghi %r9, 0
	; CHECK: brasl %r14, foo			; CHECK: brasl %r14, foo
	; CHECK: cgijlh %r9, 0,			; CHECK: ltgr %r2, %r9
				; CHECK: jlh .LBB7_4
	; Access part of the error object and save it to error_ref			; Access part of the error object and save it to error_ref
	; CHECK: lb %r0, 8(%r9)			; CHECK: lb %r0, 8(%r2)
	; CHECK: stc %r0, 0(%r[[REG2]])			; CHECK: stc %r0, 0(%r[[REG2]])
	; CHECK: lgr %r2, %r9
	; CHECK: brasl %r14, free			; CHECK: brasl %r14, free

	; CHECK-O0-LABEL: caller_with_multiple_swifterror_values:			; CHECK-O0-LABEL: caller_with_multiple_swifterror_values:

	; The first swifterror value:			; The first swifterror value:
	; CHECK-O0: lghi %r9, 0			; CHECK-O0: lghi %r9, 0
	; CHECK-O0: brasl %r14, foo			; CHECK-O0: brasl %r14, foo
	; CHECK-O0: jlh			; CHECK-O0: jlh
	Show All 38 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Handle COPYs of physregs better (regalloc hints)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 116666

include/llvm/CodeGen/MachineRegisterInfo.h

lib/CodeGen/CalcSpillWeights.cpp

lib/CodeGen/TargetRegisterInfo.cpp

test/CodeGen/SystemZ/call-05.ll

test/CodeGen/SystemZ/call-args-coalesce.mir

test/CodeGen/SystemZ/fp-sqrt-01.ll

test/CodeGen/SystemZ/fp-sqrt-02.ll

test/CodeGen/SystemZ/swift-return.ll

test/CodeGen/SystemZ/swifterror.ll

Handle COPYs of physregs better (regalloc hints)
ClosedPublic