Page MenuHomePhabricator

Handle COPYs of physregs better (regalloc hints)
ClosedPublic

Authored by RKSimon on Sep 21 2017, 4:29 AM.

Details

Summary

While enabling the mischeduler for SystemZ, it was discovered that fore some reason a test needed one extra seemingly needless COPY (test/CodeGen/SystemZ/call-03.ll). The handling for that is resulted in this patch, which improves the register coalescing by providing not just one copy hint, but a sorted list of copy hints. On SystemZ, this gives ~12500 less register moves on SPEC, as well as marginally less spilling.

Instead of improving just the SystemZ backend, the improvement has been implemented in common-code (calculateSpillWeightAndHint(). This gives a lot of test failures, but since this should be a general improvement I hope that the involved targets will help and review the test updates.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
kparzysz accepted this revision.Feb 21 2018, 7:52 AM

The Hexagon changes look ok.

This revision is now accepted and ready to land.Feb 21 2018, 7:52 AM
jonpa added a comment.Feb 21 2018, 8:40 AM

The Hexagon changes look ok.

Thank you.
r325697

jonpa updated this revision to Diff 135373.Feb 22 2018, 1:25 AM
jonpa edited reviewers, added: sdardis, atanasyan; removed: kparzysz.

[Mips] Multiple regalloc hints enabled with updated tests.

This revision now requires review to proceed.Feb 22 2018, 1:25 AM
sdardis accepted this revision.Feb 22 2018, 4:57 AM

Hi Jonas, I've regenerated the tests in rL325770 and the changes seem ok. The main difference I'm seeing is some moves are being eliminated, but in a few cases in test/CodeGen/Mips/analyzebranch.ll the earlier move prevents a delay slot being filled but otherwise looks ok.

https://reviews.llvm.org/differential/diff/135383 is my local copy of this patch against the regenerated tests.

This revision is now accepted and ready to land.Feb 22 2018, 4:57 AM
jonpa updated this revision to Diff 135383.Feb 23 2018, 12:21 AM

sdardis patch applied.

Hi Jonas, I've regenerated the tests in rL325770 and the changes seem ok. The main difference I'm seeing is some moves are being eliminated, but in a few cases in test/CodeGen/Mips/analyzebranch.ll the earlier move prevents a delay slot being filled but otherwise looks ok.

https://reviews.llvm.org/differential/diff/135383 is my local copy of this patch against the regenerated tests.

Thank you, Simon. I applied your patch with regenerated tests and committed it as r325870.

jonpa updated this revision to Diff 135599.Feb 23 2018, 1:00 AM
jonpa edited reviewers, added: JDevlieghere, fedor.sergeev; removed: sdardis, atanasyan.

[SPARC] Patch enabled - please review the updated tests.

This revision now requires review to proceed.Feb 23 2018, 1:00 AM
jyknight accepted this revision.Feb 23 2018, 8:33 AM

Sparc changes look reasonable.

This revision is now accepted and ready to land.Feb 23 2018, 8:33 AM

Sparc changes look reasonable.

Thank you, James. r326028.

jonpa updated this revision to Diff 135790.Feb 24 2018, 3:57 AM
jonpa edited reviewers, added: robertlytton; removed: JDevlieghere, fedor.sergeev, jyknight.

[XCore] patch enabled - please review updated test and approve patch.

This revision now requires review to proceed.Feb 24 2018, 3:57 AM
robertlytton requested changes to this revision.Feb 25 2018, 9:51 AM

please see in-line comments

test/CodeGen/XCore/byVal.ll
45–46 ↗(On Diff #135790)

Hi jonpa,
The output in test/CodeGen/XCore/byVal.ll is incorrect.
viz the value in r1 is not copied into r0 before being overwritten by the value in sp[2] (r0's indirected value).
These two lines should be the other way around.
sorry
robert

This revision now requires changes to proceed.Feb 25 2018, 9:51 AM
jonpa updated this revision to Diff 135853.Feb 25 2018, 11:57 PM

Test updated again.

jonpa marked an inline comment as done.Feb 25 2018, 11:58 PM
jonpa added inline comments.
test/CodeGen/XCore/byVal.ll
45–46 ↗(On Diff #135790)

Ah, I think 'r1' slipped through in the place of 'r11'... Does it look ok now?

robertlytton accepted this revision.Feb 26 2018, 12:01 AM

LGTM
thank you
robert

This revision is now accepted and ready to land.Feb 26 2018, 12:01 AM
jonpa marked an inline comment as done.Feb 26 2018, 12:09 AM

LGTM
thank you
robert

Thank you, Robert. r326069

jonpa updated this revision to Diff 135887.Feb 26 2018, 5:19 AM
jonpa edited reviewers, added: craig.topper; removed: robertlytton.

X86 is now the final backend to enable :-)

This patch enables X86 with regenerated tests in all (155) tests that previously had the "NOTE: : Assertions have been autogenerated..." comment. I have manually updated the remaining (28) ones.

Please review these test changes.

After this, we can finally get rid of the enableMultipleCopyHints() hook.

This revision now requires review to proceed.Feb 26 2018, 5:19 AM

This looks like a nice improvement modulo a few issues:

  • In a number of places (notably register arguments to shift instructions) we generate movq instead of movl that's shorter and equivalent. I believe there's no real performance difference. Certainly we should for when compiling for size.
  • There's some unnecessary shuffling of register names in SSE4 tests.

Someone who's a more familiar with Debug should double check those tests though they seem fine to me.

test/CodeGen/X86/fast-isel-shift.ll
22 ↗(On Diff #135887)

Why are we getting two kill comments about cx here?

test/CodeGen/X86/schedule-x86-64-shld.ll
129 ↗(On Diff #135887)

This should be movl given optsize.

test/CodeGen/X86/sret-implicit.ll
13 ↗(On Diff #135887)

This shouldn't be DAG matches anymore the 2nd line should always come first

27 ↗(On Diff #135887)

Same as above

test/CodeGen/X86/vector-shift-ashr-128.ll
270 ↗(On Diff #135887)

All of the SSE4 changes regarding shifts seem to generate unnecessary register shuffling.

test/CodeGen/X86/vectorcall.ll
26 ↗(On Diff #135887)

another case of movq vs movl

152 ↗(On Diff #135887)

This is probably fine, but confusing.

Can you decompose this up into the X86 and X64? In fact, this file should probably be autogenerated with utils/update_llc_test_checks.py

RKSimon commandeered this revision.Apr 9 2018, 1:19 AM
RKSimon edited reviewers, added: jonpa; removed: RKSimon.
jonpa added a comment.Jul 24 2018, 4:26 AM

ping! Please don't forget to do this for X86, we still want to get rid of that temporary hook... :-)

RKSimon updated this revision to Diff 157993.Jul 30 2018, 10:17 AM
RKSimon added reviewers: efriedma, spatel.
RKSimon set the repository for this revision to rL LLVM.

Updated all x86 tests to trunk latest

Has anyone looked at @niravd's comments?

Has anyone looked at @niravd's comments?

I haven't - just updated all the tests

RKSimon added inline comments.Jul 30 2018, 11:50 AM
test/CodeGen/X86/vector-shift-ashr-128.ll
270 ↗(On Diff #135887)

If I had to guess - this is probably due to SSE41's PBLENDV instructions being hardwired to use xmm0

Is this ready or any blockers?

RKSimon updated this revision to Diff 165646.Sep 15 2018, 10:04 AM

rebased to trunk

Is this ready or any blockers?

Other than the poor codegen issues (but not regressions) commented on by @niravd I don't think there are any blockers.

OK to commit?

The codesize issues are minor and shouldn't hold this patch up. The only blocker I see is the unnecessary data shuffling for SSE41 codegen which someone else should decide on.

OK to commit?

The codesize issues are minor and shouldn't hold this patch up. The only blocker I see is the unnecessary data shuffling for SSE41 codegen which someone else should decide on.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

I've gone through and marked all the places.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

test/CodeGen/X86/vector-shift-lshr-128.ll
232 ↗(On Diff #165646)

Extra instruction here

test/CodeGen/X86/vselect-minmax.ll
4539 ↗(On Diff #165646)

Extra instruction here.

4660 ↗(On Diff #165646)

Extra instruction here.

4781 ↗(On Diff #165646)

Extra Instruction here.

4901 ↗(On Diff #165646)

Extra instruction here.

5019 ↗(On Diff #165646)

Extra instructions here.

5170 ↗(On Diff #165646)

Extra instructions here.

5319 ↗(On Diff #165646)

Extra instructions here.

5468 ↗(On Diff #165646)

Extra instructions here.

6995 ↗(On Diff #165646)

Extra instructions here.

7116 ↗(On Diff #165646)

Extra instruction here.

7235 ↗(On Diff #165646)

Extra instruction here.

7356 ↗(On Diff #165646)

Extra instructions here.

7505 ↗(On Diff #165646)

Extra Instructions here.

7655 ↗(On Diff #165646)

Extra instructions here.

I've gone through and marked all the places.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

These are all due to pblendvb/blendvpd/blendvps being hardwired to use the xmm0 as the mask register (limit goes away for avx)

spatel accepted this revision.Sep 19 2018, 9:41 AM

I've gone through and marked all the places.

Is this just about the extra 'movdqa' in vector-shift-ashr-128.ll, or are there other diffs to look at?

These are all due to pblendvb/blendvpd/blendvps being hardwired to use the xmm0 as the mask register (limit goes away for avx)

Yep - thanks for marking all of those. That corner case shouldn't hold up the general improvement, and the number of customers specifically targeting SSE4.1 should go to zero over time, so LGTM.

This revision is now accepted and ready to land.Sep 19 2018, 9:41 AM
This revision was automatically updated to reflect the committed changes.