This is an archive of the discontinued LLVM Phabricator instance.

SplitKit: Correctly implement partial copies
ClosedPublic

Authored by MatzeB on Feb 27 2017, 6:06 PM.

Details

Summary
  • This fixes a bug where subregisters incompatible with the given vreg were choosen.
  • Implement the case where multiple copies are necessary to cover a given lanemask.

rdar://30388692

Diff Detail

Repository
rL LLVM

Event Timeline

MatzeB created this revision.Feb 27 2017, 6:06 PM
rampitec accepted this revision.Feb 27 2017, 7:15 PM

Thank you!

This revision is now accepted and ready to land.Feb 27 2017, 7:15 PM

defFromParent supposed to return a single def, how does it work if there is more than one? I suppose it does work as you should have tests passed...

kparzysz added inline comments.Feb 28 2017, 6:24 AM
lib/CodeGen/SplitKit.cpp
551 ↗(On Diff #89958)

Is there any guarantee that this loop will actually terminate? In other words, do we know that the set PossibleIndexes covers the lanes completely?

572 ↗(On Diff #89958)

Would it make sense to add || BestCover <= 0?

MatzeB updated this revision to Diff 91661.Mar 13 2017, 7:49 PM
MatzeB edited the summary of this revision. (Show Details)

Redo a lot of this patch:

  • Add more tests so that the code runs in more situations and screwed up liveranges are detected.
  • Use instruction bundles during splitting/regalloc and expand them into a series of instruction at VirtRegMap time. I spend days trying to get this to work by directly creating a sequence of instructions in splitkit but it turned out to be extremely hard to stop liverange splitting between the copies in an infinite loop or getting a spill/reload for every one of the copies in a series; the bundle based approach only needed nearly no changes outside of splitkit itself except for the expansion code and fixing some virtregmap bugs with bundles.

defFromParent supposed to return a single def, how does it work if there is more than one? I suppose it does work as you should have tests passed...

Indeed this turned out to screw up the liveness and I just wasn't able to catch it with the small amount of tests we had before. Hence this new version of the patch.

MatzeB updated this revision to Diff 91662.Mar 13 2017, 7:53 PM

Upload the correct patch this time!

Redo a lot of this patch:

  • Add more tests so that the code runs in more situations and screwed up liveranges are detected.
  • Use instruction bundles during splitting/regalloc and expand them into a series of instruction at VirtRegMap time. (I spend days trying to get this to work by directly creating a sequence of instructions in splitkit but it turned out to be extremely hard to stop liverange splitting between the copies in an infinite loop or getting a spill/reload for every one of the copies in a series; the bundle based approach only needed nearly no changes outside of splitkit itself except for the expansion code and fixing some virtregmap bugs with bundles.)
MatzeB marked 2 inline comments as done.Mar 13 2017, 7:57 PM
MatzeB added inline comments.
lib/CodeGen/SplitKit.cpp
551 ↗(On Diff #89958)

This specific loop only checks each subregister index once and will therefore terminate.

The later loop is repeated as we are able to remove bits from the lanemask. If we fail to find a subregister index that removes any of the bits we will end up in the report_fatal_error() cases.

572 ↗(On Diff #89958)

BestCover <= 0 can happen and is fine (it just means the best we could find is an index that covers more unnecessary lanes that we already covered anyway, than lanes we haven't covered yet).

rampitec added inline comments.Mar 14 2017, 2:01 PM
lib/CodeGen/VirtRegMap.cpp
383

I think it can break something which was bundled not by SplitKit. I wish we could mark these somehow... It will not happen now because we do not use bundles, but potentially a problem. Maybe insert some sort of a marker into the bundle or slot, or create a new MIFlag? Ideally of course it shall not use bundles just to keep LIS correct :(

rampitec requested changes to this revision.Mar 14 2017, 2:15 PM

The other question, what would happen if target implements its own addOptimizedRegAlloc and skips VirtRegRewriter? I guess in this case copies will remain bundled. Looks like NVPTX does not use it.

This revision now requires changes to proceed.Mar 14 2017, 2:15 PM
MatzeB marked 2 inline comments as done.Mar 14 2017, 2:24 PM

The other question, what would happen if target implements its own addOptimizedRegAlloc and skips VirtRegRewriter? I guess in this case copies will remain bundled. Looks like NVPTX does not use it.

If they have their own regalloc they probably cannot use SplitKit either. If they do use SplitKit, well they have to implement their own expansion mechanism (though it seems SplitKit has references to VirtRegMap so I don't think you realistically will use SplitKit without the VirtRegRewriter).

lib/CodeGen/VirtRegMap.cpp
383

I was hoping that targets would always use a target specific bundle header instructions, so this would be the only situation in which we have a headerless bundle. Admittedly this is somewhat weak, but I hoped we can delay this discussion until we actually have targets using bundles pre-regalloc more.

rampitec added inline comments.Mar 14 2017, 2:27 PM
lib/CodeGen/VirtRegMap.cpp
383

Its almost the moment I'm thinking of adding it ;) It will not be all copy though.

MatzeB added inline comments.Mar 14 2017, 2:30 PM
lib/CodeGen/VirtRegMap.cpp
383

But do you plan to use bundles without a "header" (i.e. the first instruction is not some target-specific instruction that describes the type of the bundle?)

rampitec accepted this revision.Mar 14 2017, 2:31 PM
rampitec added inline comments.
lib/CodeGen/VirtRegMap.cpp
383

OK, agree, I do not.

This revision is now accepted and ready to land.Mar 14 2017, 2:31 PM
kparzysz added inline comments.Mar 14 2017, 2:45 PM
lib/CodeGen/SplitKit.cpp
572 ↗(On Diff #89958)

Well, wasn't that the original problem, i.e. that we copied too much (which then ended up clobbering lanes marked as unused)?

MatzeB marked 2 inline comments as done.Mar 14 2017, 2:47 PM
MatzeB added inline comments.
lib/CodeGen/SplitKit.cpp
572 ↗(On Diff #89958)

No, the PossibleIndexes vector only contains indexes that cover bits that we are allowed to cover. The BestCover is solely about minimizing the amount of lanes we copy twice. We will not copy lanes that we are not supposed to copy.

kparzysz added inline comments.Mar 14 2017, 3:00 PM
lib/CodeGen/SplitKit.cpp
597 ↗(On Diff #91662)

This doesn't need to be addressed in this patch, but on a general note---this would be a case where the compiler corners itself into such a situation. It's not a symptom of simply a bug, but rather a design that can end up in a dead end. Ideally, we should avoid getting here early enough to have other options.

572 ↗(On Diff #89958)

Ok, I see it now.

test/CodeGen/AMDGPU/splitkit.mir
35 ↗(On Diff #91662)

This is the only CHECK like in this function. Did it use to crash?

MatzeB marked an inline comment as done.Mar 14 2017, 3:10 PM
MatzeB added inline comments.
lib/CodeGen/SplitKit.cpp
597 ↗(On Diff #91662)

I think this cannot happen. As for liveness to split up into a given lanemask we need an operand in the function that uses a subregister index with such a lanemask. But that would mean we can use the same index to implement the COPY.

I'm not a 100% sure that I am not missing something here so I left this check in to better abort than silently infinite loop if I did :).

MatzeB added inline comments.Mar 14 2017, 3:19 PM
test/CodeGen/AMDGPU/splitkit.mir
35 ↗(On Diff #91662)

This should indeed have more CHECKs. I'll add them.

qcolombet accepted this revision.Mar 16 2017, 3:55 PM

LGTM with more checks in the test.

test/CodeGen/AMDGPU/splitkit.mir
35 ↗(On Diff #91662)

Please add more checks indeed.

This revision was automatically updated to reflect the committed changes.