Page MenuHomePhabricator

[GlobalISel] Add a new artifact combiner for unmerge which looks through general artifact expressions.
ClosedPublic

Authored by aemerson on Jun 15 2021, 11:17 PM.

Details

Summary

The original motivation for this was to implement moreElementsVector of shuffles
on AArch64, which resulted in complex sequences of artifacts like unmerge(unmerge(concat...))
which the combiner couldn't handle. It seemed here that the better option,
instead of writing ever-more-complex combines, was to have a way to find
the original "non-artifact" source registers for a given definition, walking
through arbitrary expressions of unmerge/concat/insert. As long as the bits
aren't extended or truncated, this is a pretty simple algorithm that avoids
the need for lots of combines and instead jumps straight to the final result
we want.

I've only used this new technique in 2 places within tryCombineUnmerge, using it
in more general situations resulted in infinite loops in AMDGPU. So for now
it's used when we would otherwise fail to combine and that seems to work.

In order to support looking through G_INSERTs, I also had to add it as an
artifact in isArtifact(), which caused a whole lot of issues in tests. AMDGPU
started infinite looping since full legalization of G_INSERT doensn't seem to
be there. To work around this, I've temporarily added a CLI option to use the
old behaviour so that the MIR tests will still run and terminate.

Other minor changes include no longer making >128b G_MERGE/UNMERGE legal.
We never had isel support for that anyway and it was a remnant of the legacy
legalizer rules. However being legal prevented the combiner from checking if it
was dead and deleting them.

Diff Detail

Event Timeline

aemerson created this revision.Jun 15 2021, 11:17 PM
aemerson requested review of this revision.Jun 15 2021, 11:17 PM
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
646–647

I don't think we want to combine G_INSERT but instead not generate them at all. Is this insert from ir or from legalizer?

can't find the value if it spans both.

Did you consider this case? I had also encountered infinite loops when trying to deal with this combine when one of the vector sizes is odd.
I wanted to avoid making G_INSERT and use LCM merge/unmerge pad with undef instead. AMDGPU can deal with merge/unmerge when sizes are even (in the end it would turn everything into vectors of size 2 in worst case).
With odd vector size at some point it has to do "pad with undef". U guess that we could make combine that recognizes value that is spanned over real element and undef but it looks unnecessary complicated to combine.

I thought about adding new artifacts equivalent to G_ANYEXT/G_TRUNC but for vectors (these would be simplified versions of sdag's INSERT_SUBVECTOR and EXTRACT_SUBVECTOR) with intent to be trivial to combine. They would pad vector with undef elements/trunc elements. These do not exist in llvm-ir, and when used in more/fewer_elements_vector would always end up combining with each other if we consistently ask for same number of elements in all more/fewer_elements_vector legalizer rules.
Would this work for you (I would expect that there would be no need to combine insert)?

aemerson added inline comments.Jun 16 2021, 8:08 AM
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
646–647

It’s from moreElementsVector.

Perhaps true, but I didn’t want to tackle that problem of moving away from G_INSERT in this patch. The algorithm is agnostic to the implementation of moreElements, so my plan was to support inserts first and then later remove the code if/when G_INSERT is removed.

Did you consider this case? I had also encountered infinite loops when trying to deal with this combine when one of the vector sizes is odd.

I didn’t, because it wasn’t clear when this would actually occur. The inserts I’ve seen are mostly just being used to widen a type.

In principle I think it’s a good idea to move away from G_INSERT/EXTRACT but it doesn’t really affect the purpose of this patch, which is to deal with unmerges of unmerges.

aemerson updated this revision to Diff 352468.Jun 16 2021, 10:04 AM

Remove commented out code.

Hi Amara,

I like this approach!
What is the impact on compile time?

I am a little bit surprised by the long sequences that we generate now in certain case. What is the reason for that? (See inline comment for a highlight of what I am talking about.)

Cheers,
-Quentin

llvm/test/CodeGen/AArch64/GlobalISel/legalize-inserts.mir
864

Is this new sequence what we really want?

It seems complicated compared to what we have before that diff.

Hi Amara,

I like this approach!
What is the impact on compile time?

Compile time seems to be noise, since this the new combine only runs in some cases where the unmerge combine fails, which is rare in most code. I would have liked to use it as the first-attempt combine for unmerge, but AMDGPU just wound up getting into a legalization loop (I think that's a problem with AMDGPU rather than the combine itself).

I am a little bit surprised by the long sequences that we generate now in certain case. What is the reason for that? (See inline comment for a highlight of what I am talking about.)

Cheers,
-Quentin

llvm/test/CodeGen/AArch64/GlobalISel/legalize-inserts.mir
864

This looks to be the result of marking G_INSERT an artifact, which changes the order legalization vs artifact combines. It's not related to the ArtifactValueFinder, so it's collateral damage unfortunately.

In fact, all of these test changes except for the new tests I added are changes as a result of G_INSERT being an artifact.

qcolombet added inline comments.Jun 17 2021, 9:46 AM
llvm/test/CodeGen/AArch64/GlobalISel/legalize-inserts.mir
864

What's the plan to fix that?

I would have expected that by delaying the legalization of G_INSERT (i.e., make that an artifact) we would have strictly improved codegen. Clearly, that's not what is happening :).

aemerson added inline comments.Jun 17 2021, 1:48 PM
llvm/test/CodeGen/AArch64/GlobalISel/legalize-inserts.mir
864

I'm not sure it's even worth spending effort to fix, if the plan is to remove G_INSERT/EXTRACT eventually anyway, as @arsenm wants to do.

ping @arsenm what do you think?

This seems like a much more holistic solution. What we have now isn't manageable to guarantee every possible permutation of artifacts can legalize.

I still think legalizing G_INSERT/G_EXTRACT (especially for vectors) is problematic and probably shouldn't be done. I view them now as a thin abstraction over subregister indexes, and not really useful for vector legalization as they are used now. Having the hack for now is fine since it's definitely impossible to make any progress on this sort of thing all at once.

Are you actively working on removing the old artifact combines?

llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
718

Confusing sentence "If the def is from an instruction a single def"

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
4795–4797 ↗(On Diff #352468)

Can this be split to a separate patch?

This seems like a much more holistic solution. What we have now isn't manageable to guarantee every possible permutation of artifacts can legalize.

Yeah, although combining extensions isn't really possible with this, so for now we still need some of the old combiner code around. I would like to use it more places though.

I still think legalizing G_INSERT/G_EXTRACT (especially for vectors) is problematic and probably shouldn't be done. I view them now as a thin abstraction over subregister indexes, and not really useful for vector legalization as they are used now. Having the hack for now is fine since it's definitely impossible to make any progress on this sort of thing all at once.

+1. The sooner we can move away from those to merge/unmerge the better.

Are you actively working on removing the old artifact combines?

Not yet, but maybe in future. The first thing we need to fix is the AMDGPU infinite loops, because those are blocking deploying this in more obvious places.

llvm/lib/CodeGen/GlobalISel/LegalizerHelper.cpp
4795–4797 ↗(On Diff #352468)

It was the motivating example for the combine, but yes it can be separated.

foad added inline comments.Thu, Jul 1, 2:17 AM
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
653

These conditions don't look right. You are only disallowing:

+-----------+-----+-----------+
| CONTAINER | INS | CONTAINER |
+-----------+-----+-----------+
        |            |
        SB           EB

but there are lots of other cases of partial overlap. I would suggest rewriting it...

694

... like this:

if (EndBit <= InsertOffset || InsertedEndBit <= StartBit)
  SrcRegToUse = ContainerSrcReg;
else if (InsertOffset <= StartBit && EndBit <= InsertedEndBit)
  SrcRegToUse = InsertedReg;
else
  return Register();
aemerson updated this revision to Diff 356103.Thu, Jul 1, 10:57 PM

Address comments.

ping

This is blocking some more legalization work for AArch64.

arsenm accepted this revision.Fri, Jul 9, 4:06 PM

LGTM with nit

llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
690

No return after else

This revision is now accepted and ready to land.Fri, Jul 9, 4:06 PM
This revision was landed with ongoing or failed builds.Fri, Jul 9, 11:01 PM
This revision was automatically updated to reflect the committed changes.