Page MenuHomePhabricator

[GlobalISel Legalizer] Improve artifact combiner
Needs RevisionPublic

Authored by Petar.Avramovic on May 10 2019, 8:13 AM.

Details

Summary

The main idea is to allow legalization of chained artifacts by waiting for resolution of unresolved uses.

Therefore, this patch introduces two types of artifacts:
Artifact and Auxiliary Artifact.

Artifact (G_ZEXT, G_ANYEXT, G_SEXT, G_UNMERGE_VALUES, G_EXTRACT)
is able to perform combine with
Auxiliary Artifact (G_TRUNC, G_MERGE_VALUES, G_CONCAT_VECTORS, G_BUILD_VECTOR).

The idea is not to legalize an Auxiliary Artifact if there still exists a possibility that it can be combined with some other Artifact.
With this new setup, we will not have to make complicated rules in LegalizerInfo that have a purpose to not crash Artifact Combiner,
i.e. the goal here is to avoid situation to ask for Auxiliary Artifact if it is legal when answer must be yes (if we don't want to crash).
We want to avoid the following sequence of events:
legalization of Auxiliary Artifact (e.g. G_MERGE_VALUES) into a legal instruction,
that will disappear by combining with some other Artifact (e.g. G_UNMERGE_VALUES) that appears later.

Additionally, in this moment, Auxiliary Artifact G_TRUNC can jump to uses of its operand 0, and preform combine from them (this uses are Artifacts).
This patch removes such behavior since it disturbs order in which artifacts are processed.

Legalization of chained artifact is done by waiting for resolution of unresolved use.
The way we achieve this is:
When artifact fails to combine check if it has unresolved uses:

  1. Unprocessed Artifact (Artifact in ArtifactList or TryLaterArtifactList)
  2. Use in InstList. This can be either: a) Artifact that was turned into instruction and is waiting to be legalized b) Instruction that was created during some combine e.g. G_AND when combining G_ZEXT and G_TRUNC
  3. Use Artifact in AuxiliaryArtifactList, since combine was not possible there is a chance that appropriate artifact will appear after legalization of Auxiliary Artifact. When this happens, move Auxiliary Artifact to InstList to be Legalized

When there was an unresolved use, move Artifact to TryLaterArtifactList and try again later.

Diff Detail

Event Timeline

Petar.Avramovic marked 10 inline comments as done.May 10 2019, 8:36 AM

@volkan @aemerson @arsenm Here is brief explanation of changes in tests.

test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
5

Originally:
We crash on attempt to legalize %3:_(s64) = G_ANYEXT %1:_(s4).
When we attempt to combine Artifact
%3:_(s64) = G_ANYEXT %1:_(s4)
no combine is available and this gets moved to InstList even though it had unresolved use
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
Also,
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
is the next artifact that had nothing to combine with and is moved to InstList.
In InstList
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
is legalized with widen scalar to s8 and in the process
%1:_(s4) = G_TRUNC %9:_(s8) is created.
Next,
%3:_(s64) = G_ANYEXT %1:_(s4)
is legalized like an instruction, and we crash.
However, we could have combined this with
%1:_(s4) = G_TRUNC %9:_(s8)
(if we were patient).

Now:
%3:_(s64) = G_ANYEXT %1:_(s4)
has unresolved use and is moved to TryLaterArtifactList.
After
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
is legalized (it did not have unresolved use)
%1:_(s4) = G_TRUNC %9:_(s8)
is created. Later
%3:_(s64) = G_ANYEXT %1:_(s4)
performs combine. Legalization completes successfully.
I removed global-isel-abort=0 since we don't crash any more.
%9:_(s8), %10:_(s8) = G_UNMERGE_VALUES %8(s16)
will crash in instruction select.
Since this UNMERGE was declared as legal for pair {s8, s16} Artifact Combiner did good job.

test/CodeGen/AArch64/GlobalISel/legalizer-combiner-zext-trunc-crash.mir
5

Difference here comes from different order in which MachineInstructions are processed.

Originally:
InstList has

%20:_(s32) = G_ZEXT %4:_(s8)
...
%31:_(s32) = G_ZEXT %6:_(s8)
...
%6:_(s8) = G_TRUNC %35:_(s16)
%4:_(s8) = G_TRUNC %36:_(s32)
...

%4:_(s8) = G_TRUNC %36:_(s32)
jumps to middle of InstList to
%20:_(s32) = G_ZEXT %4:_(s8)
and combines two of them.
%6:_(s8) = G_TRUNC %35:_(s16)
will jump to
%31:_(s32) = G_ZEXT %6:_(s8)
In summary:
%20:_(s32) = G_ZEXT %4:_(s8) is combined first
%31:_(s32) = G_ZEXT %6:_(s8) is combined second

Now:
InstList will not have G_TRUNCs and will avoid jumping around the list
Content of InstList is:

%20:_(s32) = G_ZEXT %4:_(s8)
...
%31:_(s32) = G_ZEXT %6:_(s8)
...

%31:_(s32) = G_ZEXT %6:_(s8)
will be combined first,
%20:_(s32) = G_ZEXT %4:_(s8)
will be combined second.
That is why script renamed some variables in diff.

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
56

We start with following MIR:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s8>) = G_TRUNC %0
%2:_(<2 x s16>) = G_SEXT %1
$vgpr0 = COPY %2

%2:_(<2 x s16>) = G_SEXT %1
is combined, as a result we are left with:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%3:_(s16) = G_CONSTANT i16 8
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%6:_(<2 x s16>) = G_SHL %5:_, %4:_(<2 x s16>)
%2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)
$vgpr0 = COPY %2:_(<2 x s16>)

Originally:
Two Artifacts failed to combine and ended up in InstList:

...
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
was legal,
We have crash on %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

Now:

%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

are Auxiliary Artifacts and will have to wait for something to combine with them.
Eventually %11:_(s16), %12:_(s16) = G_UNMERGE_VALUES %4:_(<2 x s16>)
from legalization of %2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)
will combine with %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

Combine attempt of %19:_(s16), %20:_(s16) = G_UNMERGE_VALUES %5:_(<2 x s16>)
will move %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>) to InstList
Once we finish with artifacts, we crash on attempt to legalize Instruction
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-zext.mir
46

In:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s8>) = G_TRUNC %0
%2:_(<2 x s16>) = G_ZEXT %1
$vgpr0 = COPY %2

%2:_(<2 x s16>) = G_ZEXT %1
gets combined first into:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%3:_(s16) = G_CONSTANT i16 255
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%2:_(<2 x s16>) = G_AND %5:_, %4:_
$vgpr0 = COPY %2:_(<2 x s16>)

Originally:
All new instructions (created in process of G_ZEXT combine) fail to combine and get moved to InstList:

%3:_(s16) = G_CONSTANT i16 255
%2:_(<2 x s16>) = G_AND %5:_, %4:_
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
was legal and we crash on:
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

Now:
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
is left to be processed as the last one, since it cannot perform combine (Auxiliary Artifact)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
G_BUILD_VECTOR is also Auxiliary Artifact.
InstList has

%3:_(s16) = G_CONSTANT i16 255
%2:_(<2 x s16>) = G_AND %5:_, %4:_

they get legalized and we get:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%6:_(s32) = G_CONSTANT i32 255
%3:_(s16) = G_TRUNC %6:_(s32)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%2:_(<2 x s16>) = G_AND %5:_, %4:_
$vgpr0 = COPY %2:_(<2 x s16>)

This is an interesting situation, there are no artifacts in ArtifactList or Instructions in InstList, so we are left with Auxiliary Artifacts.
None of real artifacts was able to combine with them and they are all moved to InstList:

%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%3:_(s16) = G_TRUNC %6:_(s32)
`%3:_(s16) = G_TRUNC %6:_(s32)`

was legal. Crash on %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

Difference in diff comes from legalization of %3:_(s16) = G_CONSTANT i16 255

test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir
444

In test function extract_s8_v4s8_offset0

%3:_(<4 x s32>) = G_IMPLICIT_DEF
%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
%1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
%2:_(s32) = G_ANYEXT %1:_(s8)
$vgpr0 = COPY %2:_(s32)

Originally:

%1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0 was legalized first into: 
%3:_(<4 x s32>) = G_IMPLICIT_DEF
%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
%4:_(<4 x s16>) = G_ANYEXT %0:_(<4 x s8>)
%5:_(s16) = G_EXTRACT %4:_(<4 x s16>), 0
%1:_(s8) = G_TRUNC %5:_(s16)
%2:_(s32) = G_ANYEXT %1:_(s8)
$vgpr0 = COPY %2:_(s32)

then
%2:_(s32) = G_ANYEXT %1:_(s8)
is legalized (!), it turned out to be legal
but it should have been combined with
%1:_(s8) = G_TRUNC %5:_(s16)

%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
is next to legalize and we crash.

Now:

%1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
%2:_(s32) = G_ANYEXT %1:_(s8)

are chained to
%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
G_TRUNC gets legalized first (and we crash)

same story for test functions:
extract_s8_v4s8_offset8
extract_s8_v4s8_offset16
extract_s8_v4s8_offset24
extract_s8_v3s8_offset16
extract_s8_v5s1_offset4

911

in test function extract_v2s16_v5s16_offset0

%3:_(<6 x s32>) = G_IMPLICIT_DEF
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
$vgpr0 = COPY %1:_(<2 x s16>)

Originally:
All Artifacts:

%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)

fail to combine and are moved to InstList that now contains (processed bottom up):

%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0

%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
is legalized with moreElements,
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
is legal, and we crash on
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)

Now:
Since we have chained artifacts
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
is legalized first and we crash. Remaining two wait for this legalization to finish.

test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir
316

In test function test_implicit_def_v3s1:

%3:_(<4 x s32>) = G_IMPLICIT_DEF
%2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>)
%0:_(<3 x s1>) = G_EXTRACT %2:_(<4 x s1>), 0
%1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)
$vgpr0_vgpr1_vgpr2 = COPY %1:_(<3 x s32>)

We used to crash on %0:_(<3 x s1>) = G_EXTRACT %2:_(<4 x s1>), 0
but since this artifact has unresolved artifact use %2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>) this is where we crash now.
Also we did not try to legalize %1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>) since it has unresolved artifact use,
difference in diff comes from legalization of %1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)

It is the same story in test_implicit_def_v3s8, only type is different (s8 instead of s1).

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
2

Originally:
We used to crash on:
%1:_(s1), %2:_(s1) = G_UNMERGE_VALUES %0

Now:
We noticed that this artifact has use
%0:_(<2 x s1>) = G_TRUNC %3:_(<2 x s32>), this is where we crash now.

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
239

In test function test_unmerge_s1_s8:
We have a combine for G_MERGE + G_UNMERGE that did not happen before.
Remaining changes are variable name changes by the script, because of combine,
first merge {MV} and first unmerge{UV} are now missing, and MV1 is now MV, MV2 is MV1 and so on.
There was no crash here so I removed -global-isel-abort=0

test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
484

I will explain change in legalize-xor.mir, it is the same thing in:
I will explain change in legalize-and.mir and
I will explain change in legalize-or.mir,

%29:_(<4 x s32>) = G_IMPLICIT_DEF
%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
%28:_(<4 x s32>) = G_IMPLICIT_DEF
%1:_(<4 x s8>) = G_TRUNC %28:_(<4 x s32>)
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
%8:_(s8), %9:_(s8), %10:_(s8), %11:_(s8) = G_UNMERGE_VALUES %1:_(<4 x s8>)
%25:_(s32) = G_ANYEXT %4:_(s8)
%26:_(s32) = G_ANYEXT %8:_(s8)
%27:_(s32) = G_XOR %25:_, %26:_
%12:_(s8) = G_TRUNC %27:_(s32)
%22:_(s32) = G_ANYEXT %5:_(s8)
%23:_(s32) = G_ANYEXT %9:_(s8)
%24:_(s32) = G_XOR %22:_, %23:_
%13:_(s8) = G_TRUNC %24:_(s32)
%19:_(s32) = G_ANYEXT %6:_(s8)
%20:_(s32) = G_ANYEXT %10:_(s8)
%21:_(s32) = G_XOR %19:_, %20:_
%14:_(s8) = G_TRUNC %21:_(s32)
%16:_(s32) = G_ANYEXT %7:_(s8)
%17:_(s32) = G_ANYEXT %11:_(s8)
%18:_(s32) = G_XOR %16:_, %17:_
%15:_(s8) = G_TRUNC %18:_(s32)
%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)
%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %3:_(<4 x s32>)

Artifact List had in one moment (processed bottom up):

%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
%8:_(s8), %9:_(s8), %10:_(s8), %11:_(s8) = G_UNMERGE_VALUES %1:_(<4 x s8>)
...

Originally:
All artifacts fail to combine, and are moved to InstList:

...
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)

%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
is legalized with FewerElements.
We crash on
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)

Now:
Artifacts failed to combine with available Auxiliary Artifacts:

%1:_(<4 x s8>) = G_TRUNC %28:_(<4 x s32>)
%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)

Because of that Auxiliary Artifacts are now in InstList.

%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)
is legal
%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
is not legal and we crash.
Difference in diff comes from legalization of %3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>) because it failed to combine and was moved to InstList,
new algorithm crashed before attempt to legalize this instruction.

LGTM with a few nits. But I'm not an expert in GlobalISel so get reply from other reviewers.

include/llvm/CodeGen/GlobalISel/GISelWorkList.h
86

That can be written as return WorklistMap.count(I).

lib/CodeGen/GlobalISel/Legalizer.cpp
307

s/Use/UseInst/ in the comment?

336

It looks like if move this block out of the while loop we can drop the if statement and write it as:

SmallVector<MachineInstr *, 8> Flip;
while (!AuxiliaryArtifactList.empty())
  Flip.push_back(AuxiliaryArtifactList.pop_back_val());
while (!Flip.empty())
  InstList.insert(Flip.pop_back_val());
Petar.Avramovic marked 2 inline comments as done.May 15 2019, 6:12 AM
Petar.Avramovic added inline comments.
lib/CodeGen/GlobalISel/Legalizer.cpp
307

Yes, comment does not do best job explaining what do here. We can simplify this if statement and change comment like this:

// UseInst is an unresolved artifact or instruction.
// Unresolved artifacts are in ArtifactList or TryLaterArtifactList.
// InstList now contains only unresolved instructions i.e.
// instructions that used to be artifacts or instructions created by
// artifact combiner.
if (ArtifactList.contains(UseInst) ||
    TryLaterArtifactList.contains(UseInst) ||
    InstList.contains(UseInst)) {
  HasUnresolvedUses = true;
  break;
}
336

We have to move AuxiliaryArtifactList to InstList inside while loop because otherwise whatever is left in AuxiliaryArtifactList will remain unprocessed and will affect regression tests.

Now that I look at this, it might be better to move Auxiliary Artifacts to InstList one by one

if (InstList.empty() && ArtifactList.empty() &&
    TryLaterArtifactList.empty() && !AuxiliaryArtifactList.empty())
  InstList.insert(AuxiliaryArtifactList.pop_back_val());

instead of all together, this could allow us to combine away Auxiliary Artifact with help of an Artifact that appears after some other Auxiliary Artifacts was legalized. I don't have a test for this sequence of events.

Or maybe even to Artifact list if there appears combine where Auxiliary Artifact could be an Artifact ( e.g. G_TRUNC + G_MERGE_VALUES) but then some opcodes (G_TRUNC) will have dual meaning and will be hard to figure out what combine have precedence over another.

Suggestions?

Currently, for MIPS, I would expect for AuxiliaryArtifactList to be empty when all other lists are empty.
This patch is required for MIPS to allow legalization of chained artifacts that appear when we narrow Scalar G_{S|Z}EXT_INREG (once D61289 lands) or narrow scalar G_{Z|S|ANY}EXT.

atanasyan added inline comments.May 15 2019, 6:37 AM
lib/CodeGen/GlobalISel/Legalizer.cpp
336

We have to move AuxiliaryArtifactList to InstList inside while loop because otherwise whatever is left in AuxiliaryArtifactList will remain unprocessed and will affect regression tests.

Agreed. I missed that in the if condition has TryLaterArtifactList.empty() while in the loop cindition has !TryLaterArtifactList.empty().

it might be better to move Auxiliary Artifacts to InstList one by one

If you put this statement in the end of the loop the last element of the AuxiliaryArtifactList will be moved to the InstList. After that we exit from the loop because both InstList.empty() and TryLaterArtifactList.empty() are true and the loop condition !InstList.empty() || !TryLaterArtifactList.empty() is false. If AuxiliaryArtifactList contains more then one element they remains in the list. Did I miss something?

Petar.Avramovic marked an inline comment as done.May 15 2019, 7:29 AM
Petar.Avramovic added inline comments.
lib/CodeGen/GlobalISel/Legalizer.cpp
336

Once we move top element (Art) from AuxiliaryArtifactList to InstList, InstList is no longer empty and we enter loop again with intention to legalize Art.

Remaining (if any) auxiliary artifacts in AuxiliaryArtifactList stay there until we are done with legalizing Art and everything he produced during legalization.
It may be possible that this Art could produce e.g. G_UNMERGE that could combine away some G_MERGE that is still in AuxiliaryArtifactList. Again this is just in theory, I don't have test for this case.
For this to happen, there can be no artifact that attempted to combine with Art. That is Art is only used in legal instructions. This look a little strange considering how we deal with artifacts at the moment, but might be necessary in the future as more artifact combines are added.

Petar.Avramovic edited the summary of this revision. (Show Details)

Addressed review comments.

Petar.Avramovic marked 4 inline comments as done.May 17 2019, 3:20 AM
atanasyan accepted this revision.May 22 2019, 12:25 AM

LGTM, but as I said early please wait another approval.

This revision is now accepted and ready to land.May 22 2019, 12:25 AM
dsanders requested changes to this revision.May 23 2019, 3:56 PM

Hi Petar,

I haven't gone through the code for this but I've gone through the test changes. It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass. The most common case was neglecting to scalarize G_ANYEXT in the AMDGPU tests when the legalization rules would have called .scalarize(0) but there were a couple others.

At the high level, I'm not very keen on complicating the legalizer further by splitting the artifacts into multiple kinds and specifying a processing order. Personally I'd prefer not to have artifacts at all but they were a necessary concession to resolve two (somewhat conflicting) requirements:

  • Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)
  • The MIR must be valid after each legalization step

That said, I could be persuaded to complicate things if it buys a significant win. At the moment it looks like a correctness regression (going by the tests) but assuming the correctness issues were fixed, what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.

test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
5

Just to clarify terminology here. When you say 'crash' you don't appear to mean a crash but rather an unable-to-legalize error. Is that right?

This test is artificial and should probably be moved to the unit tests. The intent is to test the expansion of widenScalar on G_UNMERGE_VALUES and is making use of an impossible input to ensure that the legalizer picks widenScalar for it. There should be no way to produce s4,s4 = G_UNMERGE_VALUES s8 in the current AArch64 target since the IRTranslator doesn't create G_UNMERGE_VALUES and the legalizer would only create them if an s8 operation narrowScalar'd to s4 which isn't the case for AArch64. The closest you can get from valid input is:

%1:_(s4) = G_TRUNC %0
%2 = G_ANYEXT %1:_(s4)

If we pretend that the input was valid, then AArch64 is breaking one of the few legalization rule requirements which is that G_ANYEXT must be legal for all type combinations that can occur (and makes sense there must be a legal G_ANYEXT that can consume the legal output of any other instruction that produces a scalar output and must add a .legalFor({X, s4}) to its G_ANYEXT rules

It looks like we never actually wrote down the minimum legalization rules. I'll look into that

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
56

AMDGPU hasn't marked any G_TRUNC's legal. I think this test is behaving correctly until it adds the minimum set of legal truncates for their target. The minimum set is the same as for G_ANYEXT but with the type indices swapped.

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-zext.mir
46

This one comes down to AMDGPU not defining any legal G_TRUNC's too

test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir
514

This change is violating AMDGPU's legalization rules. G_ANYEXT is defined as:

getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})
  .legalFor({{S64, S32}, {S32, S16}, {S64, S16},
             {S32, S1}, {S64, S1}, {S16, S1},
             // FIXME: Hack
             {S64, LLT::scalar(33)},
             {S32, S8}, {S128, S32}, {S128, S64}, {S32, LLT::scalar(24)}})
  .scalarize(0);

which is instructing the legalizer to scalarize all vector types.

test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir
444

This one boils down to AMDGPU not declaring any legalization rules for G_TRUNC

907

This change is violating AMDGPU's legalization rules. The relevant one for the G_EXTRACT is:

.moreElementsIf(isSmallOddVector(BigTyIdx), oneMoreElement(BigTyIdx))

which is supposed to add an extra element if the number of elements is odd and the size of the elements is <32. The legalIf that preceeds it only matches if the bit type is a multiple of 32 in size but 5*16 isn't

911

This one boils down to AMDGPU not declaring any legalization rules for G_TRUNC

test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir
324

This is another case of violating AMDGPU's G_ANYEXT legalization rules by not scalarizing vector types

test/CodeGen/AMDGPU/GlobalISel/legalize-or.mir
514

This is another case of violating AMDGPU's G_ANYEXT legalization rules by not scalarizing vector types

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
2

This one is interesting. AMDGPU's legalization rules say to scalarize this (assuming I'm reading it right, this opcode has very complicated legalization rules), but how do you scalarize G_UNMERGE_VALUES. The normal answer is to emit G_UNMERGE_VALUES but that's not going to work here as we'll just infinitely loop replacing an instruction with the same instruction. I guess G_EXTRACT_VECTOR_ELT is probably the answer here but suppose we want to widenScalar that. We'd probably want to G_UNMERGE_VALUES (bring us back into the loop) and G_BUILD_VECTOR. I'll have to give this one some more thought

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
239

I believe this is the test where I mentioned to Matt that the lack of G_TRUNC legalization rules caused it to stop processing before the combines could happen. It looks like this patch makes the combines happen a bit earlier and before the G_TRUNC stops further processing

test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
514

This one violates AMDGPU's G_ANYEXT rules by not scalarizing

This revision now requires changes to proceed.May 23 2019, 3:56 PM
Petar.Avramovic marked 17 inline comments as done.May 24 2019, 5:24 AM

'crash' = unable-to-legalize error/warning/nothing depending on -global-isel-abort option.

It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass.

That is already the case. Mentioned tests already used to crash but now they always crash on instruction that is root of problem (here it is most of the times G_TRUNC for vector) .
About mentioned legalization rules violations:
Tests that finish legalization successfully are unchanged (except one where we perform additional G_UNMERGE/G_MERGE combine).
"Violations" happen for Artifacts in tests where mir output was illegal anyway.
That it Artifacts are either waiting for it uses to be resolved first or are about to be legalized. It just happened that there was some "artifact"/"auxiliary artifact"/instruction that was not legal.
@arsenm Do you have any comments?

what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.

The primary goal of this patch is adding support for combining of chained artifacts.

About side effects/benefits:
Remove some of the rules for artifacts. The ones that say artifact is legal for a type but we know it is not selectable for that type.

Suppose that target cannot select s128 = G_MERGE s32 s32 s32 s32 nor s32 s32 s32 s32 = G_UNMERGE s128. With current artifact combiner we could encounter following situations:

  1. G_UNMERGE combines with G_MERGE, that was easy.
  2. There was no G_UNMERGE, that could combine with G_MERGE. Then let's legalize G_MERGE, and hope for the best. G_MERGE is now legal, removed from all GlobalISel observer lists and waits for legalization to finish so that it can be regbankselected. During legalization of some other instruction G_UNMERGE appears and combines away G_MERGE. This was fortunate sequence of events as we couldn't select this G_MERGE if it "escaped" from Legalizer. But in LegalizerInfo it says that G_MERGE is legal for {s128, s32} despite the fact that we cannot select it. Why? Because that is how current Artifact combiner works.
  3. G_UNMERGE couldn't find G_MERGE to combine with. G_UNMERGE is "Artifact" nothing can combine it away, then let's make it legal. G_UNMERGE is now a legalized instruction that is no more in ArtifactList and waits legalizer to end so that it can crash in Instruction Select. But there is more: other instruction get legalized (e.g. G_SEXT_INREG get narrowscalar-ed with G_UNMERGE/G_MERGE) and the G_MERGE appears. G_UNMERGE could combine with this G_MERGE but it is not in ArtifactList and we end up with uncombined Artifacts.

Artifact combiner at the moment does 1 and 2. With this change Artifact combiner does 1,2 and 3(meaning it can perform this combine) + combines of type 2. are performed without requirement to have Artifacts legal for types that are not Instr Selectable ( in example above: {s128, s32} for G_MERGE and {s32, s128} for G_UNMERGE).
By the way 3. is the reason D61289 can't have mir test for narrow scalar of G_SEXT_INREG, with expected optimal output.
With this change
Target that has only s32 as legal type (everything else is narrow/widen scalar-ed to s32) is free to declare all artifacts as unsupported as a last rule. This is it could have to narrowScalar G_ANYEXT to s32 but there is no need to declare G_ANYEXT as legal for any types.
We could argue that this is not that beneficial by itself, but since it is a consequence of allowing combine of chained artifacts, I would say it makes legalization rules for artifacts much more straightforward.

Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)

Instruction legalization can already be done in any order, but artifact combine has to wait for all instructions to be legalized first, right?
Now we allow for chained artifacts to wait for another artifact/instruction to be handled first by introducing TryLaterArtifactList. To me, this feels like a natural extension of existing algorithm.

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
56

Forgot to mention, this will turn into G_SEXT_INREG and then it's another story.
Anyway, I assume that <2 x s16> = G_TRUNC <2 x s32> should be legal and that it is a subregister class copy
since $vgpr0_vgpr1 is <2 x s32> and $vgpr0 is <2 x s16>

test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir
514

We crashed before attempting to legalize this G_ANYEXT, once G_TRUNC is resolved G_ANYEXT will be next.

test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir
907

This is intended, G_EXTRACT waits for it artifact use to be resolved ( [[EXTRACT]] ). Crash happens before we try to legalize G_EXTRACT.
We don't give up on attempt to combine Artifact while there is still chance that corresponding auxiliary artifact would appear once uses of G_EXTRACT are resolved. Notice how we process Machine function, it is InstList first not ArtifactList first. This way we cover a lot of simpler cases but not the cases when artifacts are chained, then we have to wait and try again later.

test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir
324

We did not try to legalize this yet G_ANYEXT since it has unresolved artifact use, and mir output is state of MF when we crashed on %2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>)

test/CodeGen/AMDGPU/GlobalISel/legalize-or.mir
514

We crashed before attempting to legalize this G_ANYEXT, once G_TRUNC is resolved G_ANYEXT will be next. The trunc in question is CHECK: [[TRUNC:%[0-9]+]]:_(<4 x s8>) = G_TRUNC [[DEF]](<4 x s32>)

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
2

Originally this was widenScalarUnmergeValues called for vector, and it crashed since vector is not scalar.
From second rule for G_MERGE/G_UNMERGE .clampScalar(LitTyIdx, S16, S256).
Don't know what was idea behind this test.

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
239

This one finished successfully. G_TRUNC is legal for scalars according to the "legacy rules" that are used when no rules are defined... This is not the case for vectors.
In order to avoid duality of G_TRUNC being both "artifact" and "auxiliary artifact" I suggest to have narrowScalar for G_TRUNC (similar for vectors) so that G_TRUNC could be legalized into G_UNMERGE + COPY and Artifact Combiner would continue with combining. This works with patch like it is.
D62338 complicates things with order in which we attempt to combine.
Maybe instead for moving G_TRUNC to InstList we move it to ArtifactList first?
No change for remaining auxiliary artifacts since they don't have such combine.

test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
514

We crashed before attempting to legalize this G_ANYEXT.

Petar.Avramovic marked 6 inline comments as done.Mon, Jun 3, 12:41 AM

Ping.

nhaehnle removed a subscriber: nhaehnle.Mon, Jun 3, 4:09 AM
nhaehnle added a subscriber: nhaehnle.
nhaehnle removed a subscriber: nhaehnle.

Ping.

Sorry Petar. I've been occupied by other things. I'll try to get back to you this week but as it's WWDC this week it might have to be next week