This is an archive of the discontinued LLVM Phabricator instance.

[GlobalISel Legalizer] Improve artifact combiner
AbandonedPublic

Authored by Petar.Avramovic on May 10 2019, 8:13 AM.

Details

Summary

The main idea is to allow legalization of chained artifacts by waiting for resolution of unresolved uses.

Therefore, this patch introduces two types of artifacts:
Artifact and Auxiliary Artifact.

Artifact (G_ZEXT, G_ANYEXT, G_SEXT, G_UNMERGE_VALUES, G_EXTRACT)
is able to perform combine with
Auxiliary Artifact (G_TRUNC, G_MERGE_VALUES, G_CONCAT_VECTORS, G_BUILD_VECTOR).

The idea is not to legalize an Auxiliary Artifact if there still exists a possibility that it can be combined with some other Artifact.
With this new setup, we will not have to make complicated rules in LegalizerInfo that have a purpose to not crash Artifact Combiner,
i.e. the goal here is to avoid situation to ask for Auxiliary Artifact if it is legal when answer must be yes (if we don't want to crash).
We want to avoid the following sequence of events:
legalization of Auxiliary Artifact (e.g. G_MERGE_VALUES) into a legal instruction,
that will disappear by combining with some other Artifact (e.g. G_UNMERGE_VALUES) that appears later.

Additionally, in this moment, Auxiliary Artifact G_TRUNC can jump to uses of its operand 0, and preform combine from them (this uses are Artifacts).
This patch removes such behavior since it disturbs order in which artifacts are processed.

Legalization of chained artifact is done by waiting for resolution of unresolved use.
The way we achieve this is:
When artifact fails to combine check if it has unresolved uses:

  1. Unprocessed Artifact (Artifact in ArtifactList or TryLaterArtifactList)
  2. Use in InstList. This can be either: a) Artifact that was turned into instruction and is waiting to be legalized. b) Instruction that was created during some combine e.g. G_AND when combining G_ZEXT and G_TRUNC.
  3. Use Artifact in AuxiliaryArtifactList, since combine was not possible there is a chance that appropriate artifact will appear after legalization of Auxiliary Artifact. When this happens, move Auxiliary Artifact to InstList to be Legalized

When there was an unresolved use, move Artifact to TryLaterArtifactList and try again later.

Diff Detail

Event Timeline

Petar.Avramovic marked 10 inline comments as done.May 10 2019, 8:36 AM

@volkan @aemerson @arsenm Here is brief explanation of changes in tests.

test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
5

Originally:
We crash on attempt to legalize %3:_(s64) = G_ANYEXT %1:_(s4).
When we attempt to combine Artifact
%3:_(s64) = G_ANYEXT %1:_(s4)
no combine is available and this gets moved to InstList even though it had unresolved use
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
Also,
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
is the next artifact that had nothing to combine with and is moved to InstList.
In InstList
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
is legalized with widen scalar to s8 and in the process
%1:_(s4) = G_TRUNC %9:_(s8) is created.
Next,
%3:_(s64) = G_ANYEXT %1:_(s4)
is legalized like an instruction, and we crash.
However, we could have combined this with
%1:_(s4) = G_TRUNC %9:_(s8)
(if we were patient).

Now:
%3:_(s64) = G_ANYEXT %1:_(s4)
has unresolved use and is moved to TryLaterArtifactList.
After
%1:_(s4), %2:_(s4) = G_UNMERGE_VALUES %0:_(s8)
is legalized (it did not have unresolved use)
%1:_(s4) = G_TRUNC %9:_(s8)
is created. Later
%3:_(s64) = G_ANYEXT %1:_(s4)
performs combine. Legalization completes successfully.
I removed global-isel-abort=0 since we don't crash any more.
%9:_(s8), %10:_(s8) = G_UNMERGE_VALUES %8(s16)
will crash in instruction select.
Since this UNMERGE was declared as legal for pair {s8, s16} Artifact Combiner did good job.

test/CodeGen/AArch64/GlobalISel/legalizer-combiner-zext-trunc-crash.mir
5

Difference here comes from different order in which MachineInstructions are processed.

Originally:
InstList has

%20:_(s32) = G_ZEXT %4:_(s8)
...
%31:_(s32) = G_ZEXT %6:_(s8)
...
%6:_(s8) = G_TRUNC %35:_(s16)
%4:_(s8) = G_TRUNC %36:_(s32)
...

%4:_(s8) = G_TRUNC %36:_(s32)
jumps to middle of InstList to
%20:_(s32) = G_ZEXT %4:_(s8)
and combines two of them.
%6:_(s8) = G_TRUNC %35:_(s16)
will jump to
%31:_(s32) = G_ZEXT %6:_(s8)
In summary:
%20:_(s32) = G_ZEXT %4:_(s8) is combined first
%31:_(s32) = G_ZEXT %6:_(s8) is combined second

Now:
InstList will not have G_TRUNCs and will avoid jumping around the list
Content of InstList is:

%20:_(s32) = G_ZEXT %4:_(s8)
...
%31:_(s32) = G_ZEXT %6:_(s8)
...

%31:_(s32) = G_ZEXT %6:_(s8)
will be combined first,
%20:_(s32) = G_ZEXT %4:_(s8)
will be combined second.
That is why script renamed some variables in diff.

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
56

We start with following MIR:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s8>) = G_TRUNC %0
%2:_(<2 x s16>) = G_SEXT %1
$vgpr0 = COPY %2

%2:_(<2 x s16>) = G_SEXT %1
is combined, as a result we are left with:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%3:_(s16) = G_CONSTANT i16 8
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%6:_(<2 x s16>) = G_SHL %5:_, %4:_(<2 x s16>)
%2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)
$vgpr0 = COPY %2:_(<2 x s16>)

Originally:
Two Artifacts failed to combine and ended up in InstList:

...
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
was legal,
We have crash on %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

Now:

%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

are Auxiliary Artifacts and will have to wait for something to combine with them.
Eventually %11:_(s16), %12:_(s16) = G_UNMERGE_VALUES %4:_(<2 x s16>)
from legalization of %2:_(<2 x s16>) = G_ASHR %6:_, %4:_(<2 x s16>)
will combine with %4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

Combine attempt of %19:_(s16), %20:_(s16) = G_UNMERGE_VALUES %5:_(<2 x s16>)
will move %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>) to InstList
Once we finish with artifacts, we crash on attempt to legalize Instruction
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-zext.mir
46

In:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%1:_(<2 x s8>) = G_TRUNC %0
%2:_(<2 x s16>) = G_ZEXT %1
$vgpr0 = COPY %2

%2:_(<2 x s16>) = G_ZEXT %1
gets combined first into:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%3:_(s16) = G_CONSTANT i16 255
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%2:_(<2 x s16>) = G_AND %5:_, %4:_
$vgpr0 = COPY %2:_(<2 x s16>)

Originally:
All new instructions (created in process of G_ZEXT combine) fail to combine and get moved to InstList:

%3:_(s16) = G_CONSTANT i16 255
%2:_(<2 x s16>) = G_AND %5:_, %4:_
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)

%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
was legal and we crash on:
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

Now:
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
is left to be processed as the last one, since it cannot perform combine (Auxiliary Artifact)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
G_BUILD_VECTOR is also Auxiliary Artifact.
InstList has

%3:_(s16) = G_CONSTANT i16 255
%2:_(<2 x s16>) = G_AND %5:_, %4:_

they get legalized and we get:

%0:_(<2 x s32>) = COPY $vgpr0_vgpr1
%6:_(s32) = G_CONSTANT i32 255
%3:_(s16) = G_TRUNC %6:_(s32)
%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%2:_(<2 x s16>) = G_AND %5:_, %4:_
$vgpr0 = COPY %2:_(<2 x s16>)

This is an interesting situation, there are no artifacts in ArtifactList or Instructions in InstList, so we are left with Auxiliary Artifacts.
None of real artifacts was able to combine with them and they are all moved to InstList:

%4:_(<2 x s16>) = G_BUILD_VECTOR %3:_(s16), %3:_(s16)
%5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)
%3:_(s16) = G_TRUNC %6:_(s32)
`%3:_(s16) = G_TRUNC %6:_(s32)`

was legal. Crash on %5:_(<2 x s16>) = G_TRUNC %0:_(<2 x s32>)

Difference in diff comes from legalization of %3:_(s16) = G_CONSTANT i16 255

test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir
444

In test function extract_s8_v4s8_offset0

%3:_(<4 x s32>) = G_IMPLICIT_DEF
%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
%1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
%2:_(s32) = G_ANYEXT %1:_(s8)
$vgpr0 = COPY %2:_(s32)

Originally:

%1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0 was legalized first into: 
%3:_(<4 x s32>) = G_IMPLICIT_DEF
%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
%4:_(<4 x s16>) = G_ANYEXT %0:_(<4 x s8>)
%5:_(s16) = G_EXTRACT %4:_(<4 x s16>), 0
%1:_(s8) = G_TRUNC %5:_(s16)
%2:_(s32) = G_ANYEXT %1:_(s8)
$vgpr0 = COPY %2:_(s32)

then
%2:_(s32) = G_ANYEXT %1:_(s8)
is legalized (!), it turned out to be legal
but it should have been combined with
%1:_(s8) = G_TRUNC %5:_(s16)

%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
is next to legalize and we crash.

Now:

%1:_(s8) = G_EXTRACT %0:_(<4 x s8>), 0
%2:_(s32) = G_ANYEXT %1:_(s8)

are chained to
%0:_(<4 x s8>) = G_TRUNC %3:_(<4 x s32>)
G_TRUNC gets legalized first (and we crash)

same story for test functions:
extract_s8_v4s8_offset8
extract_s8_v4s8_offset16
extract_s8_v4s8_offset24
extract_s8_v3s8_offset16
extract_s8_v5s1_offset4

911

in test function extract_v2s16_v5s16_offset0

%3:_(<6 x s32>) = G_IMPLICIT_DEF
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
$vgpr0 = COPY %1:_(<2 x s16>)

Originally:
All Artifacts:

%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)

fail to combine and are moved to InstList that now contains (processed bottom up):

%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0

%1:_(<2 x s16>) = G_EXTRACT %0:_(<5 x s16>), 0
is legalized with moreElements,
%0:_(<5 x s16>) = G_EXTRACT %2:_(<6 x s16>), 0
is legal, and we crash on
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)

Now:
Since we have chained artifacts
%2:_(<6 x s16>) = G_TRUNC %3:_(<6 x s32>)
is legalized first and we crash. Remaining two wait for this legalization to finish.

test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir
316

In test function test_implicit_def_v3s1:

%3:_(<4 x s32>) = G_IMPLICIT_DEF
%2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>)
%0:_(<3 x s1>) = G_EXTRACT %2:_(<4 x s1>), 0
%1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)
$vgpr0_vgpr1_vgpr2 = COPY %1:_(<3 x s32>)

We used to crash on %0:_(<3 x s1>) = G_EXTRACT %2:_(<4 x s1>), 0
but since this artifact has unresolved artifact use %2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>) this is where we crash now.
Also we did not try to legalize %1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>) since it has unresolved artifact use,
difference in diff comes from legalization of %1:_(<3 x s32>) = G_ANYEXT %0:_(<3 x s1>)

It is the same story in test_implicit_def_v3s8, only type is different (s8 instead of s1).

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
2

Originally:
We used to crash on:
%1:_(s1), %2:_(s1) = G_UNMERGE_VALUES %0

Now:
We noticed that this artifact has use
%0:_(<2 x s1>) = G_TRUNC %3:_(<2 x s32>), this is where we crash now.

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
239

In test function test_unmerge_s1_s8:
We have a combine for G_MERGE + G_UNMERGE that did not happen before.
Remaining changes are variable name changes by the script, because of combine,
first merge {MV} and first unmerge{UV} are now missing, and MV1 is now MV, MV2 is MV1 and so on.
There was no crash here so I removed -global-isel-abort=0

test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
484

I will explain change in legalize-xor.mir, it is the same thing in:
I will explain change in legalize-and.mir and
I will explain change in legalize-or.mir,

%29:_(<4 x s32>) = G_IMPLICIT_DEF
%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
%28:_(<4 x s32>) = G_IMPLICIT_DEF
%1:_(<4 x s8>) = G_TRUNC %28:_(<4 x s32>)
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
%8:_(s8), %9:_(s8), %10:_(s8), %11:_(s8) = G_UNMERGE_VALUES %1:_(<4 x s8>)
%25:_(s32) = G_ANYEXT %4:_(s8)
%26:_(s32) = G_ANYEXT %8:_(s8)
%27:_(s32) = G_XOR %25:_, %26:_
%12:_(s8) = G_TRUNC %27:_(s32)
%22:_(s32) = G_ANYEXT %5:_(s8)
%23:_(s32) = G_ANYEXT %9:_(s8)
%24:_(s32) = G_XOR %22:_, %23:_
%13:_(s8) = G_TRUNC %24:_(s32)
%19:_(s32) = G_ANYEXT %6:_(s8)
%20:_(s32) = G_ANYEXT %10:_(s8)
%21:_(s32) = G_XOR %19:_, %20:_
%14:_(s8) = G_TRUNC %21:_(s32)
%16:_(s32) = G_ANYEXT %7:_(s8)
%17:_(s32) = G_ANYEXT %11:_(s8)
%18:_(s32) = G_XOR %16:_, %17:_
%15:_(s8) = G_TRUNC %18:_(s32)
%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)
%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
$vgpr0_vgpr1_vgpr2_vgpr3 = COPY %3:_(<4 x s32>)

Artifact List had in one moment (processed bottom up):

%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
%8:_(s8), %9:_(s8), %10:_(s8), %11:_(s8) = G_UNMERGE_VALUES %1:_(<4 x s8>)
...

Originally:
All artifacts fail to combine, and are moved to InstList:

...
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)
%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)

%3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>)
is legalized with FewerElements.
We crash on
%4:_(s8), %5:_(s8), %6:_(s8), %7:_(s8) = G_UNMERGE_VALUES %0:_(<4 x s8>)

Now:
Artifacts failed to combine with available Auxiliary Artifacts:

%1:_(<4 x s8>) = G_TRUNC %28:_(<4 x s32>)
%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)

Because of that Auxiliary Artifacts are now in InstList.

%2:_(<4 x s8>) = G_BUILD_VECTOR %12:_(s8), %13:_(s8), %14:_(s8), %15:_(s8)
is legal
%0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>)
is not legal and we crash.
Difference in diff comes from legalization of %3:_(<4 x s32>) = G_ANYEXT %2:_(<4 x s8>) because it failed to combine and was moved to InstList,
new algorithm crashed before attempt to legalize this instruction.

LGTM with a few nits. But I'm not an expert in GlobalISel so get reply from other reviewers.

include/llvm/CodeGen/GlobalISel/GISelWorkList.h
86

That can be written as return WorklistMap.count(I).

lib/CodeGen/GlobalISel/Legalizer.cpp
307

s/Use/UseInst/ in the comment?

336

It looks like if move this block out of the while loop we can drop the if statement and write it as:

SmallVector<MachineInstr *, 8> Flip;
while (!AuxiliaryArtifactList.empty())
  Flip.push_back(AuxiliaryArtifactList.pop_back_val());
while (!Flip.empty())
  InstList.insert(Flip.pop_back_val());
Petar.Avramovic marked 2 inline comments as done.May 15 2019, 6:12 AM
Petar.Avramovic added inline comments.
lib/CodeGen/GlobalISel/Legalizer.cpp
307

Yes, comment does not do best job explaining what do here. We can simplify this if statement and change comment like this:

// UseInst is an unresolved artifact or instruction.
// Unresolved artifacts are in ArtifactList or TryLaterArtifactList.
// InstList now contains only unresolved instructions i.e.
// instructions that used to be artifacts or instructions created by
// artifact combiner.
if (ArtifactList.contains(UseInst) ||
    TryLaterArtifactList.contains(UseInst) ||
    InstList.contains(UseInst)) {
  HasUnresolvedUses = true;
  break;
}
336

We have to move AuxiliaryArtifactList to InstList inside while loop because otherwise whatever is left in AuxiliaryArtifactList will remain unprocessed and will affect regression tests.

Now that I look at this, it might be better to move Auxiliary Artifacts to InstList one by one

if (InstList.empty() && ArtifactList.empty() &&
    TryLaterArtifactList.empty() && !AuxiliaryArtifactList.empty())
  InstList.insert(AuxiliaryArtifactList.pop_back_val());

instead of all together, this could allow us to combine away Auxiliary Artifact with help of an Artifact that appears after some other Auxiliary Artifacts was legalized. I don't have a test for this sequence of events.

Or maybe even to Artifact list if there appears combine where Auxiliary Artifact could be an Artifact ( e.g. G_TRUNC + G_MERGE_VALUES) but then some opcodes (G_TRUNC) will have dual meaning and will be hard to figure out what combine have precedence over another.

Suggestions?

Currently, for MIPS, I would expect for AuxiliaryArtifactList to be empty when all other lists are empty.
This patch is required for MIPS to allow legalization of chained artifacts that appear when we narrow Scalar G_{S|Z}EXT_INREG (once D61289 lands) or narrow scalar G_{Z|S|ANY}EXT.

atanasyan added inline comments.May 15 2019, 6:37 AM
lib/CodeGen/GlobalISel/Legalizer.cpp
336

We have to move AuxiliaryArtifactList to InstList inside while loop because otherwise whatever is left in AuxiliaryArtifactList will remain unprocessed and will affect regression tests.

Agreed. I missed that in the if condition has TryLaterArtifactList.empty() while in the loop cindition has !TryLaterArtifactList.empty().

it might be better to move Auxiliary Artifacts to InstList one by one

If you put this statement in the end of the loop the last element of the AuxiliaryArtifactList will be moved to the InstList. After that we exit from the loop because both InstList.empty() and TryLaterArtifactList.empty() are true and the loop condition !InstList.empty() || !TryLaterArtifactList.empty() is false. If AuxiliaryArtifactList contains more then one element they remains in the list. Did I miss something?

Petar.Avramovic marked an inline comment as done.May 15 2019, 7:29 AM
Petar.Avramovic added inline comments.
lib/CodeGen/GlobalISel/Legalizer.cpp
336

Once we move top element (Art) from AuxiliaryArtifactList to InstList, InstList is no longer empty and we enter loop again with intention to legalize Art.

Remaining (if any) auxiliary artifacts in AuxiliaryArtifactList stay there until we are done with legalizing Art and everything he produced during legalization.
It may be possible that this Art could produce e.g. G_UNMERGE that could combine away some G_MERGE that is still in AuxiliaryArtifactList. Again this is just in theory, I don't have test for this case.
For this to happen, there can be no artifact that attempted to combine with Art. That is Art is only used in legal instructions. This look a little strange considering how we deal with artifacts at the moment, but might be necessary in the future as more artifact combines are added.

Petar.Avramovic edited the summary of this revision. (Show Details)

Addressed review comments.

Petar.Avramovic marked 4 inline comments as done.May 17 2019, 3:20 AM
atanasyan accepted this revision.May 22 2019, 12:25 AM

LGTM, but as I said early please wait another approval.

This revision is now accepted and ready to land.May 22 2019, 12:25 AM
dsanders requested changes to this revision.May 23 2019, 3:56 PM

Hi Petar,

I haven't gone through the code for this but I've gone through the test changes. It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass. The most common case was neglecting to scalarize G_ANYEXT in the AMDGPU tests when the legalization rules would have called .scalarize(0) but there were a couple others.

At the high level, I'm not very keen on complicating the legalizer further by splitting the artifacts into multiple kinds and specifying a processing order. Personally I'd prefer not to have artifacts at all but they were a necessary concession to resolve two (somewhat conflicting) requirements:

  • Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)
  • The MIR must be valid after each legalization step

That said, I could be persuaded to complicate things if it buys a significant win. At the moment it looks like a correctness regression (going by the tests) but assuming the correctness issues were fixed, what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.

test/CodeGen/AArch64/GlobalISel/legalize-unmerge-values.mir
5

Just to clarify terminology here. When you say 'crash' you don't appear to mean a crash but rather an unable-to-legalize error. Is that right?

This test is artificial and should probably be moved to the unit tests. The intent is to test the expansion of widenScalar on G_UNMERGE_VALUES and is making use of an impossible input to ensure that the legalizer picks widenScalar for it. There should be no way to produce s4,s4 = G_UNMERGE_VALUES s8 in the current AArch64 target since the IRTranslator doesn't create G_UNMERGE_VALUES and the legalizer would only create them if an s8 operation narrowScalar'd to s4 which isn't the case for AArch64. The closest you can get from valid input is:

%1:_(s4) = G_TRUNC %0
%2 = G_ANYEXT %1:_(s4)

If we pretend that the input was valid, then AArch64 is breaking one of the few legalization rule requirements which is that G_ANYEXT must be legal for all type combinations that can occur (and makes sense there must be a legal G_ANYEXT that can consume the legal output of any other instruction that produces a scalar output and must add a .legalFor({X, s4}) to its G_ANYEXT rules

It looks like we never actually wrote down the minimum legalization rules. I'll look into that

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
56

AMDGPU hasn't marked any G_TRUNC's legal. I think this test is behaving correctly until it adds the minimum set of legal truncates for their target. The minimum set is the same as for G_ANYEXT but with the type indices swapped.

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-zext.mir
46

This one comes down to AMDGPU not defining any legal G_TRUNC's too

test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir
514–515

This change is violating AMDGPU's legalization rules. G_ANYEXT is defined as:

getActionDefinitionsBuilder({G_SEXT, G_ZEXT, G_ANYEXT})
  .legalFor({{S64, S32}, {S32, S16}, {S64, S16},
             {S32, S1}, {S64, S1}, {S16, S1},
             // FIXME: Hack
             {S64, LLT::scalar(33)},
             {S32, S8}, {S128, S32}, {S128, S64}, {S32, LLT::scalar(24)}})
  .scalarize(0);

which is instructing the legalizer to scalarize all vector types.

test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir
444

This one boils down to AMDGPU not declaring any legalization rules for G_TRUNC

907

This change is violating AMDGPU's legalization rules. The relevant one for the G_EXTRACT is:

.moreElementsIf(isSmallOddVector(BigTyIdx), oneMoreElement(BigTyIdx))

which is supposed to add an extra element if the number of elements is odd and the size of the elements is <32. The legalIf that preceeds it only matches if the bit type is a multiple of 32 in size but 5*16 isn't

911

This one boils down to AMDGPU not declaring any legalization rules for G_TRUNC

test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir
324–325

This is another case of violating AMDGPU's G_ANYEXT legalization rules by not scalarizing vector types

test/CodeGen/AMDGPU/GlobalISel/legalize-or.mir
514–515

This is another case of violating AMDGPU's G_ANYEXT legalization rules by not scalarizing vector types

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
2

This one is interesting. AMDGPU's legalization rules say to scalarize this (assuming I'm reading it right, this opcode has very complicated legalization rules), but how do you scalarize G_UNMERGE_VALUES. The normal answer is to emit G_UNMERGE_VALUES but that's not going to work here as we'll just infinitely loop replacing an instruction with the same instruction. I guess G_EXTRACT_VECTOR_ELT is probably the answer here but suppose we want to widenScalar that. We'd probably want to G_UNMERGE_VALUES (bring us back into the loop) and G_BUILD_VECTOR. I'll have to give this one some more thought

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
239

I believe this is the test where I mentioned to Matt that the lack of G_TRUNC legalization rules caused it to stop processing before the combines could happen. It looks like this patch makes the combines happen a bit earlier and before the G_TRUNC stops further processing

test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
514–515

This one violates AMDGPU's G_ANYEXT rules by not scalarizing

This revision now requires changes to proceed.May 23 2019, 3:56 PM
Petar.Avramovic marked 17 inline comments as done.May 24 2019, 5:24 AM

'crash' = unable-to-legalize error/warning/nothing depending on -global-isel-abort option.

It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass.

That is already the case. Mentioned tests already used to crash but now they always crash on instruction that is root of problem (here it is most of the times G_TRUNC for vector) .
About mentioned legalization rules violations:
Tests that finish legalization successfully are unchanged (except one where we perform additional G_UNMERGE/G_MERGE combine).
"Violations" happen for Artifacts in tests where mir output was illegal anyway.
That it Artifacts are either waiting for it uses to be resolved first or are about to be legalized. It just happened that there was some "artifact"/"auxiliary artifact"/instruction that was not legal.
@arsenm Do you have any comments?

what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.

The primary goal of this patch is adding support for combining of chained artifacts.

About side effects/benefits:
Remove some of the rules for artifacts. The ones that say artifact is legal for a type but we know it is not selectable for that type.

Suppose that target cannot select s128 = G_MERGE s32 s32 s32 s32 nor s32 s32 s32 s32 = G_UNMERGE s128. With current artifact combiner we could encounter following situations:

  1. G_UNMERGE combines with G_MERGE, that was easy.
  2. There was no G_UNMERGE, that could combine with G_MERGE. Then let's legalize G_MERGE, and hope for the best. G_MERGE is now legal, removed from all GlobalISel observer lists and waits for legalization to finish so that it can be regbankselected. During legalization of some other instruction G_UNMERGE appears and combines away G_MERGE. This was fortunate sequence of events as we couldn't select this G_MERGE if it "escaped" from Legalizer. But in LegalizerInfo it says that G_MERGE is legal for {s128, s32} despite the fact that we cannot select it. Why? Because that is how current Artifact combiner works.
  3. G_UNMERGE couldn't find G_MERGE to combine with. G_UNMERGE is "Artifact" nothing can combine it away, then let's make it legal. G_UNMERGE is now a legalized instruction that is no more in ArtifactList and waits legalizer to end so that it can crash in Instruction Select. But there is more: other instruction get legalized (e.g. G_SEXT_INREG get narrowscalar-ed with G_UNMERGE/G_MERGE) and the G_MERGE appears. G_UNMERGE could combine with this G_MERGE but it is not in ArtifactList and we end up with uncombined Artifacts.

Artifact combiner at the moment does 1 and 2. With this change Artifact combiner does 1,2 and 3(meaning it can perform this combine) + combines of type 2. are performed without requirement to have Artifacts legal for types that are not Instr Selectable ( in example above: {s128, s32} for G_MERGE and {s32, s128} for G_UNMERGE).
By the way 3. is the reason D61289 can't have mir test for narrow scalar of G_SEXT_INREG, with expected optimal output.
With this change
Target that has only s32 as legal type (everything else is narrow/widen scalar-ed to s32) is free to declare all artifacts as unsupported as a last rule. This is it could have to narrowScalar G_ANYEXT to s32 but there is no need to declare G_ANYEXT as legal for any types.
We could argue that this is not that beneficial by itself, but since it is a consequence of allowing combine of chained artifacts, I would say it makes legalization rules for artifacts much more straightforward.

Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)

Instruction legalization can already be done in any order, but artifact combine has to wait for all instructions to be legalized first, right?
Now we allow for chained artifacts to wait for another artifact/instruction to be handled first by introducing TryLaterArtifactList. To me, this feels like a natural extension of existing algorithm.

test/CodeGen/AMDGPU/GlobalISel/artifact-combiner-sext.mir
56

Forgot to mention, this will turn into G_SEXT_INREG and then it's another story.
Anyway, I assume that <2 x s16> = G_TRUNC <2 x s32> should be legal and that it is a subregister class copy
since $vgpr0_vgpr1 is <2 x s32> and $vgpr0 is <2 x s16>

test/CodeGen/AMDGPU/GlobalISel/legalize-and.mir
514–515

We crashed before attempting to legalize this G_ANYEXT, once G_TRUNC is resolved G_ANYEXT will be next.

test/CodeGen/AMDGPU/GlobalISel/legalize-extract.mir
907

This is intended, G_EXTRACT waits for it artifact use to be resolved ( [[EXTRACT]] ). Crash happens before we try to legalize G_EXTRACT.
We don't give up on attempt to combine Artifact while there is still chance that corresponding auxiliary artifact would appear once uses of G_EXTRACT are resolved. Notice how we process Machine function, it is InstList first not ArtifactList first. This way we cover a lot of simpler cases but not the cases when artifacts are chained, then we have to wait and try again later.

test/CodeGen/AMDGPU/GlobalISel/legalize-implicit-def.mir
324–325

We did not try to legalize this yet G_ANYEXT since it has unresolved artifact use, and mir output is state of MF when we crashed on %2:_(<4 x s1>) = G_TRUNC %3:_(<4 x s32>)

test/CodeGen/AMDGPU/GlobalISel/legalize-or.mir
514–515

We crashed before attempting to legalize this G_ANYEXT, once G_TRUNC is resolved G_ANYEXT will be next. The trunc in question is CHECK: [[TRUNC:%[0-9]+]]:_(<4 x s8>) = G_TRUNC [[DEF]](<4 x s32>)

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values-xfail.mir
2

Originally this was widenScalarUnmergeValues called for vector, and it crashed since vector is not scalar.
From second rule for G_MERGE/G_UNMERGE .clampScalar(LitTyIdx, S16, S256).
Don't know what was idea behind this test.

test/CodeGen/AMDGPU/GlobalISel/legalize-unmerge-values.mir
239

This one finished successfully. G_TRUNC is legal for scalars according to the "legacy rules" that are used when no rules are defined... This is not the case for vectors.
In order to avoid duality of G_TRUNC being both "artifact" and "auxiliary artifact" I suggest to have narrowScalar for G_TRUNC (similar for vectors) so that G_TRUNC could be legalized into G_UNMERGE + COPY and Artifact Combiner would continue with combining. This works with patch like it is.
D62338 complicates things with order in which we attempt to combine.
Maybe instead for moving G_TRUNC to InstList we move it to ArtifactList first?
No change for remaining auxiliary artifacts since they don't have such combine.

test/CodeGen/AMDGPU/GlobalISel/legalize-xor.mir
514–515

We crashed before attempting to legalize this G_ANYEXT.

Petar.Avramovic marked 6 inline comments as done.Jun 3 2019, 12:41 AM

Ping.

nhaehnle removed a subscriber: nhaehnle.Jun 3 2019, 4:09 AM
nhaehnle added a subscriber: nhaehnle.
nhaehnle removed a subscriber: nhaehnle.

Ping.

Sorry Petar. I've been occupied by other things. I'll try to get back to you this week but as it's WWDC this week it might have to be next week

Petar.Avramovic edited the summary of this revision. (Show Details)

Rebased the patch. Ping.

'crash' = unable-to-legalize error/warning/nothing depending on -global-isel-abort option.

It looks like a lot of these test changes result in illegal MIR getting out of the legalizer pass.

That is already the case. Mentioned tests already used to crash but now they always crash on instruction that is root of problem (here it is most of the times G_TRUNC for vector) .
About mentioned legalization rules violations:
Tests that finish legalization successfully are unchanged (except one where we perform additional G_UNMERGE/G_MERGE combine).
"Violations" happen for Artifacts in tests where mir output was illegal anyway.
That it Artifacts are either waiting for it uses to be resolved first or are about to be legalized. It just happened that there was some "artifact"/"auxiliary artifact"/instruction that was not legal.
@arsenm Do you have any comments?

I see your point that those tests to fail to legalize before and after this patch. Because the legalization order is made much more complicated by this patch you end up seeing the problematic instruction (usually G_TRUNC) before legalizing the other instructions which is confusing but isn't functionally different. I don't think anything has really changed in terms of what they fail to legalize on though, it just comes down to which problem you see first.

what would be the main benefit to making this change? I get the impression it's about avoiding implementing all the legalization rules but while I can see that it postpones the need to implement some, I'm not convinced it reduces the amount rules that need defining.

The primary goal of this patch is adding support for combining of chained artifacts.

This answer appears to say that the primary goal of making the change is to make the change :-)

About side effects/benefits:
Remove some of the rules for artifacts. The ones that say artifact is legal for a type but we know it is not selectable for that type.

You shouldn't have any rules in this category. An instruction is legal if and only if it is guaranteed to be selectable. (Note: that doesn't necessarily mean that the instruction selector handles it, just that something after the legalizer is guaranteed to handle it).

Suppose that target cannot select s128 = G_MERGE s32 s32 s32 s32 nor s32 s32 s32 s32 = G_UNMERGE s128. With current artifact combiner we could encounter following situations:

  1. G_UNMERGE combines with G_MERGE, that was easy.

Yep, the majority of G_MERGE/G_UNMERGE's are eliminated this way.

  1. There was no G_UNMERGE, that could combine with G_MERGE. Then let's legalize G_MERGE, and hope for the best. G_MERGE is now legal, removed from all GlobalISel observer lists and waits for legalization to finish so that it can be regbankselected. During legalization of some other instruction G_UNMERGE appears and combines away G_MERGE. This was fortunate sequence of events as we couldn't select this G_MERGE if it "escaped" from Legalizer. But in LegalizerInfo it says that G_MERGE is legal for {s128, s32} despite the fact that we cannot select it. Why? Because that is how current Artifact combiner works.

It's wrong that your LegalizerInfo says G_MERGE_VALUES is legal for {s128, s32} if you cannot select it.

The main question to ask here is: How did you get a G_MERGE_VALUES whose result isn't consumed by either a legal operation or a G_UNMERGE_VALUES (or G_TRUNC) that breaks it up into something a legal operation can consume? This is supposed to be an impossible situation. While typing up an explanation of why it's impossible, I actually ran into the root problem I think you're trying to describe which is that if the LLVM-IR contains types that are bigger than the biggest register in the backend, then there's some gaps in the combine rules that leave behind some dangling G_MERGE/G_UNMERGE's. Essentially, the narrower type used by the return value isn't propagating into the earlier code and causing excess bits to die.

Suppose we have the following AArch64 MIR:

%0:_(s64) = COPY $x0
%1:_(s64) = COPY $x1
%2:_(s1024) = G_SEXT %0(s64)
%3:_(s1024) = G_SEXT %1(s64)
%4:_(s1024) = G_OR %2, %3
%5:_(s64) = G_TRUNC %4(s1024)
$x0 = COPY %5(s64)
RET_ReallyLR implicit $x0

(yes, you really do need s1024 to trigger this on AArch64 thanks to 512-bit operations :-D)
and our legalization rules are such that s64 is the only type we can use in operations. 960-bits of the intermediates are redundant as the function only returns 64-bits. A first pass of legalization over the non-artifacts gives:

%0:_(s64) = COPY $x0
%1:_(s64) = COPY $x1
%2:_(s1024) = G_SEXT %0:_(s64)
%3:_(s1024) = G_SEXT %1:_(s64)
%6:_(s64), %7:_(s64), %8:_(s64), %9:_(s64), %10:_(s64), %11:_(s64), %12:_(s64), %13:_(s64), %14:_(s64), %15:_(s64), %16:_(s64), %17:_(s64), %18:_(s64), %19:_(s64), %20:_(s64), %21:_(s64) = G_UNMERGE_VALUES %2:_(s1024)
%22:_(s64), %23:_(s64), %24:_(s64), %25:_(s64), %26:_(s64), %27:_(s64), %28:_(s64), %29:_(s64), %30:_(s64), %31:_(s64), %32:_(s64), %33:_(s64), %34:_(s64), %35:_(s64), %36:_(s64), %37:_(s64) = G_UNMERGE_VALUES %3:_(s1024)
%38:_(s64) = G_OR %6:_, %22:_
%39:_(s64) = G_OR %7:_, %23:_
%40:_(s64) = G_OR %8:_, %24:_
%41:_(s64) = G_OR %9:_, %25:_
%42:_(s64) = G_OR %10:_, %26:_
%43:_(s64) = G_OR %11:_, %27:_
%44:_(s64) = G_OR %12:_, %28:_
%45:_(s64) = G_OR %13:_, %29:_
%46:_(s64) = G_OR %14:_, %30:_
%47:_(s64) = G_OR %15:_, %31:_
%48:_(s64) = G_OR %16:_, %32:_
%49:_(s64) = G_OR %17:_, %33:_
%50:_(s64) = G_OR %18:_, %34:_
%51:_(s64) = G_OR %19:_, %35:_
%52:_(s64) = G_OR %20:_, %36:_
%53:_(s64) = G_OR %21:_, %37:_
%4:_(s1024) = G_MERGE_VALUES %38:_(s64), %39:_(s64), %40:_(s64), %41:_(s64), %42:_(s64), %43:_(s64), %44:_(s64), %45:_(s64), %46:_(s64), %47:_(s64), %48:_(s64), %49:_(s64), %50:_(s64), %51:_(s64), %52:_(s64), %53:_(s64)
%5:_(s64) = G_TRUNC %4:_(s1024)
$x0 = COPY %5:_(s64)
RET_ReallyLR implicit $x0

then a pass over the artifacts should give this but actually does nothing (it might take a couple iterations to get rid of the dead code, I've deleted it early for brevity):

%0:_(s64) = COPY $x0
%1:_(s64) = COPY $x1
%57:_(s64) = G_ASHR %0:_(s64), i64 63
%2:_(s1024) = G_MERGE_VALUES %0:_(s64), %57:_(s64), %57:_(s64), ...
%56:_(s64) = G_ASHR %1:_(s64), i64 63
%3:_(s1024) = G_MERGE_VALUES %1:_(s64), %56:_(s64), %56:_(s64), ...
%58:_(s512), %59:_(s512) = G_UNMERGE_VALUES %2:_(s1024)
%6:_(s64), %7:_(s64), %8:_(s64), %9:_(s64), %10:_(s64), %11:_(s64), %12:_(s64), %13:_(s64) = G_UNMERGE_VALUES %58:_(s512)
%60:_(s512), %61:_(s512) = G_UNMERGE_VALUES %3:_(s1024)
%22:_(s64), %23:_(s64), %24:_(s64), %25:_(s64), %26:_(s64), %27:_(s64), %28:_(s64), %29:_(s64) = G_UNMERGE_VALUES %60:_(s512)
%38:_(s64) = G_OR %6:_, %22:_
$x0 = COPY %38:_(s64) // %5 -> %38
RET_ReallyLR implicit $x0

then another pass over the changed artifacts should give:

%0:_(s64) = COPY $x0
%1:_(s64) = COPY $x1
%57:_(s64) = G_ASHR %0:_(s64), i64 63
%65:_(s512) = G_MERGE_VALUES %0:_(s64), %57:_(s64), %57:_(s64), ... // fewer %56's than before
%56:_(s64) = G_ASHR %1:_(s64), i64 63
%66:_(s512) = G_MERGE_VALUES %1:_(s64), %56:_(s64), %56:_(s64), ... // fewer %56's than before
%6:_(s64), %7:_(s64), %8:_(s64), %9:_(s64), %10:_(s64), %11:_(s64), %12:_(s64), %13:_(s64) = G_UNMERGE_VALUES %65:_(s512)
%22:_(s64), %23:_(s64), %24:_(s64), %25:_(s64), %26:_(s64), %27:_(s64), %28:_(s64), %29:_(s64) = G_UNMERGE_VALUES %66:_(s512)
%38:_(s64) = G_OR %6:_, %22:_
$x0 = COPY %38:_(s64) // %5 -> %38
RET_ReallyLR implicit $x0

and one more

%0:_(s64) = COPY $x0
%1:_(s64) = COPY $x1
%57:_(s64) = G_ASHR %0:_(s64), i64 63 // This is dead but hasn't been noticed yet
%56:_(s64) = G_ASHR %1:_(s64), i64 63 // This is dead but hasn't been noticed yet
%38:_(s64) = G_OR %0:_, %1:_
$x0 = COPY %38:_(s64) // %5 -> %38
RET_ReallyLR implicit $x0

The main bit that's missing is a combine that changes:

%1 = G_MERGE_VALUES %0, ...
%2 = G_TRUNC %1

to this when the types are ok:

%2 = %0

or with a G_TRUNC or a smaller G_MERGE_VALUES when they differ.

There is also another means of doing the right thing. If G_TRUNC correctly specified the min/max types with clampScalar() and a G_TRUNC narrowScalar were implemented as G_UNMERGE_VALUES/G_MERGE_VALUES then the MERGE/UNMERGE would pair up and use the existing combines. Similarly for G_SEXT. The snag there is the one I think you're trying to solve which is that the MERGE/UNMERGE have already left the artifact list and won't retry the combines again.

I think adding the missing combine(s) is my first preference at the moment but I ought to give it a bit more thought and weigh up the different options. I'm hoping a small number of combines prevent us from getting into the situations that lead us to complicating the processing order. One other thought that I haven't completely gone through is that artifacts should only be artifacts if the legalizer produced them. In other words, on the first pass things like G_TRUNC are treated just like every other instruction and only those that are created by the legalizer would be artifacts. This would ensure that the first artifact combiner pass can see all the merge/unmerge pairs caused by the first pass instead of some of them turning up on the second pass.

  1. G_UNMERGE couldn't find G_MERGE to combine with. G_UNMERGE is "Artifact" nothing can combine it away, then let's make it legal. G_UNMERGE is now a legalized instruction that is no more in ArtifactList and waits legalizer to end so that it can crash in Instruction Select. But there is more: other instruction get legalized (e.g. G_SEXT_INREG get narrowscalar-ed with G_UNMERGE/G_MERGE) and the G_MERGE appears. G_UNMERGE could combine with this G_MERGE but it is not in ArtifactList and we end up with uncombined Artifacts.

I believe this is the same problem as 2. It's wrong for G_UNMERGE_VALUE to be legal for types you can't handle later in the backend and the same gap in the artifact combiners is the root cause of illegal type combinations surviving the combiner.

Artifact combiner at the moment does 1 and 2. With this change Artifact combiner does 1,2 and 3(meaning it can perform this combine) + combines of type 2. are performed without requirement to have Artifacts legal for types that are not Instr Selectable ( in example above: {s128, s32} for G_MERGE and {s32, s128} for G_UNMERGE).
By the way 3. is the reason D61289 can't have mir test for narrow scalar of G_SEXT_INREG, with expected optimal output.

That patch has unit tests that check narrow scalar of G_SEXT_INREG works correctly. The reason you can't have an MIR test for it in that patch is that no target does anything other than lower(). D61290 makes narrowScalar a possibility for AArch64. The reason there isn't an MIR test for it there is that it's already covered by the unit tests.

With this change
Target that has only s32 as legal type (everything else is narrow/widen scalar-ed to s32) is free to declare all artifacts as unsupported as a last rule. This is it could have to narrowScalar G_ANYEXT to s32 but there is no need to declare G_ANYEXT as legal for any types.

At minimum you need G_TRUNC/G_ANYEXT between the types produced by legal operations to types consumed by legal operations. They're the last means of changing the size of a type that legalization can rely on.

We could argue that this is not that beneficial by itself, but since it is a consequence of allowing combine of chained artifacts, I would say it makes legalization rules for artifacts much more straightforward.

Legalization should be able to happen in any order (i.e. processing order is purely optimization and not correctness)

Instruction legalization can already be done in any order, but artifact combine has to wait for all instructions to be legalized first, right?

We do a pass over the non-artifacts, then a pass over the artifacts and repeat that cycle until we're finished. There's no actual requirement to do distinct passes like this, it just made it easier to understand, implement, and be sure we were combining away all the problematic artifacts.

Now we allow for chained artifacts to wait for another artifact/instruction to be handled first by introducing TryLaterArtifactList. To me, this feels like a natural extension of existing algorithm.

About

%1 = G_MERGE_VALUES %0, ...
%2 = G_TRUNC %1

Potential problem about such combine is that G_TRUNC could than both perform combine, and be combined away, and one of this two has to be attempted first, and it is not quite clear when we have to wait for other instruction to be legalized.

There is already similar combine that we could possibly use:

%1 = G_MERGE_VALUES %0, ...
%2 = G_EXTRACT %1, 0

For example instead of narrowing scalar with G_TRUNC we could G_EXTRACT to narrowType instead.

There is also another means of doing the right thing. If G_TRUNC correctly specified the min/max types with clampScalar() and a G_TRUNC narrowScalar were implemented as G_UNMERGE_VALUES/G_MERGE_VALUES then the MERGE/UNMERGE would pair up and use the existing combines. Similarly for G_SEXT. The snag there is the one I think you're trying to solve which is that the MERGE/UNMERGE have already left the artifact list and won't retry the combines again.

Oh yes. Let's go with the simplest case for start. I can't deal with e.g. s64 = G_SEXT s32 or s32 = G_TRUNC s64 without narrowScalar but then I am left with uncombined G_MERGE etc.Maybe this could be covered with additional Artifact combines?

I think adding the missing combine(s) is my first preference at the moment but I ought to give it a bit more thought and weigh up the different options.

I am not so sure that combines alone could fix problems we have:
1 In order to be able to perform combine both parts have to be present (both auxiliary artifact and artifact as I named them in this patch).
Just having the combine available is not of much use when one of the parts is not present. We don't know if we:
have to wait for other part to appear while there is possibility or
legalize when we know nothing can produce the other part.
2 Some of potential new Artifacts combines are conditional, e.g.

%1 = G_MERGE_VALUES %0, ...
%2 = G_TRUNC %1

is only possible when %2 has same LLT as %0.
OK, we could also perform narrowScalar to LLT of %0 when %2 has different LLT first and then combine.
Then it looks like having combine that does: narrowScalar + combine G_MERGE/G_UNMERGE. There is nothing wrong with this I guess, some combines could be even more complicated but are not really required since they can break down into sequence of already present simpler actions.

Combines we have atm are essentially all we need, using them at the right time in conjunction with narrowScalar or similar should be able to deal with everything. Here I tried to address the 'at the right time'.

The reason there isn't an MIR test for it there is that it's already covered by the unit tests.

Unit tests are fine, however the instructions that are made by narrowScalar will not combine away because the artifacts that should perform combine were not in the artifact list. When I tried it with s128 G_SEXT_INREG it had uncombined artifacts that are legal, it fallbacks to sdag later if I remember correctly.

We do a pass over the non-artifacts, then a pass over the artifacts and repeat that cycle until we're finished. There's no actual requirement to do distinct passes like this, it just made it easier to understand, implement, and be sure we were combining away all the problematic artifacts.

I don't agree with "There's no actual requirement"

Current code deals with artifact chained to an instruction because instructions always go first, but does not deal with artifact chained to an artifact.

The "be sure we were combining away all the problematic artifacts" implies a few requirements.
For example G_UNMERGE is most likely chained to some instruction (it could also be chained to an artifact but let's leave such complications for now).
If we want to combine this G_UNMERGE away, instruction has to be legalized first (it will create G_MERGE that G_UNMERGE is waiting for).

If we do artifact combine first most of the artifacts will leave Artifact list as legal instructions and won't attempt to combine again (G_UNMERGE is one of them). That is why we legalize instructions first.
For same reason we combine/legalize instruction/artifacts that USE %xy first before instruction/artifact that DEFs %xy.

Well in fact, some artifact combines will still be performed because e.g. G_TRUNC currently jumps to uses of its operand 0 and tries to perform combine from there but this is somewhat chaotic: legal instruction(uncombined artifact) that is not in artifact List perform combine, not to mention that that we might not be able to deal with it later if it leaves Legalizer despite the fact that it is legal. G_TRUNC jumping was probably added at some point in attempt fix the fact that some chained artifacts could leave the Artifact list (in this patch they have to wait in another list, and we don't ask them if they are legal until we are sure they can't be combined away).

I could make a new RFC thread with a few tests and expected results, to see how to deal with them. We could go with simple ones first, here is one:

%2:_(s32) = G_ADD %0(s32), %1(s32)
%3:_(s64) = G_SEXT %2(s32)
%4:_(s64) = ...
%5:_(s64) = G_ADD %3(s64), %4(s64)
...

for aarch64 just double the size of types. This patch with Narrow scalar G_SEXT would do the trick, but we should explore other possibilities.

Updated regression tests.

Since there is a lot of long comments here, I opened new RFC thread D65199.
There is full -debug output there for two simple test functions so that we can have better overview of the changes this patch brings.

Some thoughts about G_CONSTANT widen scalar.
We don't know how to perform widen scalar, i.e. do we zero or sign extend immediate. What we could do, if this patch gets accepted, is to remove G_CONSTANT widen scalar, and introduce G_SEXT/G_ZEXT/G_ANYEXT + G_CONSTANT combine. G_CONSTANT would be auxiliary artifact (or even have dedicated constList for constants) so that it processed after everything else. It still needs narrowScalar and when is legal but what is currently widen scalar could be handled with artifact combines.

I'm trying to figure out whats' going on with this discussion. One thing I didn't quite see a response to (apologies if you answered it and I missed it) is that why is this the case:

But in LegalizerInfo it says that G_MERGE is legal for {s128, s32} despite the fact that we cannot select it. Why? Because that is how current Artifact combiner works.

I understand that doing this may allow more artifacts to be combined away, but it shouldn't change the legality of anything coming out of the legalizer, if properly specified, right?

I understand that doing this may allow more artifacts to be combined away, but it shouldn't change the legality of anything coming out of the legalizer, if properly specified, right?

Absolutely. When some artifact is legal it can leave legalizer. It also needs few other instructions that are legal for same types so that it can connect them. e.g. for AArch64 G_PHI is legal for s8 and G_ADD is legal for s32 you expect s32 = G_ZEXT s8 to be legal and could leave legalizer since we could have sequence like this:

%10:_(s8) = G_PHI ....
%11:_(s32) = G_ZEXT %10:(s8)
%12:_(s32) = G_ADD %11:(s32), %8:(s32)

This patch does not affect legal artifacts in any way (it might be able to perform few combines more then what we have atm because it checks if uncombined artifact is chained to some unprocessed instr and might try to combine again later).
It won't ask artifact is it legal as long there is a instr in some of the lists that could create other half necessary for combine (here I named other half auxiliary artifact).

On the other hand, at the moment, smallest legal type for MIPS is s32. If s8 appears it should be dealt with widen scalar and G_ZEXT/G_SEXT should never leave legalizer.
This patch is able to perform all combines without declaring G_ZEXT/G_SEXT as legal. Combiner as is now, can perform them (with recursive combine when G_TRUNC jumps to its uses) but needs G_ZEXT/G_SEXT to be legal in order to not crash during legalization. However combiner as is now is not able to combine all G_MERGE/G_UNMERGE.
Patch was originally made for narrow scalar of artifacts. It might be easier to think of large types that cannot fit into register and not having to make merge/unmerge legal for them since we cannot deal with them later (efficiently at least).
I think that we should be dealt with big types with narrow scalar and artifact combiner without having to declare artifacts legal for big types.

Comments got too long and are hard to follow here, we could start again in D65199.

Refresh test changes. Ping.

@volkan Regarding D65894.
Here, artifacts that failed to combine are moved to RetryList and we retry to combine them. Once all of the MachineInstrs that define use operands of our artifact are processed (are not in any of the Observer Lists) we turn artifact into an instruction. This way we have more opportunities for combines. e.g. in /test/AMDGPU/GlobalISel/legalize-unmerge-values.mir , this patch catches G_MERGE/G_UNMERGE combine that D65894 cannot catch because those two are declared legal.
As for the other test changes in D65894, all of them are essentially here.
I also tried test/CodeGen/AArch64/GlobalISel/retry-artifact-combine.mir and got similar output, but with one less copy instr.
there are differences in test/CodeGen/AMDGPU/GlobalISel/legalize-{xor|and|or}.mir
D65894 managed to combine few more G_TRUNC + G_ANYEXT but in the end both patches crash on %0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>), because order of combine attempts is different this patch crashed before attempting to combine mentioned G_TRUNC + G_ANYEXT.
Please check if I missed something the this patch does not cover compared to D65894.

See D65199 for more tests with output from -debug option.

volkan added a comment.Aug 8 2019, 9:06 PM

@volkan Regarding D65894.
Here, artifacts that failed to combine are moved to RetryList and we retry to combine them. Once all of the MachineInstrs that define use operands of our artifact are processed (are not in any of the Observer Lists) we turn artifact into an instruction. This way we have more opportunities for combines. e.g. in /test/AMDGPU/GlobalISel/legalize-unmerge-values.mir , this patch catches G_MERGE/G_UNMERGE combine that D65894 cannot catch because those two are declared legal.
As for the other test changes in D65894, all of them are essentially here.
I also tried test/CodeGen/AArch64/GlobalISel/retry-artifact-combine.mir and got similar output, but with one less copy instr.
there are differences in test/CodeGen/AMDGPU/GlobalISel/legalize-{xor|and|or}.mir
D65894 managed to combine few more G_TRUNC + G_ANYEXT but in the end both patches crash on %0:_(<4 x s8>) = G_TRUNC %29:_(<4 x s32>), because order of combine attempts is different this patch crashed before attempting to combine mentioned G_TRUNC + G_ANYEXT.
Please check if I missed something the this patch does not cover compared to D65894.

See D65199 for more tests with output from -debug option.

Hi Petar,

I completely forgot this patch exists, sorry about it. I took a look and found a few issues.

First of all, I don't see a reason to add another concept called AuxilaryArtifacts, it makes the legalization really complicated. I think we can achieve the same result with the current design. Here is an example:

...
%1:_(s8) = G_TRUNC %0(s32)
...
%9:_(s32) = G_ANYEXT %1(s8)

In this case, moving %1 to AuxiliaryArtifactList doesn't make any difference because combining %9 will remove it anyway, even if it was in InstList. Since you already check the source registers before legalizing the artifacts, this shouldn't be a problem. What do you think?

Also, the definition of auxiliary artifact didn't seem right to me. It might be the opposite depending on the function. For example, the G_TRUNC instruction below has to wait for %9 to get combined. %9 should be auxiliary artifact in this case.

...
%9:_(s64) = G_ANYEXT %0(s32)
...
%1:_(s32) = G_TRUNC %9(s64)
...

Regarding the algorithm, I'm not sure if it's always a good idea to wait for the source instructions before legalizing artifacts, that why I didn't want to change that in D65894. One reason is generated instruction might be expensive for targets. For example, in order to combine G_SEXT, we emit G_SHL and G_ASHR, but we don't check if it's ideal for the target. The other reason is that you keep processing the artifact that are legal for no reason. So, I think we need a long term solution if we are going to change the legalization strategy, maybe one with a target hook to decide how to process artifacts?

Lastly, does D65894 fix your issue? If so, do you think it can be used as a short-term solution?

In this case, moving %1 to AuxiliaryArtifactList doesn't make any difference because combining %9 will remove it anyway, even if it was in InstList. Since you already check the source registers before legalizing the artifacts, this shouldn't be a problem. What do you think?

Well in this example it is not a problem, but in more complicated case:
moving %1 to AuxiliaryArtifactList means that %1 will be legalized first if %9 failed to combine it away, %9 might be able to combine away the artifact that was produced during legalization of %1.
In general, this patch only combines Artfact from ArtifactList with some auxiliary artifact from AuxiliaryArtifactList, if something gets legalized to legal it will not dispersal from MF during some combine. It's a bit confusing to see instruction said to be legal during legalization but it is not in the final MF.

Other option is to add much more artifact combines to make sure if two artifact are connected, they can be combined away. I guess this way we should eventually lose need for artifacts being anything other then legal. Also since there is no "order for combine attempts" and other part required for combine might be hidden behind and instruction (e.g. G_SEXT_IN_REG) we have to make all combines to jump to its use and try from there e.g G_MERGE has to try to jump to use of its def Operand since corresponding G_UNMERGE could be already legalized.

Also, the definition of auxiliary artifact didn't seem right to me. It might be the opposite depending on the function. For example, the G_TRUNC instruction below has to wait for %9 to get combined. %9 should be auxiliary artifact in this case.

This would be true if there was an G_ANYEXT + G_TRUNC combine, but we don't have one (we have G_TRUNC + G_ANYEXT);
I am pretty sure that you can't get this sequence of instructions from llvm-ir input with what we have in the tree. There might be some narrowScalars in the future that could create such sequence but then we can narrow scalar both artifacts and continue from there.

I have only considered what we have in tree and regression tests. Artifacts are cases from switch in tryCombineInstruction(...
excluding G_TRUNC since it jumps to uses of its operand 0 and recursively calls tryCombineInstruction

Anyway since auxiliary artifacts just complicate things i tried similar thing again but instead of having concept of auxiliary artifacts, I check for both inputs and outputs(this is new) of Artifacts that failed to combine and if some of them is in any Observer list I push it to RetryList. Problem with this is that at some point we are left with chain of artifacts and only thing left to do is to try to legalize. Now if we legalize in wrong order and don't try to combine first we might miss combine since legal instruction gets out of Observer Lists and won't be combined.

Regarding the algorithm, I'm not sure if it's always a good idea to wait for the source instructions before legalizing artifacts.

Probably not always, I pretty much extended what happens in MF where all instructions are either legal or widenScalar, Legalize instructions (sources) first, combine artifacts and we are done (at the end we just check that artifact combines generated only legal instructions)

One reason is generated instruction might be expensive for targets. For example, in order to combine G_SEXT, we emit G_SHL and G_ASHR, but we don't check if it's ideal for the target.

I don't understand this one.

The other reason is that you keep processing the artifact that are legal for no reason.

Yes, but algorithm doesn't know if that artifact is legal, it just waits for its input to get legalized and to retry combine. Artifacts should be combined regardless if they are legal or not, e.g. s32 G_ANYEXT s8 is legal for AArch64 but almost always gets combined.

So, I think we need a long term solution if we are going to change the legalization strategy, maybe one with a target hook to decide how to process artifacts?

I am not so sure about this one, that hook will have to hold pretty much entire body of runOnMachineFunction because do/while loop is different and observer is different.

Lastly, does D65894 fix your issue? If so, do you think it can be used as a short-term solution?

Hm.. It should fix the not having artifacts to be legal in order to combine them. But it misses to combine artifacts that are legal(RegBankSelect needs G_MERGE and G_UNMERGE legal and is able to deal with them). We could try adding few more combines for start and see how it works.

In this case, moving %1 to AuxiliaryArtifactList doesn't make any difference because combining %9 will remove it anyway, even if it was in InstList. Since you already check the source registers before legalizing the artifacts, this shouldn't be a problem. What do you think?

Well in this example it is not a problem, but in more complicated case:
moving %1 to AuxiliaryArtifactList means that %1 will be legalized first if %9 failed to combine it away, %9 might be able to combine away the artifact that was produced during legalization of %1.

I don't think that matters as long as you process the rest of the instructions before aborting. That's what D65894 does, it keeps processing the instructions in InstList, then tries to combine the illegal artifacts if there is anything new. Do you have a test case that requires more complicated logic?

In general, this patch only combines Artfact from ArtifactList with some auxiliary artifact from AuxiliaryArtifactList, if something gets legalized to legal it will not dispersal from MF during some combine. It's a bit confusing to see instruction said to be legal during legalization but it is not in the final MF.

Other option is to add much more artifact combines to make sure if two artifact are connected, they can be combined away. I guess this way we should eventually lose need for artifacts being anything other then legal. Also since there is no "order for combine attempts" and other part required for combine might be hidden behind and instruction (e.g. G_SEXT_IN_REG) we have to make all combines to jump to its use and try from there e.g G_MERGE has to try to jump to use of its def Operand since corresponding G_UNMERGE could be already legalized.

The problem with adding more artifact combines is that targets have no control over it, that's why I don't think it's a good idea to delay the legalization of the artifacts as much as possible in order to combine more instructions, unless it's not supported by the target.

Also, the definition of auxiliary artifact didn't seem right to me. It might be the opposite depending on the function. For example, the G_TRUNC instruction below has to wait for %9 to get combined. %9 should be auxiliary artifact in this case.

This would be true if there was an G_ANYEXT + G_TRUNC combine, but we don't have one (we have G_TRUNC + G_ANYEXT);
I am pretty sure that you can't get this sequence of instructions from llvm-ir input with what we have in the tree. There might be some narrowScalars in the future that could create such sequence but then we can narrow scalar both artifacts and continue from there.

I have only considered what we have in tree and regression tests. Artifacts are cases from switch in tryCombineInstruction(...
excluding G_TRUNC since it jumps to uses of its operand 0 and recursively calls tryCombineInstruction

Anyway since auxiliary artifacts just complicate things i tried similar thing again but instead of having concept of auxiliary artifacts, I check for both inputs and outputs(this is new) of Artifacts that failed to combine and if some of them is in any Observer list I push it to RetryList. Problem with this is that at some point we are left with chain of artifacts and only thing left to do is to try to legalize. Now if we legalize in wrong order and don't try to combine first we might miss combine since legal instruction gets out of Observer Lists and won't be combined.

Regarding the algorithm, I'm not sure if it's always a good idea to wait for the source instructions before legalizing artifacts.

Probably not always, I pretty much extended what happens in MF where all instructions are either legal or widenScalar, Legalize instructions (sources) first, combine artifacts and we are done (at the end we just check that artifact combines generated only legal instructions)

One reason is generated instruction might be expensive for targets. For example, in order to combine G_SEXT, we emit G_SHL and G_ASHR, but we don't check if it's ideal for the target.

I don't understand this one.

Let's assume {G_SEXT, DstTy, SrcTy} is marked as Custom. In this case, if you skip legalization and try to combine them aggressively, the ArtifactCombiner will generate G_SHL and G_ASHR, but it doesn't check if this is something that the target wants. As I said before, currently, we don't have any control over these combines.

The other reason is that you keep processing the artifact that are legal for no reason.

Yes, but algorithm doesn't know if that artifact is legal, it just waits for its input to get legalized and to retry combine.

Yes, maybe that's what we need to do, the ArtifactCombiner can check if it's legal or not.

Artifacts should be combined regardless if they are legal or not, e.g. s32 G_ANYEXT s8 is legal for AArch64 but almost always gets combined.

I din't think that's always a good idea, see my example above (G_SEXT).

So, I think we need a long term solution if we are going to change the legalization strategy, maybe one with a target hook to decide how to process artifacts?

I am not so sure about this one, that hook will have to hold pretty much entire body of runOnMachineFunction because do/while loop is different and observer is different.

I didn't mean the target hook should do the legalization, it can specify some actions for the artifacts that can guide the Legalizer.

Lastly, does D65894 fix your issue? If so, do you think it can be used as a short-term solution?

Hm.. It should fix the not having artifacts to be legal in order to combine them. But it misses to combine artifacts that are legal(RegBankSelect needs G_MERGE and G_UNMERGE legal and is able to deal with them). We could try adding few more combines for start and see how it works.

Is it necessary if they are already legal? The goal is to clean up the instruction generated by unsupported types, not to optimize generated MF. That's should be the combiner's job.

I don't think that matters as long as you process the rest of the instructions before aborting. That's what D65894 does, it keeps processing the instructions in InstList, then tries to combine the illegal artifacts if there is anything new. Do you have a test case that requires more complicated logic?

Yes, D65894 works perfectly fine for non legal artifacts, but it can't deal with legal artifacts that could be combined.
File withD61787.txt in D65199
line 920 in second file, Inspecting Artifact : %10:_(s32), %11:_(s32) = G_UNMERGE_VALUES %6:_(s64)

D65894 would try to legalize that G_UNMERGE and it is legal, that is it, it is not in any observer list any more, we can't combine it.

If we add tryCombine for G_MERGE_VALUES similar to what G_TRUNC does we might be able to combine them, but again that means to combine instruction that is not in any observer List and is legal.

or D61289, see my comment from May 22 2019

Let's assume {G_SEXT, DstTy, SrcTy} is marked as Custom. In this case, if you skip legalization and try to combine them aggressively, the ArtifactCombiner will generate G_SHL and G_ASHR, but it doesn't check if this is something that the target wants. As I said before, currently, we don't have any control over these combines.

Oh that, I agree that targets should get some control over how to combine some artifacts and just as we speak about it D61289 just landed so target gets control over how G_TRUNC+G_SEXT are combined by legalizing G_SEXT_INREG. However I don't thik that is the main focus here as I tried to figure out when is it justified to wait and try to combine again, and when we have to legalize. Again I only considered combines currently availabe, this patch most probably won't work if we try to add more combines (but it works if we narrow scalar artifacts).

The problem with adding more artifact combines is that targets have no control over it, that's why I don't think it's a good idea to delay the legalization of the artifacts as much as possible in order to combine more instructions, unless it's not supported by the target.
Yes, maybe that's what we need to do, the ArtifactCombiner can check if it's legal or not.

Artifacts should be combined regardless if they are legal or not, e.g. s32 G_ANYEXT s8 is legal for AArch64 but almost always gets combined.

I din't think that's always a good idea, see my example above (G_SEXT).

Let's call it detect more combine opportunities instead of add more combines.
We could add hooks for target custom combine but I am not that sure we need them.
I think that Artifact combiner should not care if something is legal or not, it should manage its oberver lists and be able to combine/notify target that combine is available, when two connected artifact appear during legalization.

Is it necessary if they are already legal? The goal is to clean up the instruction generated by unsupported types, not to optimize generated MF. That's should be the combiner's job.

e.g. on AArch64 G_ZEXT, G_SEXT, G_ANYEXT and G_TRUNC are legal for all combinations of most common types (s8, s16, s32, s64) and you will not see them leave legalizer next to each other uncombined. On the other hand uncombined G_MERGE/G_UNMERGE can leave legalizer. I think its desireable to combine everything available, it's is a little extra effort vs. having one extra pass that does same thing but is smarter and can combine everything.

Hi Petar,

Sorry for the slow reply.

I think it’s okay not to combine legal artifacts because 1) targets don’t control over it, and 2) we don’t know if the result of the combine is ideal. Therefore, I don’t think adding more combines would improve the legalizer. We should probably work on the current legalization strategy first before doing something like that. What do you think?

In the mean time, I would like to commit D65894 if you don’t have any concerns about it. It will allow us to legalize the instructions blocked by illegal artifacts without trying to improve the generated code.

I think it’s okay not to combine legal artifacts

Often they get combined before even asking if they are legal or not, there is no strict agreement on what should be combined or not it's kind of random and depends on the order we attempt combines.

Therefore, I don’t think adding more combines would improve the legalizer. We should probably work on the current legalization strategy first before doing something like that. What do you think?

I see two routes atm, one is this patch and no more combines, other is D65894 as it deals with combining non-legal artifacts, and to catch combines of legal artifacts we could add more combines, like D62338 and for example add combine for G_MERGE, similar to G_TRUNC, by inspecting uses of G_MERGE def operands in hope to find G_UNMERGE and perform combine from that G_UNMERGE. With such combines it is possible to visit def before all it uses have been visited and tracking of dead instructions becomes a little difficult. It all comes down to how we menage Observer lists that hold instructions/artifacts.
If we want to handle everything I think it is best to catch Artifact that failed to combine and retry or legalize under some conditions, the conditions when to retry combine or legalize depend on set of available combines(and maybe it would have to be reduced to set of more basic combines).

In the mean time, I would like to commit D65894 if you don’t have any concerns about it. It will allow us to legalize the instructions blocked by illegal artifacts without trying to improve the generated code.

Sure, let's go with D65894 for start.

Petar.Avramovic abandoned this revision.Sep 23 2019, 1:49 AM