This is an archive of the discontinued LLVM Phabricator instance.

[Legalizer] Making artifact combining order-independent
ClosedPublic

Authored by rtereshin on Dec 12 2019, 5:28 PM.

Details

Summary

Legalization algorithm is complicated by two facts:

  1. While regular instructions should be possible to legalize in an isolated, per-instruction, context-free manner, legalization artifacts can only be eliminated in pairs, which could be deeply, and ultimately arbitrary nested: { [ () ] }, where which paranthesis kind depicts an artifact kind, like extend, unmerge, etc. Such structure can only be fully eliminated by simple local combines if they are attempted in a particular order (inside out), or alternatively by repeated scans each eliminating only one innermost pair, resulting in O(n^2) complexity.
  2. Some artifacts might in fact be regular instructions that could (and sometimes should) be legalized by the target-specific rules. Which means failure to eliminate all artifacts on the first iteration is not a failure, they need to be tried as instructions, which may produce more artifacts, including the ones that are in fact regular instructions, resulting in a non-constant number of iterations required to finish the process.

I trust the recently introduced termination condition (no new artifacts
were created during as-a-regular-instruction-retrial of artifacts not
eliminated on the previous iteration) to be efficient in providing
termination, but only performing the legalization in full if and only if
at each step such chains of artifacts are successfully eliminated in
full as well.

Which is currently not guaranteed, as the artifact combines are applied
only once and in an arbitrary order that has to do with the order of
creation or insertion of artifacts into their worklist, which is a no
particular order.

In this patch I make a small change to the artifact combiner, making it
to re-insert into the worklist immediate (modulo a look-through copies)
artifact users of each vreg that changes its definition due to an
artifact combine.

Here the first scan through the artifacts worklist, while not
being done in any guaranteed order, only needs to find the innermost
pair(s) of artifacts that could be immediately combined out. After that
the process follows def-use chains, making them shorter at each step, thus
combining everything that can be combined in O(n) time.

Diff Detail

Event Timeline

rtereshin created this revision.Dec 12 2019, 5:28 PM

Looks like a reasonable change to me - LGTM . Maybe wait a couple days for others to give feedback.

This revision is now accepted and ready to land.Dec 12 2019, 6:45 PM

missed a test case update (arm64-fallback.ll)

nhaehnle removed a subscriber: nhaehnle.Dec 13 2019, 5:46 AM

Adding // Adding Use to ArtifactList. comment in front of WrapperObserver.changedInstr(Use); as requested, NFC.

paquette accepted this revision.Dec 13 2019, 11:48 AM

LGTM aside from some places where a couple comments can be added.

llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
481

Can we have a comment explaining what the purpose of UpdatedDefs is?

E.g. "Store the refs that were changed by a combine, so that we can..."

507–508

It's not immediately obvious to me why this is only happening for G_TRUNC. A comment would be helpful here. :)

516

A comment explaining what you're doing here at a high level would be nice too.

rtereshin marked 2 inline comments as done.Dec 13 2019, 12:00 PM
rtereshin added inline comments.
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
481

Sure! I'll come up with something shortly.

516

Not entirely sure how would I shape it. You mean repeating all the same stuff described in the commit message and the test?

rtereshin marked 6 inline comments as done.Dec 13 2019, 3:15 PM
rtereshin added subscribers: craig.topper, arsenm.
rtereshin added inline comments.
llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
481
diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
index 3946f2ff0af..58d76fc56ff 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
@@ -478,6 +478,10 @@ public:
     if (!DeadInsts.empty())
       deleteMarkedDeadInsts(DeadInsts, WrapperObserver);
 
+    // Put here every vreg that was redefined in such a way that it's at least
+    // possible that one (or more) of its users (immediate or COPY-separated)
+    // could become artifact combinable with the new definition (or the
+    // instruction reachable from it through a chain of copies if any).
     SmallVector<Register, 4> UpdatedDefs;
     bool Changed = false;
     switch (MI.getOpcode()) {
507–508

I believe I'm applying to G_TRUNC here the same change I'm applying to every other opcode. It looks different because the logic the change is applied to is different for G_TRUNC.

That difference seems to stem mostly from 2 prior commits:

  1. https://github.com/llvm/llvm-project/commit/e6201c8724dbce94969bc0b7f81e950cdbb6f742 ("[GISel]: Rework legalization algorithm for better elimination of" by @aditya_nandakumar)
  2. https://github.com/llvm/llvm-project/commit/a5376f6322132e3b0664de55348f6bbba1fabd00 ("[GlobalISel][AArch64][AMDGPU][X86] Teach LegalizationArtifactCombiner to combine trunc(g_constant)." by @craig.topper)

(1) is of most interest here. It looks like prior to the current patch being reviewed here this behavior (re-visiting the users of G_TRUNC) was needed for the legalization algorithm to work for similar reasons to what I'm driven by here in this patch.

There is also an interesting statement in (1)'s commit message:

Currently, the target needs to say all combinations of extends and truncs are legal

I think with this patch here we finally completely eliminate that requirement, but targets (most notably AMDGPU @arsenm) seem to still have it in their legalization rules, e.g. claiming that a trunc to s8 is legal, while it's actually not. What that leads to is that if I remove this special logic for G_TRUNC, the legalization for AMDGPU produces quite a bit of truncates to s8 and extends from s8 (and s1 even more so), which is technically correct because all of those are declared by the target as legal. Not a single test fails to legalize or select either, so those truncates and extends are either getting combined out after the legalization, or AMDGPU is missing integration tests running the entire GlobalISel, or I'm mistaken and those are in fact actually legal and can be selected, I just don't know. I definitely intend to let the targets to figure out what they want to do with that.

There is also a possibility that some targets try to make GlobalISel work with some of the artifacts (e.g. G_TRUNC to s1) being legal in a context-dependent way, only if they are used only by a specific subset of opcodes. Perhaps in such cases that may force them (or seemingly force them) to call such artifacts legal, but they still need them to be combined away in every case when it's possible, even for correctness. Not sure.

There is also another thing here. There are some instructions which can be both combined out or legalized. Technically, the legalizer's job is to just produce legal code. Practically speaking though, provided such an alternative, it's probably better to combine them out instead, as that tends to improve the quality of the code generated *and* improve compile time both. We probably shouldn't try to do such combines (improving both the codegen and compile time) during legalization on purpose and try too hard to find more of them, but if they come naturally around, like I think in this case with G_TRUNC, I wouldn't try too hard to remove them either.

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
index 58d76fc56ff..7564736f291 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
@@ -504,8 +504,13 @@ public:
       break;
     case TargetOpcode::G_TRUNC:
       Changed = tryCombineTrunc(MI, DeadInsts, UpdatedDefs);
-      if (!Changed)
+      if (!Changed) {
+        // Try to combine truncates away even if they are legal. As all artifact
+        // combines at the moment look only "up" the def-use chains, we achieve
+        // that by throwing truncates' users (with look through copies) into the
+        // ArtifactList again.
         UpdatedDefs.push_back(MI.getOperand(0).getReg());
+      }
       break;
     }
     while (!UpdatedDefs.empty()) {
516

Okay, given the commit message, the tests (especially the unit test and the comment in it), I came up with the following comments here in-code, trying to bring some more information to the table instead of just repeating all the same stuff:

diff --git a/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h b/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
index 7564736f291..4c2ab316930 100644
--- a/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
+++ b/llvm/include/llvm/CodeGen/GlobalISel/LegalizationArtifactCombiner.h
@@ -513,11 +513,16 @@ public:
       }
       break;
     }
+    // If the main loop through the ArtifactList found at least one combinable
+    // pair of artifacts, not only combine it away (as done above), but also
+    // follow the def-use chain from there to combine everything that can be
+    // combined within this def-use chain of artifacts.
     while (!UpdatedDefs.empty()) {
       Register NewDef = UpdatedDefs.pop_back_val();
       assert(NewDef.isVirtual() && "Unexpected redefinition of a physreg");
       for (MachineInstr &Use : MRI.use_instructions(NewDef)) {
         switch (Use.getOpcode()) {
+        // Keep this list in sync with the list of all artifact combines.
         case TargetOpcode::G_ANYEXT:
         case TargetOpcode::G_ZEXT:
         case TargetOpcode::G_SEXT:
@@ -533,6 +538,11 @@ public:
             UpdatedDefs.push_back(Copy);
           break;
         }
+        default:
+          // If we do not have an artifact combine for the opcode, there is no
+          // point in adding it to the ArtifactList as nothing interesting will
+          // be done to it anyway.
+          break;
         }
       }
     }

Hope that works.

rtereshin updated this revision to Diff 233889.Dec 13 2019, 3:24 PM
rtereshin marked an inline comment as done.

Addressing the feedback, NFC.

This revision was automatically updated to reflect the committed changes.