This is an archive of the discontinued LLVM Phabricator instance.

[CodeGenPrepare] Fix ModifiedDT flag in optimizeSelectInst
ClosedPublic

Authored by xur on Mar 8 2019, 9:48 AM.

Details

Summary

r344412 fixed a huge compile time regression but it needs ModifiedDT flag
to be maintained correctly in optimizations in optimizeBlock() and optimizeInst().

optimizeSelectInst() does not updated the flag ( it updates a different ModifiedDT in CodeGenPrepare class).

This patche propagates the flag in optimizeSelectInst() back to optimizeBlock().

I also change the name of ModifiedDT in CodeGenPrepare class to avoid confusion. It seems ModifiedDT in CodeGenPrepare class is not being used anywhere. We may want to delete it.

Diff Detail

Event Timeline

xur created this revision.Mar 8 2019, 9:48 AM

Thanks for the fix. A couple comments below.

lib/CodeGen/CodeGenPrepare.cpp
293–294

I'd either remove this completely, since it isn't used, or migrate to using this instead of passing around a flag parameter.

7248

I think this should still set ModifiedDT so that the caller handles it appropriately (see callsite in optimizeBlock).

The comment doesn't make sense to me...might be stale?

xur marked 2 inline comments as done.Mar 8 2019, 10:11 AM
xur added inline comments.
lib/CodeGen/CodeGenPrepare.cpp
293–294

I think remove this field is better. The existing uses of this field will be kept in a ref parameter.

7248

We can remove all this.

xur updated this revision to Diff 189877.Mar 8 2019, 10:14 AM

Integrated Teresa's comments.

Also using spatel's test in Bug 41004 which is much cleaner.
My test was using produced by bugpoint (reduced from clang PGO bootstrap) and needs branchweight.

Just got a report of an internal test failure from my commit, let me see if your patch fixes it.

lib/CodeGen/CodeGenPrepare.cpp
2045

Should this set the new ModifiedDT?

Just got a report of an internal test failure from my commit, let me see if your patch fixes it.

This patch fixes the test failure.

xur added inline comments.Mar 8 2019, 10:32 AM
lib/CodeGen/CodeGenPrepare.cpp
2045

Sorry for missing this one. We should.
I will update the patch.

xur updated this revision to Diff 189881.Mar 8 2019, 10:33 AM

add one missing updates in last version of the patch.

This revision is now accepted and ready to land.Mar 8 2019, 10:42 AM
spatel added a comment.EditedMar 8 2019, 10:56 AM

Minor: it would be better to make the test in IR (opt -codegenprepare), so it would live under test/Transforms/CodeGenPrepare/X86/. That's because this isn't really an x86 bug. It could affect any target, and the bug is really one of an IR transform. Along those lines, I would include auto-generated FileCheck output too even if the IR for this example is semi-useless, just so we know exactly what is coming out of CGP.

xur updated this revision to Diff 189894.Mar 8 2019, 11:39 AM

changed test case based on spatel's comments.

spatel added inline comments.Mar 8 2019, 12:38 PM
test/Transforms/CodeGenPrepare/X86/optimizeSelect-DT.ll
23 ↗(On Diff #189894)

typo - CEECK
You could avoid this kind of bug by using the script at utils/update_test_checks.py.

xur updated this revision to Diff 189923.Mar 8 2019, 1:56 PM

using utils/update_test_checks.py to generate the test checks.

thanks the tip from spatel!

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptMar 8 2019, 2:48 PM

This change caused a compile time regression (that I referenced at the end of my summary in D59696). In this case, there are huge numbers of select instructions. After this change, since we now update the ModifiedDT correctly, the loop over the function in CodeGenPrepare::runOnFunction will break after each select is optimized, due to ModifiedDT being set. As mentioned in D59696, even after making the DT build lazy, there is still a large regression because of the number of times we re-walk the function.

I have a couple possible solutions, and am looking for guidance on what is preferable.

  1. Reset the DT instead of existing the function walk. Now that D59696 is in and the DT is built lazily, instead of setting ModifiedDT true and breaking out of the function walk in runOnFunction() (which was previously required due to the DT being built for each function walk), we can simply reset the DT unique_ptr, which will force a rebuild of DT when it is next needed:

diff --git a/lib/CodeGen/CodeGenPrepare.cpp b/lib/CodeGen/CodeGenPrepare.cpp
index 9d642ba245c..bd9cc5e4158 100644

  • a/lib/CodeGen/CodeGenPrepare.cpp

+++ b/lib/CodeGen/CodeGenPrepare.cpp
@@ -5962,7 +5962,7 @@ bool CodeGenPrepare::optimizeSelectInst(SelectInst *SI, bool &ModifiedDT) {

  !isFormingBranchFromSelectProfitable(TTI, TLI, SI))
return false;
  • ModifiedDT = true;

+ DT.reset();

// Transform a sequence like this:
//    start:

In my test case, since apparently there are many more select instructions (which don't use the DT to optimize) than the type of instructions that need the DT, this fixes the regression. It may not always have an effect, say if selects were interspersed with instructions that need the DT to optimize, but it won't make matters worse than the current status quo (which would cause the function iteration to exit and the DT to be reset there). It's possible that other places that set ModifiedDT could get the same treatment, I just tried in optimizeSelectInst since that was the place affected by this patch and that caused my regression.

  1. Optimize select insts before iterative function walk. Similar to how some of the other instructions are optimized before this iterative walk (e.g. splitBranchCondition()), since select instructions don't need the DT, and presumably don't need iterative optimization, we could do the select optimizations earlier:

diff --git a/lib/CodeGen/CodeGenPrepare.cpp b/lib/CodeGen/CodeGenPrepare.cpp
index 9d642ba245c..b2a0c8a9570 100644

  • a/lib/CodeGen/CodeGenPrepare.cpp

+++ b/lib/CodeGen/CodeGenPrepare.cpp
@@ -465,6 +465,15 @@ bool CodeGenPrepare::runOnFunction(Function &F) {

// to help generate sane code for PHIs involving such edges.
EverMadeChange |= SplitIndirectBrCriticalEdges(F);

+ for (Function::iterator I = F.begin(); I != F.end(); ) {
+ BasicBlock *BB = &*I++;
+ CurInstIterator = BB->begin();
+ while (CurInstIterator != BB->end()) {
+ if (SelectInst *SI = dyn_cast<SelectInst>(&*CurInstIterator++))
+ EverMadeChange |= optimizeSelectInst(SI, ModifiedDT);
+ }
+ }
+

bool MadeChange = true;
while (MadeChange) {
  MadeChange = false;

Presumably this would be refactored into a separate function, similar to splitBranchCondition, the above was just a quick hack to test. The advantage of this second approach is that it will improve compile time in any other cases involving select instructions interspersed with instructions that need the DT to optimize.

However - this only eliminates 2/3 of the additional CGP overhead. Since I didn't remove the call to optimizeSelectInst from optimizeInst in this case, we may be finding additional selects to optimize during the iterative function optimization stage.

  1. Similar to 2 but do the select optimization once per function walk:

diff --git a/lib/CodeGen/CodeGenPrepare.cpp b/lib/CodeGen/CodeGenPrepare.cpp
index 9d642ba245c..2241d4af206 100644

  • a/lib/CodeGen/CodeGenPrepare.cpp

+++ b/lib/CodeGen/CodeGenPrepare.cpp
@@ -468,6 +468,14 @@ bool CodeGenPrepare::runOnFunction(Function &F) {

bool MadeChange = true;
while (MadeChange) {
  MadeChange = false;

+ for (Function::iterator I = F.begin(); I != F.end(); ) {
+ BasicBlock *BB = &*I++;
+ CurInstIterator = BB->begin();
+ while (CurInstIterator != BB->end()) {
+ if (SelectInst *SI = dyn_cast<SelectInst>(&*CurInstIterator++))
+ MadeChange |= optimizeSelectInst(SI, ModifiedDT);
+ }
+ }

DT.reset();
for (Function::iterator I = F.begin(); I != F.end(); ) {
  BasicBlock *BB = &*I++;

This removes the remaining overhead.

If #1 is a one-liner that doesn't change any existing tests, that sounds like a good choice.
#2 / #3 also sound like good options. IMO, the fact that we have or need to make that kind of structural change points out that CGP itself has gotten too big. According to its lead comment, this is supposed to be temporary spot for hacks prohibited by SDAG (ie, once we switch to global-isel, CGP should go away)...but CGP has become several independent passes in 1 file. The bigger change to correct these kinds of problems would be to split things into multiple passes.

If #1 is a one-liner that doesn't change any existing tests, that sounds like a good choice.

Basically, yes (I'd also remove the ModifiedDT parameter to that routine).

#2 / #3 also sound like good options. IMO, the fact that we have or need to make that kind of structural change points out that CGP itself has gotten too big. According to its lead comment, this is supposed to be temporary spot for hacks prohibited by SDAG (ie, once we switch to global-isel, CGP should go away)...but CGP has become several independent passes in 1 file. The bigger change to correct these kinds of problems would be to split things into multiple passes.

Yeah it seems that the structure is has gotten really unwieldy and haphazard (i.e. when the function walk needs to be completely restarted and which parts need to be iterative isn't clear to me).

I will go ahead and send a patch for #1 for now since it is very simple and addresses this regression specifically, by undoing the iteration change provoked by this fix.