Page MenuHomePhabricator

[LSR] Do not hoist IV if it is not post increment case. PR43678
AbandonedPublic

Authored by skatkov on Jul 9 2021, 12:29 PM.

Details

Summary

Hosting IV increment in SCEV expander may cause a wrong SSA form formation in case
this increment is planned to be updated by LSR due to other pieces of LSR formula
will be expanded in old place and replacing an operand of hoisted instruction results in
incorrect SSA form.

Diff Detail

Unit TestsFailed

TimeTest
3,170 msx64 debian > libarcher.barrier::barrier.c
Script: -- : 'RUN: at line 14'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/barrier/barrier.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/barrier/Output/barrier.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/barrier/barrier.c
3,170 msx64 debian > libarcher.critical::critical.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/critical.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/critical.c
3,040 msx64 debian > libarcher.critical::lock-nested.c
Script: -- : 'RUN: at line 15'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/lock-nested.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock-nested.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock-nested.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/critical/Output/lock-nested.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/critical/lock-nested.c
3,490 msx64 debian > libarcher.races::critical-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/critical-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/critical-unrelated.c
3,400 msx64 debian > libarcher.races::lock-nested-unrelated.c
Script: -- : 'RUN: at line 13'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -g -O1 -fsanitize=thread -I /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests -I /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/runtime/src -L /var/lib/buildkite-agent/builds/llvm-project/build/lib -Wl,-rpath,/var/lib/buildkite-agent/builds/llvm-project/build/lib /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp -latomic && env TSAN_OPTIONS='ignore_noninstrumented_modules=0:ignore_noninstrumented_modules=1' /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/deflake.bash /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp 2>&1 | tee /var/lib/buildkite-agent/builds/llvm-project/build/projects/openmp/tools/archer/tests/races/Output/lock-nested-unrelated.c.tmp.log | /var/lib/buildkite-agent/builds/llvm-project/build/./bin/FileCheck /var/lib/buildkite-agent/builds/llvm-project/openmp/tools/archer/tests/races/lock-nested-unrelated.c
View Full Test Results (14 Failed)

Event Timeline

skatkov created this revision.Jul 9 2021, 12:29 PM
skatkov requested review of this revision.Jul 9 2021, 12:29 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2021, 12:29 PM

Some additional information to help to reviewers, in other words why I think this is a good fix.

LSR, in LSRInstance::OptimizeLoopTermCond(), sets the IVIncInsertPos variable to common dominator of all conditions related to post-increment IV.
Say, if we have a loop exit with IV check which is not latch and increment of this IV is located in latch. The optimization will try to hoist IV increment closer to check to increase the chances to utilize the same physical register for increment and check.

Later, for each IV it adjust insert position in LSRInstance::AdjustInsertPositionForExpand and in this place it insertion position is adjusted taking into account whether IV is post-increment or not. So for post increment all pieces of formula will be expanded before IVIncInsertPos.

However if IV is not post incremented insertion position will not be adjusted.

Later in SCEV expander, there is another attempt to hoist IV increment (this part which is in the patch) and it does not take into account whether current IV is post-increment or not.
it just based on IVIncInsertPos variable which is set once for all IVs processed in LSR.

As a result if current IV processed by SEV expander is not post incremented (but there is another IV with post-increment), IV increment expansion will be hoisted while other parts of formula do not. This causes a bug.

Hi,

I am not sure I understand the issue here.
Could you paste the IR with the wrong transformation (before this patch) for one of the test (e.g., the smaller one)?

Some high-level comments:

  • You can put more than one function in a test if you want to avoid to create two different files?
  • Could you change the filenames of the tests with something more human friendly so that we know what is tested just looking at the filename? (You can put the PR number in the comment in the test)

Cheers,
-Quentin

llvm/test/Transforms/LoopStrengthReduce/pr43678-2.ll
4 ↗(On Diff #357594)

Could you be more explicit about what this test is exercising?

E.g., what IV causes the problem and how it was wrongly transformed (or what it looks like to generate something correct.)

5 ↗(On Diff #357594)

Could you add more check lines?

As is, we only check that we don't crash.

You can use update_test_checks.py

llvm/test/Transforms/LoopStrengthReduce/pr43678.ll
5 ↗(On Diff #357594)

Same comments about the check lines and the explanation of what is being tested here.

Hi,

I am not sure I understand the issue here.
Could you paste the IR with the wrong transformation (before this patch) for one of the test (e.g., the smaller one)?

Hi Quentin, first of all, thank you very much for starting looking into that. I'll update tests soon.

Here is the buggy output for the first case for the current state of LSR:

*** IR Dump After Loop Strength Reduction (loop-reduce) ***
; Preheader:
bb:
  %tmp = bitcast i8* null to i32*
  %tmp1 = load i32, i32* %tmp, align 4
  %tmp2 = bitcast i8* null to i32*
  %tmp3 = load i32, i32* %tmp2, align 4
  br label %bb6

; Loop:
bb6:                                              ; preds = %bb12, %bb
  %lsr.iv = phi i64 [ %lsr.iv.next, %bb12 ], [ -1, %bb ]
  %tmp8 = phi i32 [ %tmp16, %bb12 ], [ %tmp3, %bb ]
  %lsr.iv.next = add nsw i64 %lsr.iv, 1
  %tmp14 = add i32 %tmp8, %tmp1
  %tmp16 = add i32 %0, 1
  %tmp10 = icmp ult i64 %lsr.iv.next, 1048576
  br i1 %tmp10, label %bb12, label %bb11

bb12:                                             ; preds = %bb6
  %0 = add i32 %tmp1, %tmp8
  %tmp15 = select i1 false, i32 %tmp14, i32 %tmp8
  %tmp17 = fcmp olt double 0.000000e+00, 2.270000e+02
  br i1 %tmp17, label %bb6, label %bb4

; Exit blocks
bb11:                                             ; preds = %bb6
  unreachable

bb4:                                              ; preds = %bb12
  %tmp5 = sext i32 %tmp16 to i64
  unreachable
Instruction does not dominate all uses!
  %0 = add i32 %tmp1, %tmp8
  %tmp16 = add i32 %0, 1
in function test

The critical pieces:
%tmp7 is post-increment induction variable.
%tmp8 is another IV which increment value %tmp16 LSR wants to update.

Due to the bug %tmp16 is hoisted to header due to there is a post-increment %tmp7 and we try to hoist all IV before condition of that IV.
Generated instruction %0 is still generated in backedge due to insert position for formula know that %tmp8 is not post-increment IV and we do not need to hoist it to header.

skatkov updated this revision to Diff 358516.Jul 13 2021, 10:22 PM
skatkov edited the summary of this revision. (Show Details)

test update.

I spent way too much time looking at this on Tuesday. Let me summarize my findings.

The basic issue we're running into is that while attempting to expand an LSR formula corresponding to an operand of an IV increment, SCEVExpander goes looking for a profitable expansion in terms of existing IVs. While doing that, it tries to optimize the IVs it finds by hoisting expressions involved in those IVs (to the IVIncInsertPt, but which point doesn't really matter). This movement is itself entirely legal, but it creates a very surprising effect. In this case, the IV being hoisted is exactly the one that LSR was in the middle of expanding a formula for an operand of.

This results in the very surprising result that calling LSRInstance::Expand on an operand of some instruction can result in a Value being returned which does not dominate that instruction. It dominates the location that instruction *used to be at*, but the instruction itself may have itself been hoisted. LSR clearly does not expect this.

SCEVExpander has a very delicate interlock with LSR. LSR puts the expander into both a "literal mode", and a special "LSR mode". LSR expects to be able to cache information about SCEVs and instructions, and have only it's own rewrites invalidate them. This seems like a case where SCEVExpander is violating the implicit contract.

Note that nothing in the description above is specific to post-increment IVs. I think the fact we're happening to see this with post-inc is an artifact. Or at least, I don't currently understand why pre-incs couldn't see the same issue.

In this case, simply disabling hoisting entirely in LSRMode appears to work, and doesn't appear to cause any regressions in the test suite. On the surface, that seems like a reasonable fix.

The point I got stuck was trying to reason about how having SCEVExpander return a Value for the operand of some instruction which no longer dominates said instruction doesn't break users other than LSR as well. I'd just started thinking that through when I got distracted and haven't come to a conclusion on that question yet.

In terms of incrementalism, I'd suggest revising this patch to simply disable hoisting in LSRMode. Doing that appears to work, is at least slightly less narrow in fixing a symptom, and is a reasonable step forward. I'd LGTM that. I could also maybe be convinced to LGTM the current patch if you can make a good argument for why this is actually specific to post-inc IVs.

llvm/test/Transforms/LoopStrengthReduce/pr43678.ll
1 ↗(On Diff #357594)

Please combine these two files into one, and auto-gen the result so that the output gets checked.

I spent way too much time looking at this on Tuesday. Let me summarize my findings.

The basic issue we're running into is that while attempting to expand an LSR formula corresponding to an operand of an IV increment, SCEVExpander goes looking for a profitable expansion in terms of existing IVs. While doing that, it tries to optimize the IVs it finds by hoisting expressions involved in those IVs (to the IVIncInsertPt, but which point doesn't really matter). This movement is itself entirely legal, but it creates a very surprising effect. In this case, the IV being hoisted is exactly the one that LSR was in the middle of expanding a formula for an operand of.

This results in the very surprising result that calling LSRInstance::Expand on an operand of some instruction can result in a Value being returned which does not dominate that instruction. It dominates the location that instruction *used to be at*, but the instruction itself may have itself been hoisted. LSR clearly does not expect this.

SCEVExpander has a very delicate interlock with LSR. LSR puts the expander into both a "literal mode", and a special "LSR mode". LSR expects to be able to cache information about SCEVs and instructions, and have only it's own rewrites invalidate them. This seems like a case where SCEVExpander is violating the implicit contract.

Note that nothing in the description above is specific to post-increment IVs. I think the fact we're happening to see this with post-inc is an artifact. Or at least, I don't currently understand why pre-incs couldn't see the same issue.

In this case, simply disabling hoisting entirely in LSRMode appears to work, and doesn't appear to cause any regressions in the test suite. On the surface, that seems like a reasonable fix.

The point I got stuck was trying to reason about how having SCEVExpander return a Value for the operand of some instruction which no longer dominates said instruction doesn't break users other than LSR as well. I'd just started thinking that through when I got distracted and haven't come to a conclusion on that question yet.

In terms of incrementalism, I'd suggest revising this patch to simply disable hoisting in LSRMode. Doing that appears to work, is at least slightly less narrow in fixing a symptom, and is a reasonable step forward. I'd LGTM that. I could also maybe be convinced to LGTM the current patch if you can make a good argument for why this is actually specific to post-inc IVs.

Hi Philip, first of all thank you to looking to this as well and spending time to git this issue.
I completely agree with your root cause analysis. Thank you for writing this in clear manner.

WRT post-increment - digging.... My feeling is that with post-inc we can get such case. I'' try to write a test with post-inc or prove that it is impossible.
What worries me in disabling hoisting in LSR mode in SCEV expander is that it hoist under check LSR mode. So this was done intentionally...

ok, after some additional investigation it looks like with my patch I effectively disabled hoisting at all.
First of all in places which I updated we are trying to hoist loop increment use of IV.

And it seems loop increment use cannot be marked as post-inc use.
There are two places in the code where use is marked as post-inc:

  1. In IVUsers::AddUsersImpl it is marked under condition IVUseShouldUsePostIncValue is true, where the first check
if (L->contains(User))
  return false;

It means that for IV increment (which is always located in a loop) we always returns false.

void IVStrideUse::transformToPostInc(const Loop *L) {
  PostIncLoops.insert(L);
}

The only use of this function is CondUse->transformToPostInc(L); in LSRInstance::OptimizeLoopTermCond, so we mark condition which is not increment of IV.

So increment of IV cannot be marked as post-inc use and my patch just completely disable hoisting...

I will try to look into the history for the reason why this hoisting was added. If we want to complete it, probably we can check if there is a IV increment user which is planned to be updated by LSR and forbid hoisting of such increment. But it will add a complexity and interaction between LSR and expander.

skatkov added a reviewer: atrick.EditedJul 15 2021, 8:55 PM

Add Andrew who seems fixed the similar issues in 2012 (!!!).
Basing on commit c908b43d9fe12c633216458e992a1ee375d9910c.

skatkov updated this revision to Diff 359215.Jul 15 2021, 9:59 PM
skatkov edited the summary of this revision. (Show Details)

An attempt to make a less intrusive fix.

I've posted https://reviews.llvm.org/D106178 as an alternative. Unless someone strongly objects, I plan to simply remove all the buggy and untested code.

I've posted https://reviews.llvm.org/D106178 as an alternative. Unless someone strongly objects, I plan to simply remove all the buggy and untested code.

I'm fine with that but my limited knowledge of LSR makes me not so comfortable to approve the whole elimination of optimization.
But in general I agree with all your notes here and in D106178.

Hi all,

Given Philip (@reames)'s findings, it seems reasonable to me to go with what Philip is suggesting: removing the code that violate the contract between LSR and SCEVExpander.

Cheers,
-Quentin

Hi all,

Given Philip (@reames)'s findings, it seems reasonable to me to go with what Philip is suggesting: removing the code that violate the contract between LSR and SCEVExpander.

Cheers,
-Quentin

Given that, can you approve Philip's patch?

Given that, can you approve Philip's patch?

Done!