This is an archive of the discontinued LLVM Phabricator instance.

Add CanonicalizeFreezeInLoops pass
ClosedPublic

Authored by aqjune on Apr 5 2020, 11:18 PM.

Details

Summary

If an induction variable is frozen and used, SCEV yields imprecise result
because it doesn't say anything about frozen variables.

Due to this reason, performance degradation happened after
https://reviews.llvm.org/D76483 is merged, causing
SCEV yield imprecise result and preventing LSR to optimize a loop.

The suggested solution here is to add a pass which canonicalizes frozen variables
inside a loop. To be specific, it pushes freezes out of the loop by freezing
the initial value and step values instead & dropping nsw/nuw flags from instructions used by freeze.
This solution was also mentioned at https://reviews.llvm.org/D70623 .

Diff Detail

Event Timeline

aqjune created this revision.Apr 5 2020, 11:18 PM
aqjune updated this revision to Diff 255229.Apr 5 2020, 11:47 PM

clang-format

aqjune updated this revision to Diff 255245.Apr 6 2020, 1:00 AM

Minor updates (capitalize pass name)

nikic added a subscriber: nikic.Apr 6 2020, 1:18 AM
efriedma added inline comments.Apr 6 2020, 11:41 AM
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
117

What guarantees that SteppingInst is an add or sub?

125

If StepBB is inside the loop, it can't be the preheader.

aqjune updated this revision to Diff 255434.Apr 6 2020, 12:13 PM
aqjune marked an inline comment as done.

Remove L->getLoopPreheader() != StepBB check, add a test for this (with minor updates too)

aqjune marked an inline comment as done.Apr 6 2020, 12:13 PM
aqjune added inline comments.
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
117

It is L->isAuxiliaryInductionVariable.

efriedma added inline comments.Apr 6 2020, 2:07 PM
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
100

OpIdx is unused?

126

It looks like isAuxiliaryInductionVariable checks that the step is loop-invariant?

I might have missed this but what was the reason we don't do this as part of LICM?

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
72

Is there a good reason not to make this a struct with named members so the values and use sites have meaning?

75

auto *BB if these are pointers.

100

You shadow PN even though you capture everything. Choose a different name, return the PHI, or use the captured version of PN?

115

Please initialize PN with null for the future, similar SteppingInst.

117

I have a bad feeling about this but if people think it's ok I'm fine with an assert.

130

Shouldn't we teach SCEV about freeze instead?

143

Style: Single DEBUG? If you really want multiple dbgs() make it LLVM_DEBUG({ ... });

172

Do we handle the case where both are GuaranteedNotToBeUndefOrPoison somewhere else? In that case we don't need to drop the poison generating flags, right?

aqjune updated this revision to Diff 255587.Apr 6 2020, 9:58 PM
aqjune marked 9 inline comments as done.

Address comments

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
72

Tuple was used because its types seemed to represent what they stand for.

126

A semantically loop invariant instruction may reside in a loop. For example, if it is division of two loop-invariant variables, it is also loop invariant but cannot be hoisted.

130

I think we have two kinds of analyses conceptually: must-be analysis (any program execution should satisfy the analysis result) and may-be analysis (there exists an execution that satisfies the result). SCEV analysis seems to satisfy both.
In case of freeze, if SCEV result of freeze(val) is defined as SCEV of val, must-be analysis becomes problematic; freeze(val) can be any value if val was poison. So, teaching SCEV about freeze may not fully recover precision in general.

172

It can be poison if summation overflowed. But it seems SteppingInst can be non-poison in certain case (e.g. it was used by divison; it is UB if poison), so added GuaranteedNotToBeUndefOrPoison check on SteppingInst.

I might have missed this but what was the reason we don't do this as part of LICM?

The motivation was that freezes inserted by DivRemPairs should be moved out of the loop so LoopStrengthReduce (in TargetPassConfig::addIRPasses) optimizes successfully.
Currently LICM isn't there between the two. There is SimplifyCFGPass , but it does not have dependency on loop analyses and it fires regardless of whether it is LTO or not (whereas the regression happened only when LTO was enabled), so seemed it wasn't a good candidate for having this.
In terms of reusability of this pass, I believe this pass can be inserted after e.g. passes like LoopUnswitch inserted freeze.

jdoerfert added inline comments.Apr 6 2020, 10:42 PM
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
72

Sure, a freeze inst, a phi node, a binary operator and a value. You have to see Value *StepValue = std::get<3>(Candidate); to guess what the value is. If you want to access the phi (as an Instruction or auto) you have to go back to the vector definition to determine the index. These are (IMHO) good reason against a tuple. Are there any benefits?

172

I see.

aqjune updated this revision to Diff 255595.Apr 6 2020, 11:08 PM

Move to a named struct from tuple

aqjune updated this revision to Diff 255597.Apr 6 2020, 11:09 PM

Minor update to a comment

aqjune marked 2 inline comments as done.Apr 6 2020, 11:10 PM
aqjune added inline comments.
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
72

Had no special preference for it, so moved to a struct with names.

fhahn added inline comments.Apr 7 2020, 5:11 AM
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
11

'use induction variables' is a bit imprecise IMO; there are no induction variables in IR. You could say 'freeze instructions that use either an induction PHI or the corresponding 'step' instruction (= the incoming value from the loop latch of the induction PHI)'

63

nit: A struct.

86

An alternative approach would be to look at the phis of the loop header and use InductionDescriptor::isInductionPHI to identify induction PHIs and their connected 'step' instruction. Then look at their users to see if they contain a freeze instruction. Candidates would be {PHI, IVDescripter, [list of freeze users to eliminate]}.

I think this potentially could simplify the code a bit. The alternative has a slightly different compile-time cost: with the alternative approach, you always have to pay to cost to find the induction PHIs (shouldn't be too costly, as the information is available in SE already), with the current approach you always have to iterate over all instructions in the loop. But I think that's unlikely to really matter in the grand scheme of things. The current approach potentially leads to duplicated induction checks currently.

The alternative approach would also handle the case where we have multiple freeze instructions using the same induction PHIs/steps without having multiple candidates, meaning each induction PHI/step would be frozen exactly once, not multiple times as with the current approach.

93

There's a test case missing for that scenario. Also, I think removing FI here will invalidate your iterator over the BB, probably causing a crash. You can work around that by using make_early_inc_range(*BB) and iterating over that.

But I am not sure we actually need to do this here, as freeze without users should be cleaned up by the existing DCE, right? Not sure if we need to handle it here.

151

I think it would be good to be a bit more descriptive in the debug message, e.g. it could say something like 'Pushing freeze ' << %FI <<
through ' << *SteppingInst << ' and ' << *PN << ' outside of loop' ,

155

Is this overly conservative? The transform ensures that both operands won't be poison, right? Would it be enough to check if the instruction may result in poison with 2 non-poison operands?

164

I think currently cases where we have multiple candidates for the same PHI are not handled correctly at the moment here, as after handling the first candidate the step operand will not match the original StepValue. Multiple candidates would be added for a loop like the one below I think.

loop:
  %i = phi i32 [%init, %entry], [%i.next, %loop]
  %i.fr = freeze i32 %i
  call void @call(i32 %i.fr)
  %i.next = add nsw nuw i32 %i, %step
  %i.next.fr = freeze i32 %i.next
  %cond = icmp eq i32 %i.next, %n
  call void @call(i32 %i.next.fr)
  br i1 %cond, label %loop, label %exit

Don't wait for me on this, ping me if needed.

aqjune marked an inline comment as done.Apr 8 2020, 9:21 AM

Oh, I was busy today and didn't have enough time to update this patch. Maybe I can update this tomorrow.

aqjune updated this revision to Diff 257644.Apr 15 2020, 2:52 AM

I found that dealing with multiple phis had problems, so had to change the
algorithm.
Rather than having a candidate for each (auxiliary) induction PHI, now it
just maintains three sets:

  • InstsToDropFlags: instructions to drop flags like nsw/nuw
  • FIsToRemove: Freeze insts to remove
  • UsesToInsertFr: Uses (Value, User) which should be frozen so

it becomes (freeze(Value), User). The frozen values are placed at
the loop preheader.

What CanonicalizeFreezeInLoopsImpl::run does is:

(1) Gets reachable instructions from auxiliary induction PHIs (upto MaxDepth step)
(2) For each freeze among the reachable instructions, see whether it can be
pushed out of the loop by seeing instructions among paths from PHI to the freeze
whether their flags can be dropped / relevant users can be frozen. If not, the
freeze cannot be pushed out of the loop.
(3) For all 'pushable' freezes, push them out of the loop.

aqjune marked 3 inline comments as done and 4 inline comments as done.Apr 15 2020, 3:06 AM
aqjune added inline comments.
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
164

Now this case will be correctly handled. Have a test add_multiuses2.

aqjune marked 19 inline comments as done.Apr 15 2020, 7:21 PM

Marked reflected/old comments as done

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
11

replaced it with 'induction PHI' in all places.

86

Now the code iterates over phis first, as you suggested

117

The assertion has been removed

aqjune edited the summary of this revision. (Show Details)Apr 15 2020, 7:22 PM
aqjune edited the summary of this revision. (Show Details)

I can report that in our testing on SPEC 2017, this pass fixes the regression to mcf introduced with D76483.

This patch gives an uplift of 1.5% on mcf_r; other benchmarks in SPEC 2017 intrate are between +0.4% and -0.2%.

We also looked into what seemed to be a regression of at most 1.5% on xalancbmk on some different hardware but this seems to be within normal fluctuations of this particular benchmark, with no changes apparent in the relevant (performance critical) areas of the code.

From a performance point of view this is good to go, but I'd like to leave an LGTM to others.

aqjune updated this revision to Diff 260093.Apr 25 2020, 7:05 AM
aqjune marked 2 inline comments as done.

Rebase

jdoerfert added a comment.EditedApr 30 2020, 7:00 AM

I'm asking myself if it wouldn't be better to start with the freeze instructions and work our way up through the operands. At the end of the day we do a lot of work just to figure out there is no freeze or the instruction chains we build do not end in one. Am I missing something?

Also, could this drop flags of instructions not on the path between a phi and a freeze? So do we keep the nsw on the sub and mul in the following example:

%iv = phi [0, ...] [%step, ...]
%step = add nsw %iv, 1
%f = freeze %step
%not_step1 = sub nsw %iv, 1
%not_step2 = mul nsw %step, 2
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
78

Nit: I would call this canHandleInst or something similar. We can deal with various instructions I guess, some require us to drop flags but others not, e.g., freeze or bitcast.

91

What happens if the step value looks like this:

%iv = phi [0, ...] [%step, ...]
%step = add %iv, %iv

Just checking that we don't have a problem in this case.

123

Style: for (const auto &U : I->users()) ? (w/ or w/o const)

138

Is the find stuff necessary or could you just check the return value of the insert call?

141

I think we often avoid such recursion in favor of worklists but if this doesn't show on the profile it's OK I guess.

151

true is an unsigned for the callee, maybe /* CurDepth */ 1 instead?

166

Do we really need a queue or would LIFO (=smallvector) also work?

282

Nit: I guess we could make all these class members, right? Unclear if that is better though.

Nit: DenseMap instead of std::map?

^ Both should be considered but can be kept this way too.

303

Nit: StepI is checked twice for null.
Style: I'd use the braces around the inner conditional instead

313

const auto &Pair

I think const and * or & should be combined with auto whenever possible.

aqjune added a comment.May 1 2020, 2:07 AM

Hi,

I'm asking myself if it wouldn't be better to start with the freeze instructions and work our way up through the operands. At the end of the day we do a lot of work just to figure out there is no freeze or the instruction chains we build do not end in one. Am I missing something?

It's true, but we should iterate over instructions inside the loop to find freeze instructions.
Indexing freezes that are uses of induction variables will be helpful for exiting early; right now, DivRemPair is the place where it introduces freezes, but it does not do relevant conversion in x86-64, so the code will have almost zero freeze instructions.
Would it make sense if IVUsers maintains the info (existence of freeze)? I'm not familiar with the class, but may have a look.

Also, could this drop flags of instructions not on the path between a phi and a freeze? So do we keep the nsw on the sub and mul in the following example:

%iv = phi [0, ...] [%step, ...]
%step = add nsw %iv, 1
%f = freeze %step
%not_step1 = sub nsw %iv, 1
%not_step2 = mul nsw %step, 2

The nsw flags of sub and mul will be preserved. I'll add this as a test.

Hi,

I'm asking myself if it wouldn't be better to start with the freeze instructions and work our way up through the operands. At the end of the day we do a lot of work just to figure out there is no freeze or the instruction chains we build do not end in one. Am I missing something?

It's true, but we should iterate over instructions inside the loop to find freeze instructions.
Indexing freezes that are uses of induction variables will be helpful for exiting early; right now, DivRemPair is the place where it introduces freezes, but it does not do relevant conversion in x86-64, so the code will have almost zero freeze instructions.
Would it make sense if IVUsers maintains the info (existence of freeze)? I'm not familiar with the class, but may have a look.

It might but that means maintaining state, which is hard and error prone.

I was thinking more along the lines of this:

for (auto *I : L->instruction())
  if (auto *Fr = dyn_cast<Freeze>(I))
    handleFreeze(Fr);

handleFreeze(L, Fr) { 
Worklist = {Fr};
while (!Worklist.empty()) {
  Inst =  dyn_cast<Instruction>(Worklist.pop())
  if (!Inst || !L->contains(Inst) || !Visited.insert(V))
    continue;
  if (isSpecialPHI(V))
    // found special PHI that ends in a freeze, do record it
  for (auto &Op : Inst->operands())
    Worklist.push_back(Op);
}
}

The main difference is that we are performing a single traversal of the loop + work per freeze instruction that is bounded by the def-use chain ending in the freeze and contained in the loop.

Also, could this drop flags of instructions not on the path between a phi and a freeze? So do we keep the nsw on the sub and mul in the following example:

%iv = phi [0, ...] [%step, ...]
%step = add nsw %iv, 1
%f = freeze %step
%not_step1 = sub nsw %iv, 1
%not_step2 = mul nsw %step, 2

The nsw flags of sub and mul will be preserved. I'll add this as a test.

Thx!

fhahn added a comment.May 2 2020, 4:46 AM

Hi,

I'm asking myself if it wouldn't be better to start with the freeze instructions and work our way up through the operands. At the end of the day we do a lot of work just to figure out there is no freeze or the instruction chains we build do not end in one. Am I missing something?

It's true, but we should iterate over instructions inside the loop to find freeze instructions.
Indexing freezes that are uses of induction variables will be helpful for exiting early; right now, DivRemPair is the place where it introduces freezes, but it does not do relevant conversion in x86-64, so the code will have almost zero freeze instructions.
Would it make sense if IVUsers maintains the info (existence of freeze)? I'm not familiar with the class, but may have a look.

It might but that means maintaining state, which is hard and error prone.

I was thinking more along the lines of this:

for (auto *I : L->instruction())
  if (auto *Fr = dyn_cast<Freeze>(I))
    handleFreeze(Fr);

handleFreeze(L, Fr) { 
Worklist = {Fr};
while (!Worklist.empty()) {
  Inst =  dyn_cast<Instruction>(Worklist.pop())
  if (!Inst || !L->contains(Inst) || !Visited.insert(V))
    continue;
  if (isSpecialPHI(V))
    // found special PHI that ends in a freeze, do record it
  for (auto &Op : Inst->operands())
    Worklist.push_back(Op);
}
}

The main difference is that we are performing a single traversal of the loop + work per freeze instruction that is bounded by the def-use chain ending in the freeze and contained in the loop.

I guess it depends on what kinds of freeze instructions we are looking for exactly.

IIRC the earlier version of the patch only considered freeze instructions where the operand is either an induction PHI or the step instruction. If we only look for those cases, checking only the users of the PHI and the step instruction potentially avoids having to iterate over large loop bodies (also, parent loops contain all blocks of their child loops, so I think iterating over all instructions in a loop would mean revisiting all instructions in child loops).

If we are looking for any freeze that is reachable from an induction PHI, then the benefit of starting at a PHI will probably much less.

But I thought the problem that this pass is supposed to solve is quite narrow: if there is a freeze in a cycle involving an induction PHI, push the freeze out of the loop. To detect those cases, it should be enough to (recursively) check the operands of the step instruction. That should only require a few steps at most, as the instructions that can be part of such a cycle are quite limited.

FWIW I think ideally the patch should be kept as narrow as possible to handle the motivating case initially and additional cases as follow-ups as required. That makes the reviews easier, reduces risk, lowers the threshold for enabling it and allows to better analyse the impact of extensions on its own. For example, there might not be much (any?) benefit of pushing certain freezes out of a loop.

aqjune marked 15 inline comments as done.May 2 2020, 7:36 PM

For easier review, I agree with @fhahn .
I'll split this patch so first it only deals with a simple case (consider the case when freeze uses either phi or step instruction only). This will make the patch smaller, and also compilation time concern can be addressed as well.
I'll check the performance impact, and if it needs more general version to fully address the slowdown, it will be the following one.

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
91

It will be safe (won't be optimized). The test step_inst at onephi.ll checks it.

151

Giving true wasn't intended, I removed the value.

282

Nit: I guess we could make all these class members, right? Unclear if that is better though.

Previously there was one candidate object per phi, so making it as a class with named fields was making sense (instead of having tuple).
Here, I think leaving it as variables is also okay, maybe.

Nit: DenseMap instead of std::map?

Just found that LLVM coding convention also suggests using ADTs, so fixed

313

Oops sorry, fixed

aqjune updated this revision to Diff 261678.May 2 2020, 7:37 PM
aqjune marked 4 inline comments as done.

Address comments

aqjune added a comment.May 2 2020, 8:23 PM

Added a test (func_from_mcf_r.ll) that is a real-world code which should be optimized.

I agree. @fhahn will you finish the review?

llvm/test/Transforms/CanonicalizeFreezeInLoops/func_from_mcf_r.ll
86

I think we can and should remove the metadata and attributes.

aqjune updated this revision to Diff 261712.May 3 2020, 11:36 AM

Leave the basic algorithm only.

I see that by this patch alone the slowdown in mcf_r isn't resolved.
I'll bring a follow-up patch.
While doing this, I'll investigate whether it is possible to apply more specific
pattern rather than just looking through instructions within N steps.

aqjune updated this revision to Diff 261713.May 3 2020, 11:37 AM

Remove an attribute from func_from_mcf_r.ll

aqjune edited the summary of this revision. (Show Details)May 3 2020, 11:39 AM
aqjune added a comment.EditedMay 4 2020, 11:38 AM

I investigated how this patch affects mcf_r , and the result was as follows:

(1) For the previous experimental result, I made a silly mistake :(
Actually, it brought speedup; I ran mcf_r with testsuite with LTO enabled, and applying this patch showed 4.7% speedup (91.86 sec -> 87.55 sec), which is much more than I expected.
To check whether it affects other benchmarks, I'm running SPEC 2017 (without using testsuite).

(2) For mcf_r, I investigated how many steps do we need to see from PHI to push all freezes out of the loop, and the maximum distance was 2.
From this fact, we may not need very general algorithm at this point. Simple syntactic rules can be added if mcf_r shows slowdown again. If more general pattern is needed, we can revisit the general algorithm.

I can invest only a few hours for patches per a day, so updates might be a bit slow. Anyway, I'll attach the SPEC 2017 result after running is done.

aqjune added a comment.May 6 2020, 1:01 PM

I applied this patch on e124e83, and ran SPEC2017rate. Here is the result (unit: sec.):

		e124e83	+D77523	speedup
500.perlbench_r	844.24	830.61	1.64%
502.gcc_r	488.62	487.73	0.18%
505.mcf_r	673.73	678.61	-0.72%
520.omnetpp_r	684.97	684.75	0.03%
523.xalancbmk_r	753.45	750.64	0.37%
525.x264_r	598.98	597.80	0.20%
531.deepsjeng_r	481.23	481.94	-0.15%
541.leela_r	774.59	775.01	-0.05%
557.xz_r	694.84	696.73	-0.27%
508.namd_r	621.90	622.51	-0.10%
511.povray_r	967.31	966.13	0.12%
519.lbm_r	671.68	671.86	-0.03%
538.imagick_r	1122.32	1134.04	-1.03%
544.nab_r	943.58	943.61	0.00%

A few tests that raised an error because it required fortran are excluded.
(1) I chose e124e83 because it was the commit where xalancbmk_r showed the small slowdown. With this patch only, xalancbmk_r is okay.
(2) mcf_r shows slowdown with this patch, but on my machine the speedup of the (full) patch was visible only on SPEC compiled with LLVM testsuite.
Interestingly again, with this patch only, its speedup (of the testsuite's mcf_r) is larger than expected (compared with e124e83, it is 3.4%)
(3) 538.imagick_r has 1% slowdown, which seems to be within error - these numbers are medians after 3 runs, and the baseline of 538.imagick_r fluctuates between around 1122 and 1133 (+D77523: 1133 ~ 1138).
(4) 500.perlbench_r has 1.6% speedup, and the speedup is consistent (it did not fluctuate).

I have minor comments, but otherwise I think this is nicely restricted as a start. Why isn't it added to the pass pipeline?
@fhahn @efriedma Any objections?

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
46

Not needed anymore?

164

Can you also add:

loop:
  %i = phi i32 [%init, %entry], [%i.next, %loop]
  %i.fr1 = freeze i32 %i
  call void @call(i32 %i.fr1)
  %i.fr2 = freeze i32 %i
  call void @call(i32 %i.fr2)
  %i.next = add nsw nuw i32 %i, %step
  %cond = icmp eq i32 %i.next, %n
  br i1 %cond, label %loop, label %exit

and the double use of the frozen step value as well?

180

Nit: No braces.

188

if (!ProcessedPHIs.insert(Info.PHI))
(or insert(..).second)

190

Please add a message to asserts, also in other places.

I don't have any concerns about the general approach. A couple drive-by comments.

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
230

getLoopAnalysisUsage?

248

Newline

aqjune updated this revision to Diff 262735.May 7 2020, 1:00 PM
  • Address comments

Addition to the pipeline is addressed at the separate patch (https://reviews.llvm.org/D77524)
to make revert easier when either this patch or registration of the pass causes
a problem.

aqjune marked 6 inline comments as done.May 7 2020, 1:03 PM
aqjune added inline comments.
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
230

Seems like using getLoopAnalysisUsage affects the code generation of LSR even if the pass is doing nothing, causing tests such as CodeGen/X86/lsr-loop-exit-cond.ll fail.

efriedma added inline comments.May 7 2020, 1:24 PM
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
230

What, specifically, in getLoopAnalysisUsage is causing that?

If you're intentionally not using getLoopAnalysisUsage, please carefully document what you're doing here and in LSR.

aqjune updated this revision to Diff 262755.May 7 2020, 2:02 PM

Explain why AnalysisUsage is manually updated

aqjune marked an inline comment as not done.May 7 2020, 2:04 PM
aqjune added inline comments.
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
230

It is because it calls AU.addRequiredID(LCSSAID) , and LSR does not require LCSSA.
I left a comment about this.

Does it look good to other reviewers as well?

For me this makes sense. No objections.

fhahn added inline comments.May 15 2020, 10:30 AM
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
40

Not used?

47

not used?

65

nit: private not needed here.

71

public not needed here.

85

nit: I think it would be slightly more straight forward to have this function return a new FrozenIndPHIInfo.

85

On second look, this seems very similar to InductionDescriptor::isInductionPHI. Can we use that instead? isInductionPHI should also be able to subsume the call to isAuxiliaryInductionVariable

110

nit: placed *in*?

116

nit: needs message

aqjune updated this revision to Diff 264311.May 15 2020, 12:28 PM

Use InductionDescriptor::isInductionPHI , address comments

aqjune marked 9 inline comments as done.May 15 2020, 12:30 PM
aqjune added inline comments.
llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
85

Yes, InductionDescriptor::isInductionPHI works as well. Thank you for the information.

aqjune edited the summary of this revision. (Show Details)May 15 2020, 12:30 PM
aqjune marked 3 inline comments as done.
fhahn accepted this revision.May 17 2020, 10:36 AM

LGTM with some optional nits and a few comments about the tests. Please wait with committing for a day or so incase there are any additional comments.

llvm/lib/Passes/PassBuilder.cpp
182

nit: unused?

llvm/lib/Transforms/Utils/CanonicalizeFreezeInLoops.cpp
67

nit: It would be good to add comments to the members here, especially how they are related.

74

nit: It would be good to clarify what 'can handle' means here in function comment. Potentially move the comment about dropping nsw/nuw flags into it/

96

nit: you can use cast instead of dyn_cast. cast asserts that the value can be cast and you can UserI from the asset below.

127

nit: consider initializing PHI & StepInst via constructor.

146

nit: This does exactly the same as for the users of Info.StepInst, right (modulo the debug message). Might be good to use something like concat_range or a lambda for the common body.

llvm/test/Transforms/CanonicalizeFreezeInLoops/func_from_mcf_r.ll
4

same about AArch64 triple as for llvm/test/Transforms/CanonicalizeFreezeInLoops/onephi.ll.

Also, it would be good to reduce the test to the important bits. For example, the types <{ i64, %struct.arc*, %struct.g }>, <{ i64, %struct.arc*, %struct.g } and the global aliases are quite verbose and do not really impact the freeze hoisting.

llvm/test/Transforms/CanonicalizeFreezeInLoops/onephi.ll
5

does this require an AArch64 triple? If so, you have to add something like REQUIRES: aarch64-registered-target I think, otherwise it might if LLVM is built without AARch64 backend.

This revision is now accepted and ready to land.May 17 2020, 10:36 AM
aqjune updated this revision to Diff 264531.May 17 2020, 10:02 PM

Remove unnecessary parts from tests, address comments

aqjune marked 9 inline comments as done.EditedMay 17 2020, 10:07 PM

Thank you! I'll wait 2 days and land this on Wednesday.

llvm/lib/Passes/PassBuilder.cpp
182

This seems to be needed by PassRegistry.def .

llvm/test/Transforms/CanonicalizeFreezeInLoops/onephi.ll
5

aarch64 wasn't needed. removed

aqjune updated this revision to Diff 264533.May 17 2020, 10:08 PM
aqjune marked an inline comment as done.

Add REQUIRES to func_from_mcf_r.ll

This revision was automatically updated to reflect the committed changes.