This is an archive of the discontinued LLVM Phabricator instance.

Add an analysis printer for must execute reasoning
ClosedPublic

Authored by reames on Mar 15 2018, 9:21 AM.

Details

Summary

Many of our loop passes make use of so called "must execute" or "guaranteed to execute" facts to prove the legality of code motion. The basic notion is that we know (by assumption) an instruction didn't fault at it's original location, so if the location we move it to is strictly post dominated by the original, then we can't have introduced a new fault.

At the moment, the testing for this logic is somewhat adhoc and done mostly through LICM. Since I'm working on that code, I want to improve the testing. This patch is the first step in that direction. It doesn't actually test the variant used by the loop passes - I need to move that to the Analysis library first - but instead exercises an alternate implementation used by SCEV. (I plan on merging both implementations.)

Diff Detail

Repository
rL LLVM

Event Timeline

reames created this revision.Mar 15 2018, 9:21 AM
fhahn added a subscriber: fhahn.Mar 15 2018, 9:30 AM
anna added a comment.Mar 16 2018, 6:49 AM

Just to note, other printer passes like print-memoryssa, print-lvi, print-predicate-info, we print the annotations as comments just above the IR (using the asm printer interface). It's easier to see which instruction we are referring to and quickly spot the loops they were part of , *if loops are properly annotated in the IR*. It would be useful when debugging large code base IR and spotting the missing cases.
IMO, the missing cases may be hard to spot in this printer. One workaround is to print the number of loops where it's not guaranteed to execute. That can be a separate add on after landing this as well.

From a testing perspective of the utility, this version works well for me.

lib/Analysis/MustExecute.cpp
25 ↗(On Diff #138576)

Any reason for 4? Given we are looking at all instructions within all loops that's guaranteed to execute, it can be a much larger number.

85 ↗(On Diff #138576)

I think we can use printAsOperand so that nameless values are handled properly.

87 ↗(On Diff #138576)

Perhaps rename this to exactlyOneLoop and define that as MustExec[V]->size() == 1

test/Analysis/MustExecute/loop-header.ll
4 ↗(On Diff #138576)

From a usability perspective in large IR where multiple transforms already done, we will have print statements like this:

; %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]  (mustexec in loop.split.unroll.split.split.crit_edge.... , another loop header name etc)

It seems difficult to read. If the usecase of the printer is to test existing code, it's fine.
However, printing this from large IR can make it pretty difficult to parse.
How about something like:

; %iv = phi i32 [ 0, %entry ], [ %iv.next, %loop ]  (mustexec in 3 loops: <header names> of all 3 loops)

This number 3 can be retrieved from MustExec[V]->size()

Just to note, other printer passes like print-memoryssa, print-lvi, print-predicate-info, we print the annotations as comments just above the IR (using the asm printer interface). It's easier to see which instruction we are referring to and quickly spot the loops they were part of , *if loops are properly annotated in the IR*. It would be useful when debugging large code base IR and spotting the missing cases.
IMO, the missing cases may be hard to spot in this printer. One workaround is to print the number of loops where it's not guaranteed to execute. That can be a separate add on after landing this as well.

From a testing perspective of the utility, this version works well for me.

This is a good idea. I'd copied the basic structure of this from the mem deref printer, but I like the annotation version better. Mind if I submit this one (so that I can start getting some basic testing around this code), and then return to the annotation as a follow on? I'll change over mem-deref while I'm at it.

lib/Analysis/MustExecute.cpp
25 ↗(On Diff #138576)

4 is simply the initial size and chosen fairly randomly. The SmallVector will resize if needed. Suggestions?

85 ↗(On Diff #138576)

Good idea!

87 ↗(On Diff #138576)

Er, what? I don't understand your comment. The "first" variable is simply being used to join a set of strings with a comma in between.

test/Analysis/MustExecute/loop-header.ll
4 ↗(On Diff #138576)

Good idea. Will implement.

anna added inline comments.Mar 16 2018, 10:14 AM
lib/Analysis/MustExecute.cpp
87 ↗(On Diff #138576)

What I meant is we can do something like this instead of using the first variable:

const bool exactlyOneLoop = MustExec[V]->size()  == 1;
for (const Loop *L : MustExec.lookup(V)) {
      OS << L->getHeader()->getName();
      if (!ExactlyOneLoop)
          OS << ", ";
}
anna added a comment.Mar 16 2018, 10:20 AM

Just to note, other printer passes like print-memoryssa, print-lvi, print-predicate-info, we print the annotations as comments just above the IR (using the asm printer interface). It's easier to see which instruction we are referring to and quickly spot the loops they were part of , *if loops are properly annotated in the IR*. It would be useful when debugging large code base IR and spotting the missing cases.
IMO, the missing cases may be hard to spot in this printer. One workaround is to print the number of loops where it's not guaranteed to execute. That can be a separate add on after landing this as well.

From a testing perspective of the utility, this version works well for me.

This is a good idea. I'd copied the basic structure of this from the mem deref printer, but I like the annotation version better. Mind if I submit this one (so that I can start getting some basic testing around this code), and then return to the annotation as a follow on? I'll change over mem-deref while I'm at it.

SGTM.

lib/Analysis/MustExecute.cpp
25 ↗(On Diff #138576)

Yes, it would resize. It's just that this feels we would definitely resize. I don't have another good suggestion though. We could do something like Ordering.reserve(F.size()), i.e. reserve the number of instructions in the function, but that's just wasted space...
And I don't think there's an easy way to get num instructions in all loops.
So, we can just stick with 4 I guess.

reames added inline comments.Mar 16 2018, 10:59 AM
lib/Analysis/MustExecute.cpp
87 ↗(On Diff #138576)

Ah, I see where you're going. And actually you're pointing out a bug in the code. What I'd actually wanted was: "name, name, name". What I'd wrote produced "name name, name, " Your suggestion is a cleanup of the code as written, but doesn't work once I fix the formatting bug.

anna added inline comments.Mar 16 2018, 11:20 AM
lib/Analysis/MustExecute.cpp
87 ↗(On Diff #138576)

I didn't notice the bug originally, but I think the cleanup should fix the bug :)
We should produce "name, name, name" because the boolean exactlyOneLoop is a constant and set outside the loop. We don't change it (unlike the first variable which was changed in the loop, and was the reason for the bug).

anna added inline comments.Mar 16 2018, 12:14 PM
lib/Analysis/MustExecute.cpp
87 ↗(On Diff #138576)

nvm.. just took a look at it again. It fixes one part of the formatting: one comma after the first name. However, there's one extra comma after the last one.

reames added inline comments.Mar 16 2018, 1:50 PM
lib/Analysis/MustExecute.cpp
87 ↗(On Diff #138576)

Nope, your variant produces: "name, name, ". (With the trailing ", ")

reames updated this revision to Diff 138776.Mar 16 2018, 2:50 PM

Fix Anna's comments.

p.s. I realize the whole multiple loop printing code is effectively dead at the moment. The current analysis isn't smart enough to ever return true for the outer loops.

anna accepted this revision.Mar 19 2018, 5:40 AM

lgtm!

include/llvm/Analysis/Passes.h
101 ↗(On Diff #138776)

Stale comment here.

This revision is now accepted and ready to land.Mar 19 2018, 5:40 AM
This revision was automatically updated to reflect the committed changes.