This is an archive of the discontinued LLVM Phabricator instance.

[IVUsers] Changes to make IVUsers's results robust to instruction and uselist ordering
Needs ReviewPublic

Authored by dneilson on Sep 20 2017, 1:45 PM.

Details

Reviewers
hfinkel
Summary

Full discussion of the motivation, and some background, for this patch can be
found on the LLVM dev mailing list:

http://lists.llvm.org/pipermail/llvm-dev/2017-September/117424.html

This patch makes two changes to the implementation of IVUsers' analysis
to make the result of the analysis more robust to instruction and
use list ordering:
A) Memoize the results of AddUsersImpl() each given instruction. The presence of
Processed.count(User) in the conditions that test for adding an instruction to the
current SCEV (approx lines 235 & 241) basically have the effect of changing the
return value of AddUsersImpl() for “interesting” instructions from true to false,
which results in different behaviour depending on the order in which instructions
are visited.

B) Don’t let the DFT continue into Users that are phi nodes in the loop-header.
We’re going to visit every phi node in the loop-header as the root of a DFT, anyways,
so this prevents the possibility of revisiting the same instruction multiple times
in the same def-use chain.

Event Timeline

dneilson created this revision.Sep 20 2017, 1:45 PM
hfinkel edited edge metadata.Sep 20 2017, 1:50 PM

Any performance changes in the test suite?

Any performance changes in the test suite?

I don't have access to SPEC source, so I cannot test those. I'm unfamiliar with the performance test suite, so I don't know what all else is there. I'm going to set up a run against our internal (Java-based) performance suite to see where that stands.

Functionally, this patch doesn't regress anything in 'ninja check-all'

Any performance changes in the test suite?

I don't have access to SPEC source, so I cannot test those. I'm unfamiliar with the performance test suite, so I don't know what all else is there. I'm going to set up a run against our internal (Java-based) performance suite to see where that stands.

Great.

You can run LLVM's test suite using LNT by following the directions here: http://llvm.org/docs/lnt/quickstart.html

Functionally, this patch doesn't regress anything in 'ninja check-all'

Sounds good (although I have no idea how good our coverage is in this regard).

Any performance changes in the test suite?

You can run LLVM's test suite using LNT by following the directions here: http://llvm.org/docs/lnt/quickstart.html

Thanks for the LNT pointer -- seems like a pretty awesome tool!

I ran the benchmarks in the test-suite (excluding all external source ones, since I don't have them) on a Linux-x86 box, with 5x multisampling. Looks like 2 runtime regressions, and a runtime improvement:
MultiSource/Benchmarks/llubenchmark/llu - 3.56% (3.8200 -> 3.9560, sigma=0.0433). REGRESSION
MultiSource/Benchmarks/TSVC/CrossingThresholds-fit/CrossingThresholds-fit - 1.11% (2.8800 -> 2.9120, sigma 0.0054). REGRESSION
SingleSource/Benchmarks/Misc/flops-4 - -5.26% (1.9760 -> 1.8720, sigma 0.0178). IMPROVEMENT

Clearly we'd be missing one or more IVUsers as input to LSR in these benchmarks. It's interesting that it resulted in a 5% improvement in one benchmark. I don't understand LSR at all, and the code looks like a bit of a beast to wrap my head around, so this is going to require a lot more time & digging around to understand why these benchmarks are acting as they are with this patch.

Functionally, this patch doesn't regress anything in 'ninja check-all'

Sounds good (although I have no idea how good our coverage is in this regard).

Yeah, I don't know either. The IVUsers-specific tests are basically non-existent, so we'd pretty much be relying on the LSR tests to catch things.

dberlin added a subscriber: dberlin.EditedSep 22 2017, 8:58 AM

So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles (IE nested phi cycles)

If you do care, the only complete solution i know of would be "form sccs of ssa graph, sort them if necessary, perform whatever filtering you want".

Forming scc's guarantees you have all instructions that you could ever want to process for a given node.
You can then sort the SCC's by dominance order (DT dfs numbers, then local dfs numbers) if you don't like the ordering it produces, and process.

That will guarantee completely consistent ordering, as tarjan scc's are maximal.

(This will have the same time bound as the current DFS based solution)

So, TL;DR, i'm not sure how much you really care, this isn't going to make your ordering completely consistent in the face of use list reordering or instruction ordering. It should work if there is a single cycle, but not if there are nested cycles.

Is there a reason not to use a complete solution, which would be "form sccs of ssa graph, sort them if necessary, perform whatever filtering you want".

Forming scc's guarantees you have all instructions that you could ever want to process.
You can then sort the SCC's by dominance order (DT dfs numbers, then local dfs numbers) if you don't like the ordering it produces, and process.

That will guarantee completely consistent ordering, as tarjan scc's are maximal.

(This will have the same time bound as the current DFS based solution)

No reason other than "didn't think of it" -- my first stab at this is trying to retain as much of the existing code/behaviour as possible.

Can you point me to another pass that uses the technique that you suggest? I'd like to see a sample of how it's implemented & how it works.