This is an archive of the discontinued LLVM Phabricator instance.

[Polly] Add early exits for SCoPs we did not optimize
AbandonedPublic

Authored by jdoerfert on Jan 29 2015, 7:54 AM.

Details

Summary
All test cases have been ported and use the --polly-no-early-exit flag to keep the coverage up.

Diff Detail

Event Timeline

jdoerfert updated this revision to Diff 18964.Jan 29 2015, 7:54 AM
jdoerfert retitled this revision from to [TODO] Add early exits for SCoPs we did not optimize.
jdoerfert updated this object.
jdoerfert added subscribers: Restricted Project, Unknown Object (MLST).
grosser edited edge metadata.Jan 30 2015, 4:29 AM

Hi Johannes,

as discussed before, I really like the idea. I added some minor comments inline.

Regarding the performance numbers, it is very nice to see that your patch reduces compile time for several of the TSVC benchmarks.I also see two compile and execution time regressions of around 50% with your patch. The compile time regression is possibly noise, but the execution time regression is surprising. Did you had a chance to see what we changed in Misc/flops to get the nice speedup that with you patch applied is lost again?

Cheers,
Tobias

lib/Analysis/ScopInfo.cpp
698–699

Why did you drop the tuple name that identifies the 'scattering'? This seems unrelated to the patch and causes a large number of seemingly unrelated test case changes. Was this intended?

1513–1514

Again, this seems unrelated?

lib/CodeGen/IslAst.cpp
373

Nice. It may even be worth outlining this cost estimation into its own function to clarify that we plan to extend/improve it.

lib/Transform/ScheduleOptimizer.cpp
581

Maybe put this code into its own function?

599

It seems here you check if the new schedule is a subset of the old schedule. This is a very simple heuristic and probably does not fire in a lot of cases (which is fine for me, as we can refine it later).

Instead of looping over all Stmts, you could possibly implement this by calling S.getSchedule() and comparing the result with ScheduleMap.

An alternative to the approach you are taking would be to assume we did not do anything, but just mark the scop as optimized if we clearly did something useful. This could e.g. be that we were able to perform tiling. I kind of like this idea, as it would make Polly be more conservative in terms of not changing existing code only if it is clear we can clearly improve performance.

For me the compile time regressions seem like jitter (all < 0.001 sec) but the improvemeents don't (a lot > 0.01 sec). I don't know about the runtime effects, but except 2, all of the regressions seem like jittter too (again < 0.001 sec). I will look into flops or just run it 15 times in isolation to get more confidence in the results.

lib/Analysis/ScopInfo.cpp
698–699

It would have made my "isIdentitySchedule" check more complicated if we keep it but I'm fine with keeping it in the end (even though I do not see any benefit of the word "scattering" in the early representation of the scattering as we loose it after isl optimization anyway)

1513–1514

This was implied by the decision to remove it (see above)

lib/CodeGen/IslAst.cpp
373

That's true.

lib/Transform/ScheduleOptimizer.cpp
581

We should try to come up with a better way to check this. I believe this doesn't catch all the cases and it should be unnecessary to iterate over the statements (maybe we can traverse the schedule map).

599

I'm open for suggestions here and think we can improve this step by step. (I put the TODO above to make it clear I know this is not really a good way to check this).

grosser added inline comments.Feb 2 2015, 12:03 AM
lib/Analysis/ScopInfo.cpp
698–699

Dropping 'scattering' as a tuple name may indeed be a good idea. If you could commit this part separately, the test case noise would not obscure this patch.

lib/Transform/ScheduleOptimizer.cpp
581

This identify check is indeed a very rough heuristic. Improving it by deriving the set of performed transformations by looking at the resulting schedules seems incredibly hard.

I see two alternative approaches:

  • We compute properties (# of stride-one-accesses, ...)
  • We keep track of optimizations we have successfully performed (tiling)

both require some thoughts and may not be needed for the initial version.

As said earlier, the current code that looks for subset schedules can be simplified by using S.getSchedule() and comparing the result with ScheduleMap.

599

Maybe you can actually state in your todo what we do today and what could be done later.

To understand if the schedule has been optimized we check if the schedule has
changed at all.
TODO: We can improve this by tracking if any necessarily beneficial
transformations have been performed. This can e.g. be tiling, loop interchange,
or ...) We can track this either at the place where the transformation has been
performed or, in case of automatic ILP based optimizations, by comparing
(yet to be defined) performance metrics before/after the scheduling optimizer
(e.g. number of stride-one accesses)

Also, I think it would be good to get a test case for our heursitic here, similar to how the loop vectorizer has test cases for its performance model:

Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

For us a simple optimized/non-optimized is probably a good start,

jdoerfert updated this revision to Diff 19169.Feb 2 2015, 11:06 AM
jdoerfert edited edge metadata.

Updated test cases, modified according to comments, checked lnt (runtime regression was noise)

I agree that we should improve our heuristic but:

  1. This should be done later
  2. I have a group of students working on a heurisitc for parallel execution, we can use some of their infrastructure and save us the trouble :)
jdoerfert retitled this revision from [TODO] Add early exits for SCoPs we did not optimize to [Polly] Add early exits for SCoPs we did not optimize.Feb 3 2015, 1:02 AM
jdoerfert updated this object.

Hi Johannes,

thanks. This patch looks very nice now.
As mentioned in my previous review I agree that "[improvements to the heuristic] require some thoughts and may not be needed for the initial version.".

One comment from my last review, where I would be still interested on your opinion is the following one:

Also, I think it would be good to get a test case for our heuristic here, similar to how the loop vectorizer has test cases for its performance model:

Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll

Cheers,
Tobias

include/polly/ScopInfo.h
787

optimized

790

if the (missing space)

lib/Transform/ScheduleOptimizer.cpp
568

(e.g., number of stride-one accesses).

578

This is now very nice and concise (It still it may be worth to be outlined into a separate function).

test/Isl/CodeGen/MemAccess/codegen_constant_offset.ll
1

You add the flag twice.

jdoerfert abandoned this revision.Feb 25 2015, 8:17 AM

Commited at some point.