All test cases have been ported and use the --polly-no-early-exit flag to keep the coverage up.
Details
Diff Detail
Event Timeline
Hi Johannes,
as discussed before, I really like the idea. I added some minor comments inline.
Regarding the performance numbers, it is very nice to see that your patch reduces compile time for several of the TSVC benchmarks.I also see two compile and execution time regressions of around 50% with your patch. The compile time regression is possibly noise, but the execution time regression is surprising. Did you had a chance to see what we changed in Misc/flops to get the nice speedup that with you patch applied is lost again?
Cheers,
Tobias
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
698–699 | Why did you drop the tuple name that identifies the 'scattering'? This seems unrelated to the patch and causes a large number of seemingly unrelated test case changes. Was this intended? | |
1513–1514 | Again, this seems unrelated? | |
lib/CodeGen/IslAst.cpp | ||
373 | Nice. It may even be worth outlining this cost estimation into its own function to clarify that we plan to extend/improve it. | |
lib/Transform/ScheduleOptimizer.cpp | ||
581 | Maybe put this code into its own function? | |
599 | It seems here you check if the new schedule is a subset of the old schedule. This is a very simple heuristic and probably does not fire in a lot of cases (which is fine for me, as we can refine it later). Instead of looping over all Stmts, you could possibly implement this by calling S.getSchedule() and comparing the result with ScheduleMap. An alternative to the approach you are taking would be to assume we did not do anything, but just mark the scop as optimized if we clearly did something useful. This could e.g. be that we were able to perform tiling. I kind of like this idea, as it would make Polly be more conservative in terms of not changing existing code only if it is clear we can clearly improve performance. |
For me the compile time regressions seem like jitter (all < 0.001 sec) but the improvemeents don't (a lot > 0.01 sec). I don't know about the runtime effects, but except 2, all of the regressions seem like jittter too (again < 0.001 sec). I will look into flops or just run it 15 times in isolation to get more confidence in the results.
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
698–699 | It would have made my "isIdentitySchedule" check more complicated if we keep it but I'm fine with keeping it in the end (even though I do not see any benefit of the word "scattering" in the early representation of the scattering as we loose it after isl optimization anyway) | |
1513–1514 | This was implied by the decision to remove it (see above) | |
lib/CodeGen/IslAst.cpp | ||
373 | That's true. | |
lib/Transform/ScheduleOptimizer.cpp | ||
581 | We should try to come up with a better way to check this. I believe this doesn't catch all the cases and it should be unnecessary to iterate over the statements (maybe we can traverse the schedule map). | |
599 | I'm open for suggestions here and think we can improve this step by step. (I put the TODO above to make it clear I know this is not really a good way to check this). |
lib/Analysis/ScopInfo.cpp | ||
---|---|---|
698–699 | Dropping 'scattering' as a tuple name may indeed be a good idea. If you could commit this part separately, the test case noise would not obscure this patch. | |
lib/Transform/ScheduleOptimizer.cpp | ||
581 | This identify check is indeed a very rough heuristic. Improving it by deriving the set of performed transformations by looking at the resulting schedules seems incredibly hard. I see two alternative approaches:
both require some thoughts and may not be needed for the initial version. As said earlier, the current code that looks for subset schedules can be simplified by using S.getSchedule() and comparing the result with ScheduleMap. | |
599 | Maybe you can actually state in your todo what we do today and what could be done later. To understand if the schedule has been optimized we check if the schedule has Also, I think it would be good to get a test case for our heursitic here, similar to how the loop vectorizer has test cases for its performance model: Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll For us a simple optimized/non-optimized is probably a good start, |
Updated test cases, modified according to comments, checked lnt (runtime regression was noise)
I agree that we should improve our heuristic but:
- This should be done later
- I have a group of students working on a heurisitc for parallel execution, we can use some of their infrastructure and save us the trouble :)
Hi Johannes,
thanks. This patch looks very nice now.
As mentioned in my previous review I agree that "[improvements to the heuristic] require some thoughts and may not be needed for the initial version.".
One comment from my last review, where I would be still interested on your opinion is the following one:
Also, I think it would be good to get a test case for our heuristic here, similar to how the loop vectorizer has test cases for its performance model: Transforms/LoopVectorize/X86/uint64_to_fp64-cost-model.ll
Cheers,
Tobias
include/polly/ScopInfo.h | ||
---|---|---|
787 | optimized | |
790 | if the (missing space) | |
lib/Transform/ScheduleOptimizer.cpp | ||
568 | (e.g., number of stride-one accesses). | |
578 | This is now very nice and concise (It still it may be worth to be outlined into a separate function). | |
test/Isl/CodeGen/MemAccess/codegen_constant_offset.ll | ||
1 | You add the flag twice. |
optimized