Page MenuHomePhabricator

[MachineCombiner] Add check for optimal pattern order.
ClosedPublic

Authored by fhahn on Jan 5 2018, 8:41 AM.

Details

Summary

In D41587, @mssimpso discovered that the order of some patterns for
AArch64 was sub-optimal. I thought a bit about how we could avoid that
case in the future. I do not think there is a need for evaluating all
patterns for now. But this patch adds an extra (expensive) check, that
evaluates the latencies of all patterns, and ensures that the latency
saved decreases for subsequent patterns.

This catches the sub-optimal order fixed in D41587, but I am not
entirely happy with the check, as it only applies to sub-optimal
patterns seen while building with EXPENSIVE_CHECKS on. It did not
discover any other sub-optimal pattern ordering.

Do you think that additional check would be useful?

Diff Detail

Repository
rL LLVM

Event Timeline

fhahn created this revision.Jan 5 2018, 8:41 AM

Hi Florian,

It's hard to say how useful this patch will be, but I wouldn't mind having it under EXPENSIVE_CHECKS. Have you looked at other targets other than Arm/AArch64?

test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir
29–30 ↗(On Diff #128752)

These tests don't need changing because of this patch, right? But you can go ahead and commit the test changes separately if you need to.

fhahn added a comment.Jan 10 2018, 6:52 AM

TBH I was hoping for a more complete approach to testing this, but I thought I share this relatively straight forward check.

I've run the LLVM test suite with this patch (without the EXPENSIVE_CHECKS guard) on AArch64 with -O3 -ffast-math and various mcpu options. I also run the patch on a single X86 machine with -mcpu=native. Beyond that, I did not run this patch on any other targets (except all LLVM unit tests with this change)

test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir
29–30 ↗(On Diff #128752)

Unfortunately they do. Some patterns, at least on AArch64 add additional virtual registers when the pattern is instantiated. Unused virtual registers should be dropped by later passes, but they change the numbers of the registers in this test case.

mssimpso accepted this revision.Jan 11 2018, 12:56 PM

TBH I was hoping for a more complete approach to testing this, but I thought I share this relatively straight forward check.

I'm fine with that. We can always improve it later.

I've run the LLVM test suite with this patch (without the EXPENSIVE_CHECKS guard) on AArch64 with -O3 -ffast-math and various mcpu options. I also run the patch on a single X86 machine with -mcpu=native. Beyond that, I did not run this patch on any other targets (except all LLVM unit tests with this change)

OK, sounds good. I'm just trying to avoid the case where some in-tree target is intentionally adding the patterns in a way that doesn't increase expected latency. Though if that's the case, they might want to know.

This seems useful enough to me to commit.

test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir
29–30 ↗(On Diff #128752)

Ahh, that makes sense. Thanks!

This revision is now accepted and ready to land.Jan 11 2018, 12:56 PM

Thanks Matthew! I'll wait with committing this till next week, in case @Gerolf has some additional thoughts.

I'll commit this in the next couple of days. @Gerolf do you have any additional thoughts/ideas?

I like the spirit of the idea. What made you look into this? Some more questions and suggestions below, but LGTM as is.

Cheers
Gerolf

lib/CodeGen/MachineCombiner.cpp
485 ↗(On Diff #128752)

Could that go into a separate function? You might want to control it by an internal flag rather than a compile-time define.

506 ↗(On Diff #128752)

You might want to add a dumping capability that prints the pattern in a sorted order. Did you encounter scenarios where different orderings of pattern (other than the default) gave improvements in some cases, but not others? If these cases can be tight to some characteristic in the program it might be possible to pass this information on to getMachineCombinerPatterns() and return a different order.

fhahn updated this revision to Diff 131989.Jan 30 2018, 9:40 AM

Thanks Gerolf! The main reason for adding this is that @mssimpso pointed out in D41587 that some of the recently added patterns were not optimally ordered for at least Falkor and I wanted to give us a better chance at detecting sub-optimal orderings automatically.

lib/CodeGen/MachineCombiner.cpp
485 ↗(On Diff #128752)

Done! The option is enabled with expensive checks. I've also enabled the option in a couple of machine-combiner tests.

506 ↗(On Diff #128752)

I could add dumping of the sorted patterns, but I think it would be better to do it in a separate commit.

I did not find any cases where a different order gave improvements in some cases. Although it would be worth to run this verification on larger benchmarks (e.g. SPEC2017), which should be easy with the new option.

This revision was automatically updated to reflect the committed changes.