This is an archive of the discontinued LLVM Phabricator instance.

[MachineCombiner] Add check for optimal pattern order.
ClosedPublic

Authored by fhahn on Jan 5 2018, 8:41 AM.

Details

Summary

In D41587, @mssimpso discovered that the order of some patterns for
AArch64 was sub-optimal. I thought a bit about how we could avoid that
case in the future. I do not think there is a need for evaluating all
patterns for now. But this patch adds an extra (expensive) check, that
evaluates the latencies of all patterns, and ensures that the latency
saved decreases for subsequent patterns.

This catches the sub-optimal order fixed in D41587, but I am not
entirely happy with the check, as it only applies to sub-optimal
patterns seen while building with EXPENSIVE_CHECKS on. It did not
discover any other sub-optimal pattern ordering.

Do you think that additional check would be useful?

Diff Detail

Repository
rL LLVM

Event Timeline

fhahn created this revision.Jan 5 2018, 8:41 AM

Hi Florian,

It's hard to say how useful this patch will be, but I wouldn't mind having it under EXPENSIVE_CHECKS. Have you looked at other targets other than Arm/AArch64?

test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir
29–30 ↗(On Diff #128752)

These tests don't need changing because of this patch, right? But you can go ahead and commit the test changes separately if you need to.

fhahn added a comment.Jan 10 2018, 6:52 AM

TBH I was hoping for a more complete approach to testing this, but I thought I share this relatively straight forward check.

I've run the LLVM test suite with this patch (without the EXPENSIVE_CHECKS guard) on AArch64 with -O3 -ffast-math and various mcpu options. I also run the patch on a single X86 machine with -mcpu=native. Beyond that, I did not run this patch on any other targets (except all LLVM unit tests with this change)

test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir
29–30 ↗(On Diff #128752)

Unfortunately they do. Some patterns, at least on AArch64 add additional virtual registers when the pattern is instantiated. Unused virtual registers should be dropped by later passes, but they change the numbers of the registers in this test case.

mssimpso accepted this revision.Jan 11 2018, 12:56 PM

TBH I was hoping for a more complete approach to testing this, but I thought I share this relatively straight forward check.

I'm fine with that. We can always improve it later.

I've run the LLVM test suite with this patch (without the EXPENSIVE_CHECKS guard) on AArch64 with -O3 -ffast-math and various mcpu options. I also run the patch on a single X86 machine with -mcpu=native. Beyond that, I did not run this patch on any other targets (except all LLVM unit tests with this change)

OK, sounds good. I'm just trying to avoid the case where some in-tree target is intentionally adding the patterns in a way that doesn't increase expected latency. Though if that's the case, they might want to know.

This seems useful enough to me to commit.

test/CodeGen/AArch64/aarch64-combine-fmul-fsub.mir
29–30 ↗(On Diff #128752)

Ahh, that makes sense. Thanks!

This revision is now accepted and ready to land.Jan 11 2018, 12:56 PM

Thanks Matthew! I'll wait with committing this till next week, in case @Gerolf has some additional thoughts.

I'll commit this in the next couple of days. @Gerolf do you have any additional thoughts/ideas?

I like the spirit of the idea. What made you look into this? Some more questions and suggestions below, but LGTM as is.

Cheers
Gerolf

lib/CodeGen/MachineCombiner.cpp
485 ↗(On Diff #128752)

Could that go into a separate function? You might want to control it by an internal flag rather than a compile-time define.

506 ↗(On Diff #128752)

You might want to add a dumping capability that prints the pattern in a sorted order. Did you encounter scenarios where different orderings of pattern (other than the default) gave improvements in some cases, but not others? If these cases can be tight to some characteristic in the program it might be possible to pass this information on to getMachineCombinerPatterns() and return a different order.

fhahn updated this revision to Diff 131989.Jan 30 2018, 9:40 AM

Thanks Gerolf! The main reason for adding this is that @mssimpso pointed out in D41587 that some of the recently added patterns were not optimally ordered for at least Falkor and I wanted to give us a better chance at detecting sub-optimal orderings automatically.

lib/CodeGen/MachineCombiner.cpp
485 ↗(On Diff #128752)

Done! The option is enabled with expensive checks. I've also enabled the option in a couple of machine-combiner tests.

506 ↗(On Diff #128752)

I could add dumping of the sorted patterns, but I think it would be better to do it in a separate commit.

I did not find any cases where a different order gave improvements in some cases. Although it would be worth to run this verification on larger benchmarks (e.g. SPEC2017), which should be easy with the new option.

This revision was automatically updated to reflect the committed changes.
madhur13490 added a subscriber: madhur13490.EditedJul 3 2023, 11:56 PM

Hi @fhahn
We are testing AArch64 builds with EXPENSIVE_CHECK=ON and SPEC 2017 and llvm test-suite are failing with the assert in this patch.

assert(CurrentLatencyDiff <= PrevLatencyDiff &&
           "Current pattern is better than previous pattern.");

In addition to the above benchmarks, there are a couple of other internal benchmarks failing too. Here is a small reproducer we got with creduce.

long a, b, c, d;
void f() {
  while (!0) {
    long e = c * b - d * a;
    if (e < 7)
      break;
  }
}
Command-line: clang  -O2 reduced.c

I have a high-level question for this patch. I understand that the verify function aims to verify patterns to be present in certain order but insertion is not guaranteed to be in the same order.

Does it make sense to have this verification in the first place?

FWIW, the following patch would fix the issue but it is not a scalable approach:

host:~/build_release$ git diff ../llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
diff --git a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
index d49b82290ff3..ed1ccb7b7484 100644
--- a/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
+++ b/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
@@ -5143,8 +5143,8 @@ static bool getMaddPatterns(MachineInstr &Root,
     setFound(AArch64::MADDWrrr, 2, AArch64::WZR, MCP::MULSUBW_OP2);
     break;
   case AArch64::SUBXrr:
-    setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);
     setFound(AArch64::MADDXrrr, 2, AArch64::XZR, MCP::MULSUBX_OP2);
+    setFound(AArch64::MADDXrrr, 1, AArch64::XZR, MCP::MULSUBX_OP1);
     break;
   case AArch64::ADDWri:
     setFound(AArch64::MADDWrrr, 1, AArch64::WZR, MCP::MULADDWI_OP1);

I can file a Gitlab issue if you prefer to.

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJul 3 2023, 11:56 PM
fhahn added a comment.Jul 5 2023, 11:22 AM

I have a high-level question for this patch. I understand that the verify function aims to verify patterns to be present in certain order but insertion is not guaranteed to be in the same order.

Does it make sense to have this verification in the first place?

The machine combiner applies the pattern in order, so the goal is to ensure that the most profitable patterns are processed first. Did you encounter cases where it wasn't possible to fix the insertion order?

madhur13490 added a comment.EditedJul 6 2023, 8:02 AM

I have a high-level question for this patch. I understand that the verify function aims to verify patterns to be present in certain order but insertion is not guaranteed to be in the same order.

Does it make sense to have this verification in the first place?

The machine combiner applies the pattern in order, so the goal is to ensure that the most profitable patterns are processed first. Did you encounter cases where it wasn't possible to fix the insertion order?

So far not but the I have one case where we had to find it meticulously to fix the order and it is time-consuming each time. It is not possible to make sure that we always do it. EXPENSIVE_CHECK builds are not very well tested as I couldn't see a build bot for aarch64 platform. (I might be wrong if added recently).

If the goal is to make sure we apply patterns in certain order then we should sort them based on latency. Verifying without making sure the insertion order is a risk and we have more than tests failing. Moreover, at the insertion point we have no clue if it is inserting in the required order.

Please let me know if I am missing something here.