This is an archive of the discontinued LLVM Phabricator instance.

[ExpandMemCmp] Properly constant-fold all compares.
ClosedPublic

Authored by courbet on Mar 3 2020, 6:16 AM.

Details

Summary

This gets rid of duplicated code and diverging behaviour w.r.t.
constants.
Fixes PR45086.

Diff Detail

Event Timeline

courbet created this revision.Mar 3 2020, 6:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 6:16 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
courbet updated this revision to Diff 247879.Mar 3 2020, 6:19 AM

regenerate pgso tests

courbet marked an inline comment as done.Mar 3 2020, 6:21 AM

Note that most test changes are NFC (just interleaving computations). I've highlighted the actual changes.

llvm/test/CodeGen/X86/memcmp.ll
103

This is the real change.

courbet updated this revision to Diff 247895.Mar 3 2020, 7:21 AM

Update powerpc tests

courbet marked an inline comment as done.Mar 3 2020, 7:22 AM
courbet added inline comments.
llvm/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll
93 ↗(On Diff #247895)

These are also real changes.

Given memcmp expansion is running late in the pipeline, do you think we're possibly missing other relevant optimizations on the loads? LICM, maybe?

llvm/test/CodeGen/X86/memcmp.ll
425

The scheduling here seems to be worse?

courbet marked an inline comment as done.Mar 3 2020, 11:54 PM

Given memcmp expansion is running late in the pipeline, do you think we're possibly missing other relevant optimizations on the loads? LICM, maybe?

It's definitely missing other optimizations. I tried moving it sooner in the pipeline some time ago for this reason (https://reviews.llvm.org/D60318), but it turned out to be much harder than I anticipated because of interactions with the sanitizers.

llvm/test/CodeGen/X86/memcmp.ll
425

I think we could argue either way: the new scheduling interleaves loads and other computations, evening out the port pressure and increased compute parallelism. On the other hand it's true that there is less data parallelism. It should not matter with recent out-of-order cores anyway.
If you feel strongly about this I can go back to interleaving the data, this is actually orthogonal to the real change.

efriedma added inline comments.Mar 4 2020, 1:31 PM
llvm/test/CodeGen/X86/memcmp.ll
425

I'd prefer to avoid unrelated changes.

the new scheduling interleaves loads and other computations

Scheduling an arithmetic operation that uses a loaded value immediately after the load is never going to work out; loads have latency. Granted, I agree it's unlikely to matter much on a modern x86 core.

Really, I'm more surprised we aren't trying to do any scheduling at all after isel.

courbet updated this revision to Diff 248430.Mar 5 2020, 3:52 AM

Revert scheduling changes.

Given memcmp expansion is running late in the pipeline, do you think we're possibly missing other relevant optimizations on the loads? LICM, maybe?

It's definitely missing other optimizations. I tried moving it sooner in the pipeline some time ago for this reason (https://reviews.llvm.org/D60318), but it turned out to be much harder than I anticipated because of interactions with the sanitizers.

For reference, we still have at least a couple of motivating bugs for moving this up in the pipeline:
https://bugs.llvm.org/show_bug.cgi?id=36421
https://bugs.llvm.org/show_bug.cgi?id=34032#c13 - reduced example: https://godbolt.org/g/QjVXvS

I think this patch makes sense, but we might also consider a small add-on that would catch the cases shown here and possibly more. We can basically run -instsimplify within a pass via:

diff --git a/llvm/lib/CodeGen/ExpandMemCmp.cpp b/llvm/lib/CodeGen/ExpandMemCmp.cpp
index d0dd538f1f5..bb5e6261593 100644
--- a/llvm/lib/CodeGen/ExpandMemCmp.cpp
+++ b/llvm/lib/CodeGen/ExpandMemCmp.cpp
@@ -23,6 +23,7 @@
 #include "llvm/CodeGen/TargetSubtargetInfo.h"
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/InitializePasses.h"
+#include "llvm/Transforms/Utils/Local.h"
 #include "llvm/Transforms/Utils/SizeOpts.h"
 
 using namespace llvm;
@@ -869,6 +870,11 @@ PreservedAnalyses ExpandMemCmpPass::runImpl(
       ++BBIt;
     }
   }
+
+  if (MadeChanges)
+    for (BasicBlock &BB : F)
+      SimplifyInstructionsInBlock(&BB);
+
   return MadeChanges ? PreservedAnalyses::none() : PreservedAnalyses::all();
 }
courbet updated this revision to Diff 248711.Mar 6 2020, 5:36 AM

Simplify instructions after expandmemcmp.

spatel accepted this revision.Mar 6 2020, 9:24 AM

Simplify instructions after expandmemcmp.

LGTM, but as a matter of risk reduction, please make that an independent commit.

I don't think it would be noticeable, but there is some non-zero compile-time cost in doing that analysis/transform (although it might pay for itself if it makes later codegen faster).

I don't have a preference for order of landing the changes, but if I'm seeing it correctly, if we land the simplify commit later, it would only cause benign (different label names) diffs in the asm tests.

This revision is now accepted and ready to land.Mar 6 2020, 9:24 AM

Thanks.

Simplify instructions after expandmemcmp.

LGTM, but as a matter of risk reduction, please make that an independent commit.

Will do.

I don't think it would be noticeable, but there is some non-zero compile-time cost in doing that analysis/transform (although it might pay for itself if it makes later codegen faster).

I don't have a preference for order of landing the changes, but if I'm seeing it correctly, if we land the simplify commit later, it would only cause benign (different label names) diffs in the asm tests.

There is actually a diff in the powerpc test that compares two constants.

This revision was automatically updated to reflect the committed changes.