This is an archive of the discontinued LLVM Phabricator instance.

[BranchFolding] Merge MMOs during tail merge
ClosedPublic

Authored by flyingforyou on Nov 18 2015, 5:20 PM.

Details

Summary

If we remove the MMOs from Load/Store instructions, they are treated as volatile. This makes other optimization passes unhappy.
eg. Load/Store Optimization

So, it looks better to merge, not remove.

Diff Detail

Repository
rL LLVM

Event Timeline

flyingforyou retitled this revision from to [BranchFolding] Add volatile checking when clearing memory references.
flyingforyou updated this object.
flyingforyou added reviewers: apazos, hfinkel, mcrosier.
flyingforyou added a subscriber: llvm-commits.
gberry added a subscriber: gberry.Nov 18 2015, 6:25 PM

Hi Junmo,

Instead of adding a field to every MachineInstr, it might be better to concatenate the two memref lists in the case that they are unequal when being merged.
You could try to do something more complex like sorting the merged list and removing duplicates, but that might be overkill depending on how large these memref lists can get.

Thanks Geoff.

Is that safe for other optimization passes? Actually, I am not sure that merging memref lists is safe.
If you think it is safe, I will do that. please let me know.

mcrosier edited edge metadata.Nov 19 2015, 5:40 AM

Thanks Geoff.

Is that safe for other optimization passes? Actually, I am not sure that merging memref lists is safe.
If you think it is safe, I will do that. please let me know.

Checkout the commit message for r231799.

Merging multiple MMO's onto a single MI should not cause any correctness issues. However, the MI scheduler doesn't know how to handle multiple MMOs, so it makes conservative assumptions in this case (see the FIXME comment in ScheduleDAGInstrs.cpp:MIsNeedChainEdge()).

In r231799, Ana and I added the == operator for MMOs. When merging the MMO into the MI you could check for redundant MMOs and not merge them. I don't know if this is necessary for your use case, however.

HTH, Chad

flyingforyou retitled this revision from [BranchFolding] Add volatile checking when clearing memory references to [BranchFolding] Merge MMOs during tail merge.
flyingforyou updated this object.
flyingforyou edited edge metadata.

I addressed Chad, Geoff comments.

Thanks Chad. Your comment really helpful for me.

Junmo.

gberry added inline comments.Nov 20 2015, 1:10 PM
lib/CodeGen/BranchFolding.cpp
747 ↗(On Diff #40743)

Maybe you should mention the fact that you are removing duplicates.

751 ↗(On Diff #40743)

This check seems unnecessary to me. You will just skip the below for loop if this is true.

754 ↗(On Diff #40743)

You should consider using a SmallSet here to look for duplicates instead of the N^2 search that you are currently doing.

756 ↗(On Diff #40743)

I think you need to set I2 = MI2->memoperands_begin() and E2 = MI2->memoperands_end() here. Otherwise, whenever you find an MMO is equal, when you start looking for a matching MMO on the next iteration you will start where you left off, instead of at the beginning of MI2's MMO list.

763 ↗(On Diff #40743)

This can be hoisted out of the loop.

flyingforyou added a reviewer: gberry.

Thanks Geoff.

I accept several changes according to your opinion.
And I added several comments about your sugesstions.

flyingforyou marked 3 inline comments as done.Nov 20 2015, 8:28 PM
flyingforyou added inline comments.
lib/CodeGen/BranchFolding.cpp
756 ↗(On Diff #40859)

When I wrote the code with N^2 search, there is a reason that most of cases, MI only has a one or zero MMO.
So, This routine is faster than using SmallSet.

In order to support that most of MI has one or zero MMO, I tested some benchmarks app and got a data.

When I investigated 12354 MI which is used by mergeMMOs function,
MI(MMO is 0) == 1595, MI(MMO is 1) == 10514. Sum of two cases is 12109 and it's 98%.

So, I don't think Smallset is useful in this case.
But there is no explain why I use N^2 search, so I added the comment.

I want you agree with this, please.

758 ↗(On Diff #40859)

Thanks.
You're right. I2 = MI2->memoperands_begin() need to be set. But E2 = MI2->memoperands_end() is only needed when MI2's MMO is changed.

gberry edited edge metadata.Nov 23 2015, 10:52 AM

Do you have any information on the impact of this change on final code generation/performance?

lib/CodeGen/BranchFolding.cpp
756 ↗(On Diff #40859)

I believe a SmallSet will have almost exactly the same performance as the code you have written here when N is small, since it just uses a SmallVector and does a linear search. If N ever gets large though it will switch to using a std::set which should have log(N) lookup. I'd like to hear from another reviewer on whether this is warranted in this case.

Hi Geoff, thanks for comment.

This optimization helps LDR/STR optimizations likes making load-pair or store-piar on AArch64.

When I ran Single/Multi Source Benchmarks in LLVM test-suite, Geomean value of positive effect is 0.06%, negative effect is 0.04%.

Especially, below benchmarks shows 0~1% improvement.

MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
MultiSource/Benchmarks/Prolangs-C++/life/life
MultiSource/Benchmarks/mafft/pairlocalalign

Hi Chad, Could you review this commit, please?

mcrosier added inline comments.Dec 2 2015, 6:18 AM
lib/CodeGen/BranchFolding.cpp
747 ↗(On Diff #40859)

How about:

// Add MI1's MMOs to MI2's MMOs while excluding any duplicates. The MI scheduler currently doesn't handle multiple MMOs, so duplicates would likely pessimize the scheduler.

756 ↗(On Diff #40859)

I believe the number of MMO's on a MI is target-dependent. For AArch64 it tends to be 1, but for x86 it can be more, IIRC. Therefore, I tend to agree with Geoff; we should consider using a SmallSet. @gberry, do you have a suggestion on how the compare function could be written?

Another suggestion.

lib/CodeGen/BranchFolding.cpp
756 ↗(On Diff #40859)

As an alternative you might consider adding logic to handle the common cases and then use std::unique for the other cases.

Something like:

static void mergeMMOs(MachineInstr *MI1, MachineInstr *MI2) {

// Nothing to merge.
if (MI2->getNumMemOperands() == 0)
  return;

if (MI1->getNumMemOperands() == 0) {
  // Copy MMOs from MI2 to MI1.
}

if (MI1->getNumMemOperands() == 1 && MI1->getNumMemOperands() == 1) {
  if (MMO1 == MMO2)
    return;
  // Copy MMO2 to MI1.
  return;
}

// Copy all MMOs from MI2 to MI1.
// std::unique MI1's MMOs.

}

flyingforyou marked an inline comment as done.Dec 2 2015, 6:54 PM
flyingforyou added inline comments.
lib/CodeGen/BranchFolding.cpp
747 ↗(On Diff #40859)

Thanks.

756 ↗(On Diff #40859)

Thanks Chad.
Actually, I didn't think about other target. But, I think memory operand information is common thing. It is about array reference, pointer information, etc..
For clarity, I tested samething on x86 target. It also shows same percentage (MMO is 0 or 1 == 98%).

MMO NumMI Count
01971
111714
2178
324
45
55
63
73
80
90
>=101
Total MI Count13904

Can we just use this code at this time?

For using smallset or unique, we have to develop compare function. I don't think that it's easy.

flyingforyou edited edge metadata.
flyingforyou marked an inline comment as done.

Addressed Chad's comment.

mcrosier accepted this revision.Dec 2 2015, 7:54 PM
mcrosier edited edge metadata.

Given your experimental results I think the code is fine as is.

This revision is now accepted and ready to land.Dec 2 2015, 7:54 PM

Regarding an MMO compare function, I don't think this would be that much different from the existing MachineMemOperand::operator==.
Alternatively, could you just add a bail out case that clears all memops if M1->getMemOperands().size() * M2->getMemOperands().size() is above a certain threshold? That would prevent any pathological cases from causing compile time blowups.

Alternatively, could you just add a bail out case that clears all memops if M1->getMemOperands().size() * M2->getMemOperands().size() is above a certain threshold? That would prevent any pathological cases from causing compile time blowups.

This sounds very reasonable, IMO.

This revision was automatically updated to reflect the committed changes.

Thanks, Geoff, Chad.

I will make an another patch.
We can discuss about threshold value.