Page MenuHomePhabricator

[CGP] Extends the scope of optimizeMemoryInst optimization

Authored by skatkov on Jul 31 2017, 12:17 AM.



This is an implementation of PR26223.

Currently optimizeMemoryInst optimization tries to fold address computation
if all possible way to get compute the address are of the form

baseGV + base + scale * Index + offset

where scale and offset are constants and baseGV, base and Index are exactly
the same instructions if defined.

The patch extends this optimization to allow different bases. In this case
it tries to find/build a Phi node merging all possible bases and use this Phi node
as a base for sunk address computation.

The main motivation for this scope extension is GCRelocateInst.
If there is a relocation of derived pointer it will be represented as relocation of base + offset.
Also there will be a Phi node merging address computation for relocated derived pointer
and derived pointer itself. If we have a Phi node merging original base and relocated base
and can fold the address computation of derived pointer then we can potentially reduce
the code size and Phi node for derived pointer. The later can have a positive impact to
register allocator.

Diff Detail


Event Timeline

skatkov created this revision.Jul 31 2017, 12:17 AM
dberlin added inline comments.Jul 31 2017, 10:38 AM
4381 ↗(On Diff #108867)

FWIW, i'd just hash them, rather thancheck the possible matches again and again, i think.

Happy to give you some code to crib for that.

skatkov added inline comments.Jul 31 2017, 8:08 PM
4381 ↗(On Diff #108867)

Interesting idea, you mean hash the Phi node and compare hashes first? I'm not sure I'm doing comparison of the same Phi node many times.

I take a Phi node and compare it against each Phi in the same basic block. If I find a match I will recursively check the dependency.
If this Phi node is not matched to any phi node in this basic block, I just bail out if creation of new Phi node is not allowed or move this Phi node (and all other Phi nodes considered together) to the list of known Phi nodes. And I will not consider them again.

So it seems that hashing is redundant here.
Do I miss anything?

skatkov added inline comments.Jul 31 2017, 8:22 PM
4429 ↗(On Diff #108867)

should be static. Will update after first iteration of review.

I would really appreciate if you review the patch :)
May be I really miss something.

I know the code is pretty big in size but I tried to write it as much as simple for understanding as I could.
If I can do anything to simplify the review, please let me know.

Add a couple of more reviewers... If someone knows who can also be added please do.

mkazantsev added inline comments.Aug 7 2017, 12:15 AM
2617 ↗(On Diff #108867)

I think this can be merged separately as NFC. Also you could re-define the operator== avode like return (BaseReg == O.BaseReg) && EqualsIgnoreBase(O) to avoid code duplication.

4630 ↗(On Diff #108867)

Better use && instread of increased nesting.

mkazantsev added inline comments.Aug 7 2017, 12:16 AM
4630 ↗(On Diff #108867)

UPD: Not viable here, sorry, I misread it.

skatkov edited the summary of this revision. (Show Details)Aug 16 2017, 11:36 PM
reames added inline comments.Sep 22 2017, 2:49 PM
2617 ↗(On Diff #108867)

Please take Max's suggestion. Getting this integrated is going to be a slow process and we should carve out any piece we can.

4617 ↗(On Diff #108867)

getParent()->getParent() --> getFunction()

4630 ↗(On Diff #108867)

Why is it not viable?

4645 ↗(On Diff #108867)

This snippet of code confuses me and the comment doesn't help.

4649 ↗(On Diff #108867)

As I pointed out in the other review (, I think both can be done using the same API as seen by this function. I generally like the framing of the other patch (track the addrmodes found, then merge later) a bit better than this and it's clearly more general. I think splitting this patch into two would simply the review greatly and highlight the common work between the reviews. My suggested split would be:

  1. The changes to this function required to track which components differ. With a trivial implementation of the analysis/transform for when one component differs and the necessary phi/select can be trivially found in the same basic block.
  2. A patch which builds on top of that with the rest of the algorithm.

(Note: This is in addition to the select handling factored out from that other review.)

reames requested changes to this revision.Sep 22 2017, 2:50 PM

Marking as changes needs to reflect split requested. Note that I really haven't even looked at the guts of the transform and don't plan to until the common pieces have been extracted.

This revision now requires changes to proceed.Sep 22 2017, 2:50 PM
skatkov added inline comments.Sep 25 2017, 10:12 PM
2617 ↗(On Diff #108867)

We agreed that john.brown will implement it in a separate patch. So I 'm waiting him for this update.

4630 ↗(On Diff #108867)

Because there is an additional code with we should execute if first condition is true while the second one is false.

4645 ↗(On Diff #108867)

I meant that if AddrModes contains only base (no index, no offset), no need to do a complex analysis because we end up by the same Phi node as we start from.

john.brawn added inline comments.
2617 ↗(On Diff #108867)

This now in D38278 (plus its dependent D38242).

mkazantsev added inline comments.Oct 10 2017, 1:16 AM
4366 ↗(On Diff #108867)

Nit: Item

skatkov updated this revision to Diff 118344.Oct 10 2017, 4:24 AM
skatkov edited edge metadata.

Please take a look.
I also plan to add more complex test cases like different mix of select and phi.

skatkov added inline comments.Oct 10 2017, 4:31 AM
203 ↗(On Diff #118344)

I plan to set this to true before submitting the patch and return back to false as a separate commit.

209 ↗(On Diff #118344)

If I set this to true I get one unit test failure. The optimization works but generated code for ARM seems worse...
Need to think about some heuristic when to enable Phi node creation. For example if original Phi will be eliminated (no other users except memory operations...)

2752 ↗(On Diff #118344)

Unfortunately it is not enough for checking the trivial case. It is possible that BaseReg is just a bitcast while there is no other fields but OriginalValue != baseReg and we consider this as not trivial what seems not true.

I plan to come up with separate patch for this.

5044 ↗(On Diff #118344)

Like an idea for compile time improvement: If we know that for this Addr there is already sunk address why we need all checks above?

mkazantsev added inline comments.Oct 10 2017, 4:39 AM
3414 ↗(On Diff #118344)

In such cases it needs to be auto *PHI.

skatkov updated this revision to Diff 118546.Oct 10 2017, 11:16 PM

Handled Maxim's comment.

john.brawn added inline comments.Oct 20 2017, 7:24 AM
3765 ↗(On Diff #118546)

If TrueValue is not an Instruction then this will fail because the value will have been inserted into the Map with nullptr for the block. I think we need handling similar to how PHINode handles it below (and the same is true for FalseValue also).

3779 ↗(On Diff #118546)

The iterator order of predecessors(CurrentBlock) may be different to the iterator order of CurrentPhi->blocks(), which can cause PHI to have incoming values in a different order to CurrentPhi which is a bit of a nuisance when writing tests. Could you see if you can preserve the block order? I tried modifying this to iterate over CurrentPhi->blocks(), but then I hit the "No predecessor Value" assert for reasons I haven't figured out yet.

3816 ↗(On Diff #118546)

Elsewhere Map.find(Current) != Map.end() is used to see if a value is in the map. It would be nice for that to be checked the same way everywhere.

3835 ↗(On Diff #118546)

Map[Current] = PHI (and similar elsewhere in this function).

skatkov updated this revision to Diff 119830.Oct 23 2017, 4:20 AM

Handled comments. Still need to add more tests... Plan to do it in 1-2 days.

skatkov marked 2 inline comments as done.Oct 23 2017, 4:23 AM
skatkov added inline comments.
3765 ↗(On Diff #118546)

Good catch. Fixed.

3779 ↗(On Diff #118546)

We can do this only if CurrentPhi is declared in this basic block otherwise you'll get an assert as you mentioned.

If we really want it it will require the duplication of the loop to support both cases - in this block and not. Do you see any other possible way to do that?

3816 ↗(On Diff #118546)


3835 ↗(On Diff #118546)


skatkov updated this revision to Diff 120770.Oct 29 2017, 10:51 PM
skatkov marked 2 inline comments as done.

added a couple of more tests. Specifically for select of select, select of phi and phi of select.

john.brawn accepted this revision.Nov 3 2017, 6:22 AM


3779 ↗(On Diff #118546)

Looking at this some more it does look like it would be more trouble than it's worth to try and keep the block order, so this I think this is OK as it is.

skatkov updated this revision to Diff 121614.Nov 4 2017, 9:39 PM

Re-based before landing.
Disable optimization by default, I will land separate patch to enable it on Tue my morning to be able to re-act on possible problems.

This revision was automatically updated to reflect the committed changes.

Yeah :(

Reverted as r317667. Will investigated and re-submit later.