This is an archive of the discontinued LLVM Phabricator instance.

[CGP] Relax a bit restriction for optimizeMemoryInst to extend scope
ClosedPublic

Authored by skatkov on Jun 29 2017, 9:40 PM.

Details

Summary

CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.

However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.

The added test case shows an example.

Diff Detail

Event Timeline

skatkov created this revision.Jun 29 2017, 9:40 PM
efriedma added inline comments.Jun 30 2017, 12:24 PM
lib/CodeGen/CodeGenPrepare.cpp
4245

This is quadratic in the number of instruction in the BB.

skatkov updated this revision to Diff 105261.Jul 5 2017, 6:16 AM

This should be better, please take a look.

efriedma added inline comments.Jul 6 2017, 3:23 PM
lib/CodeGen/CodeGenPrepare.cpp
4361

This is still essentially quadratic: optimizeMemoryInst itself is called in a loop which iterates over the basic block.

skatkov added inline comments.Jul 6 2017, 8:53 PM
lib/CodeGen/CodeGenPrepare.cpp
4361

Do you see a better way to detect that all instructions in AddrModeInsts is defined earlier than MemoryInst or you just propose to drop this improvement due to its complexity?

ok, I will think whether it can be improved further...

Any hints are appreciated :)

efriedma edited edge metadata.Jul 7 2017, 11:58 AM

Err, wait a sec... you don't need to check dominance at all.

The point of the check is to make sure we don't end up in an infinite loop: if we already sank the addressing mode, we don't want to sink it again.

If we didn't look through a PHI node to come up with an instruction in AddrModeInsts, it must dominate the memory instruction because the IR wouldn't be valid otherwise, so checking dominance is useless. If we did look through a PHI node to come up with an instruction in AddrModeInsts, it isn't really local anyway; we're using the PHI node, not the instruction itself, so the dominance relationship is irrelevant.

So I think the right solution here is to actually track how we came up with each instruction in AddrModeInsts, and skip the IsNonLocalValue check for the ones that came from PHI nodes.

Good point. Meaning that if we traversed through Phi node it means that even it is in the same block, address computation is after memory instruction, otherwise it would not traverse Phi node. I will take a look into this deeper.

Side note, actually I think that the second reason for this check is to allow selection dag to use address folding. It traverses the instructions from the last one to the first one in basic block. So if addressing computation in the other basic block or after current memory instruction it will not be able to fold address computation to memory instruction...

skatkov updated this revision to Diff 105819.Jul 10 2017, 1:57 AM

Thank you Eli for the idea.

Please take a look.

This revision is now accepted and ready to land.Jul 10 2017, 4:11 PM
This revision was automatically updated to reflect the committed changes.