This is an archive of the discontinued LLVM Phabricator instance.

[memcpyopt] Memcpy-memcpy dependence isn't detected across basic blocks
ClosedPublic

Authored by sunfish on Sep 28 2017, 11:54 AM.

Details

Summary

This picks up the patch from https://reviews.llvm.org/D23470 and updates it to address the review feedback provided there.

This is a very simple patch that just hooks up memcpyopt's existing logic for optimizing memcpys, which already uses memdep, to memdep's existing logic for doing non-local queries. This notably allows it to eliminate many more llvm.memcpy calls in common Rust code, often by 20-30%.

Diff Detail

Repository
rL LLVM

Event Timeline

sunfish created this revision.Sep 28 2017, 11:54 AM
dberlin edited edge metadata.Sep 28 2017, 12:00 PM
  1. non-local calls in GVN are the cause of pretty much all of it's massive slowdown.

Have you measured the effect this has on larger memory-using testcases?

  1. If you are interested, someone already ported it to memoryssa:

https://reviews.llvm.org/D26739

(Bryant appears to have disappeared, sadly :( )

  1. non-local calls in GVN are the cause of pretty much all of it's massive slowdown.

Have you measured the effect this has on larger memory-using testcases?

I've measured compile times on some C++ and Rust testcases, and it doesn't appear to significantly slow down -O2. memcpyopt remains under 1% of the time, according to -time-passes.

  1. If you are interested, someone already ported it to memoryssa:

https://reviews.llvm.org/D26739

That's interesting, though I'm still interested in the focused and incremental patch here.

  1. non-local calls in GVN are the cause of pretty much all of it's massive slowdown.

Have you measured the effect this has on larger memory-using testcases?

I've measured compile times on some C++ and Rust testcases, and it doesn't appear to significantly slow down -O2. memcpyopt remains under 1% of the time, according to -time-passes.

I'm not opposed if we are willing to revert it.

  1. If you are interested, someone already ported it to memoryssa:

https://reviews.llvm.org/D26739

That's interesting, though I'm still interested in the focused and incremental patch here.

Okay. You said"It might be desirable to use MemorySSA, however MemCpyOpt is currently using MemoryDependenceAnalysis for everything, and it seems desirable to avoid using a mix of both within the same pass, and updating the whole pass to use MemorySSA is a bigger project than I'm hoping to take on here."

  1. There's now a way to do that, with pretty much no work.
  2. Chandler specifically requested that passes be mixed with an option to turn it on off.

So ....

  1. non-local calls in GVN are the cause of pretty much all of it's massive slowdown.

Have you measured the effect this has on larger memory-using testcases?

I've measured compile times on some C++ and Rust testcases, and it doesn't appear to significantly slow down -O2. memcpyopt remains under 1% of the time, according to -time-passes.

I'm not opposed if we are willing to revert it.

Is that approval? :-)

  1. If you are interested, someone already ported it to memoryssa:

https://reviews.llvm.org/D26739

That's interesting, though I'm still interested in the focused and incremental patch here.

Okay. You said"It might be desirable to use MemorySSA, however MemCpyOpt is currently using MemoryDependenceAnalysis for everything, and it seems desirable to avoid using a mix of both within the same pass, and updating the whole pass to use MemorySSA is a bigger project than I'm hoping to take on here."

  1. There's now a way to do that, with pretty much no work.
  2. Chandler specifically requested that passes be mixed with an option to turn it on off.

So ....

Indeed maybe it wouldn't be as much work as I assumed. That said, I'm still interested in pursuing the current incremental, focused, and simple patch first.

So, if the work is done (I think you would have to change some getDefiningAccess calls to getClobberingAccess), and it works, i'd prefer for us to just do that and declare victory.
If it's *not* done, okay, fine, that sucks but not gonna hold up the world here.

Everyone always wants to just do the small change they want, You know this.
Part of doing the review is to say "well, that's nice, but we should make sure that's really a good thing for llvm as a whole".
Otherwise, tragedy of the commons and all that.
So i'd like to understand if the other patch solves your problem before i approve this one.

Your suggestions above both involve adding functionality under a flag/option. That alone makes them different situations from just landing the current patch, in terms of seeing the work all the way through into production.

I have a simple, small, focused, incremental patch. It makes a significant improvement for some use cases. I have addressed all the concerns raised here about the patch itself, and am happy to address any others that are raised. The patch doesn't make anything about the MemorySSA-based patches harder to work on.

"Your suggestions above both involve adding functionality under a flag/option. That alone makes them different situations from just landing the current patch, in terms of seeing the work all the way through into production.
"

(Rust could easily enable that option, of course)
Past that, yes, it is.
But like I said, that's sometimes how you prevent a tragedy of the commons.

"The patch doesn't make anything about the MemorySSA-based patches harder to work on"
First, of course it does.
It's yet another thing to keep working or replace. Both in in time spend keeping this working and fixing issues it exposes over time vs spending improving the infrastructure, and in cost to build a replacement.

You are not signing up to maintain memcpyopt and improve it. That's literally what you are saying above. You are making an improvement you want and then going to the next thing. You see that as an argument in favor of this patch: You can just commit it and move on, and leave the maintenance/support to others.
I'm not saying that as a value judgement, but as an explanation of why it's not anywhere near as simple as you make it out to be.

What you've said is just another way of saying "I don't want to have to sign up for that". Which again, i get. Really!
But if you want the people who basically *have signed up* to fix bugs and keep this stuff working, to be willing to accept it, it also means helping them out time to time.

So if you want me to approve it, i'd like to know it's the right path. Small or large. my line has to be drawn somewhere, otherwise, everything just incrementally gets bigger without any stopping point or design reconsideration.
This is where i am drawing the line for me to approve it - knowing that a thing we believe is a better path doesn't already solve your problem.
I haven't actually asked you to go productionize that (despite your concern that i haven). I've asked you simply to test whether it works and whether that path is feasible. If it works, in fact, i'm happy to commit to productionizing and getting it in in the next week or so.

If someone else who maintains this stuff wants to draw a different line and be more pragmatic, that's fine by me too.
I'm only telling you what i would want to see in this case.

sunfish edited the summary of this revision. (Show Details)Sep 29 2017, 11:58 AM
sunfish added reviewers: ab, efriedma, Prazek, joerg, reames.

I've asked you simply to test whether it works and whether that path is feasible. If it works, in fact, i'm happy to commit to productionizing and getting it in in the next week or so.

Ok. I've now tested the patch in https://reviews.llvm.org/D26739 for the use cases that motivated the original patch here, and it does indeed address them.

sanjoy added a subscriber: sanjoy.Oct 10 2017, 2:34 PM

Friendly ping :-). I confirmed that the MemorySSA-based patch does implement the optimization. In fact, it tries not to, however it doesn't try hard enough, and the optimization happens anyway. It passes all the tests here as-is.

sunfish edited the summary of this revision. (Show Details)Oct 17 2017, 10:31 AM

Friendly ping! Anyone interested in taking a look?

Friendly ping! Anyone interested in taking a look?

sanjoy requested changes to this revision.Nov 28 2017, 11:50 PM
sanjoy added inline comments.
lib/Analysis/MemoryDependenceAnalysis.cpp
925

Would it be better to call this getNonLocalPointerDependencyFrom, in line with getPointerDependencyFrom?

lib/Transforms/Scalar/MemCpyOptimizer.cpp
1240

Why not push this SmallVector into the if block?

1274

IIUC, this may try to create illegal IR (def not dominating use) if it finds the non-local memset via a backedge since it tries to create a memset before the memcpy using the memset's value operand, which may not dominate the memcpy with your change.

This revision now requires changes to proceed.Nov 28 2017, 11:50 PM
sunfish updated this revision to Diff 124773.Nov 29 2017, 10:05 AM
sunfish edited edge metadata.
  • Sink NonLocalDepResults smallvectors into their if blocks.
  • Rename getSpecificNonLocalPointerDependency to getNonLocalPointerDependencyFrom
sunfish marked 2 inline comments as done.Nov 29 2017, 10:26 AM

Thanks for taking a look!

lib/Analysis/MemoryDependenceAnalysis.cpp
925

Sure.

lib/Transforms/Scalar/MemCpyOptimizer.cpp
1240

Done.

1274

If the memset doesn't dominate the memcpy, the non-local dependence results reflect this. For example, in

for (...) {
    memcpy(out, buffer, size);
    memset(buffer, 0, size);
}

the memcpy depends on the memset around the backend, but memdep reports two dependencies: the memcpy on the backedge, but also whatever there is on the loop entry edge, so the code in this patch requiring a single dependency doesn't accept it.

sanjoy accepted this revision.Nov 29 2017, 6:11 PM
sanjoy added inline comments.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
1274

Ok, can you then add an assert that SrcDepInfo.getInst() dominates M (specifically in the cases where you computed it from a singular non-local dependency)?

This revision is now accepted and ready to land.Nov 29 2017, 6:11 PM
sunfish marked 3 inline comments as done.Nov 30 2017, 2:10 PM
sunfish added inline comments.
lib/Transforms/Scalar/MemCpyOptimizer.cpp
1274

Makes sense. I've now added the asserts.

This revision was automatically updated to reflect the committed changes.
sunfish marked an inline comment as done.Nov 30 2017, 2:16 PM

Thanks for your help!