This is an archive of the discontinued LLVM Phabricator instance.

[Mem2Reg] Also handle memcpy
Needs RevisionPublic

Authored by loladiro on Sep 15 2017, 3:39 PM.

Details

Summary

In julia, when we know we're moving data between two memory locations,
we always emit that as a memcpy rather than a load/store pair. However,
this can give worse optimization results in certain cases because some
optimizations that can handle load/store pairs cannot handle memcpys.
Mem2reg is one of these optimizations. This patch adds rudamentary
support for mem2reg for recognizing memcpys that cover the whole alloca
we're promoting. While several more sophisticated passes (SROA, GVN)
can get similar optimizations, it is preferable to have these kinds
of cases caught early to expose optimization opportunities before
getting to these later passes. The approach taken here is to split
the memcpy into a load/store pair early (after legality analysis)
and retain the rest of the analysis only on loads/stores. It would
be possible of course to leave the memcpy as is and generate the
left over load or store only on demand. However, that would entail
a significantly larger patch for unclear benefit.

Event Timeline

loladiro created this revision.Sep 15 2017, 3:39 PM
loladiro updated this revision to Diff 115591.Sep 17 2017, 7:30 PM

Fix a small bug and also look through a single level of bitcasts.
Since IRBuilder automatically inserts bitcasts to i8*, it seems
prudent to handle that case.

clang's standard pass pipeline never uses the mem2reg pass; it uses SROA instead. What pass pipeline are you using where this matters?

Hi @efriedma,

this is the julia pass pipeline (https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp#L148). IIRC the original list of passes came from VMKit,
but the pass list was adjusted as needed over the years.

chandlerc edited edge metadata.Sep 19 2017, 10:48 AM

Hi @efriedma,

this is the julia pass pipeline (https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp#L148). IIRC the original list of passes came from VMKit,
but the pass list was adjusted as needed over the years.

I would still suggest just switching to SROA. You can (and should) run it quite early in the pipeline, but that seems much more likely to be a good long term solution.

I would look at the current early pass pipeline in LLVM for ideas about an effective sequencing here.

chandlerc requested changes to this revision.Sep 19 2017, 6:27 PM

(marking as needing changes to clear dashboard)

This revision now requires changes to proceed.Sep 19 2017, 6:27 PM