In julia, when we know we're moving data between two memory locations,
we always emit that as a memcpy rather than a load/store pair. However,
this can give worse optimization results in certain cases because some
optimizations that can handle load/store pairs cannot handle memcpys.
Mem2reg is one of these optimizations. This patch adds rudamentary
support for mem2reg for recognizing memcpys that cover the whole alloca
we're promoting. While several more sophisticated passes (SROA, GVN)
can get similar optimizations, it is preferable to have these kinds
of cases caught early to expose optimization opportunities before
getting to these later passes. The approach taken here is to split
the memcpy into a load/store pair early (after legality analysis)
and retain the rest of the analysis only on loads/stores. It would
be possible of course to leave the memcpy as is and generate the
left over load or store only on demand. However, that would entail
a significantly larger patch for unclear benefit.
Details
Diff Detail
- Build Status
Buildable 10344 Build 10344: arc lint + arc unit
Event Timeline
Fix a small bug and also look through a single level of bitcasts.
Since IRBuilder automatically inserts bitcasts to i8*, it seems
prudent to handle that case.
clang's standard pass pipeline never uses the mem2reg pass; it uses SROA instead. What pass pipeline are you using where this matters?
Hi @efriedma,
this is the julia pass pipeline (https://github.com/JuliaLang/julia/blob/master/src/jitlayers.cpp#L148). IIRC the original list of passes came from VMKit,
but the pass list was adjusted as needed over the years.
I would still suggest just switching to SROA. You can (and should) run it quite early in the pipeline, but that seems much more likely to be a good long term solution.
I would look at the current early pass pipeline in LLVM for ideas about an effective sequencing here.