This pass aims to infer alignment for instructions as a separate pass,
to reduce redundant work done by InstCombine running multiple times. It
runs late in the pipeline, just before the back-end passes where this
information is most useful.
Details
- Reviewers
nikic efriedma - Commits
- rG0f152a55d3e4: [InferAlignment] Implement InferAlignmentPass
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
@efriedma I'd like to have some feedback on whether the direction here makes sense. The premise is that in the middle-end, we don't care all that much about alignment, and the optimizations that do care about it will try to increase alignment in order to succeed (via getOrEnforceKnownAlignment). As such, instead of inferring alignment ~8 times in the optimization pipeline, do it once towards the end of the pipeline. Do you see any inherent problem with that approach?
llvm/lib/Transforms/Scalar/InferAlignment.cpp | ||
---|---|---|
28 | Drop llvm:: prefix. |
Alignment information is most useful in the backend, yes. But it's also relevant for alias analysis and cost modeling, so there could be some impact to delaying the computation. We might need both an early pass and a late pass. We might not need to do it every instcombine run, though.
Do we want the late run to be after unrolling?
A pointer-specific computePointerAlign() might be faster to compute than computeKnownBits(); I mean, the basic logic is the same, but the KnownBits datastructure is a lot more complicated than an "int".
Hm, where does alias analysis use alignment? I don't remember seeing that anywhere.
Do we want the late run to be after unrolling?
In theory, yes. In practice, computeKnownBits() only looks through phi nodes to a very limited degree, so it doesn't cope with the more complex structure of unrolled code well.
A pointer-specific computePointerAlign() might be faster to compute than computeKnownBits(); I mean, the basic logic is the same, but the KnownBits datastructure is a lot more complicated than an "int".
That sounds plausible. I think if we're going with a separate pass, we might want to (later) switch to using dataflow analysis to infer alignment, rather than recursive queries. I wouldn't want to reimplement all of KnownBits in that way, but for alignment a lot less instructions are relevant (basically just root instructions + gep/select/phi). That would fix the phi problem while possibly being faster overall. And people have been complaining about that phi limitation (the GPU folks in particular, I think?)
I think I got confused here. I'm pretty sure the backend alias analysis does use alignment, but maybe IR-level analysis doesn't.
LGTM
llvm/lib/Transforms/Scalar/InferAlignment.cpp | ||
---|---|---|
18 | I think you don't need this include (it's a LegacyPM thing). |
(See also D134282, where we're discussing inferring alignment for memset/memcpy/memmove.)
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1148 | I was wondering why the pass was added after LoopeVectorize pass and not before? Wouldn't LV benefit from knowing that some pointers are better aligned? Or it doesn't care about this at the moment? |
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1148 | LV uses getOrEnforceKnownAlignment(), so it shouldn't matter. Did you see any issues relating to this? (This is true pretty generally -- the passes that care about alignment will do their own inference.) |
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1148 | I haven't seen the issues with pass buidler pipeline, no. I just use custom pipeline and got caught a bit off guard by this change, so I'm now in the process of figuring out where to insert this new pass ;P |
llvm/lib/Passes/PassBuilderPipelines.cpp | ||
---|---|---|
1148 | Okay, in that case the guideline would be to insert it once before and after runtime unrolling, but where exactly probably doesn't matter too much (or at least we haven't found out yet...) |
I was wondering why the pass was added after LoopeVectorize pass and not before? Wouldn't LV benefit from knowing that some pointers are better aligned? Or it doesn't care about this at the moment?