This is an archive of the discontinued LLVM Phabricator instance.

[InferAlignment] Implement InferAlignmentPass
ClosedPublic

Authored by 0xdc03 on Aug 22 2023, 10:44 AM.

Details

Summary

This pass aims to infer alignment for instructions as a separate pass,
to reduce redundant work done by InstCombine running multiple times. It
runs late in the pipeline, just before the back-end passes where this
information is most useful.

Diff Detail

Event Timeline

0xdc03 created this revision.Aug 22 2023, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 10:44 AM
0xdc03 requested review of this revision.Aug 22 2023, 10:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 22 2023, 10:44 AM
nikic added a subscriber: efriedma.

@efriedma I'd like to have some feedback on whether the direction here makes sense. The premise is that in the middle-end, we don't care all that much about alignment, and the optimizations that do care about it will try to increase alignment in order to succeed (via getOrEnforceKnownAlignment). As such, instead of inferring alignment ~8 times in the optimization pipeline, do it once towards the end of the pipeline. Do you see any inherent problem with that approach?

llvm/lib/Transforms/Scalar/InferAlignment.cpp
28

Drop llvm:: prefix.

Alignment information is most useful in the backend, yes. But it's also relevant for alias analysis and cost modeling, so there could be some impact to delaying the computation. We might need both an early pass and a late pass. We might not need to do it every instcombine run, though.

Do we want the late run to be after unrolling?

A pointer-specific computePointerAlign() might be faster to compute than computeKnownBits(); I mean, the basic logic is the same, but the KnownBits datastructure is a lot more complicated than an "int".

nikic added a comment.Aug 22 2023, 1:12 PM

Alignment information is most useful in the backend, yes. But it's also relevant for alias analysis and cost modeling, so there could be some impact to delaying the computation. We might need both an early pass and a late pass. We might not need to do it every instcombine run, though.

Hm, where does alias analysis use alignment? I don't remember seeing that anywhere.

Do we want the late run to be after unrolling?

In theory, yes. In practice, computeKnownBits() only looks through phi nodes to a very limited degree, so it doesn't cope with the more complex structure of unrolled code well.

A pointer-specific computePointerAlign() might be faster to compute than computeKnownBits(); I mean, the basic logic is the same, but the KnownBits datastructure is a lot more complicated than an "int".

That sounds plausible. I think if we're going with a separate pass, we might want to (later) switch to using dataflow analysis to infer alignment, rather than recursive queries. I wouldn't want to reimplement all of KnownBits in that way, but for alignment a lot less instructions are relevant (basically just root instructions + gep/select/phi). That would fix the phi problem while possibly being faster overall. And people have been complaining about that phi limitation (the GPU folks in particular, I think?)

Alignment information is most useful in the backend, yes. But it's also relevant for alias analysis and cost modeling, so there could be some impact to delaying the computation. We might need both an early pass and a late pass. We might not need to do it every instcombine run, though.

Hm, where does alias analysis use alignment? I don't remember seeing that anywhere.

I think I got confused here. I'm pretty sure the backend alias analysis does use alignment, but maybe IR-level analysis doesn't.

lkail added a subscriber: lkail.Aug 22 2023, 3:26 PM
0xdc03 updated this revision to Diff 552669.Aug 23 2023, 5:24 AM
  • Reorder the patches, merge D158531
0xdc03 edited the summary of this revision. (Show Details)Aug 23 2023, 5:35 AM
0xdc03 updated this revision to Diff 552750.Aug 23 2023, 9:13 AM
  • Move pipeline tests to the correct place
0xdc03 updated this revision to Diff 553725.Aug 26 2023, 6:16 AM
  • Make tests better
0xdc03 updated this revision to Diff 554222.Aug 29 2023, 2:33 AM
  • Make tests betterer
  • Rebase on main
0xdc03 updated this revision to Diff 554281.Aug 29 2023, 5:52 AM
  • Update tests
0xdc03 updated this revision to Diff 554364.Aug 29 2023, 8:26 AM
  • Update tests
nikic accepted this revision.Aug 29 2023, 8:39 AM

LGTM

llvm/lib/Transforms/Scalar/InferAlignment.cpp
18

I think you don't need this include (it's a LegacyPM thing).

This revision is now accepted and ready to land.Aug 29 2023, 8:39 AM

(See also D134282, where we're discussing inferring alignment for memset/memcpy/memmove.)

0xdc03 updated this revision to Diff 556957.Sep 18 2023, 9:49 AM
  • Rebase on main
  • Run the pass twice
This revision was automatically updated to reflect the committed changes.
danilaml added inline comments.
llvm/lib/Passes/PassBuilderPipelines.cpp
1148

I was wondering why the pass was added after LoopeVectorize pass and not before? Wouldn't LV benefit from knowing that some pointers are better aligned? Or it doesn't care about this at the moment?

nikic added inline comments.Sep 29 2023, 6:47 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
1148

LV uses getOrEnforceKnownAlignment(), so it shouldn't matter. Did you see any issues relating to this?

(This is true pretty generally -- the passes that care about alignment will do their own inference.)

danilaml added inline comments.Sep 29 2023, 6:51 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
1148

I haven't seen the issues with pass buidler pipeline, no. I just use custom pipeline and got caught a bit off guard by this change, so I'm now in the process of figuring out where to insert this new pass ;P

nikic added inline comments.Sep 29 2023, 6:54 AM
llvm/lib/Passes/PassBuilderPipelines.cpp
1148

Okay, in that case the guideline would be to insert it once before and after runtime unrolling, but where exactly probably doesn't matter too much (or at least we haven't found out yet...)