This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] Freeze index unless it is known to be non-poison.
ClosedPublic

Authored by fhahn on May 30 2021, 4:52 AM.

Details

Summary

If the index itself is already poison, the poison propagates through
instructions clamping the index to a valid range. This still causes
introducing a load of poison, as flagged by Alive2 and pointed out
at 575e2aff5574.

This patch updates the code to freeze the index, unless it is proven to
not be poison.

Diff Detail

Event Timeline

fhahn created this revision.May 30 2021, 4:52 AM
fhahn requested review of this revision.May 30 2021, 4:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 30 2021, 4:52 AM
nlopes accepted this revision.May 30 2021, 9:35 AM

LGTM, thanks!

This revision is now accepted and ready to land.May 30 2021, 9:35 AM
This revision was landed with ongoing or failed builds.Jun 1 2021, 2:41 AM
This revision was automatically updated to reflect the committed changes.
efriedma added inline comments.
llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
153

This is still not right. See https://alive2.llvm.org/ce/z/dqg8HF

fhahn added inline comments.Jun 14 2021, 8:48 AM
llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
153

Thanks Eli, I thought I ran Alive2 on the test after the changes but something went wrong. I limited the transform to non-poison indices only for now in 96ca03493ae5.

Not sure if there's much else we can do to make this save using freeze, if we cannot rely on the constrained range.

efriedma added inline comments.Jun 14 2021, 9:07 AM
llvm/test/Transforms/VectorCombine/AArch64/load-extractelement-scalarization.ll
153

The following is legal, in general:

define i32 @src(<4 x i32>* %x, i4 %idx) {
entry:
  %lv = load <4 x i32>, <4 x i32>* %x
  %r = extractelement <4 x i32> %lv, i4 %idx
  ret i32 %r
}

define i32 @tgt(<4 x i32>* %x, i4 %idx) {
entry:
  %idx.freeze = freeze i4 %idx
  %idx.clamped = and i4 %idx.freeze, 3
  %gep = getelementptr inbounds <4 x i32>, <4 x i32>* %x, i32 0, i4 %idx.clamped
  %r = load i32, i32* %gep
  ret i32 %r
}

I haven't thought about when it's profitable.