This is an archive of the discontinued LLVM Phabricator instance.

[KnownBits] Make shl/lshr/ashr implementations optimal
ClosedPublic

Authored by nikic on May 15 2023, 9:34 AM.

Details

Summary

The implementations for shifts were suboptimal in the case where the max shift amount was >= bitwidth. In that case we should still use the usual code clamped to BitWidth-1 rather than just giving up entirely.

Additionally, there was an implementation bug where the known zero bits for the individual shift amounts were not set in the shl/lshr implementations. I think after these changes, we'll be able to drop some of the code in ValueTracking which *also* evaluates all possible shift amounts and has been papering over this issue.

For the "all poison" case I've opted to return an unknown value for now. It would be better to return zero, but this has fairly substantial test fallout, so I figured it's best to not mix it into this change. (The "correct" return value would be a conflict, but given that a lot of our APIs assert conflict-freedom, that's probably not the best idea to actually return.)

Diff Detail

Event Timeline

nikic created this revision.May 15 2023, 9:34 AM
nikic requested review of this revision.May 15 2023, 9:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 15 2023, 9:34 AM
nikic added inline comments.May 15 2023, 9:37 AM
llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
224

I believe this is correct, because SCALED_IDX is IDX << 4 and as such at least 16. As such, 0x10000 >> SCALED_IDX is either zero or one and the and mask can be narrowed to 1.

llvm/test/Transforms/InstCombine/not-add.ll
175

A nominal regression because I did not try to preserve the exact behavior for "always poison" and always return unknown. If we switch to returning zero this whole code folds away (as well as code in many other tests).

foad accepted this revision.May 15 2023, 9:47 AM
This revision is now accepted and ready to land.May 15 2023, 9:47 AM
nikic edited the summary of this revision. (Show Details)May 15 2023, 10:02 AM
foad added a comment.May 15 2023, 11:12 AM

The "correct" return value would be a conflict,

Yes!

but given that a lot of our APIs assert conflict-freedom

That is a shame.

goldstein.w.n added inline comments.May 15 2023, 11:24 AM
llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
224

Was the old value buggy then?

nikic added inline comments.May 15 2023, 12:43 PM
llvm/test/CodeGen/AMDGPU/amdgpu.private-memory.ll
224

No, it's also correct, the constant is just unnecessarily wide. I believe this is done as part of demanded bits simplification.

goldstein.w.n added inline comments.May 15 2023, 12:53 PM
llvm/lib/Support/KnownBits.cpp
243–248

Not for this patch, but in future installments, maybe a third argument to specify behavior here? In the X86 backend, for example, we remove masks on shiftamt b.c its known processor will just modulo shiftamt by bitwidth. Likewise some targets just return zero. If we could specify the behavior, it may be usable in TargetLowering::computeKnownBitsForTargetNode.

This revision was landed with ongoing or failed builds.May 16 2023, 12:44 AM
This revision was automatically updated to reflect the committed changes.