This is an archive of the discontinued LLVM Phabricator instance.

[ValueTracking] Fix computeKnownBits() with bitwidth-changing ptrtoint
ClosedPublic

Authored by nikic on May 1 2020, 8:38 AM.

Details

Summary

computeKnownBitsFromAssume() currently asserts if m_V matches a ptrtoint that changes the bitwidth. Because InstCombine canonicalizes ptrtoint instructions to use explicit zext/trunc, we never ran into the issue in practice. I'm adding unit tests, as I don't know if this can be triggered via IR anywhere.

Fix this by calling anyextOrTrunc(BitWidth) on the computed KnownBits. Note that we are going from the KnownBits of the ptrtoint result to the KnownBits of the ptrtoint operand, so we need to truncate if the ptrtoint zexted and anyext if the ptrtoint truncated.

Diff Detail

Event Timeline

nikic created this revision.May 1 2020, 8:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 1 2020, 8:38 AM

Is this fixing some bug? Is there an observable behavior change?
Can a (currently-crashing) test be added for some pass?
Or can this only be triggered via a specially-crafted unit test?

nikic added a subscriber: efriedma.May 6 2020, 12:40 AM

Is this fixing some bug? Is there an observable behavior change?
Can a (currently-crashing) test be added for some pass?
Or can this only be triggered via a specially-crafted unit test?

This fixes a bug. There is no behavior change. It can be observed in IR by moving known bits alignment calculation from InstCombine into AlignmentFromAssumptions. It cannot be observed in InstCombine because InstCombine canonicalizes ptrtoint casts to explicit zext/trunc. Maybe it can be observed in some other pass, but it's not easy (we need both non-trivial context instruction for assumes, and known bits query on a pointer value, which is unusual).

Maybe the better way of "fixing" this issue is to change IR definition of ptrtoint/inttoptr to forbid bitwidth change and always require zext/trunc? I have no idea why the current semantics exist. Maybe @efriedma knows?

Historically, we allowed IR without a datalayout, so we didn't actually know what the pointer width would be when the IR was eventually compiled. Given that, allowing arbitrary casts was the least-bad alternative: otherwise, we wouldn't be able to verify a module.

Now that datalayout is required, we could maybe restrict the legal ptrtoint casts? Not sure what we would do about ptrtoint constant expressions; constant expressions that don't reference globals aren't associated with a particular module.

nikic added a comment.May 15 2020, 1:32 PM

Historically, we allowed IR without a datalayout, so we didn't actually know what the pointer width would be when the IR was eventually compiled. Given that, allowing arbitrary casts was the least-bad alternative: otherwise, we wouldn't be able to verify a module.

Now that datalayout is required, we could maybe restrict the legal ptrtoint casts? Not sure what we would do about ptrtoint constant expressions; constant expressions that don't reference globals aren't associated with a particular module.

Thanks for the context! Would anything speak against making the validity of constant expressions dependent on which module they are used in? That is, the same constant expression could be valid in one module and invalid in another, depending on data layout. As the validator works by recursively checking constants referenced in the module, I think that should work.

I may look into adding this requirement as a longer term change. I'd prefer to land this patch in the meantime though.

Would anything speak against making the validity of constant expressions dependent on which module they are used in? That is, the same constant expression could be valid in one module and invalid in another, depending on data layout.

That's not something we've ever done, but I guess it's possible.

lib/Analysis/ValueTracking.cpp
794 ↗(On Diff #261458)

LangRef says ptrtoint zero-extends.

nikic marked an inline comment as done.May 15 2020, 2:26 PM

Historically, we allowed IR without a datalayout, so we didn't actually know what the pointer width would be when the IR was eventually compiled. Given that, allowing arbitrary casts was the least-bad alternative: otherwise, we wouldn't be able to verify a module.

Now that datalayout is required, we could maybe restrict the legal ptrtoint casts? Not sure what we would do about ptrtoint constant expressions; constant expressions that don't reference globals aren't associated with a particular module.

Thanks for the context! Would anything speak against making the validity of constant expressions dependent on which module they are used in? That is, the same constant expression could be valid in one module and invalid in another, depending on data layout. As the validator works by recursively checking constants referenced in the module, I think that should work.

I may look into adding this requirement as a longer term change. I'd prefer to land this patch in the meantime though.

I poked at this a bit, and ran into the first snag in https://github.com/llvm/llvm-project/blob/a63eedd049bfde97f1d73caa61d27df600501c54/llvm/lib/IR/BasicBlock.cpp#L79. This is creating an inttoptr cast in the BasicBlock destructor, so we have no data layout to ensure the correct integer size is used. That may make it infeasible to enforce this restriction...

lib/Analysis/ValueTracking.cpp
794 ↗(On Diff #261458)

Here we have the knownbits of the ptrtoint result and want to reason backwards to the knownbits of the ptrtoint operand. If the ptrtoint zexted, that means we need to trunc. If the ptrtoint truncated, that means we need to anyext.

Here anyextOrTrunc is used as the inverse operation to zextOrTrunc.

efriedma accepted this revision.May 15 2020, 2:57 PM

LGTM

lib/Analysis/ValueTracking.cpp
794 ↗(On Diff #261458)

Oh, right.

This revision is now accepted and ready to land.May 15 2020, 2:57 PM
This revision was automatically updated to reflect the committed changes.