This is an archive of the discontinued LLVM Phabricator instance.

[ConstantFolding] Avoid assert when folding ptrtoint of vectorized GEP
ClosedPublic

Authored by bjope on Oct 4 2017, 8:54 AM.

Details

Summary

Got asserts in llvm::CastInst::getCastOpcode saying:
`DestBits == SrcBits && "Illegal cast to vector (wrong type or size)"' failed.

Problem seemed to be that llvm::ConstantFoldCastInstruction did
not handle ptrtoint cast of a getelementptr returning a vector
correctly. I assume such situations are quite rare, since the
GEP needs to be considered as a constant value (base pointer
being null).
The solution used here is to simply avoid the constant fold
of ptrtoint when the value is a vector. It is not supported,
and by bailing out we do not fail on assertions later on.

Change-Id: Ic8fb38e2014cdf2cf4027b10b9535c32fe044a33

Diff Detail

Repository
rL LLVM

Event Timeline

bjope created this revision.Oct 4 2017, 8:54 AM
bjope added inline comments.Oct 4 2017, 9:01 AM
lib/IR/ConstantFold.cpp
623 ↗(On Diff #117678)

Should this perhaps go below the V->isNullValue() check?
(I could not see that isNullValue() will be true for a vector of null pointers, and I am not sure that the ConstantInt::get(DestTy, 0) would be correct if DestTy is a vector. So I did put the check here before the isNullValue() check.)

bjope added a reviewer: davide.Oct 9 2017, 6:27 AM
filcab requested changes to this revision.Oct 16 2017, 11:44 AM
filcab added a subscriber: filcab.

Hi @bjope,

Thanks for the patch, but it seems to be trying to paper over another problem instead of fixing it.
I've commented on the specific line where I have concerns.

Thank you,

Filipe

test/Analysis/ConstantFolding/cast-vector.ll
10 ↗(On Diff #117678)

This is invalid IR (per langref). the ptrtoint constant expression only takes a pointer type argument:

ptrtoint (CST to TYPE)
Convert a pointer typed constant to the corresponding integer constant. TYPE must be an integer type. CST must be of pointer type. The CST value is zero extended, truncated, or unchanged to make it fit in TYPE.

It seems to me that one of the following is the best fix:

  • Fix whatever is creating a ptrtoint constant expression with a vector argument (make sure to also add an assert to make sure it doesn't happen again)
  • Make ptrtoint of vectors a valid constant expression, which involves changing the langref and making sure these kinds of folds are valid.
This revision now requires changes to proceed.Oct 16 2017, 11:44 AM
bjope added inline comments.Oct 16 2017, 1:59 PM
test/Analysis/ConstantFolding/cast-vector.ll
10 ↗(On Diff #117678)

@filcab: Are you sure?

AFAIK this is valid according to verifiers. And it seems to work.
The examples for both ptrtoint and inttoptr in the "Instruction Reference" part of the langref includes examples with vectors. For ptrtoint the version of the langref that I've been looking at says:

The ‘ptrtoint‘ instruction takes a value to cast, which must be a value of type pointer or a vector of pointers, and a type to cast it to ty2, which must be an integer or a vector of integers type.

I assume that you quoted the description from the "Constant Expressions" section of the langref.
Are you saying that there is a limitaiton for constant expressions, and that the Instruction Reference part of the langref only is valid for non-constant expressions?

Besides, you say that I should "fix whatever is creating a ptrtoint constant expression with a vector argument". My patch is about bailing out from ConstantFoldCastInstruction without doing any folding. Basically leaving the ptrtoint as it was in the input (which follows the example from the Instruction Reference part of the langref.... as the arguments isn't constant folded I guess it can't be seen as a constant expression...).

filcab added inline comments.Oct 17 2017, 10:53 AM
test/Analysis/ConstantFolding/cast-vector.ll
10 ↗(On Diff #117678)

The ptrtoint instruction in the input for your test is ok per langref.
The ptrtoint constant expression on the output you expect is not.

If you look at your CHECK line, you're expecting an output like:
ret <2 x i16> ptrtoint (<2 x i32*> getelementptr ([10 x i32], [10 x i32]* null, <2 x i64> zeroinitializer, <2 x i64> <i64 5, i64 7>) to <2 x i16>)

This is a ret of a ptrtoint constant expression which takes a getelementptr constant expression. On the input, you have instructions, which are different. ConstantFoldCastInstruction is currently being called on malformed IR which includes that ptrtoint constant expression. The IR is already bad (according to langref), so it's not useful to have ConstantFoldCastInstruction instead of stopping whatever emitted that IR. (That's just one of the possible fixes)

bjope added inline comments.Oct 17 2017, 2:53 PM
test/Analysis/ConstantFolding/cast-vector.ll
10 ↗(On Diff #117678)

So you are saying that constant expressions has special limitations, and that the Instruction Reference part of the langref only is valid for non-constant expressions?
(note that vector types are first class types, and the Constant Expressions section of the langref do say "Constant expressions may be of any first class type")

Is the following also malformed IR then?

%bar = add <2 x i16> %foo, trunc (<2 x i32> <i32 78, i32 99> to <2 x i16>)

I think it is weird that using vector types in constant expressions with cast operations works just fine if it isn't allowed. Is it perhaps just the langref that is a little bit vague regarding this?

Nevertheless, I do not think that the IR has been "malformed" yet when I hit the assert. My fix is just about bailing out when analysing a ptrtoint expression, very much in the same way as it already is done for trunc inside ConstantFoldCastInstruction.

Maybe it is a pity that the constant GEP expression isn't constant evaluated, reducing everything into a ret with a <2 x i16> with two literals. But that is kind of a different problem, and I'm not sure how common it is to have GEPs that are based on null pointers (only way to get a GEP that is a constant expression?), so maybe adding logic for that isn't worth the trouble.

As a comparison, if I change the test case to this (equivalent code but without the vector gep):

%gep1 = getelementptr inbounds [10 x i32], [10 x i32]* null, i16 0, i16 5
%gep2 = getelementptr inbounds [10 x i32], [10 x i32]* null, i16 0, i16 7
%gepvec1 = insertelement <2 x i32*> undef, i32* %gep1, i32 0
%gepvec2 = insertelement <2 x i32*> %gepvec1, i32* %gep2, i32 1
%vec = ptrtoint <2 x i32*> %gepvec2 to <2 x i16>
ret <2 x i16> %vec

then instsimplify will reduce it to

ret <2 x i16> <i16 ptrtoint (i32* inttoptr (i64 20 to i32*) to i16), i16 ptrtoint (i32* inttoptr (i64 28 to i32*) to i16)>

while instcombine will reduce it to

ret <2 x i16> <i16 20, i16 28>

So someone has bothered about evaluating constant GEPs that aren't vectors. But it also show that instsimplify doesn't reduce the ptrtoint and inttoptr fully, not even for the scalar case.

The IR produced in my test case:

ret <2 x i16> ptrtoint (<2 x i32*> getelementptr ([10 x i32], [10 x i32]* null, <2 x i64> zeroinitializer, <2 x i64> <i64 5, i64 7>) to <2 x i16>)

is accepted both by the verify pass, and llc. So instead of an assert, my original test case (C code) compiles into valid assembler. And I get the same assembler result after llc for all these variants of ret.

The constant expression ptrtoint is supposed to have the same rules as the instruction ptrtoint. Looks like we just forgot to update LangRef when pointer vectors were added.

bjope added a comment.Oct 18 2017, 8:40 AM

The constant expression ptrtoint is supposed to have the same rules as the instruction ptrtoint. Looks like we just forgot to update LangRef when pointer vectors were added.

Then the test case is valid.
And this patch avoids the assert by not executing the piece of code that does not support a getelementptr that returns a pointer of vectors. So it is a simple bugfix for an existing problem.
So can I land this now?

bjope requested review of this revision.Oct 18 2017, 11:43 PM
bjope edited edge metadata.

Needs a FIXME or something explaining why exactly you're returning early; in theory, we should be able to fold vector ptrtoint operations the same way we fold integer ptrtoint operations, given the right logic.

Also, while we're discussing LangRef, would you mind updating it to clarify the rules here?

bjope updated this revision to Diff 119784.Oct 22 2017, 3:20 AM

Add a FIXME as requested by Eli.

I've now added a FIXME describing the reason for now handling vectors at the
moment. And I moved the check closer to the code that needs to be guarded.

Also updated the test case with a second function, in order to show that the
problem exists for both the "offset-of" and "sizeof" kind of expressions.

bjope added a comment.Oct 22 2017, 3:56 AM

Needs a FIXME or something explaining why exactly you're returning early; in theory, we should be able to fold vector ptrtoint operations the same way we fold integer ptrtoint operations, given the right logic.

Ok! I've updated the patch.

Also, while we're discussing LangRef, would you mind updating it to clarify the rules here?

I've proposed a LangRef update in a separate patch: https://reviews.llvm.org/D39165

efriedma accepted this revision.Oct 23 2017, 3:44 PM

LGTM with fixed whitespace.

lib/IR/ConstantFold.cpp
640 ↗(On Diff #119784)

Looks like the whitespace got messed up here somehow?

bjope added inline comments.Oct 24 2017, 12:57 AM
lib/IR/ConstantFold.cpp
640 ↗(On Diff #119784)

Yes, thanks! I just found out that my emacs config ended up inserting tabs instead of spaces depending on how my local git repo was named - I did not know that - quite unexpected. I'll correct the indents before I push this.

This revision was automatically updated to reflect the committed changes.