This is an archive of the discontinued LLVM Phabricator instance.

Avoid pointer truncation by InstCombine with IntToPtr combining
AbandonedPublic

Authored by rampitec on Sep 17 2015, 3:53 PM.

Details

Summary

InstCombiner::visitIntToPtr can insert a zero extend or truncate to match pointer size. The pointer size is determined by a corresponding DataLayout record for a given address space.
The problem occurs when that is a pointer to a function. There are no DataLayout fields to represent code, so size of a pointer to code is unknown.
Problem manifest itself as a pointer truncation in case if target is 64 bit, but a pointer to address space zero is 32 bit (for example "e-p:32:64-p1:64:64"). In this situation InstCombine would produce from:

%0 = ptrtoint i8 addrspace(1)* %ptr to i64
%1 = inttoptr i64 %0 to i32 ()*
%call = tail call spir_func i32 %1()

a malformed call to a function using truncated pointer:

%0 = ptrtoint i8 addrspace(1)* %ptr to i64
%1 = trunc i64 %0 to i32
%2 = inttoptr i32 %1 to i32 ()*
%call = tail call spir_func i32 %2()

In a longer term that is needed to create a mechanism to provide code pointer size info. Currently patch disables this optimization on a function pointer.

Diff Detail

Event Timeline

rampitec updated this revision to Diff 35041.Sep 17 2015, 3:53 PM
rampitec retitled this revision from to Avoid pointer truncation by InstCombine with IntToPtr combining.
rampitec updated this object.
rampitec added reviewers: arsenm, mehdi_amini.
rampitec set the repository for this revision to rL LLVM.
rampitec added a subscriber: llvm-commits.
mehdi_amini edited edge metadata.Sep 17 2015, 4:05 PM

This is what I expected from the previous patch, it seems backed everywhere in LLVM that functions are in address space 0. I wonder if the solution would be not to add a "code address space" to the DataLayout but rather being able to specify an address space on the function type itself.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1452

Put the comment before the added condition maybe?

test/Transforms/InstCombine/combine-func-ptr.ll
6

Shouldn't it be at CHECK-NOT?

It should be possible, but spir producers will still produce address space 0, and then we can hardly fix all potential frontends. Then again, code address space is still different and in most cases does not alias with data (except for jits and generally self modifying code). An implementation might create an address space for its code, though target knowledge will need to start with frontends, limiting acceptable inputs.

Actually I do not see an ideal solution.

Stas

BTW, even a single code address space is not a good solution. It might serve immediate need, but not good enough as a general abstraction.
Back in 286 times there were far and near functions and code pointers. So several code address spaces might be needed for some implementations.
Maybe instead of making new code address spaces, it would make sense to be able map code to a particular address space as a separate field of data layout, and have a way of providing several such mappings.
That will change the whole way of querying pointer size across the compiler though. Generally it will be impossible to ask for a pointer size given an address space, only given a type.

Stas

rampitec updated this revision to Diff 35053.Sep 17 2015, 6:36 PM
rampitec edited edge metadata.
rampitec removed rL LLVM as the repository for this revision.

Changed per Mehdi's comments.

pete added a subscriber: pete.Sep 17 2015, 6:43 PM

I assume block addresses would have a similar problem? Not sure if there are any other constructs which take the address of code.

Pete

It should be possible, but spir producers will still produce address space 0, and then we can hardly fix all potential frontends.

I'm not sure why?
If LLVM 3.x ships with code address spaces to fix this bug, then either the front-end supports this or it continues generates buggy code (just as now).

Then again, code address space is still different and in most cases does not alias with data (except for jits and generally self modifying code).

I don't see how it contradicts what I said, on the opposite I think it justify completely having the ability to specify function address space.

An implementation might create an address space for its code, though target knowledge will need to start with frontends, limiting acceptable inputs.

I'm not sure I understand what you mean here?
Address space are already an opaque contract with the target.

BTW, even a single code address space is not a good solution. It might serve immediate need, but not good enough as a general abstraction.
Back in 286 times there were far and near functions and code pointers. So several code address spaces might be needed for some implementations.

Which is exactly the point of my previous comment: "being able to specify an address space on the function type". You could have one function in address space 0 and another in address space 1.

rampitec marked 2 inline comments as done.Sep 17 2015, 6:48 PM

I think you are right. All we need to do is to convince frontend and
Kronos standard writers ;)

Stas

BTW, even a single code address space is not a good solution. It might serve immediate need, but not good enough as a general abstraction.

Back in 286 times there were far and near functions and code pointers. So several code address spaces might be needed for some implementations.

Which is exactly the point of my previous comment: "being able to specify an address space on the function type". You could have one function in address space 0 and another in address space 1.

rampitec updated this revision to Diff 35054.Sep 17 2015, 7:01 PM

Added isLabelTy() to the check to complete code pointers check.

This patch seems highly suspect. Pointee types are largely irrelevant given LLVM's value semantics.

lib/Transforms/InstCombine/InstCombineCasts.cpp
1453

To quote http://llvm.org/docs/LangRef.html#pointer-type

Note that LLVM does not permit pointers to void (void*) nor does it permit pointers to labels (label*). Use i8* instead.

test/Transforms/InstCombine/combine-func-ptr.ll
2

This is not ok. Please run only instcombine.

6

Please use more comprehensive check lines.

arsenm edited edge metadata.Sep 18 2015, 12:57 AM

I think for our purposes we care more about the address space of alloca being distinct from the address space of code.

I think adding support for non-0 addrspace alloca would mostly solve this problem and a few other related ones around address space 0 assumptions. The "default" address space code pointer would then just happen to work.

This might be a good solution, but what would zero then mean? That would likely break compatibility with existing spir codes.

Stas

This might be a good solution, but what would zero then mean? That would likely break compatibility with existing spir codes.

Stas

The point would be to avoid ever using address space 0. A SPIR loader would have to do a lot of rewriting to eliminate them

rampitec abandoned this revision.Jan 18 2017, 10:00 AM