This is an archive of the discontinued LLVM Phabricator instance.

SROA: Allow eliminating addrspacecasted allocas
ClosedPublic

Authored by arsenm on Apr 10 2017, 8:52 PM.

Details

Summary

This is a resurrection of D10482 and D4501

There is a circular dependency between SROA and InferAddressSpaces
today that requires running both multiple times in order to be able to
eliminate all simple allocas and addrspacecasts. InferAddressSpaces
can't remove addrspacecasts when written to memory, and SROA helps
move pointers out of memory.

This should avoid inserting new commuting addrspacecasts with GEPs,
since there are unresolved questions about pointer wrapping between
different address spaces.

For now, don't replace volatile operations that don't match the alloca
addrspace, as it would change the address space of the access. It may
be still OK to insert an addrspacecast from the new alloca, but be
more conservative for now.

Diff Detail

Event Timeline

arsenm created this revision.Apr 10 2017, 8:52 PM
efriedma added inline comments.
docs/LangRef.rst
8452 ↗(On Diff #94776)

Do we need to restrict this rule to inbounds indexing?

arsenm added inline comments.Apr 11 2017, 12:28 PM
docs/LangRef.rst
8452 ↗(On Diff #94776)

I suppose this should specify for a defined result. The idea is only really to disallow implementations that can't round-trip likes in @sanjoy's example of implementing it with abs.

arsenm updated this revision to Diff 95311.Apr 14 2017, 9:56 AM

Specify defined results, and that the pointer must be able to round trip

efriedma added inline comments.Apr 14 2017, 10:45 AM
docs/LangRef.rst
8451 ↗(On Diff #95311)

What does "and then indexed" mean for a gep that isn't inbounds? We clearly can't make all indexing equivalent: if you use a gep to increment a pointer by 2^33, that's clearly going to have a different result if you try to round-trip that value through a 32-bit pointer.

I mean, I understand what you're getting at with the new text: the rule is essentially that the original and casted pointers point at the same memory allocation. LangRef really needs to be clear, though.

arsenm updated this revision to Diff 95441.Apr 17 2017, 9:06 AM

Attempt simpler phrasing

efriedma added inline comments.Apr 18 2017, 12:30 PM
docs/LangRef.rst
8460 ↗(On Diff #95441)

"The pointer conversion cannot be an arbitrarily complex value modification." is a bit vague... it'd be better if we could specifically say what transforms are allowed.

lib/Transforms/Scalar/SROA.cpp
1661

Why do we want to generate an address-space cast here, as opposed to performing the memory operation using the alloca's natural address-space?

arsenm updated this revision to Diff 96005.Apr 20 2017, 12:52 PM

Remove unnecessary change. The pointer already has the right address space

arsenm added inline comments.Apr 20 2017, 1:18 PM
docs/LangRef.rst
8460 ↗(On Diff #95441)

Would just dropping that sentence work? The intent is more clear in the second sentence where it needs to be reversible

I'd like to see some testcases which involve mixing GEPs and addrspacecasts. What width of APInt do we use to accumulate the offset? How does overflow work when we mix GEPs in different address-spaces?

docs/LangRef.rst
8460 ↗(On Diff #95441)

Yes, I think that's fine.

sanjoy added inline comments.Apr 20 2017, 1:41 PM
lib/Transforms/Scalar/SROA.cpp
1631

Are you ruling that GEP(CAST(X), 1) is the same as CAST(GEP X, 1)? If so, I'm not sure this is correct given your constraint on address space casts.

For instance, if casting from address space N to M, with both spaces having the same pointer width, involves flipping the high and low halves of the pointer then GEP(CAST(X), 1) is not the same as (CAST(GEP X, 1)).

Of course, this means that GEPs over pointers of address space M are different operations from GEPs over pointers of address space N, but that's allowed, AFAIK.

arsenm added inline comments.Apr 20 2017, 1:45 PM
lib/Transforms/Scalar/SROA.cpp
1631

Yes, those should be the same

sanjoy added inline comments.Apr 20 2017, 1:46 PM
lib/Transforms/Scalar/SROA.cpp
1631

Then you need to change the langref to disallow address space casts as the above (flipping the high and low halfs of the pointer).

arsenm updated this revision to Diff 96082.Apr 20 2017, 7:46 PM

Try to re-word langref again.

Re-add addrspacecast insertion. It can be necessary when the alloca isn't entirely eliminated.

Fix asserting when casting between different sized pointers

chandlerc edited edge metadata.Apr 20 2017, 10:22 PM

Restricting addrspace cast in this way seems... really hard to get right. I still have the question Eli asked: what does this mean in the absence of inbounds? What if the address spaces have different wrapping behavior even though they have the same number of bits?

Consider an address space where there are tag bits in the high bits and one where there aren't. These may appear to be the same type, but the GEP-ing rule you propose doesn't seem to generally hold.

And that's just one example. I'm not sure even restricting this to inbounds will really fix the issue.

What about approaching this more from the inference perspective? Could we embed the inference into the iteration of SROA without shifting the restrictions so much?

theraven added inline comments.Apr 21 2017, 1:11 AM
docs/LangRef.rst
8460 ↗(On Diff #95441)

It would be nice to clarify what 'legal' means in this context. For us, the relationship between address spaces 0 and 200 rely on some run-time properties. Address space 200 is always a superset of address space 0, so it is always safe to cast from AS 0 to AS 200, but the converse might not be possible and will give either a valid value (without bounds information) or a null pointer. A cast from AS200 -> AS0 -> AS200 may result in a null pointer if the address is outside the range covered by AS0. The same would apply on microcontrollers with a 32-bit global address space and a 16-bit address space mapped within that: you could always cast from the 16-bit range to the 32-bit range and back, but casting from an arbitrary 32-bit range to the 16-bit range and back may not work.

lib/Analysis/PtrUseVisitor.cpp
37–43

No changes suggested here, but in our version we have queries on the data layout that differentiate between the type and the range of a pointer (ours are 128- or 256-bit sized, but with a 64-bit range).

Restricting addrspace cast in this way seems... really hard to get right. I still have the question Eli asked: what does this mean in the absence of inbounds? What if the address spaces have different wrapping behavior even though they have the same number of bits?

Consider an address space where there are tag bits in the high bits and one where there aren't. These may appear to be the same type, but the GEP-ing rule you propose doesn't seem to generally hold.

And that's just one example. I'm not sure even restricting this to inbounds will really fix the issue.

What about approaching this more from the inference perspective? Could we embed the inference into the iteration of SROA without shifting the restrictions so much?

I'm not sure exactly what you mean by this. Do you mean somehow merging InferAddressSpaces and SROA?

lib/Analysis/PtrUseVisitor.cpp
37–43

We will probably need this at some point for using pointers for resource descriptors

What about approaching this more from the inference perspective? Could we embed the inference into the iteration of SROA without shifting the restrictions so much?

I'm not sure exactly what you mean by this. Do you mean somehow merging InferAddressSpaces and SROA?

In a limited form...

Essentially, expose utilities to infer address spaces which can be shared with the InferAddressSpaces pass but can also be used to infer address spaces for allocas as SROA promotes their uses into SSA registers.

Was there a decision reached here on what the correct semantics are? There are other places in LLVM (I found one in instcombine - there may be others) which do make the assumption that this change is proposing to introduce to the langref. Personally, I don't think this transformation should be allowed. I know there are architectures where different address spaces have different GEP behavior (though I'm not sure if this is the case for any in-tree backend). Nevertheless, if people do feel like this should be allowed (e.g. because such architectures should use something other than addrspacecast to convert between such address spaces), that's fine with me as well, but I think there should a clear statement in the langref on way or the other. As is, different people read the langref differently.

Was there a decision reached here on what the correct semantics are? There are other places in LLVM (I found one in instcombine - there may be others) which do make the assumption that this change is proposing to introduce to the langref. Personally, I don't think this transformation should be allowed. I know there are architectures where different address spaces have different GEP behavior (though I'm not sure if this is the case for any in-tree backend). Nevertheless, if people do feel like this should be allowed (e.g. because such architectures should use something other than addrspacecast to convert between such address spaces), that's fine with me as well, but I think there should a clear statement in the langref on way or the other. As is, different people read the langref differently.

Do you mean non-integral pointers? I don't think this changes the rules I was thinking. You still should have a reversible result. AMDGPU will use some non-integral pointers eventually, although it doesn't need them for this particular case.

I think the way the non-integral pointer is worded now is to allow GC etc. to completely replace the pointer value, in which case eliminating it like this is probably not OK. Would it work to restrict this for only integral pointer address spaces?

There was some discussion about non-integral address spaces at EuroLLVM. The current restriction is too great, as not allowing ptrtoint and inttoptr makes it impossible to support C-like languages. We discussed refining the definition to be that optimisers should not introduce inttoptr or ptrtoint, but that they are allowed to be inserted by the front end in places where they are valid in the context of the source language.

This change still has the problem with requiring bijection between address spaces, which is not the case for us or for any platform where there is a subset relationship between address spaces. For example, casting from a 32-bit address to a 16-bit address and then back is not guaranteed to give the same address. On an ARM M-profile chip, casting between addresses in different MPU sections may encounter similar problems.

I'm very nervous of any optimisations that introduce address space casts, because they rely on far more knowledge of the relationship between address spaces than we currently provide with the data layout. Perhaps providing that information in the data layout should be a prerequisite for this.

Do you mean non-integral pointers?

No, I don't mean non-integral pointers (though there's problems there too). I apologize if I'm being vague here, but I read a lot of architecture specs and I can never remember what is and is not public. In any case, I think the easiest example here is virtual memory. Consider an architecture with primitives for both physical and virtual memory and instructions for converting between the two quickly. It seems perfectly plausible to want to express the conversion between the two kinds of pointers as an address space cast. However, certainly geps and address space casts don't commute here. Now, as I said, you might argue that should a crazy address space cast deserves a target specific intrinsic, and I think that's a fine stance to take. I suppose the other alternative would be to add some information to the datalayout or the addressspace casts itself to indicate whether the optimizer is allowed to introduce addrspace casts that weren't in the original program.

sanjoy edited edge metadata.May 15 2017, 9:45 AM

There was some discussion about non-integral address spaces at EuroLLVM. The current restriction is too great, as not allowing ptrtoint and inttoptr makes it impossible to support C-like languages. We discussed refining the definition to be that optimisers should not introduce inttoptr or ptrtoint, but that they are allowed to be inserted by the front end in places where they are valid in the context of the source language.

You're making me regret not making it to EuroLLVM even more. :)

We disallowed ptrtoint and inttoptr because these instructions (today) are arbitrarily speculatable; and changing that to be dependent on their types would introduce complexity.

Instead my plan is to add intrinsics to convert between ni pointers and integers with exactly the property you mentioned -- these intrinsics may have side effects so they can't be inserted by the optimizer or speculated, but the frontend may insert them when legal.

For us, speculation isn't a problem. ptrtoint is not guaranteed to give stable results in all run-time environments (i.e. if we enable a copying GC), but it doesn't break the memory safety guarantees. inttoptr only works in some execution environments (and will result in a null where it wouldn't work), and it's up to the C programmer to ensure that they don't use it when it wouldn't be sensible and other front ends won't emit it at all. Code works as expected, as long as optimisers don't try to add them.

Having pondered this some more, I wonder if what we're missing is an annotation on the addrspace cast itself that indicates whether or not GEPs may be commuted past it (could call it inbounds or something else). It seems like in many cases (including allocas). The frontend (or whoever else is doing language/target specific work) can often know whether the entire object is available in the target address space (e.g. because the entire stack always is).

arsenm updated this revision to Diff 202756.Jun 3 2019, 11:07 AM
arsenm marked 3 inline comments as done.

Rebase and fix using the pointer type instead of the new indexing type. Don't introduce a new addrspacecast, since it's easily avoidable.

Since the addrspacecast is no longer inserted, I think that avoids some of the questions about pointer wrapping? The addrspacecasts are only eliminated, and ignored for computing the offset.

lib/Transforms/Scalar/SROA.cpp
1631

Does this only matter because of the newly introduced addrspacecast? This may change the pointer value, but we only care about number of bytes indexed off of the original object. Changing the representation in the middle shouldn't change the total number of bytes addressed from the original object?

1661

You're right, this is unnecessary

arsenm updated this revision to Diff 203889.Jun 10 2019, 1:40 PM
arsenm edited the summary of this revision. (Show Details)

Re-apply change to insert a final addrspacecast in getAdjustedPtr, which is necessary in some cases.

Don't allow getAdjustedPtr to search through addrspacecast. This should prevent the questionable addrspacecast-GEP commuting behavior. The practical case that matters is an alloca immediately casted, and all addressing is done in the result address space, so getting the same GEP folds in all cases as a bitcast isn't critically important, though would be nice to have. Some new addrspacecasts may be introduced, but not followed by a GEP.

Fix changing the address space of volatile operations, although inserting a new addrspacecast should be OK in these cases. For now just leave these cases alone.

Also fix some missing test coverage.

arsenm marked an inline comment as done.Jun 10 2019, 1:45 PM
arsenm added inline comments.
lib/Transforms/Scalar/SROA.cpp
1661

This is actually necessary in some cases. (e.g. in select_addrspacecast_const_op, without this, the two operands of the select end up as different types)

theraven accepted this revision.Jun 11 2019, 1:31 AM
theraven added a subscriber: arichardson.

I think this looks like it will improve codegen for us and not violate any of our C-level guarantees. Hopefully @arichardson can also take a look.

This revision is now accepted and ready to land.Jun 11 2019, 1:31 AM

I think this looks like it will improve codegen for us and not violate any of our C-level guarantees. Hopefully @arichardson can also take a look.

I just tried this on our fork and it looks good.

test/Transforms/SROA/basictest.ll
108

Use FileCheck captures for the variables in case the naming changes in the future?

arsenm closed this revision.Jun 14 2019, 2:35 PM

r363462