This is an archive of the discontinued LLVM Phabricator instance.

Avoid relying on address space zero default parameter in llvm/IR
Needs RevisionPublic

Authored by arichardson on Apr 20 2020, 7:01 AM.

Details

Summary

APIs such as PointerType::getUnqual/Type::getPointerTo() can result in
pointers being created in address space zero even though this is not
correct for all targets. We have been bitten by this issue many times our
CHERI fork: we use address space 200 for both globals and functions.
Creating a pointer in address space zero will generally result instruction
selection failures or in some cases code being generated that triggers
traps at run time. To avoid these problems, I've remove the many instances
of unsigned AS = 0 function parameters to ensure that pointers get created
in the right address space.

Keeping these changes out-of-tree results in compilation failures almost
every time I do an upstream merge. It would be very beneficial for us to
have these changes upstreamed, and should help targets such as AVR than
use a non-zero program address space.
I've seen some recent commits for AVR that fix the same AS=0 issues that
we have out-of-tree (e.g. 215dc2e203341f7d1edc4c4a191b048af4ace43d).
I think making such bugs a compilation failure should be beneficial for
everyone even if it means typing a few more characters to get a pointer type.

This change allows for gradual migration: we can add a single CMake line
to each directory that should no longer compile with implicit address
space zero: add_definitions(-DLLVM_NO_IMPLICIT_ADDRESS_SPACE=1). In this
patch I've added the definition to lib/IR and I will follow up with further
cleanups.

Depends on D70947.

Diff Detail

Event Timeline

arichardson created this revision.Apr 20 2020, 7:01 AM
Herald added a project: Restricted Project. · View Herald Transcript

I can't give an LGTM but at least from my perspective I very much welcome this change. I am still hitting problems with non-zero address spaces on AVR. One nit in an inline comment.

Do you plan on eventually removing LLVM_NO_IMPLICIT_ADDRESS_SPACE when the transition is finished? If so, I think it's worth documenting this somewhere - perhaps at llvm/lib/IR/CMakeLists.txt.

llvm/include/llvm/IR/DataLayout.h
360

There is no default address space in this declaration?

arichardson marked an inline comment as done.Apr 20 2020, 4:24 PM

I can't give an LGTM but at least from my perspective I very much welcome this change. I am still hitting problems with non-zero address spaces on AVR. One nit in an inline comment.

In case it helps, our changes can be seen here: https://github.com/CTSRD-CHERI/llvm-project. However, we made many of those changes before there was a program address space. For us all pointers are in AS200, so we initially had a getDefaultAS() helper and use that in cases where it should be program/globals address space. That is one of the things I intend to fix while upstreaming.

Do you plan on eventually removing LLVM_NO_IMPLICIT_ADDRESS_SPACE when the transition is finished? If so, I think it's worth documenting this somewhere - perhaps at llvm/lib/IR/CMakeLists.txt.

Yes, ultimately the goal is to remove the default parameters. It might also make sense to switch the define from an opt-out LLVM_NO_IMPLICIT_ADDRESS_SPACE to and opt-in LLVM_NEED_IMPLICIT_ADDRESS_SPACE_ZERO for individual directories/projects that have not been ported to the explicit APIs yet.

llvm/include/llvm/IR/DataLayout.h
360

Good catch, I did not mean to change this line. Copy-paste error.

This should really be reviewed on llvm-commits.

should help targets such as AVR than use a non-zero program address space.

It definitely will - we have the exact same problems in terms of every call to getUnqual on a function pointer becomes a ISel error.

I think making such bugs a compilation failure should be beneficial for everyone even if it means typing a few more characters to get a pointer type.

I agree, it's the only reasonable way to support AVR at a level on-par with other official backends, all of which do not place functions in nonzero address spaces.

bjope added a subscriber: ebevhan.Apr 21 2020, 8:14 AM

I really like the idea behind this patch, but I suspect there is a better way to go about it.

The only drawbacks to upstreaming this patch as-is is the idea that because it is implemented as a compiler define directive, it effectively splinters the the core LLVM C++ API into two different versions. Of course, this is kind of the point, but my only worry is that the LLVM_DEFAULT_AS_PARAM macro adds an additional layer of complexity to all users of the API that will be encountered and must be considered.

I think, if we have already identified all the call sites that break under the LLVM_DEFAULT_AS_PARAM patch, then we should instead make two final patches that fix the root rather than add the LLVM_DEFAULT_AS_PARAM for retroactive discovery. 1) replace all calls to getUnqual with the appropriate logic, which it looks like you've already done, and 2) remove the default value of 0 from all address space parameters so the callers *must* explicitly pass the address space in. I think that 2) would be hard to swing in the past whilst AVR was an experimental target, but now that AVR is an officially supported LLVM target, it is invariably true that all calls to getOrCreateInternalVariable and friends without explicit address spaces will break an officially supported in-tree target. It no longer makes sense to default to zero, so let's fix the root cause.

This would the problem at the root, stopping all new LLVM code from implicitly defaulting to address space 0. This still leaves open the problem of developers explicitly calling getUnqual when constructing pointers, but this problem is left open under this patch currently as regardless of the default parameter or not, calls to getUnqual are still allowed.

tl;dr I suggest removing the LLVM_DEFAULT_AS_PARAM directive from this patch, instead permanently removing the default value of = 0. This will achieve the same end result, but it will fix the broken API rather than just making it easier for downstream users like CHERI to discover.

What are your thoughts?

Herald added a project: Restricted Project. · View Herald TranscriptMay 16 2020, 8:19 PM
dylanmckay requested changes to this revision.May 16 2020, 8:19 PM
This revision now requires changes to proceed.May 16 2020, 8:19 PM

I really like the idea behind this patch, but I suspect there is a better way to go about it.

The only drawbacks to upstreaming this patch as-is is the idea that because it is implemented as a compiler define directive, it effectively splinters the the core LLVM C++ API into two different versions. Of course, this is kind of the point, but my only worry is that the LLVM_DEFAULT_AS_PARAM macro adds an additional layer of complexity to all users of the API that will be encountered and must be considered.

I think, if we have already identified all the call sites that break under the LLVM_DEFAULT_AS_PARAM patch, then we should instead make two final patches that fix the root rather than add the LLVM_DEFAULT_AS_PARAM for retroactive discovery. 1) replace all calls to getUnqual with the appropriate logic, which it looks like you've already done, and 2) remove the default value of 0 from all address space parameters so the callers *must* explicitly pass the address space in. I think that 2) would be hard to swing in the past whilst AVR was an experimental target, but now that AVR is an officially supported LLVM target, it is invariably true that all calls to getOrCreateInternalVariable and friends without explicit address spaces will break an officially supported in-tree target. It no longer makes sense to default to zero, so let's fix the root cause.

This would the problem at the root, stopping all new LLVM code from implicitly defaulting to address space 0. This still leaves open the problem of developers explicitly calling getUnqual when constructing pointers, but this problem is left open under this patch currently as regardless of the default parameter or not, calls to getUnqual are still allowed.

tl;dr I suggest removing the LLVM_DEFAULT_AS_PARAM directive from this patch, instead permanently removing the default value of = 0. This will achieve the same end result, but it will fix the broken API rather than just making it easier for downstream users like CHERI to discover.

What are your thoughts?

I would love to do that, but it would require many hundreds of changes. This is why I decided to go for a per-directory opt-in approach.
The next step would be to change it to an opt-out and finally remove it.

I can try inverting the logic to explicitly request implicit address space zero and see how many CMakeLists.txt need changing.
The benefit of that approach would be that new directories will automatically require explicit address spaces.

One reason the default AS macro might be useful is for out-of-tree consumers of the LLVM libraries that always use address space zero and changing it would require large update costs to move to LLVM 11+.
I'm not sure that's a reason though since the APIs are not considered stable.

Note to the readers: A recent-ish llvm-dev discussion around this patch can be found in here whrein a consensus is found on the way forward on this patch.

dylanmckay requested changes to this revision.Aug 27 2020, 5:36 AM
This revision now requires changes to proceed.Aug 27 2020, 5:36 AM

Can I just add a +1 on something like this. My swift compiler build just broke at runtime because of this.

Would it be possible to change the default so it reads the address space from the target data layout so it's all automatic in most cases? I know there's a bit of complexity because you have to work out whether to read the data or function address space, is that visible?