Add default address space for functions to the data layout (1/3)
Needs ReviewPublic

Authored by dylanmckay on Aug 23 2017, 1:41 AM.

Details

Summary

This adds initial support for letting targets specify which address
spaces their functions should reside in by default.

If a function is created by a frontend, it will get the default address space specified in the DataLayout, unless the frontend explicitly uses a more general llvm::Function constructor. Function address spaces will become a part of the bitcode and textual IR forms, as we do not have access to a data layout whilst parsing LL.

It will be possible to write IR that explicitly has addrspace(n) on a function. In this case, the function will reside in the specified space, ignoring the default in the DL.

This is the first step towards placing functions into the correct
address space for Harvard architectures.

Full patchset

  • Add program address space to data layout D37052
  • Require address space to be specified when creating functions D37054
  • [clang] Require address space to be specified when creating functions D37057
dylanmckay created this revision.Aug 23 2017, 1:41 AM

Fix a bug with parsing the address space

I seem to have lost this piece of code at some point, it used to exist.

Uploaded the wrong diff, here's the right one

asb added a comment.Aug 23 2017, 7:07 AM

From the perspective of "does the code do what it intends to and meet LLVM coding standards?" this looks good to me.

kparzysz added inline comments.Aug 23 2017, 10:49 AM
docs/LangRef.rst
1909

Aren't switch tables considered "data" in a Harvard architecture?

dylanmckay added inline comments.Aug 24 2017, 5:42 AM
docs/LangRef.rst
1909

I think in the strictest sense, you are correct.

However, I think in the real world, it makes more sense for this kind of data to live in program memory.

For example, in the AVR, all global variables (regardless of address space) live inside the executable, and this live inside program memory. The routines that run on device startup will find all variables that live in RAM, and then must copy everything over.

This is necessary because RAM is always initialised to zero on startup, if you want to have specific data in RAM, it _needs_ to be copied over.

In the cast of switch tables, we should not need to copy them over to RAM. They will never change, along with the fact that RAM accesses take longer than program memory. I imagine this is true of other Harvard architectures as well.

On top of this, if switch tables lived in RAM it would mean that any switches converted into tables by SimplifyCFG will branch on uninitialised memory if run without startup code. This is not a particularly strong argument as lots of bad things happen when run without startup code, but I would hope that we could at least evaluate a switch without it.

One other side note is that not all AVRs have RAM, and so switch tables would need to live in program memory in these chips. Again, not a super strong argument though because if you have no stack, you probably shouldn't be using LLVM.

tl;dr I think this is more a matter of practicality rather than what we consider data versus code in the academic sense.

kparzysz added inline comments.Aug 24 2017, 9:18 AM
docs/LangRef.rst
1909

My concern here is that a subsequent patch always puts switch tables in the program memory. I don't know if that's valid for all architectures.

theraven added inline comments.Aug 24 2017, 9:21 AM
docs/LangRef.rst
1909

I agree. Program memory is correct for us as well, but on a number of microcontrollers you don't have load instructions at all for the program memory and so they need to be in the data-ROM address space.

Is the intent to allow different functions to have different address spaces and this is the default, or all functions have the one address space in the datalayout? I think the description should clarify.

lib/Target/AVR/AVRTargetMachine.cpp
28 ↗(On Diff #112341)

I think changing the AVR layout is a separate patch

asb added inline comments.Aug 24 2017, 10:34 AM
docs/LangRef.rst
1909

Putting jump tables in a data section is something the ARM backend supported as of rL289784 in order to support execute-only code (i.e. no loads allowed from the code section). Would this sort of thing be better off moving towards using separating address spaces for code and data?

Is the intent to allow different functions to have different address spaces and this is the default

Yes. I will update the description to be clearer.

docs/LangRef.rst
1909

That makes sense - what are thoughts on creating some sort of target-specific hook (probably TargetTransformInfo) specific to switch tables that can be overridden and use that instead?

dylanmckay edited the summary of this revision. (Show Details)Aug 25 2017, 9:54 PM
asb added inline comments.
docs/LangRef.rst
1909

That sounds interesting to me, but it could be a bit of a rabbit hole. @prakhar, @rengolin - do you have any thoughts on this? Did you consider such an approach for the ARM execute-only support?

Has there been an RFC on this?

Despite being a simple change, I can't foresee all the consequences. Superficially, it looks harmless, but I'd rather more people on the list could have a look at it too.

lib/Target/AVR/AVRTargetMachine.cpp
28 ↗(On Diff #112341)

Indeed. It would be easier to just revert this one change on its own if bots start break, without reverting the whole set of changes.

prakhar added inline comments.Aug 29 2017, 1:28 AM
docs/LangRef.rst
1909

For execute-only on Arm, the only concern at the compiler level is ensuring that all literal loads are performed from a separate data section and not from a code section. The actual enforcement of the execute-only policy is implementation dependent, so this was not considered. Additionally, I believe it is possible for these sections to be loaded into the same address space, but with the code section at a specific execute-only base address, so this feature may not be directly applicable. Again, this is all dependent on the implementation.

I have no further comment wrt this patch.

asb added a subscriber: simoncook.Oct 19 2017, 11:58 AM
dylanmckay marked 2 inline comments as done.
dylanmckay edited the summary of this revision. (Show Details)
  • Remove the switch table stuff for a later patch
  • Rebased on top of trunk
dylanmckay edited the summary of this revision. (Show Details)Wed, Dec 6, 10:12 PM
dylanmckay retitled this revision from Add default address space for functions to the data layout (1/4) to Add default address space for functions to the data layout (1/3).

Remove switch table from the docs

dylanmckay added inline comments.Wed, Dec 6, 10:25 PM
docs/LangRef.rst
1909

Very helpful comments, thanks everyone.

It is clear that a more general solution must be found for placing lookup tables into the right address space.

This might be tricky because if we require a target-specific hook, it cannot live in TargetTransformInfo as I've recently read that logic in this class must not be required for correctness as there may not be a TTI present. In that case, I am unsure at the moment where such a hook could live whilst still being accessible to the SimplifyCfg pass. Anyway, I'll look into that further in a later patch.

For now, I've removed the switch table related code from this patchset.

prakhar removed a subscriber: prakhar.Thu, Dec 7, 1:41 AM
bjope added inline comments.Thu, Dec 7, 3:45 AM
include/llvm/IR/GlobalValue.h
185 ↗(On Diff #125888)

Is this a preparation for some future patch? I can't see that it is used in this patch (and it does not seem to be related to adding support for "P" in DataLayout).

bjope added inline comments.Thu, Dec 7, 9:17 AM
lib/IR/DataLayout.cpp
417

I'm not exactly sure what the criteria is for this method (or how it is used).
But maybe the ProgramAddrSpace member should be compared as well?

PS. I'm trying to compare your solution with the solution we have in our out-of-tree target, and we have something similar to the ProgramAddrSpace member (we call it FunctionPointerAddressSpace and use 'F' instead of 'P'). As it happens we do not compare the FunctionPointerAddressSpace in this operator==, but I guess that is just something we forgot to add.

dylanmckay updated this revision to Diff 126267.Sat, Dec 9, 1:10 AM
dylanmckay marked 2 inline comments as done.
  • Move GlobalValue::getAddressSpace() to next patch
  • Add ProgramAddressSpace to DataLayout::operator==
dylanmckay added inline comments.Sat, Dec 9, 1:13 AM
include/llvm/IR/GlobalValue.h
185 ↗(On Diff #125888)

You are right - this is for a future patch. There is no reason why it should be on this patch however.

Moved to D37054,

lib/IR/DataLayout.cpp
417

Good catch, I have added this check