This is an archive of the discontinued LLVM Phabricator instance.

Add a default address space for globals to DataLayout
ClosedPublic

Authored by arichardson on Dec 3 2019, 2:07 AM.

Details

Summary

This is similar to the existing alloca and program address spaces (D37052)
and should be used when creating/accessing global variables.
We need this in our CHERI fork of LLVM to place all globals in address space 200.
This ensures that values are accessed using CHERI load/store instructions
instead of the normal MIPS/RISC-V ones.

The problem this is trying to fix is that most of the time the type of
globals is created using a simple PointerType::getUnqual() (or ::get() with
the default address-space value of 0). This does not work for us and we get
assertion/compilation/instruction selection failures whenever a new call
is added that uses the default value of zero.

In our fork we have removed the default parameter value of zero for most
address space arguments and use DL.getProgramAddressSpace() or
DL.getGlobalsAddressSpace() whenever possible. If this change is accepted,
I will upstream follow-up patches to use DL.getGlobalsAddressSpace() instead
of relying on the default value of 0 for PointerType::get(), etc.

This patch and the follow-up changes will not have any functional changes
for existing backends with the default globals address space of zero.
A follow-up commit will change the default globals address space for
AMDGPU to 1.

Diff Detail

Unit TestsFailed

Event Timeline

arichardson created this revision.Dec 3 2019, 2:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 3 2019, 2:07 AM

Build result: fail - 60406 tests passed, 1 failed and 726 were skipped.

failed: LLVM.CodeGen/SPARC/fp128.ll

Log files: console-log.txt, CMakeCache.txt

arsenm added a comment.Dec 3 2019, 8:55 AM

Needs some bitcode compatibility tests, and actual uses. AMDGPU's data layout should also add -G1

llvm/docs/LangRef.rst
2059

The phrasing is a bit wrong. This implies for all global variables. This should maybe be the default address space for globals?

arichardson marked an inline comment as done.Dec 3 2019, 9:25 AM

Needs some bitcode compatibility tests, and actual uses. AMDGPU's data layout should also add -G1

That's great to know and should make it easier to write tests.

I was going to add uses in follow-up commits but if you prefer I can also include some in this change.

llvm/docs/LangRef.rst
2059

Possibly something like this? Specifies the address space to be used by default when creating global variables. If omitted, the globals address space defaults to the default address space 0. Note: variable declarations without an address space are always created in address space 0, this property only affects the default value to be used when creating globals programmatically.

I could probably also adjust the parser to use this value as a default but I'm not sure that is desirable.

Recommend adding a couple new tests to unittests/IR/DataLayoutTest.cpp.

arsenm added inline comments.Jan 31 2020, 6:49 AM
llvm/docs/LangRef.rst
2059

This is better. I think "when creating globals programmatically" is still a bit misleading. This is really for the case where generic code synthesizes globals out of nothing, and it's not wrong to use a different address space without more context. How about something like "when creating globals without additional contextual information" or something?

Does this actually get used in cases that produce non-constant globals?

llvm/include/llvm/IR/DataLayout.h
126

Maybe DefaultGlobalsAddrSpace?

lenary removed a subscriber: lenary.Feb 3 2020, 2:52 AM
arichardson marked an inline comment as done.
  • Use the default globals AS when creating GlobalVariable objects with a DataLayout
  • Add a test checking this behaviour
  • Update documentation

If this is accepted I'll submit some follow-up changes to allow instrumentations such as SanitizerCoverage, etc to work with a non-zero globals AS.
For reference here are the uses of DL.getGlobalsAddressSpace() in CHERI LLVM:

clang/lib/CodeGen/CGBuiltin.cpp:1479:  unsigned GlobalAS = CGF.CGM.getDataLayout().getGlobalsAddressSpace();
clang/lib/CodeGen/CGBuiltin.cpp:2084:    unsigned GlobalAS = CGM.getDataLayout().getGlobalsAddressSpace();
clang/lib/CodeGen/CGExprConstant.cpp:1915:    unsigned GlobalAS = CGM.getDataLayout().getGlobalsAddressSpace();
llvm/include/llvm/IR/DataLayout.h:293:  unsigned getGlobalsAddressSpace() const { return GlobalsAddrSpace; }
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp:20938:                           DAG.getDataLayout().getGlobalsAddressSpace()),
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:332:                              DAG.getDataLayout().getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:354:                       DAG.getDataLayout().getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2021:                             DAG.getDataLayout().getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/LegalizeDAG.cpp:2561:                       DAG.getDataLayout().getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:2444:  EVT PTy = TLI.getPointerTy(TD, TD.getGlobalsAddressSpace());
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:5662:      DAG.getTargetLoweringInfo().getPointerTy(DL, DL.getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:7923:                         DAG.getDataLayout().getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:8415:                       DAG.getDataLayout().getGlobalsAddressSpace()) &&
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:469:        DAG.getDataLayout(), DAG.getDataLayout().getGlobalsAddressSpace()));
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp:7210:                               DAG.getDataLayout().getGlobalsAddressSpace());
llvm/lib/IR/Globals.cpp:367:                       ? M.getDataLayout().getGlobalsAddressSpace()
llvm/lib/Target/RISCV/RISCVISelLowering.cpp:705:      *DAG.getContext(), DAG.getDataLayout().getGlobalsAddressSpace());
llvm/lib/Target/RISCV/RISCVISelLowering.cpp:977:                             DAG.getDataLayout().getGlobalsAddressSpace());
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:1398:  GlobalAlias *GA = GlobalAlias::create(Int8Ty, M.getDataLayout().getGlobalsAddressSpace(), GlobalValue::ExternalLinkage,
llvm/lib/Transforms/IPO/WholeProgramDevirt.cpp:1682:      B.GV->getInitializer()->getType(), M.getDataLayout().getGlobalsAddressSpace(), B.GV->getLinkage(), "",
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp:377:  Type *PcAddrPtrTy = PcAddrTy->getPointerTo(DL->getGlobalsAddressSpace());
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp:381:      PointerType::get(IRB.getInt64Ty(), DL->getGlobalsAddressSpace());
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp:383:      PointerType::get(IRB.getInt32Ty(), DL->getGlobalsAddressSpace());
llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp:385:      PointerType::get(IRB.getInt8Ty(), DL->getGlobalsAddressSpace());
llvm/lib/Transforms/Utils/ModuleUtils.cpp:34:      IRB.getInt8PtrTy(M.getDataLayout().getGlobalsAddressSpace());
  • Use Optional instead of UINT_MAX
bjope added a subscriber: ebevhan.Apr 21 2020, 8:15 AM

Reposting a comment from Matt from earlier

Needs some bitcode compatibility tests, and actual uses. AMDGPU's data layout should also add -G1

I think that once these two things are done, then there is no reason to not merge this patch. It is well written, tested, and, if switched on for the AMDGPU target, useful to an in-tree backend.

llvm/lib/IR/Globals.cpp
369 ↗(On Diff #258695)

Can we add a test that global variables do indeed get initialized with the address space specified in the DataLayout?

For example, compile under X86 but with G7 in the data layout, then assert addrspace(7) in the IR output.

I think every other code path is covered.

dylanmckay requested changes to this revision.May 31 2020, 4:33 AM
This revision now requires changes to proceed.May 31 2020, 4:33 AM

Needs some bitcode compatibility tests, and actual uses. AMDGPU's data layout should also add -G1

I've split the backwards compat tests and AMDGPU changes out to D84345. Is the DataLayout upgrade what you meant with bitcode compatibility tests or should I be adding additional tests?

dylanmckay added a comment.EditedAug 26 2020, 9:26 AM

I've split the backwards compat tests and AMDGPU changes out to D84345. Is the DataLayout upgrade what you meant with bitcode compatibility tests or should I be adding additional tests?

Good point - comparing this to D84345, there seems to be no obvious reason to add bitcode compatibility tests in this patch because the bitcode format itself is not changed (the data layout is just an arbitrary-length string embedded within the bitcode file). This particular patch does not need bitcode compatibility tests.

As far as I'm concerned, the code in this patch is good to go and all of the comments have been resolved, although I will defer actually approving the patch to be merged until the comments on D84345 are sorted, which I think is important so that the new logic is guaranteed to have at least one in-tree use. Once D84345 has been approved, I will formally hit approve on this patch and they can both be merged at the same time in a set.

llvm/lib/IR/Globals.cpp
369 ↗(On Diff #258695)

Can we add a test that global variables do indeed get initialized with the address space specified in the DataLayout?

For example, compile under X86 but with G7 in the data layout, then assert addrspace(7) in the IR output.

I think every other code path is covered.

dylanmckay added inline comments.Aug 26 2020, 9:28 AM
llvm/lib/IR/Globals.cpp
369 ↗(On Diff #258695)

I can't mark these two existing inline comments as done but note that they have been resolved.

arichardson edited the summary of this revision. (Show Details)Aug 27 2020, 2:30 AM
arichardson marked 5 inline comments as done.Aug 27 2020, 2:30 AM
arichardson edited the summary of this revision. (Show Details)

Fix clang-tidy warnings

@dylanmckay D84345 is now accepted, is this good to land now?

dylanmckay accepted this revision.Nov 17 2020, 2:23 AM

@arichardson yup, this is good to go. Thanks for the persistence on this one.

This revision is now accepted and ready to land.Nov 17 2020, 2:23 AM
This revision was landed with ongoing or failed builds.Nov 20 2020, 7:47 AM
This revision was automatically updated to reflect the committed changes.

Why is this useful rather than having targets set the address space used in the backend explicitly? i.e. how is what amdgpu was doing up to this point not ok? It's not really clear here.

Why is this useful rather than having targets set the address space used in the backend explicitly? i.e. how is what amdgpu was doing up to this point not ok? It's not really clear here.

That's fine for globals created in target-specific passes, but other passes that create new global variables need to know which address space to place them in. In our CHERI fork (using a default address space of 200), we got lots of assertion failures/invalid IR due to passes creating globals in AS0 (or assuming that all global pointers are AS0) until I added a default globals address space to the datalayout so those passes can choose the address space that our target needs. Does that make sense?