Page MenuHomePhabricator

[RFC][PATCH] Minor opt to access pointers to globals via pcrel GOT entries
ClosedPublic

Authored by bruno on Jan 12 2015, 3:51 AM.

Details

Summary

Global variables marked with "unnamed_addr" have the following property: "...unnamed_addr indicates that the address is not significant, only the content".

One potential use of "unnamed_addr" by front-ends is to emit "unnamed_addr constant" globals holding pointer initialisers to other global variables. For instance, this can be used if the front-end wishes to compute and store information against a location with a symbol pointer but doesn't care about its address. A more concrete example follows:

@foo = global i32 42
@proxy = unnamed_addr constant i32* @foo

@delta = global i32 trunc (i64 sub (i64 ptrtoint (i32** @proxy to i64),

         i64 ptrtoint (i32* @delta to i64))
to i32)

Here, @delta holds a data "PC"-relative offset to a pointer of @foo. The darwin/x86-64 assembly output for this follows:

.globl _foo
_foo:
.long 42

.globl _proxy
_proxy:
.quad _foo

.globl _delta
_delta:
.long _proxy-_delta

I propose a minor optimisation for such cases: given target specific support, replace pc-relative accesses to such unnamed_addr global constants by a PC relative access to the GOT entry of the final symbol instead. Therefore, "delta" would contain a pc relative relocation to foo's GOT entry and we avoid the emission of "proxy", yielding the assembly code below:

.globl _foo
_foo:
.long 42

.globl _delta
_delta:
.long _foo@GOTPCREL+4

There are a couple of advantages of doing this:

  • In situations where all that’s required is that there be some way to get the address of an external symbol we save space by not emitting "proxy" globals. Front-ends that need to emit a great deal of data containing pointers could benefit from this.
  • As a side effect, such IR constructs combined with the proposed opt opens a way to represent GOT pcrel relocations by solely using the LLVM IR, which is something we currently have no way to express.

Also, this is not MachO specific and can also benefit ELF targets. I've attached a patch implementing this simple feature for x86-64/darwin but mostly changes are target independent.

Thanks,

-Bruno

Diff Detail

Event Timeline

bruno updated this revision to Diff 18011.Jan 12 2015, 3:51 AM
bruno retitled this revision from to [RFC][PATCH] Minor opt to access pointers to globals via pcrel GOT entries.
bruno updated this object.
bruno edited the test plan for this revision. (Show Details)
bruno added reviewers: rafael, grosbach.
bruno set the repository for this revision to rL LLVM.
bruno added subscribers: Unknown Object (MLST), rjmccall, bob.wilson.
rafael edited edge metadata.Jan 23 2015, 7:38 AM

Looking at this now. Sorry for the delay.

This seems to be implemented a bit lower than I was expecting.

The idea is that is is an codegen optimization, right? When given a .s, we should just produce a .o that matches it.

If that is the case, why is this ever looking at symbols and MCExpr? It should be happening at a higher lever (GlobalValues and ConstantExpr).

include/llvm/CodeGen/AsmPrinter.h
105

Proxy seems like a new term.

I would probably go with *GOT*. GOTEquivalent maybe?

259

Start functions with a lowercase.

include/llvm/Target/TargetLoweringObjectFile.h
160

Instead of having a method that always return a fixed bool, make this an actual boolean and set it in the constructor.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
341

Why do you need to call support* in here? The could will be 0 if it is not supported, no?

907

Why do you need to special case not having any users?

915

You don't need a cast from ConstatExpr to Constant.

919

C can be null in here, no?

926

Don't repeat name in comments.

933

s/Get/compute/ maybe? Get sounds like the function returns them.

943

Why not fold these checks into isGlobalProxyCandidate?

test/MC/X86/cstexpr-gotpcrel.ll
2

Please don't use "llc -filetype=obj". To test the object emission, use llvm-mc.

bruno updated this revision to Diff 18697.Jan 23 2015, 4:12 PM
bruno edited edge metadata.
bruno removed rL LLVM as the repository for this revision.

Hi Rafael, updated the patch after your comments.

As talked on IRC, by dealing directly with MCExpr we gain all folding benefits that lowerConstant computes while getting rid of several layers and indirections of cstexprs contaning lots of getelementptrs, pointers and casts. Also, this patch generates equivalent .s or .o outputs, there's no need to implement this coming from a .s (one can always change the assembly source to use GOTPCREL instead).

Thanks,

rafael added inline comments.Jan 27 2015, 6:42 AM
include/llvm/CodeGen/AsmPrinter.h
259

the very minor bit is not particularity relevant.

include/llvm/Target/TargetLoweringObjectFile.h
160

not virtual.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
908

getNumOperands is redundant with hasInitializer.

This is missing a call to isConstant.

911

Why does FinalGV has to be a GlobalVariable? We can have a got-equivalent of a function.

915

This cast cannot fail. GlobalValue *is* a Constant.

2070

You should just use EvaluateAsRelocatable. It will put the expression in the form Sym_A - Sym_B + C for you.

2121

Why do you need the count?

You only really need to remember that there was at least one user that could not be replaced with the got and therefore the GOT-like GlobalVariable still has to be printed.

bruno added inline comments.Jan 28 2015, 9:25 AM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
2070

Cool, this makes it a *lot* easier, thanks!

2121

If I do like you suggest, we'd have to still check all MCExpr and its operands for any presence of a global equivalent symbol, even when we know really early that the MCExpr isn't a fit. The case where EvaluateAsRelocatable returns true should be easier to check, but otherwise we'll still need to parse MCExpr once again searching for a GV reference. I believe the current solution is a better approach.

bruno updated this revision to Diff 18900.Jan 28 2015, 9:27 AM

Update patch after Rafael's comments!

rafael added inline comments.Feb 3 2015, 2:48 PM
include/llvm/Target/TargetLoweringObjectFile.h
166

Indentation looks funny. Please clang-format before commit.

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
340

Say that emitGlobalGOTEquivs will emit them if needed.

919

All global Variables have pointer type.

920

s/dyn_cast/isa/

933

not sure there is a point is saying it is "(very) minor" :-)

938

High level question: Do we need to do the early pass over all variables?

Couldn't emitGlobalConstantImpl do the check and add "got equivalent candidates"?

2096

I am not sure I understand why the MV.getConstant() != BaseSymOff restriction is necessary.

If we have

.globl _proxy
_proxy:
.quad _foo

.globl _delta
_delta:

.long _proxy-_delta + 42

It can be converted to

.globl _delta
_delta:

.long _foo@GOTPCREL + 42

no? It works on ELF at least.

test/MC/X86/cstexpr-gotpcrel.ll
2

It would be nice to add code coverage on the cases where the optimization fails (use in instructions for example).

Hi bruno,

This is the complete example for what I mentioned about different
constants in phab.

ah, and the patch needs a rebase on current trunk.

trying to send the previous comments.

bruno added inline comments.Feb 5 2015, 11:03 AM
lib/CodeGen/AsmPrinter/AsmPrinter.cpp
340

OK

919

OK

920

OK

933

Ops, forgot to fix this one :-)

938

The problem of not doing an early pass is because we can't rely on the global declaration order - if we hit the 'got equivalent user' before keeping track of the 'global equivalent', we have to find the global variable for 'global equivalent' after its MCSymbol, because at that point we already lost the GlobalVariable reference. One could argue that we can find this GlobalVariable again by scanning the original cstexpr where we got the MCExpr from, but this also would require complex logic to handle all different types of cstexpr that could lead to a relocatable expression. Gonna add a comment about that.

2096

Note that the reason why BaseSymOff exists is to ensure that "." is properly matched, i.e., we need to know that the symbol we are subtracting from is indeed the "PC" location. Take your example:

delta:
  .long (.Lproxy-delta)+42
  .size delta, 4

Here, "." == "delta" and BaseSymOff == 0, since we're emitting the first and only element of delta. Now, if we're emitting the GOTPCREL into, let's say, the second element of an array where each element has 4 bytes, we need "." == "delta+4" and BaseSymOff == 4, since we want to be sure we're dealing with a pc relative location:

delta:
  .long 33
  .long (.Lproxy-(delta+4))+42 ==> .long (.Lproxy-delta)+38
  .size delta, 8

That said, the restriction isn't necessary as long as we have BaseSymOff contained into the final offset value. I didn't think of that use before although it's valid and looks like it could be (?) used by front-end writers to do all sort of black magic.

It's a small change, gonna include that in the patch and write a test for it using a cstexpr that also adds one more constant.

test/MC/X86/cstexpr-gotpcrel.ll
2

Ok

The problem of not doing an early pass is because we can't rely on the global declaration order - if we hit the 'got equivalent user' before keeping track of the 'global equivalent', we have to find the global variable for 'global equivalent' after its MCSymbol, because at that point we already lost the GlobalVariable reference. One could argue that we can find this GlobalVariable again by scanning the original cstexpr where we got the MCExpr from, but this also would require complex logic to handle all different types of cstexpr that could lead to a relocatable expression. Gonna add a comment about that.

I see. Given just the symbol it is not trivial to find about
unnamed_addr for example.

================
Comment at: lib/CodeGen/AsmPrinter/AsmPrinter.cpp:2081
@@ +2080,3 @@
+ int64_t BaseSymOff = -Offset;
+ if (BaseSym != &MV.getSymB()->getSymbol() || MV.getConstant() != BaseSymOff)
+ return;


rafael wrote:

I am not sure I understand why the MV.getConstant() != BaseSymOff restriction is necessary.

If we have


.globl _proxy
_proxy:

.quad   _foo

.globl  _delta

_delta:

.long   _proxy-_delta + 42

It can be converted to


.globl _delta
_delta:

.long   _foo@GOTPCREL + 42

no? It works on ELF at least.

Note that the reason why BaseSymOff exists is to ensure that "." is properly matched, i.e., we need to know that the symbol we are subtracting from is indeed the "PC" location. Take your example:

delta:
  .long (.Lproxy-delta)+42
  .size delta, 4

Here, "." == "delta" and BaseSymOff == 0, since we're emitting the first and only element of delta. Now, if we're emitting the GOTPCREL into, let's say, the second element of an array where each element has 4 bytes, we need "." == "delta+4" and BaseSymOff == 4, since we want to be sure we're dealing with a pc relative location:

delta:
  .long 33
  .long (.Lproxy-(delta+4))+42 ==> .long (.Lproxy-delta)+38
  .size delta, 8

That said, the restriction isn't necessary as long as we have BaseSymOff contained into the final offset value. I didn't think of that use before although it's valid and looks like it could be (?) used by front-end writers to do all sort of black magic.

It's a small change, gonna include that in the patch and write a test for it using a cstexpr that also adds one more constant.

Awesome. It is a case where doing the more general solution is
probably simpler (or at least easier to understand).

Thanks,
Rafael

bruno updated this revision to Diff 19775.Feb 11 2015, 12:47 PM

Updated the patch to reflect the previous discussion and other comments from Rafael. Somes notes:

  • A 'Base Constant' was added together with the 'Offset' in emitGlobalConstantImpl in order to keep track of the base symbol for the current emission, example: the Global Variable user on the top of multiple nested constant expressions is the 'Base Constant' for any nested sub-element that is being currently emitted. In the previous patches I used to walk users to find the 'Base Constant', this is wrong since it doesn't consider fact that constant expressions are uniqued, meaning that looking into cstexpr users may lead to different GlobalVariables, yielding the wrong result.
  • Add a more robust test.
  • Handle general constants to be used with the GOTPCREL relocation while opening the possibility for more complex IR cstexpr to be folded into a GOTPCREL+cst. See the test for concrete examples.
rafael accepted this revision.Feb 20 2015, 12:55 PM
rafael edited edge metadata.

Just test nits.

test/MC/X86/cstexpr-gotpcrel.ll
10

Please move these comments close to each test.

18

How about redirecting llc to %t and doing tow FileCheck runs? That way you can check in one place that they are never output.

This revision is now accepted and ready to land.Feb 20 2015, 12:55 PM
bruno closed this revision.Feb 23 2015, 1:32 PM

Updated with your last comments and committed in r230264.

Thanks Rafael,