This is an archive of the discontinued LLVM Phabricator instance.

Introduce llvm.load.relative intrinsic.
ClosedPublic

Authored by pcc on Mar 22 2016, 11:56 AM.

Details

Summary

This intrinsic takes two arguments, `%ptr and %offset`. It loads
a 32-bit value from the address `%ptr + %offset, adds %ptr` to that
value and returns it. The constant folder specifically recognizes the form of
this intrinsic and the constant initializers it may load from; if a loaded
constant initializer is known to have the form `i32 trunc(x - %ptr)`,
the intrinsic call is folded to `x`.

LLVM provides that the calculation of such a constant initializer will
not overflow at link time under the medium code model if `x` is an
`unnamed_addr` function, however it does not provide the same for a constant
initializer folded into a function body; this intrinsic can be used to avoid
the possibility of overflows when loading from such a constant.

Depends on http://reviews.llvm.org/D17938

Diff Detail

Repository
rL LLVM

Event Timeline

pcc updated this revision to Diff 51315.Mar 22 2016, 11:56 AM
pcc retitled this revision from to Introduce llvm.load.relative intrinsic..
pcc updated this object.
pcc added reviewers: rafael, rjmccall.
pcc added a subscriber: llvm-commits.
rjmccall edited edge metadata.Mar 22 2016, 1:02 PM

I would appreciate it if this intrinsic supported both i32 and i64 relative offsets. I understand that overflowing the relative adjustment is generally not a problem on 64-bit platforms, but it would be nice if frontends using a "far" relative address didn't need to use different code generation patterns.

pcc added a comment.Mar 22 2016, 3:48 PM

I would appreciate it if this intrinsic supported both i32 and i64 relative offsets.

That seems like a reasonable extension. We can certainly add that later if it is needed by a frontend.

Actually, I don't understand why the overloading is done the way it is. Are you planning to store v-table entries relative to the v-table's address point rather than the address of the slot? Even if so, can't that sort of further adjustment be trivially done by the caller instead of baking it into the intrinsic?

I think it would be simpler to have a llvm.load.relative.iN that assumes an iN relative offset and doesn't take an explicit second operand.

pcc added a comment.Mar 22 2016, 5:25 PM

Are you planning to store v-table entries relative to the v-table's address point rather than the address of the slot?

Yes, exactly. Sorry if that wasn't clear before. On most architectures the vtable load can be expressed as a reg+imm load; we can achieve a shorter call sequence by using that register as an addend rather than having an additional immediate.

Even if so, can't that sort of further adjustment be trivially done by the caller instead of baking it into the intrinsic?

We could certainly do it that way, but the problem with that is that I think it would make other parts of the compiler more complicated. For example, the whole-program virtual call optimizer would need to do more IR walking in order to discover the slot offset (which would now need to be present in two places in the IR and verified to be equal) and the call site. That was one of the things I wanted to address in the current design of the vcall optimizer (see also http://lists.llvm.org/pipermail/llvm-dev/2016-February/096146.html).

In D18367#381050, @pcc wrote:

Are you planning to store v-table entries relative to the v-table's address point rather than the address of the slot?

Yes, exactly. Sorry if that wasn't clear before. On most architectures the vtable load can be expressed as a reg+imm load; we can achieve a shorter call sequence by using that register as an addend rather than having an additional immediate.

On ARM and AArch64, adjusting the v-table address to the slot address can be done with a pre-indexed load, so it doesn't really matter there. I agree that it probably helps x86, though. Somewhat amusing to think about x86 requiring more instructions to do something, though. :)

Even if so, can't that sort of further adjustment be trivially done by the caller instead of baking it into the intrinsic?

We could certainly do it that way, but the problem with that is that I think it would make other parts of the compiler more complicated. For example, the whole-program virtual call optimizer would need to do more IR walking in order to discover the slot offset (which would now need to be present in two places in the IR and verified to be equal) and the call site.

Hmm. Fair enough, I think I see your point.

pcc updated this revision to Diff 51387.Mar 22 2016, 9:36 PM
pcc edited edge metadata.
  • Set the inline cost for the intrinsic to the cost of the lowered IR instructions

Adding some more reviewers.

rnk added inline comments.Apr 19 2016, 10:47 AM
docs/LangRef.rst
12194–12198 ↗(On Diff #51387)

Can the wording be improved here? This is a long sentence.

lib/Analysis/InstructionSimplify.cpp
3806 ↗(On Diff #51387)

Now that the verifier uses an llvm_anyint_ty, you should probably check the bitwidth here to avoid crashing on crazy inputs.

lib/CodeGen/PreISelIntrinsicLowering.cpp
1 ↗(On Diff #51387)

Should this lowering be done as part of CodeGenPrepare instead? We currently lower llvm.objectsize there.

pcc updated this revision to Diff 54236.Apr 19 2016, 11:57 AM
pcc marked an inline comment as done.
  • Improve wording in documentation
  • Avoid crashing on bit widths >64
docs/LangRef.rst
12194–12198 ↗(On Diff #51387)

Okay, what about this?

lib/CodeGen/PreISelIntrinsicLowering.cpp
1 ↗(On Diff #51387)

I looked for an existing place to add this lowering, but I couldn't find anywhere suitable. CodeGenPrepare is apparently an optional optimization pass [1], but the backend is required to handle this intrinsic (objectsize can be implemented trivially by returning 0 or -1, which is what the backend currently does [2,3], but there's no trivial implementation of this intrinsic).

[1] http://llvm-cs.pcc.me.uk/lib/CodeGen/Passes.cpp#470
[2] http://llvm-cs.pcc.me.uk/lib/CodeGen/SelectionDAG/FastISel.cpp#1218
[3] http://llvm-cs.pcc.me.uk/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp#5348

rnk accepted this revision.Apr 21 2016, 4:18 PM
rnk edited edge metadata.

lgtm

This revision is now accepted and ready to land.Apr 21 2016, 4:18 PM
This revision was automatically updated to reflect the committed changes.