This is an archive of the discontinued LLVM Phabricator instance.

[IR] Introduce load assume operand bundle
Needs RevisionPublic

Authored by aeubanks on Oct 18 2021, 10:06 AM.

Details

Summary

This is a new assume operand bundle which allows us to assume that a
load from a pointer at a specific location in the IR will always be some
value.

The name of the bundle is "load". The first operand is the pointer and
the second operand is the value that a load at the assume location would
produce.

This is useful for exposing the vtable value of a C++ object after its
constructor without having to insert a load into the instruction stream.

Diff Detail

Event Timeline

aeubanks created this revision.Oct 18 2021, 10:06 AM
aeubanks published this revision for review.Oct 18 2021, 10:09 AM
aeubanks added reviewers: Prazek, jdoerfert, fhahn, Tyker, rnk.

happy to change the name if there's a better alternative

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2021, 10:09 AM

This is useful for exposing the vtable value of a C++ object after its constructor without having to insert a load into the instruction stream.

Why is an assume better than a load?

This is useful for exposing the vtable value of a C++ object after its constructor without having to insert a load into the instruction stream.

Why is an assume better than a load?

Adding an extra load will affect cost modelling, e.g. inliner thresholds, and other various ad-hoc cost heuristics. assumes are free.

nikic added a subscriber: nikic.Oct 18 2021, 11:39 AM

As long as the load is speculatable, it should count as an ephemeral value and be considered as free by the inliner at least.

This is useful for exposing the vtable value of a C++ object after its constructor without having to insert a load into the instruction stream.

Why is an assume better than a load?

Adding an extra load will affect cost modelling, e.g. inliner thresholds, and other various ad-hoc cost heuristics. assumes are free.

I really want to get to the outlined assumes function model but I guess this is an OK first step.

  • outlined model ---
call @llvm.assume(i1 true) ["assume_fn"(void (i32*, i32) @__assume_equal, i32* %P, i32 %V]

define void @__assume_equal(i32* %P, i32 %V) {
  %L = load i32* %P
  %cmp = icmp eq i32 %L, %V
  call @llvm.assume(i1 %cmp)
}

because it is way more flexible. We can use it to encode
__builtin_assume(foobar() == barfoo());
without risking side-effects to leak or cause regressions,
as another example.

This is useful for exposing the vtable value of a C++ object after its constructor without having to insert a load into the instruction stream.

Why is an assume better than a load?

Adding an extra load will affect cost modelling, e.g. inliner thresholds, and other various ad-hoc cost heuristics. assumes are free.

I really want to get to the outlined assumes function model but I guess this is an OK first step.

  • outlined model ---
call @llvm.assume(i1 true) ["assume_fn"(void (i32*, i32) @__assume_equal, i32* %P, i32 %V]

define void @__assume_equal(i32* %P, i32 %V) {
  %L = load i32* %P
  %cmp = icmp eq i32 %L, %V
  call @llvm.assume(i1 %cmp)
}

because it is way more flexible. We can use it to encode
__builtin_assume(foobar() == barfoo());
without risking side-effects to leak or cause regressions,
as another example.

Makes sense, and I'm happy to modify this to use that if we ever get there.

As long as the load is speculatable, it should count as an ephemeral value and be considered as free by the inliner at least.

I'm not seeing that:

$ cat /tmp/a.ll
declare i8 @f(i8* %p)
declare void @llvm.assume(i1)

declare void @constructor(i8*)
declare void @foo(i8*)

define i8 @g1(i8* %p) {
        call void @constructor(i8* %p)
        %i = load i8, i8* %p
        %c = icmp eq i8 %i, 2
        call void @llvm.assume(i1 %c)
        %r = call i8 @f(i8* %p)
        ret i8 %r
}

define i8 @wrapper1(i8* %p) {
        %r = call i8 @g1(i8* %p)
        ret i8 %r
}

define i8 @g2(i8* %p) {
        call void @constructor(i8* %p)
        call void @llvm.assume(i1 true) ["load"(i8* %p, i8 2)]
        %r = call i8 @f(i8* %p)
        ret i8 %r
}

define i8 @wrapper2(i8* %p) {
        %r = call i8 @g2(i8* %p)
        ret i8 %r
}

$ ./build/rel/bin/opt -passes='print<inline-cost>' -disable-output /tmp/a.ll
      Analyzing call of g1... (caller:wrapper1)
define i8 @g1(i8* %p) {
; cost before = -35, cost after = 0, threshold before = 674, threshold after = 674, cost delta = 35
  call void @constructor(i8* %p)
; cost before = 0, cost after = 5, threshold before = 674, threshold after = 674, cost delta = 5
  %i = load i8, i8* %p, align 1
; No analysis for the instruction
  %c = icmp eq i8 %i, 2
; No analysis for the instruction
  call void @llvm.assume(i1 %c)
; cost before = 5, cost after = 40, threshold before = 674, threshold after = 674, cost delta = 35
  %r = call i8 @f(i8* %p)
; cost before = 40, cost after = 40, threshold before = 674, threshold after = 674, cost delta = 0
  ret i8 %r
}
      NumConstantArgs: 0
      NumConstantOffsetPtrArgs: 1
      NumAllocaArgs: 0
      NumConstantPtrCmps: 0
      NumConstantPtrDiffs: 0
      NumInstructionsSimplified: 1
      NumInstructions: 4
      SROACostSavings: 0
      SROACostSavingsLost: 0
      LoadEliminationCost: 0
      ContainsNoDuplicateCall: 0
      Cost: 40
      Threshold: 337
      Analyzing call of g2... (caller:wrapper2)
define i8 @g2(i8* %p) {
; cost before = -35, cost after = 0, threshold before = 674, threshold after = 674, cost delta = 35
  call void @constructor(i8* %p)
; No analysis for the instruction
  call void @llvm.assume(i1 true) [ "load"(i8* %p, i8 2) ]
; cost before = 0, cost after = 35, threshold before = 674, threshold after = 674, cost delta = 35
  %r = call i8 @f(i8* %p)
; cost before = 35, cost after = 35, threshold before = 674, threshold after = 674, cost delta = 0
  ret i8 %r
}
      NumConstantArgs: 0
      NumConstantOffsetPtrArgs: 1
      NumAllocaArgs: 0
      NumConstantPtrCmps: 0
      NumConstantPtrDiffs: 0
      NumInstructionsSimplified: 1
      NumInstructions: 3
      SROACostSavings: 0
      SROACostSavingsLost: 0
      LoadEliminationCost: 0
      ContainsNoDuplicateCall: 0
      Cost: 35
      Threshold: 337
rnk added a comment.Oct 18 2021, 1:51 PM

happy to change the name if there's a better alternative

To bikeshed a bit, how about "load_eq"? Meaning, "a load of op1 is equal to op2".

As long as the load is speculatable, it should count as an ephemeral value and be considered as free by the inliner at least.

I'm not seeing that: [...]

Thus the "as long as the load is speculatable" caveat. It works if you add a dereferenceable(1) attribute. Though now that I think about this, I have no idea why speculatability is even a requirement for ephemeral values -- shouldn't side-effect freedom be sufficient? In that case your example would work without the dereferenceable(1).

As long as the load is speculatable, it should count as an ephemeral value and be considered as free by the inliner at least.

I'm not seeing that: [...]

Thus the "as long as the load is speculatable" caveat. It works if you add a dereferenceable(1) attribute. Though now that I think about this, I have no idea why speculatability is even a requirement for ephemeral values -- shouldn't side-effect freedom be sufficient? In that case your example would work without the dereferenceable(1).

I think side-effect free is sufficient, though, haven't thought long about it. (When you say deref(1) you mean deref(sizeof(access)), right?)

fhahn added a comment.Oct 19 2021, 1:56 PM

As long as the load is speculatable, it should count as an ephemeral value and be considered as free by the inliner at least.

I'm not seeing that: [...]

Thus the "as long as the load is speculatable" caveat. It works if you add a dereferenceable(1) attribute. Though now that I think about this, I have no idea why speculatability is even a requirement for ephemeral values -- shouldn't side-effect freedom be sufficient? In that case your example would work without the dereferenceable(1).

I think side-effect free is sufficient, though, haven't thought long about it. (When you say deref(1) you mean deref(sizeof(access)), right?)

I agree that side-effect-free should be sufficient to ignore them here as well. looks like there are a couple of places that define their own checks for ephemeral values, so it might be a good start to consolidate them

nikic added a comment.Oct 19 2021, 2:12 PM

Thus the "as long as the load is speculatable" caveat. It works if you add a dereferenceable(1) attribute. Though now that I think about this, I have no idea why speculatability is even a requirement for ephemeral values -- shouldn't side-effect freedom be sufficient? In that case your example would work without the dereferenceable(1).

I think side-effect free is sufficient, though, haven't thought long about it. (When you say deref(1) you mean deref(sizeof(access)), right?)

Right, this was referring to @aeubanks' particular example, which happened to use access size 1.

I gave this a try (https://gist.github.com/nikic/531267d972ce71edf3896e25bc50456a) and it seems to work fine (i.e. no test failures). The precise condition would be wouldInstructionBeTriviallyDead(), which we can approximate by !mayHaveSideEffect() && !isTerminator().

Thus the "as long as the load is speculatable" caveat. It works if you add a dereferenceable(1) attribute. Though now that I think about this, I have no idea why speculatability is even a requirement for ephemeral values -- shouldn't side-effect freedom be sufficient? In that case your example would work without the dereferenceable(1).

I think side-effect free is sufficient, though, haven't thought long about it. (When you say deref(1) you mean deref(sizeof(access)), right?)

Right, this was referring to @aeubanks' particular example, which happened to use access size 1.

I gave this a try (https://gist.github.com/nikic/531267d972ce71edf3896e25bc50456a) and it seems to work fine (i.e. no test failures). The precise condition would be wouldInstructionBeTriviallyDead(), which we can approximate by !mayHaveSideEffect() && !isTerminator().

It seems we want to do that for sure, which means we can (probably should) hold of with this patch for now, right?

Thus the "as long as the load is speculatable" caveat. It works if you add a dereferenceable(1) attribute. Though now that I think about this, I have no idea why speculatability is even a requirement for ephemeral values -- shouldn't side-effect freedom be sufficient? In that case your example would work without the dereferenceable(1).

I think side-effect free is sufficient, though, haven't thought long about it. (When you say deref(1) you mean deref(sizeof(access)), right?)

Right, this was referring to @aeubanks' particular example, which happened to use access size 1.

I gave this a try (https://gist.github.com/nikic/531267d972ce71edf3896e25bc50456a) and it seems to work fine (i.e. no test failures). The precise condition would be wouldInstructionBeTriviallyDead(), which we can approximate by !mayHaveSideEffect() && !isTerminator().

It seems we want to do that for sure, which means we can (probably should) hold of with this patch for now, right?

there are still many places that don't use ephemerality to count instructions, e.g. the ThinLTO instruction import limit, so I think this still has value
unless you think it'd make more sense to update all the other places, like the ThinLTO instruction import limit, to also use this sort of cost modelling

rnk added a comment.Oct 21 2021, 10:08 AM

Aside from the cost modeling, I think there is value in having a more compact representation for this concept. It is nice to be able to turn four instructions (cast, load, icmp, assume) into one (assume). OK, maybe opaque pointers will remove the cast, but the point stands. This feature also has applications besides C++ devirtualization, so I think it's worth the added maintenance cost.

happy to change the name if there's a better alternative

To bikeshed a bit, how about "load_eq"? Meaning, "a load of op1 is equal to op2".

Or "loads" (like "loads" value from a pointer), but I think "load_eq" also is a bit more descriptive.
I actually thought adding this feature will require more engineering, well done.

aeubanks updated this revision to Diff 382040.Oct 25 2021, 10:23 AM

"load" -> "load_eq"

Works for me, others should chime in.

nikic requested changes to this revision.Oct 25 2021, 11:53 AM

This will require changes to AA and access modelling. Normally assumes are inaccessiblememonly, but in this case the assume also reads the location of the pointer. In particular, you can not hoist or sink a store of the location across the assume, as it may invalidate the assumption. Thankfully we do have the tools to model this, because operand bundles can change AA behavior. You'd have to adjust the handling of assume in hasReadingOperandBundles() to count as an accessible memory read for load_eq operand bundles. BasicAA could then restrict it to just the passed location. However, there are also other places that explicitly skip over assume intrinsics on the assumption that they do not access accessible memory. An obvious example is MemorySSA, which does not create memory accesses for assumes (but would need to create them for load_eq), but I remember similar code existing in a few other places as well.

Generally I'd like to see the followup patches that are needed to make this actually work. A simple load+icmp assume will "just work", but this new operand bundle will need explicit support (beyond the AA modelling, hopefully only in GVN and nowhere else?)

This revision now requires changes to proceed.Oct 25 2021, 11:53 AM