Page MenuHomePhabricator

[MemorySSA] Be less aggressive with @llvm.lifetime.start
Needs ReviewPublic

Authored by george.burgess.iv on Feb 13 2018, 4:32 PM.



Primarily uploading to get your opinion on whether this is a MSSA bug, or insanity that shouldn't exist in IR.

After inlining, EarlyCSE may be given code like:

; snip
%.fca.0.gep.i = getelementptr inbounds [2 x i64], [2 x i64]* %retval.i, i64 0, i64 0
%.fca.1.gep.i = getelementptr inbounds [2 x i64], [2 x i64]* %retval.i, i64 0, i64 1
%.fca.1.load.i = load i64, i64* %.fca.1.gep.i, align 8
call void @llvm.lifetime.end.p0i8(i64 16, i8* %2)
%23 = bitcast %class.SkRecorder* %this to %class.SkCanvas*
call void @llvm.lifetime.start.p0i8(i64 16, i8* %2) #8
store i64 0, i64* %.fca.0.gep.i, align 8
store i64 %.fca.1.load.i, i64* %.fca.1.gep.i, align 8
call void @whatever(%class.SkCanvas* %23, %struct.SkIRect* dereferenceable(16) %tmpcast.i)
; snip

...where %2 and %tmpcast.i are bitcasted %retval.i pointers. This is a pretty roundabout way of saying "store 0 into %.fca.0.gep.i", but doesn't venture into illegal IR AFAICT.

The issue is that "%fca.1.load.i MustAlias %2" is false, so MSSA doesn't treat it as a clobber for the store into %.fca.1.gep.i. Hence, EarlyCSE considers the last store to be redundant, and eliminates it. Later passes then eliminate the now-dead %.fca.1.load.i, then the store that paired with %.fca.1.load.i dies, presumably due to being unused before lifetime.end, etc.

I imagine we can get into a similar situation if we had code like:

%foo = select %whatever, %ptr1, %ptr2 ; %ptr1 NoAlias %ptr2 
%f = load i8, i8* %foo
call void @llvm.lifetime.end.p0i8(i64 8, i8* %ptr1)
call void @llvm.lifetime.start.p0i8(i64 8, i8* %ptr1)
store i8, i8* %foo
call void @use(i8* %foo)

...We'd eliminate the store, an optimization after us turns %whatever into a constant truthy value, ...

Diff Detail

Event Timeline

pirama added a subscriber: srhines.Feb 13 2018, 4:36 PM

Oh god i've been away from LLVM too long.
So, i don't know what we *want* the semantic to be.
Can you print the memoryssa dumps for this for me?
Add hal, etc, get his thoughts

One thing to realize is that the langref doesn't just say that lifetime.start means "anything before this is undefined".
It actually says anything before this *is known never to be used*
Which, imho, is stronger.

"llvm.lifetime.start’ Intrinsic
This intrinsic indicates that before this point in the code, the value of the memory pointed to by ptr is dead. This means that it is known to never be used and has an undefined value."

Essentially, lifetime.start/end, even for the same pointer, create things that are very two distinctly different memory objects, regardless of if they are at the same place in memory, and the compiler is free to treat them that way, at least, as these intrinsics go.

george.burgess.iv added a subscriber: gberry.

+hal and philip, as requested. +gberry in case they have any earlycse thoughts

So, i don't know what we *want* the semantic to be.

Without much knowledge of the implications elsewhere, something like: "lifetime.start intrinsics represent a non-volatile, non-atomic clobber of the given range of memory for the purposes of analysis. Anything loaded from this range that has not been stored to since the lifetime.start is undef" doesn't sound insane to me.

(At least, I don't see why we need to give clobber mechanics to both lifetime.start and lifetime.end, given that AIUI they're meant to be balanced)

Can you print the memoryssa dumps for this for me?

define void @bar(i8* %foo) {
  %z = bitcast i8* %foo to i32*
  %p = getelementptr inbounds i32, i32* %z, i64 1
; MemoryUse(liveOnEntry)
  %a = load i32, i32* %p
; 1 = MemoryDef(liveOnEntry)
  call void @llvm.lifetime.end.p0i8(i64 8, i8* %foo)
; 2 = MemoryDef(1)
  call void @llvm.lifetime.start.p0i8(i64 8, i8* %foo)
; 3 = MemoryDef(2)
  store i32 0, i32* %z
; 4 = MemoryDef(3)
  store i32 %a, i32* %p
  ret void

It's all pretty straightforward IMO; the problematic interaction with EarlyCSE is that we don't report that 4 is clobbered by 2, so AIUI it sees 4 as a redundant store.

Essentially, lifetime.start/end, even for the same pointer, create things that are very two distinctly different memory objects, regardless of if they are at the same place in memory, and the compiler is free to treat them that way, at least, as these intrinsics go.

Agreed. So, at it's core, the DSE by EarlyCSE is effectively saying "I'm eliminating this store to object $X because it's a load of a value from object $Y that happens to reside at the same memory location."

(I realize I'm saying EarlyCSE a lot in a MSSA patch. The reason I've fixed it here is that I believe that, at the very least, MSSA users should have to opt into a 'be aggressive with lifetime intrinsics' footgun. Until there's complaining, ...)

nlopes added a subscriber: nlopes.Feb 18 2018, 8:05 AM

Given that the original motivation for these lifetime intrinsincs was for inlining, I don't see a reason to support multiple start/ends on the same pointer.
Or is there another new use case that I'm unaware of? Also, we don't have any transformation splitting (or even shrinking I think) these lifetimes.

So, we have to support them at the moment because, at the very least, the inliner will merge allocas with non-overlapping lifetimes (see mergeInlinedArrayAllocas in lib/Transforms/IPO/Inliner.cpp). I was surprised when I saw this. :)

Aside: it looks like the discussion on the email thread didn't get mirrored to this review ( + the 2 following comments).

tl;dr: we're wondering whether MSSA should care about these objects as they exist in the lifetime system, or if it should just care about memory locations. In the former case, it seems we'll either have to end up inventing clobbers that don't exist (which is a path full of sadness, apparently), or we'll have to tack on a whole different subsystem that should probably be split out. In the latter case, we move this burden to higher-level passes that know they need to care. I prefer the latter, and I think Danny does, too, but we'd like the feedback of hal and/or philip before committing to that.

To be clear, i think the lifetime system has enough issues that trying to cover it with MemorySSA as it currently exists wouldn't end up well.
Things that want to care about and try to optimize around lifetimes can do that.
Intermixing object lifetime and aliasing results in the same system will, IMHO, give you a bad time.

(though we could provide a separate def-use chain in memoryssa to try to make sense of the lifetimes if we wanted to, but it'd be hard to update given lifetimes are not that well specified)

As far as i'm aware, it's legal to do whatever you want and then drop the intrinsics
IE reuse of a load after lifetime.end is fine as long as you kill the lifetime.end

If not, IMHO, we have bigger problems.

@nlopes It's worse than you might think. We allow, and try to make sense of, stuff like this:

/   \
B   C
\   /

lifetime.start(16, %foo)
lifetime.end(16, %foo)
lifetime.start(16, %foo)
use %foo
lifetime.end(16, %foo)

It's your job to realize the lifetime.end in %E ends the lifetime of two completely different memory objects..
Or does it?
Who knows!

There's a significant set of hacks in stack coloring/etc to handle the degenerate cases.
Splitting can also occur by accident. There is no lifetime verifier or anything, so what has been allowed has degraded over time.

Hopefully some day someone will clean this up with a well defined lifetime system, using explicit tokens and a verifier.

Thanks @dberlin and @george.burgess.iv. I wasn't aware of this issue.

Sometime ago we were thinking along these lines:

  • change alloca to do lifetime.start by default -- just like now
  • add a bit to alloca to specify that lifetime didn't start yet. (the current semantics is weird; we need to see the future to decide whether alloca is valid or not)
  • make lifetime.start return the input pointer, but now valid.

%a = alloca i32, invalid
%p = call @lifetime.start %a
load %p ; OK
load %a ; not OK; invalid pointer

call @lifetime.end %p
load %p ; invalid

I think in this model it's possible to support multiple starts because they return a pointer that is valid until end. Didn't think much about that because I wasn't aware of the need.
While this scheme makes start more memory SSA-like, it still leaves lifetime.end having implicit side-effects. But it's a start, at least.

Gentle ping for confirmation from Hal or Philip that we're OK with ignoring lifetime intrinsics (as they exist today) in MSSA. :)

What happened to this patch? It looks like it's a fix for a miscompile?

Currently, MSSA treats lifetime intrinsics as clobbers, even though they're technically not. This patch would further solidify that behavior, which might not be a good thing (e.g. there's a good argument that we shouldn't be treating lifetime intrinsics as clobbers in the first place). I'm happy to rip out what few lifetime-related pieces MSSA has, but before doing so:

  • I'd like an ack from a owner (hal/reames) that this is a direction we're comfortable with supporting, and
  • we'd need to vet that users of MSSA aren't relying on this behavior (EarlyCSE apparently is, probably implicitly). If they are, we'd need to fix that. :)

To your point, I'm fine with submitting this as is + picking it over to 7.0, as long as I can put a big, loud "FIXME: Lifetime intrinsics probably won't always be here," in the code.

If we do think that pretending lifetime.starts are clobbers (or similar) is overall the best way forward, then sure, this patch works and is the correct fix for the miscompile.

(FWIW, I plan to commit a workaround like this tomorrow with said big scary comment, assuming no complaints appear before then. Like said, I'm more than happy to discuss the ideal path to take here even after that lands. But this is a known miscompile with a trivial workaround that should presumably also find its way into the release branch, so...)

Workaround committed as r339411