Page MenuHomePhabricator

[CodeExtractor] Replace uses of extracted bitcasts in out-of-region lifetime markers
ClosedPublic

Authored by ggeorgakoudis on Nov 3 2020, 8:37 AM.

Details

Summary

CodeExtractor handles bitcasts in the extracted region that have
lifetime markers users in the outer region as outputs. That
creates unnecessary alloca/reload instructions and extra lifetime
markers. The patch identifies those cases, and replaces uses in
out-of-region lifetime markers with new bitcasts in the outer region.

Example

define void @foo() {
entry:
  %0 = alloca i32
  br label %extract

extract:
  %1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 4, i8* %1)
  call void @use(i32* %0)
  br label %exit

exit:
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %1)
  ret void
}

Current extraction

define void @foo() {
entry:
  %.loc = alloca i8*, align 8
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast = bitcast i8** %.loc to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast)
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0, i8** %.loc)
  %.reload = load i8*, i8** %.loc, align 8
  call void @llvm.lifetime.end.p0i8(i64 -1, i8* %lt.cast)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %.reload)
  ret void
}

define internal void @foo.extract(i32* %0, i8** %.out) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  store i8* %1, i8** %.out, align 8
  call void @use(i32* %0)
  br label %exit.exitStub
}

Extraction with patch

define void @foo() {
entry:
  %0 = alloca i32, align 4
  br label %codeRepl

codeRepl:                                         ; preds = %entry
  %lt.cast1 = bitcast i32* %0 to i8*
  call void @llvm.lifetime.start.p0i8(i64 -1, i8* %lt.cast1)
  call void @foo.extract(i32* %0)
  br label %exit

exit:                                             ; preds = %codeRepl
  call void @use(i32* %0)
  %lt.cast = bitcast i32* %0 to i8*
  call void @llvm.lifetime.end.p0i8(i64 4, i8* %lt.cast)
  ret void
}

define internal void @foo.extract(i32* %0) {
newFuncRoot:
  br label %extract

exit.exitStub:                                    ; preds = %extract
  ret void

extract:                                          ; preds = %newFuncRoot
  %1 = bitcast i32* %0 to i8*
  call void @use(i32* %0)
  br label %exit.exitStub
}

Diff Detail

Event Timeline

ggeorgakoudis created this revision.Nov 3 2020, 8:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 3 2020, 8:37 AM
ggeorgakoudis requested review of this revision.Nov 3 2020, 8:37 AM

Format better the patch message

ggeorgakoudis edited the summary of this revision. (Show Details)Nov 3 2020, 8:56 AM
Harbormaster completed remote builds in B77416: Diff 302594.
vsk added a comment.Nov 3 2020, 9:46 AM

In the example, what pass/component is responsible for creating %1 = bitcast i32* %0 to i8*?

If it's CodeExtractor itself, it would be nice to solve the problem earlier by getting rid of bitcasts which only exist to wire up lifetime markers. I tried doing that in 26ee8aff2, but had to revert (099bffe7f) because llvm doesn't support specializing its lifetime intrinsics on opaque pointer types. That's a bug, I think: if we fix that, perhaps this whole problem goes away.

OTOH, if this %1 = bitcast i32* %0 to i8* value comes from a frontend or another pass, I think the approach taken in this patch is reasonable. Please add a test (perhaps in the CodeExtractor unittest).

In D90689#2371522, @vsk wrote:

In the example, what pass/component is responsible for creating %1 = bitcast i32* %0 to i8*?

If it's CodeExtractor itself, it would be nice to solve the problem earlier by getting rid of bitcasts which only exist to wire up lifetime markers. I tried doing that in 26ee8aff2, but had to revert (099bffe7f) because llvm doesn't support specializing its lifetime intrinsics on opaque pointer types. That's a bug, I think: if we fix that, perhaps this whole problem goes away.

OTOH, if this %1 = bitcast i32* %0 to i8* value comes from a frontend or another pass, I think the approach taken in this patch is reasonable. Please add a test (perhaps in the CodeExtractor unittest).

Many thanks for the quick reply @vsk! The bitcast is generated by the frontend, not by CodeExtractor. Our use case is outlining a merged parallel region that includes sequential code which include bitcasts like that. I am going to add a unit test.

I have two questions about the CodeExtractor.

  1. I have second thoughts on changing the IR within the findAllocas. Should I create a helper data structure, e.g., ReplaceBitcastUses as in for SinkCands, and do the transformation inside the extractCodeRegion caller?
  2. Is the assumption that findAllocas is always called before findInputsOutputs? If findAllocas is not called, my concern is that findInputsOutputs will report an output on the use of the bitcast although this output would be removed if findAllocas has been called. This problem does not appear for extractCodeRegion because it always calls findAllocas but there may be other modules using the CodeExtractor interface. I could change findInputsOutputs not to include bitcast uses to lifetime markers as outputs regardless of whether findAllocas has been called. Is that reasonable or does that create other problems?

Thanks again!

In D90689#2371522, @vsk wrote:

In the example, what pass/component is responsible for creating %1 = bitcast i32* %0 to i8*?

If it's CodeExtractor itself, it would be nice to solve the problem earlier by getting rid of bitcasts which only exist to wire up lifetime markers. I tried doing that in 26ee8aff2, but had to revert (099bffe7f) because llvm doesn't support specializing its lifetime intrinsics on opaque pointer types. That's a bug, I think: if we fix that, perhaps this whole problem goes away.

OTOH, if this %1 = bitcast i32* %0 to i8* value comes from a frontend or another pass, I think the approach taken in this patch is reasonable. Please add a test (perhaps in the CodeExtractor unittest).

Many thanks for the quick reply @vsk! The bitcast is generated by the frontend, not by CodeExtractor. Our use case is outlining a merged parallel region that includes sequential code which include bitcasts like that. I am going to add a unit test.

I have two questions about the CodeExtractor.

  1. I have second thoughts on changing the IR within the findAllocas. Should I create a helper data structure, e.g., ReplaceBitcastUses as in for SinkCands, and do the transformation inside the extractCodeRegion caller?
  2. Is the assumption that findAllocas is always called before findInputsOutputs? If findAllocas is not called, my concern is that findInputsOutputs will report an output on the use of the bitcast although this output would be removed if findAllocas has been called. This problem does not appear for extractCodeRegion because it always calls findAllocas but there may be other modules using the CodeExtractor interface. I could change findInputsOutputs not to include bitcast uses to lifetime markers as outputs regardless of whether findAllocas has been called. Is that reasonable or does that create other problems?

The status quo is that findAllocas destructively updates the IR in ways that affect the results of findInputsOutputs: you're correct that this dependency can create issues. Imho CodeExtractor intermixes analysis and transformation too much. This makes it hard to improve its transformations (evidenced by your question (1)); it also makes it hard for client code to analyze extraction profitability, since you need to do the extraction to figure out what the real input/output set will be. (Before extractCodeRegions, you can query findInputsOutputs, but it's just guessing.)

Taking a short-term view, having findAllocas perform a new destructive update would get the job done and be in keeping with the current design. In the long term, I think it'd be nice to have a redesigned CodeExtractor that keeps analysis and transformation neatly separated. (that said, I'm not sure how best to achieve that (.. and I'm no longer actively working on it).)

Add unit test

I see your point @vsk. I have added a unit test. Anything else missing for acceptance?

vsk accepted this revision.Nov 5 2020, 2:01 PM

Thanks, lgtm.

llvm/unittests/Transforms/Utils/CodeExtractorTest.cpp
327

nit, it's a bit more idiomatic to write EXPECT_EQ(outputs.size(), 0U).

This revision is now accepted and ready to land.Nov 5 2020, 2:01 PM
ggeorgakoudis marked an inline comment as done.Nov 5 2020, 4:22 PM
This revision was landed with ongoing or failed builds.Nov 5 2020, 5:01 PM
This revision was automatically updated to reflect the committed changes.