This is an archive of the discontinued LLVM Phabricator instance.

llvm-reduce: Don't replace intrinsic calls with undef
ClosedPublic

Authored by arsenm on Oct 1 2020, 10:56 AM.

Details

Summary

These don't really have function bodies to try to eliminate. This also
has a good chance of just producing invalid IR since intrinsics can
have special operand constraints (e.g. metadata arguments aren't valid
for an arbitrary call). This was wasting quite a bit of time producing
and failing on invalid IR when replacing dbg.values with undefs.

Diff Detail

Event Timeline

arsenm created this revision.Oct 1 2020, 10:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2020, 10:56 AM
arsenm requested review of this revision.Oct 1 2020, 10:56 AM

what sort of problems manifest from this issue? Does it lead to slower reductions because they spend more time creating a lot of broken reductions?

I worry a bit about using heuristics "some intrinsics have these properties, so let's ignore all intrinsics" - other intrinsics represent normal instruction operations, right? (for target intrinsic instructions, for instance) so some cases might benefit from such reductions.

what sort of problems manifest from this issue? Does it lead to slower reductions because they spend more time creating a lot of broken reductions?

I think the general strategy of hack on the IR and if it breaks, skip it is a bit insane to begin with, so from that front this reduces the number of times that triggers

I worry a bit about using heuristics "some intrinsics have these properties, so let's ignore all intrinsics" - other intrinsics represent normal instruction operations, right? (for target intrinsic instructions, for instance) so some cases might benefit from such reductions.

I think removing intrinsic calls is an unproductive reduction strategy in general. I think this will harm reducing any codegen issues since it changes the generated code so radically. It also assumes a target supports arbitrary call targets, which may not be the case. e.g. before AMDGPU supported calls, every testcase with an intrinsic would reduce to a call to undef

fhahn added a comment.Oct 7 2020, 12:32 PM

what sort of problems manifest from this issue? Does it lead to slower reductions because they spend more time creating a lot of broken reductions?

I think the general strategy of hack on the IR and if it breaks, skip it is a bit insane to begin with, so from that front this reduces the number of times that triggers

+1.

There are plenty of other similar cases, like llvm.used metadata which we should probably handle in a special way.

llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
33–39

I think a comment here would be good explaining why we exclude intrinsics. Also, should we also remove intrinsics from chunks to start with, to keep the numbering/chunk management consistent?

fhahn added a comment.Oct 7 2020, 12:36 PM

I worry a bit about using heuristics "some intrinsics have these properties, so let's ignore all intrinsics" - other intrinsics represent normal instruction operations, right? (for target intrinsic instructions, for instance) so some cases might benefit from such reductions.

I think removing intrinsic calls is an unproductive reduction strategy in general. I think this will harm reducing any codegen issues since it changes the generated code so radically. It also assumes a target supports arbitrary call targets, which may not be the case. e.g. before AMDGPU supported calls, every testcase with an intrinsic would reduce to a call to undef

IIUC the main benefit from this optimization is removing unrelated function definitions all together so we don't need to continue processing them and the fact that this also applies to declaration is a side effect of the implementation? The call sites themselves should get removed by the remove instruction part, if they are 'uninteresting', so we probably do not loose too much in terms of reductions?

I worry a bit about using heuristics "some intrinsics have these properties, so let's ignore all intrinsics" - other intrinsics represent normal instruction operations, right? (for target intrinsic instructions, for instance) so some cases might benefit from such reductions.

I think removing intrinsic calls is an unproductive reduction strategy in general. I think this will harm reducing any codegen issues since it changes the generated code so radically. It also assumes a target supports arbitrary call targets, which may not be the case. e.g. before AMDGPU supported calls, every testcase with an intrinsic would reduce to a call to undef

IIUC the main benefit from this optimization is removing unrelated function definitions all together so we don't need to continue processing them and the fact that this also applies to declaration is a side effect of the implementation? The call sites themselves should get removed by the remove instruction part, if they are 'uninteresting', so we probably do not loose too much in terms of reductions?

fair enough - thanks for the explanations!

ychen added a subscriber: ychen.Oct 7 2020, 3:17 PM

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

arsenm added a comment.Oct 7 2020, 6:00 PM

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

arsenm updated this revision to Diff 296840.Oct 7 2020, 6:02 PM

Add comment

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

ychen added a comment.Oct 7 2020, 6:48 PM

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

How about this one? Adapted and reduced from llvm/test/Verifier/llvm.dbg.intrinsic-dbg-attachment.ll.

define void @foo() {
entry:
  call void @llvm.dbg.value(
      metadata i8* undef,
      metadata !DILocalVariable(scope: !3),
      metadata !DIExpression()),
    !dbg !DILocation(scope: !3)
  ret void
}

declare void @llvm.dbg.value(metadata, metadata, metadata)

!llvm.module.flags = !{!0}
!0 = !{i32 2, !"Debug Info Version", i32 3}
!1 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus, file: !2)
!2 = !DIFile(filename: "f2", directory: "")
!3 = distinct !DISubprogram(name: "foo", unit: !1)
ychen added a comment.Oct 7 2020, 6:55 PM

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I'm specifically checking for the invalid intrinsic error produced. I think another intrinsic would be less clear for what is actually happening

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

Also, building the target isn't actually necessary for the test. The intrinsic declaration works as is anyway

ychen accepted this revision.Oct 13 2020, 10:13 AM

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

Also, building the target isn't actually necessary for the test. The intrinsic declaration works as is anyway

That's true. "REQUIRES: amdgpu-registered-target" is not needed then?

This revision is now accepted and ready to land.Oct 13 2020, 10:13 AM
lebedev.ri requested changes to this revision.Oct 13 2020, 10:29 AM
lebedev.ri added inline comments.
llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
33–39

This comment still needs addressing.

This revision now requires changes to proceed.Oct 13 2020, 10:29 AM
arsenm added inline comments.Oct 13 2020, 10:30 AM
llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
33–39

I added the comment in the last revision

lebedev.ri added inline comments.Oct 13 2020, 10:32 AM
llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
33–39

The second part of it - Also, should we also remove intrinsics from chunks to start with, to keep the numbering/chunk management consistent?

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

Not necessarily - https://llvm.org/docs/GettingStarted.html suggests narrowing the target list in a few places, to speed up iteration time. So making tests more portable across targets is valuable - also means buildbots will get to your tests sooner (as the test will be run on more configurations, increasing the chances it'll be run on a buildbot that's running faster/picking up your changes sooner).

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I'm specifically checking for the invalid intrinsic error produced. I think another intrinsic would be less clear for what is actually happening

Ah, please tweak the test to test the general behavior (not modifying intrinsic calls), not the invalidity/second order that lead to the root cause/change in behavior (that some of those modified intrinsic calls then become invalid).

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I think only the callee matters here. If anyone makes changes to llvm-reduce and not build AMDGPU, there is a chance (should be small I guess) to regress this.

Developers should just always build all targets. We also have buildbots for this

Not necessarily - https://llvm.org/docs/GettingStarted.html suggests narrowing the target list in a few places, to speed up iteration time. So making tests more portable across targets is valuable - also means buildbots will get to your tests sooner (as the test will be run on more configurations, increasing the chances it'll be run on a buildbot that's running faster/picking up your changes sooner).

I disagree with this (maybe it's OK for frontend work), and you should run all target tests before committing backend changes. For this test it doesn't actually require the target so I've just dropped the REQUIRES

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I'm specifically checking for the invalid intrinsic error produced. I think another intrinsic would be less clear for what is actually happening

Ah, please tweak the test to test the general behavior (not modifying intrinsic calls), not the invalidity/second order that lead to the root cause/change in behavior (that some of those modified intrinsic calls then become invalid).

I don't see how else I would tell from the output that the intrinsic call itself is why it was skipped. The messages printed don't directly indicate why anything was skipped. The interestingness script needs to keep the call around

arsenm updated this revision to Diff 298230.Oct 14 2020, 1:27 PM

Remove requires, and change FunctionCount

lebedev.ri added inline comments.Oct 14 2020, 1:40 PM
llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
38

Much like in ReduceAttributes, i don't think we should rely on the @llvm. function name prefix,
but should actually check that they are intrinsics, i.e.
F.getIntrinsicID() != Intrinsic::not_intrinsic

arsenm added inline comments.Oct 14 2020, 1:42 PM
llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
38

The special operand properties are based on the llvm. prefix in the verifier, so I think this is the correct check here

lebedev.ri accepted this revision.Oct 14 2020, 1:51 PM

Since we should catch these cases later on in reduceInstructionsDeltaPass() (please add a test), this should be fine i guess.

llvm/tools/llvm-reduce/deltas/ReduceFunctions.cpp
36–37

Please drop and changing to an arbitrary call may radically change codegen.,
that's interestingness's test's problem.

This revision is now accepted and ready to land.Oct 14 2020, 1:51 PM

Since we should catch these cases later on in reduceInstructionsDeltaPass() (please add a test), this should be fine i guess.

I'm not sure what test you're asking for here

Since we should catch these cases later on in reduceInstructionsDeltaPass() (please add a test), this should be fine i guess.

I'm not sure what test you're asking for here

That if you don't include ; CHECK-INTERESTINGNESS: call, there won't be a call i32 @llvm.amdgcn.reloc.constant(metadata !"arst") in the end.

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I'm specifically checking for the invalid intrinsic error produced. I think another intrinsic would be less clear for what is actually happening

Ah, please tweak the test to test the general behavior (not modifying intrinsic calls), not the invalidity/second order that lead to the root cause/change in behavior (that some of those modified intrinsic calls then become invalid).

I don't see how else I would tell from the output that the intrinsic call itself is why it was skipped. The messages printed don't directly indicate why anything was skipped. The interestingness script needs to keep the call around

Sorry, not sure I follow - You have two calls, the difference is one is an intrinsic and one is not and the reduction continues until there's only one call & then the final check ensures the remaining call is the intrinsic you're looking for.

For instance, this modification of the test seems to fail without the patch and pass with it, but doesn't rely on a target-specific intrinsic or metadata parameters:

; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t 2> %t.log
; RUN: cat %t | FileCheck -implicit-check-not=uninteresting --check-prefixes=ALL,CHECK-FINAL %s

; Check that the call is removed by instruction reduction passes
; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=ALL,CHECK-FUNC --test-arg %s --test-arg --input-file %s -o %t
; RUN: cat %t | FileCheck -implicit-check-not=uninteresting --check-prefixes=ALL,CHECK-NOCALL %s


declare i8* @llvm.sponentry.p0i8()
declare void @uninteresting()

; ALL-LABEL: define i8* @interesting(
define i8* @interesting() {
entry:
  ; CHECK-INTERESTINGNESS: call
  ; CHECK-NOCALL-NOT: call

  ; CHECK-FINAL: %call = call i8* @llvm.sponentry.p0i8()
  ; CHECK-FINAL-NEXT: ret i8* %call
  %call = call i8* @llvm.sponentry.p0i8()
  call void @uninteresting()
  ret i8* %call
}

Might be able to be simplified further - test seems to work even with a void return from @interesting and the "CHECK-FUNC" on the second reduce run doesn't appear to be used (there are no "CHECK-FUNC" check lines).

Could you check in something like this, a simplified/reduced version of the test?

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I'm specifically checking for the invalid intrinsic error produced. I think another intrinsic would be less clear for what is actually happening

Ah, please tweak the test to test the general behavior (not modifying intrinsic calls), not the invalidity/second order that lead to the root cause/change in behavior (that some of those modified intrinsic calls then become invalid).

I don't see how else I would tell from the output that the intrinsic call itself is why it was skipped. The messages printed don't directly indicate why anything was skipped. The interestingness script needs to keep the call around

Sorry, not sure I follow - You have two calls, the difference is one is an intrinsic and one is not and the reduction continues until there's only one call & then the final check ensures the remaining call is the intrinsic you're looking for.

For instance, this modification of the test seems to fail without the patch and pass with it, but doesn't rely on a target-specific intrinsic or metadata parameters:

; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t 2> %t.log
; RUN: cat %t | FileCheck -implicit-check-not=uninteresting --check-prefixes=ALL,CHECK-FINAL %s

; Check that the call is removed by instruction reduction passes
; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=ALL,CHECK-FUNC --test-arg %s --test-arg --input-file %s -o %t
; RUN: cat %t | FileCheck -implicit-check-not=uninteresting --check-prefixes=ALL,CHECK-NOCALL %s


declare i8* @llvm.sponentry.p0i8()
declare void @uninteresting()

; ALL-LABEL: define i8* @interesting(
define i8* @interesting() {
entry:
  ; CHECK-INTERESTINGNESS: call
  ; CHECK-NOCALL-NOT: call

  ; CHECK-FINAL: %call = call i8* @llvm.sponentry.p0i8()
  ; CHECK-FINAL-NEXT: ret i8* %call
  %call = call i8* @llvm.sponentry.p0i8()
  call void @uninteresting()
  ret i8* %call
}

Might be able to be simplified further - test seems to work even with a void return from @interesting and the "CHECK-FUNC" on the second reduce run doesn't appear to be used (there are no "CHECK-FUNC" check lines).

Could you check in something like this, a simplified/reduced version of the test?

This test works, you can just replace the existing test with this one. Do you want to push it, or should I?

The test tieing to AMDGPU seems less ideal for general functionality. I've seen llvm.dbg.value caused the same problem, maybe we could use that?

The debug intrinsics require a *lot* of metadata overhead simply to satsify the operations. The other metadata using intrinsics are experimental, so this seemed like the least fragile way to test it

Given the functionality isn't implemented in terms of metadata-using-intrinsics in particular, a test using some non-metadata-using intrinsic would be fine.

I'm specifically checking for the invalid intrinsic error produced. I think another intrinsic would be less clear for what is actually happening

Ah, please tweak the test to test the general behavior (not modifying intrinsic calls), not the invalidity/second order that lead to the root cause/change in behavior (that some of those modified intrinsic calls then become invalid).

I don't see how else I would tell from the output that the intrinsic call itself is why it was skipped. The messages printed don't directly indicate why anything was skipped. The interestingness script needs to keep the call around

Sorry, not sure I follow - You have two calls, the difference is one is an intrinsic and one is not and the reduction continues until there's only one call & then the final check ensures the remaining call is the intrinsic you're looking for.

For instance, this modification of the test seems to fail without the patch and pass with it, but doesn't rely on a target-specific intrinsic or metadata parameters:

; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=ALL,CHECK-INTERESTINGNESS --test-arg %s --test-arg --input-file %s -o %t 2> %t.log
; RUN: cat %t | FileCheck -implicit-check-not=uninteresting --check-prefixes=ALL,CHECK-FINAL %s

; Check that the call is removed by instruction reduction passes
; RUN: llvm-reduce --test FileCheck --test-arg --check-prefixes=ALL,CHECK-FUNC --test-arg %s --test-arg --input-file %s -o %t
; RUN: cat %t | FileCheck -implicit-check-not=uninteresting --check-prefixes=ALL,CHECK-NOCALL %s


declare i8* @llvm.sponentry.p0i8()
declare void @uninteresting()

; ALL-LABEL: define i8* @interesting(
define i8* @interesting() {
entry:
  ; CHECK-INTERESTINGNESS: call
  ; CHECK-NOCALL-NOT: call

  ; CHECK-FINAL: %call = call i8* @llvm.sponentry.p0i8()
  ; CHECK-FINAL-NEXT: ret i8* %call
  %call = call i8* @llvm.sponentry.p0i8()
  call void @uninteresting()
  ret i8* %call
}

Might be able to be simplified further - test seems to work even with a void return from @interesting and the "CHECK-FUNC" on the second reduce run doesn't appear to be used (there are no "CHECK-FUNC" check lines).

Could you check in something like this, a simplified/reduced version of the test?

This test works, you can just replace the existing test with this one. Do you want to push it, or should I?

Sure, sure: 56a5b4b1bfafa27dcc0fe48a3095ebc19ac26256