This is an archive of the discontinued LLVM Phabricator instance.

[TransformUtils] Don't generate invalid llvm.dbg.cu
Needs ReviewPublic

Authored by ldrumm on Mar 25 2021, 5:55 AM.

Details

Summary

CloneFunctionInto generated an empty (zero operand; invalid) compile
unit in the destination module even in cases where neither the source
nor dest function's parents had them. This caused verifier failures.

This fix ensures that in the case there is no compile-unit to copy from,
an empty one is not created

Diff Detail

Event Timeline

ldrumm created this revision.Mar 25 2021, 5:55 AM
ldrumm requested review of this revision.Mar 25 2021, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 25 2021, 5:55 AM
ldrumm updated this revision to Diff 333274.Mar 25 2021, 6:01 AM
ldrumm updated this revision to Diff 333275.
ldrumm updated this revision to Diff 333280.Mar 25 2021, 6:07 AM

what was invalid about the metadata? The llvm.dbg.cu list was empty? I wonder if we should just say that's valid. It seems pretty benign/reasonable - makes it easier to write code like this, or code that, say, stripped a CU out wouldn't have to worry about removing a potentially zero-length cu list. @aprantl @JDevlieghere ?

what was invalid about the metadata? The llvm.dbg.cu list was empty? I wonder if we should just say that's valid. It seems pretty benign/reasonable - makes it easier to write code like this, or code that, say, stripped a CU out wouldn't have to worry about removing a potentially zero-length cu list. @aprantl @JDevlieghere ?

I don't think that's a good idea. It complexifies every piece of user code around handling "llvm.dbg.cu" if the user can't assume it actually contains what it's supposed to. Doing so is a tradeoff of one if statement in CloneFunctionInto becoming an extra if statement around every use of "llvm.dbg.cu". That's not a good tradeoff in terms of reliability or ergonomics.

what was invalid about the metadata? The llvm.dbg.cu list was empty? I wonder if we should just say that's valid. It seems pretty benign/reasonable - makes it easier to write code like this, or code that, say, stripped a CU out wouldn't have to worry about removing a potentially zero-length cu list. @aprantl @JDevlieghere ?

I don't think that's a good idea. It complexifies every piece of user code around handling "llvm.dbg.cu" if the user can't assume it actually contains what it's supposed to. Doing so is a tradeoff of one if statement in CloneFunctionInto becoming an extra if statement around every use of "llvm.dbg.cu". That's not a good tradeoff in terms of reliability or ergonomics.

Why would code handling llvm.dbg.cu need to add an if statement? I'd expect it to walk through whatever's in llvm.dbg.cu and if there's nothing it'd be fine with that?

Why would code handling llvm.dbg.cu need to add an if statement? I'd expect it to walk through whatever's in llvm.dbg.cu and if there's nothing it'd be fine with that?

Because 3 lines later we have auto *NMD = NewModule->getOrInsertNamedMetadata("llvm.dbg.cu");
In the case that the source module has no llvm.dbu.cu the target module ends up with on without operands. This is both illegal and a pain to deal with in user code.

Why would code handling llvm.dbg.cu need to add an if statement? I'd expect it to walk through whatever's in llvm.dbg.cu and if there's nothing it'd be fine with that?

Because 3 lines later we have auto *NMD = NewModule->getOrInsertNamedMetadata("llvm.dbg.cu");
In the case that the source module has no llvm.dbu.cu the target module ends up with on without operands. This is both illegal and a pain to deal with in user code.

I don't understand why it would be either illegal or a pain to deal with. (& I'm not sure it's illegal either - I modified an example IR file to have an empty llvm.dbg.cu list and it seemed to compile file (compared to other violations of IR debug info verification, which did result in an error/warning like this:

inlinable function call in a function with debug info must have a !dbg location
  call void @_Z2f1v()
warning: ignoring invalid debug info in test.ll
```)

Why would code handling llvm.dbg.cu need to add an if statement? I'd expect it to walk through whatever's in llvm.dbg.cu and if there's nothing it'd be fine with that?

Because 3 lines later we have auto *NMD = NewModule->getOrInsertNamedMetadata("llvm.dbg.cu");
In the case that the source module has no llvm.dbu.cu the target module ends up with on without operands. This is both illegal and a pain to deal with in user code.

I don't understand why it would be either illegal or a pain to deal with. (& I'm not sure it's illegal either - I modified an example IR file to have an empty llvm.dbg.cu list and it seemed to compile file (compared to other violations of IR debug info verification, which did result in an error/warning like this:

inlinable function call in a function with debug info must have a !dbg location
  call void @_Z2f1v()
warning: ignoring invalid debug info in test.ll
```)
  1. The verifier says it's illegal. When I wrote this patch the regression test included fails without this change
  2. getting an llvm.dbg.cu, and then having to check it's got the right number of operands so you don't crash every time is a pain. It's partly about ergonomics - and partly about making it easy for users to do the right thing.

If you're still unhappy with this if statement can you suggest how I solve this bug in a more suitable way?

Why would code handling llvm.dbg.cu need to add an if statement? I'd expect it to walk through whatever's in llvm.dbg.cu and if there's nothing it'd be fine with that?

Because 3 lines later we have auto *NMD = NewModule->getOrInsertNamedMetadata("llvm.dbg.cu");
In the case that the source module has no llvm.dbu.cu the target module ends up with on without operands. This is both illegal and a pain to deal with in user code.

I don't understand why it would be either illegal or a pain to deal with. (& I'm not sure it's illegal either - I modified an example IR file to have an empty llvm.dbg.cu list and it seemed to compile file (compared to other violations of IR debug info verification, which did result in an error/warning like this:

inlinable function call in a function with debug info must have a !dbg location
  call void @_Z2f1v()
warning: ignoring invalid debug info in test.ll
```)
  1. The verifier says it's illegal.

Where does it say that? it doesn't look to me like this would fail with a zero-length llvm.dbg.cu: https://github.com/microsoft/llvm/blob/eddbfad05ce3a85732d8bff9f85ef63afc97ad7e/lib/IR/Verifier.cpp#L758 & as mentioned in my last message, my testing seemed to show that a zero-length llvm.dbg.cu didn't trigger a failure in the IR verifier, unlike some IR I knew to be invalid (missing dbg location on a call in a dbg-having function)

When I wrote this patch the regression test included fails without this change

The unit test? It specifically tests for llvm.dbg.cu being null at the end, and it seems that's what fails without the code change - but I'm suggesting that test is incorrect, and that it's OK for llvm.dbg.cu to be non-null (but empty) here.

  1. getting an llvm.dbg.cu, and then having to check it's got the right number of operands so you don't crash every time is a pain.

I don't understand what code you're suggesting here - it looks to me like all the LLVM code that visits llvm.dbg.cu iterates over it not caring about the total number of elements - if there are zero, the loop iterates zero times (see the verifier code linked above), if it's 10, the loop iterates 10 times - either is totally fine.

If you're still unhappy with this if statement can you suggest how I solve this bug in a more suitable way?

I'm suggesting this patch is not necessary, and the behavior of the existing code doesn't seem to me to be buggy.