This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo@O2][LoopVectorize] pr39024: Vectorized code linenos step through loop even after completion
ClosedPublic

Authored by Orlando on Apr 17 2019, 10:13 AM.

Details

Summary

Bug: https://bugs.llvm.org/show_bug.cgi?id=39024

The bug reports that a vectorized loop is stepped through 4 times and each step through the loop seemed to show a different path. I found two problems here:

A) An incorrect line number on a preheader block (for.body.preheader) instruction causes a step into the loop before it begins.
B) Instructions in the middle block have different line numbers which give the impression of another iteration.

In this patch I give all of the middle block instructions the line number of the scalar loop latch terminator branch. This seems to provide the smoothest debugging experience because the vectorized loops will always end on this line before dropping into the scalar loop. To solve problem A I have altered llvm::SplitBlockPredecessors to accommodate loop header blocks.

I have set up a separate review D61933 for a fix which is required for this patch.

Diff Detail

Repository
rL LLVM

Event Timeline

Orlando created this revision.Apr 17 2019, 10:13 AM
Orlando edited the summary of this revision. (Show Details)Apr 17 2019, 10:19 AM

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

I brought this up in some offline discussions which all concluded that the impact on profilers would be small and the trade off for better debugging is worth it. The impact on profilers seems like it would be small because the middle block is only visited once after running through the vectorized loop.

I am glad you asked because this concern gives rise to an argument for giving the middle block instructions line 0 instead, and i am interested in hearing other's opinions.

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

I brought this up in some offline discussions which all concluded that the impact on profilers would be small and the trade off for better debugging is worth it. The impact on profilers seems like it would be small because the middle block is only visited once after running through the vectorized loop.

I am glad you asked because this concern gives rise to an argument for giving the middle block instructions line 0 instead, and i am interested in hearing other's opinions.

Interesting. The middle block just has the check for whether or not we need to run the remainder loop, right? I can definitely see this as kind of latch-like.

@jmellorcrummey , do you have an opinion on this?

Orlando updated this revision to Diff 195698.Apr 18 2019, 2:40 AM

Sorry, I missed a couple of failing tests. I've fixed them and updated the diff.

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

I brought this up in some offline discussions which all concluded that the impact on profilers would be small and the trade off for better debugging is worth it. The impact on profilers seems like it would be small because the middle block is only visited once after running through the vectorized loop.

I am glad you asked because this concern gives rise to an argument for giving the middle block instructions line 0 instead, and i am interested in hearing other's opinions.

Interesting. The middle block just has the check for whether or not we need to run the remainder loop, right? I can definitely see this as kind of latch-like.

@jmellorcrummey , do you have an opinion on this?

The middle block does the check for whether or not we need to run the remainder loop but also, for the bug [0], executes some operations to convert the result of the vectorized loop into a scalar value.

middle.block:                                     ; preds = %vector.body.epil, %middle.block.unr-lcssa
  %.lcssa20 = phi <4 x i32> [ %.lcssa20.ph, %middle.block.unr-lcssa ], [ %33, %vector.body.epil ]
  %.lcssa = phi <4 x i32> [ %.lcssa.ph, %middle.block.unr-lcssa ], [ %34, %vector.body.epil 
  %bin.rdx = add <4 x i32> %.lcssa, %.lcssa20
  %rdx.shuf = shufflevector <4 x i32> %bin.rdx, <4 x i32> undef, <4 x i32> <i32 2, i32 3, i32 undef, i32 undef>
  %bin.rdx16 = add <4 x i32> %bin.rdx, %rdx.shuf
  %rdx.shuf17 = shufflevector <4 x i32> %bin.rdx16, <4 x i32> undef, <4 x i32> <i32 1, i32 undef, i32 undef, i32 undef>
  %bin.rdx18 = add <4 x i32> %bin.rdx16, %rdx.shuf17
  %35 = extractelement <4 x i32> %bin.rdx18, i32 0
  %cmp.n = icmp eq i64 %n.vec, %wide.trip.count
  br i1 %cmp.n, label %return, label %for.body.preheader19

[0] https://bugs.llvm.org/show_bug.cgi?id=39024

aprantl added inline comments.
llvm/test/Transforms/LoopVectorize/unsafe-dep-remark.ll
14 ↗(On Diff #195698)

@anemet Does that change look reasonable?

How will this appear to profiling tools using PC sampling and using debug info to map the PC samples back to line numbers in the code?

I brought this up in some offline discussions which all concluded that the impact on profilers would be small and the trade off for better debugging is worth it. The impact on profilers seems like it would be small because the middle block is only visited once after running through the vectorized loop.

I am glad you asked because this concern gives rise to an argument for giving the middle block instructions line 0 instead, and i am interested in hearing other's opinions.

Interesting. The middle block just has the check for whether or not we need to run the remainder loop, right? I can definitely see this as kind of latch-like.

@jmellorcrummey , do you have an opinion on this?

Unsurprisingly, I have an opinion :-).

As the lead of a project building profiling tools, I am strongly against having any instructions map to line 0. Having address ranges without mappings completely distorts our view of the code and we have to add patches to cope with this case. All machine instructions should have a natural mapping if you put a bit of thought into it.

For instance, I sent comments back to IBM complaining about the lack of line mappings on machine instructions related to code generated for OpenMP offloading onto GPUs. Rather than having code intervals map to line 0, I advocated that they map boilerplate instructions back to the directive for which the code is being generated.

I strongly believe that instructions in what was described as the "middle block" above should map back to a plausible line. In the case of transformations such as common subexpression elimination that involve a one-to-many mapping, the mapping should be one of the lines where an operator appeared.

As the lead of a project building profiling tools, I am strongly against having any instructions map to line 0.

This is probably not what you meant, but for completeness I feel like I should point out that there are many legitimate situations where LLVM generates a line 0 location. The most prominent example is instruction merging: Since both LLVM IR and DWARF currently require each PC address to map to exactly one source location, LLVM's will insert a line 0 location when it merges two instructions with distinct source locations. I can't speak for profiling, but at least on the debugger side, the consensus is that potentially misleading information is worse than no information, because if there is no way to distinguish "always correct" from "maybe correct" information, the user can't trust any information.

As the lead of a project building profiling tools, I am strongly against having any instructions map to line 0.

This is probably not what you meant, but for completeness I feel like I should point out that there are many legitimate situations where LLVM generates a line 0 location. The most prominent example is instruction merging: Since both LLVM IR and DWARF currently require each PC address to map to exactly one source location, LLVM's will insert a line 0 location when it merges two instructions with distinct source locations. I can't speak for profiling, but at least on the debugger side, the consensus is that potentially misleading information is worse than no information, because if there is no way to distinguish "always correct" from "maybe correct" information, the user can't trust any information.

FWIW, the motivating case for the introduction of getMergedLocation, which generally handles the insertion of line 0 locations in this case was to improve the performance of profile-guided optimized code when using a sampling profiler tool (see http://llvm.org/devmtg/2017-03//assets/slides/delivering_sample_based_pgo_for_playstation_r_4.pdf ).

As the lead of a project building profiling tools, I am strongly against having any instructions map to line 0.

This is probably not what you meant, but for completeness I feel like I should point out that there are many legitimate situations where LLVM generates a line 0 location. The most prominent example is instruction merging: Since both LLVM IR and DWARF currently require each PC address to map to exactly one source location, LLVM's will insert a line 0 location when it merges two instructions with distinct source locations. I can't speak for profiling, but at least on the debugger side, the consensus is that potentially misleading information is worse than no information, because if there is no way to distinguish "always correct" from "maybe correct" information, the user can't trust any information.

When merging happens, I don't see why mapping to 0 is better than attributing it to one of a set of locations that a fused operation came from. I see that as representative of reality. I would prefer a rule stating that if two operations are merged, always pick the lexicographically least (file, line number) pair. In the case of mappings for merged instructions, I see either mapping as "correct".

FWIW, the motivating case for the introduction of getMergedLocation, which generally handles the insertion of line 0 locations in this case was to improve the performance of profile-guided optimized code when using a sampling profiler tool (see http://llvm.org/devmtg/2017-03//assets/slides/delivering_sample_based_pgo_for_playstation_r_4.pdf ).

I looked at the slides. The PGO is nice work. The slide about getMergedLocation describes the rationale for mapping to line 0 rather than another line mapping if the line mappings don't agree. I don't see it wrong to simply choose one.

vsk added a comment.Apr 18 2019, 4:46 PM

As the lead of a project building profiling tools, I am strongly against having any instructions map to line 0.

This is probably not what you meant, but for completeness I feel like I should point out that there are many legitimate situations where LLVM generates a line 0 location. The most prominent example is instruction merging: Since both LLVM IR and DWARF currently require each PC address to map to exactly one source location, LLVM's will insert a line 0 location when it merges two instructions with distinct source locations. I can't speak for profiling, but at least on the debugger side, the consensus is that potentially misleading information is worse than no information, because if there is no way to distinguish "always correct" from "maybe correct" information, the user can't trust any information.

When merging happens, I don't see why mapping to 0 is better than attributing it to one of a set of locations that a fused operation came from. I see that as representative of reality.

I would prefer a rule stating that if two operations are merged, always pick the lexicographically least (file, line number) pair. In the case of mappings for merged instructions, I see either mapping as "correct".

I don't see things this way. Arbitrarily picking a location can result in a false execution history being presented to the user. Line 0 works a lot better imo, because it a) doesn't actively mislead and b) preserves inline scope.

FWIW, the motivating case for the introduction of getMergedLocation, which generally handles the insertion of line 0 locations in this case was to improve the performance of profile-guided optimized code when using a sampling profiler tool (see http://llvm.org/devmtg/2017-03//assets/slides/delivering_sample_based_pgo_for_playstation_r_4.pdf ).

I looked at the slides. The PGO is nice work. The slide about getMergedLocation describes the rationale for mapping to line 0 rather than another line mapping if the line mappings don't agree. I don't see it wrong to simply choose one.

In D60831#1472518, @vsk wrote:

As the lead of a project building profiling tools, I am strongly against having any instructions map to line 0.

This is probably not what you meant, but for completeness I feel like I should point out that there are many legitimate situations where LLVM generates a line 0 location. The most prominent example is instruction merging: Since both LLVM IR and DWARF currently require each PC address to map to exactly one source location, LLVM's will insert a line 0 location when it merges two instructions with distinct source locations. I can't speak for profiling, but at least on the debugger side, the consensus is that potentially misleading information is worse than no information, because if there is no way to distinguish "always correct" from "maybe correct" information, the user can't trust any information.

When merging happens, I don't see why mapping to 0 is better than attributing it to one of a set of locations that a fused operation came from. I see that as representative of reality.

I would prefer a rule stating that if two operations are merged, always pick the lexicographically least (file, line number) pair. In the case of mappings for merged instructions, I see either mapping as "correct".

I don't see things this way. Arbitrarily picking a location can result in a false execution history being presented to the user. Line 0 works a lot better imo, because it a) doesn't actively mislead and b) preserves inline scope.

FWIW, the motivating case for the introduction of getMergedLocation, which generally handles the insertion of line 0 locations in this case was to improve the performance of profile-guided optimized code when using a sampling profiler tool (see http://llvm.org/devmtg/2017-03//assets/slides/delivering_sample_based_pgo_for_playstation_r_4.pdf ).

I looked at the slides. The PGO is nice work. The slide about getMergedLocation describes the rationale for mapping to line 0 rather than another line mapping if the line mappings don't agree. I don't see it wrong to simply choose one.

A good example for why arbitrarily picking one location during merging is when the two locations are coming from different inlined instances of different functions (or perhaps even worse: two inlined instances of the same function). I would assume that even in profiling a wrong backtrace would invalidate or render untrustworthy large parts of any analysis being done one this data.

anemet added inline comments.Apr 18 2019, 6:14 PM
llvm/test/Transforms/LoopVectorize/unsafe-dep-remark.ll
14 ↗(On Diff #195698)

Yes, even though it looks the version on the left was able to pinpoint the offending memory operation that is actually not the case. We are using the debug location of the loop in the remark, so that was only coincidence.

A good example for why arbitrarily picking one location during merging is when the two locations are coming from different inlined instances of different functions (or perhaps even worse: two inlined instances of the same function). I would assume that even in profiling a wrong backtrace would invalidate or render untrustworthy large parts of any analysis being done one this data.

I think we fundamentally disagree on what is good for profiling. If for instance two "load" instructions from different inlined contexts merge, I would prefer that they be charged to one of the locations where the instruction came from and the other gets the benefit of that operation for free. (That's what common subexpression elimination is for!) Saying that one got merged into the other or vice versa is an acceptable view. I don't see this as misleading, untrustworthy, or invalidating anything.

If the instructions come from two different files, I guess that you won't even associate it with one of the files. So, I have instructions in the binary that won't be covered by line map entries at all. Having LITERALLY NO INFORMATION where they came from without tracing instruction generation through the compiler is something that I fundamentally oppose.

A good example for why arbitrarily picking one location during merging is when the two locations are coming from different inlined instances of different functions (or perhaps even worse: two inlined instances of the same function). I would assume that even in profiling a wrong backtrace would invalidate or render untrustworthy large parts of any analysis being done one this data.

I think we fundamentally disagree on what is good for profiling.

That is possible.

If for instance two "load" instructions from different inlined contexts merge, I would prefer that they be charged to one of the locations where the instruction came from and the other gets the benefit of that operation for free. (That's what common subexpression elimination is for!) Saying that one got merged into the other or vice versa is an acceptable view. I don't see this as misleading, untrustworthy, or invalidating anything.

If the instructions come from two different files, I guess that you won't even associate it with one of the files. So, I have instructions in the binary that won't be covered by line map entries at all. Having LITERALLY NO INFORMATION where they came from without tracing instruction generation through the compiler is something that I fundamentally oppose.

Precisely to that point I was hoping to provide a few compelling counterexamples to demonstrate why potentially wrong information is actually worse than no information.

But I guess what this really boils down to is that all debug information in LLVM IR is (at the moment) "must" information that is supposed to be either 100% reliable or omitted. It sounds like for the kinds of analysis that you are doing, you would also benefit from a second category of "may" information that may or may not be valid. That's a legitimate ask, but if we wanted to include this in LLVM IR, we would need to qualify it as not reliable, so it doesn't, for example, leak into debug info that software developers rely on.

What I would find more interesting would be extending LLVM IR to support a one-to-many mapping from PC address to source locations. This way we would also be up front about the fact that the source location is one out of a set, but we then could use additional contextual information (such as the current backtrace and DWARF call site information) to potentially disambiguate them before use.

Precisely to that point I was hoping to provide a few compelling counterexamples to demonstrate why potentially wrong information is actually worse than no information.

But I guess what this really boils down to is that all debug information in LLVM IR is (at the moment) "must" information that is supposed to be either 100% reliable or omitted. It sounds like for the kinds of analysis that you are doing, you would also benefit from a second category of "may" information that may or may not be valid. That's a legitimate ask, but if we wanted to include this in LLVM IR, we would need to qualify it as not reliable, so it doesn't, for example, leak into debug info that software developers rely on.

This will be my last comment on the topic and then everyone can get back to work :-). I agree with the LLVM IR requirement that one not to lie to application developers; lies are worse than nothing. However, I don't feel that you have adequately explained why attributing to any of the available (file, line) mappings for an instruction that has been merged is incorrect and not 100% reliable. Just because we can't include all contributing mappings for a merged instruction doesn't make any one unreliable; I see attributing to any one of a set of mappings as 100% correct, even though it is only partial information.

This will be my last comment on the topic and then everyone can get back to work :-). I agree with the LLVM IR requirement that one not to lie to application developers; lies are worse than nothing. However, I don't feel that you have adequately explained why attributing to any of the available (file, line) mappings for an instruction that has been merged is incorrect and not 100% reliable. Just because we can't include all contributing mappings for a merged instruction doesn't make any one unreliable; I see attributing to any one of a set of mappings as 100% correct, even though it is only partial information.

Sorry for jumping in after you said you would stop, but I have been away.
Personally I find this example compelling: When there's an if/then/else construct, and some instruction is hoisted above the if, you could assign it to (for example) the source location of the 'then' block. Now in your training run, the 'then' block can be given 100% of executions (because that's what the hoisted instruction says) even if the 'else' block was chosen 100% of the time. I find the resulting profile is "incorrect and not 100% reliable" and when used for PGO will produce sub-optimal code. It is hard to imagine any other valid conclusion for this use-case.

As Adrian mentioned, being able to attribute multiple locations would be preferable to attributing line 0, and I hope we can make that happen in the future.

bjope added a subscriber: bjope.Apr 24 2019, 7:37 AM

This will be my last comment on the topic and then everyone can get back to work :-). I agree with the LLVM IR requirement that one not to lie to application developers; lies are worse than nothing. However, I don't feel that you have adequately explained why attributing to any of the available (file, line) mappings for an instruction that has been merged is incorrect and not 100% reliable. Just because we can't include all contributing mappings for a merged instruction doesn't make any one unreliable; I see attributing to any one of a set of mappings as 100% correct, even though it is only partial information.

Sorry for jumping in after you said you would stop, but I have been away.
Personally I find this example compelling: When there's an if/then/else construct, and some instruction is hoisted above the if, you could assign it to (for example) the source location of the 'then' block. Now in your training run, the 'then' block can be given 100% of executions (because that's what the hoisted instruction says) even if the 'else' block was chosen 100% of the time. I find the resulting profile is "incorrect and not 100% reliable" and when used for PGO will produce sub-optimal code. It is hard to imagine any other valid conclusion for this use-case.

As Adrian mentioned, being able to attribute multiple locations would be preferable to attributing line 0, and I hope we can make that happen in the future.

Even with multiple locations it might be weird when using those locations for profiling (or code coverage) and not just as debug information to see which source line the code origins from. If the hoisted instruction has locations from both the 'if', 'elseif' and 'else' blocks it might appear as if we have executed all paths, even if the 'else' block was chosen 100% of the time. So I guess that the samples with multiple locations should be discarded when doing profiling (perhaps depending on the goal with the profiling).

Ping.

A summary of the discussion so far:

The middle block is quite "latch like" in that it dictates control flow between the vectorized and scalar loops, but it can contain additional
instructions. For example, it may also handle converting the vectorized results into a scalar value.

The patch maps all middle block instructions to the line of the loop latch branch.

An initial question of "How much will this affect profiling tools that rely on this debug info?" was not fully addressed because
discussions on an alternative solution I mentioned eclipsed the original question.

Ping.

A summary of the discussion so far:

The middle block is quite "latch like" in that it dictates control flow between the vectorized and scalar loops, but it can contain additional
instructions. For example, it may also handle converting the vectorized results into a scalar value.

The patch maps all middle block instructions to the line of the loop latch branch.

An initial question of "How much will this affect profiling tools that rely on this debug info?" was not fully addressed because
discussions on an alternative solution I mentioned eclipsed the original question.

My take-away from the discussion was this: It is desirable to map the instructions to something in the loop (e.g., not line 0), unless doing so will provide confusing information to the mapping that PGO uses to optimize the relevant branches. Am I correct in saying that this latter issue is of minimal concern in this case?

My take-away from the discussion was this: It is desirable to map the instructions to something in the loop (e.g., not line 0), unless doing so will provide confusing information to the mapping that PGO uses to optimize the relevant branches. Am I correct in saying that this latter issue is of minimal concern in this case?

As far as I see it you are correct. There should be negligible to no impact on PGO.

hfinkel accepted this revision.Apr 30 2019, 9:54 AM

My take-away from the discussion was this: It is desirable to map the instructions to something in the loop (e.g., not line 0), unless doing so will provide confusing information to the mapping that PGO uses to optimize the relevant branches. Am I correct in saying that this latter issue is of minimal concern in this case?

As far as I see it you are correct. There should be negligible to no impact on PGO.

Sounds good. LGTM.

This revision is now accepted and ready to land.Apr 30 2019, 9:54 AM
This revision was automatically updated to reflect the committed changes.
kcc added a subscriber: kcc.May 7 2019, 1:48 PM

Hi,

This change broke most of the buildbots:
http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/32015/steps/check-llvm%20asan/logs/stdio
lib/CodeGen/LexicalScopes.cpp:176: llvm::LexicalScope *llvm::LexicalScopes::getOrCreateRegularScope(const llvm::DILocalScope *): Assertion `cast<DISubprogram>(Scope)->describes(&MF->getFunction())' failed.

I am going to revert it.

Thanks!

kcc added a comment.May 7 2019, 1:55 PM

reverted in r360190.

Orlando updated this revision to Diff 199443.May 14 2019, 8:17 AM

This update addresses the problems with the original patch that caused the built bot failures
(sorry about that!).

The original patch fixed SplitBlockPredecessors(..) so that when splitting a loop header
the new preheader was assigned the DebugLoc of the start of the loop. The DebugLoc
was determined by calling Loop::getStartLoc(), which relies on loop metadata (MD_Loop).

Currently, when inlining a loop, the loop metadata is cloned without modification. This means
that the loop metadata "start" and "end" locations refer to the locations in the cloned function.
I.e. The fact that the loop has been inlined is ignored in the loop metadata.

This update copies and then modifies the loop metadata during inlining such that the "start" and
"end" DILocations include "inlinedAt" metadata.

There have been a few very minor changes to the original patch, so the files of interest are:
"InlineFunction.cpp", "inlined-loop-metadata.ll", "inlined-argument.ll", "bcmp-debugify-remarks.ll"

A new regression test "inlined-loop-metadata.ll" covers the loop metadata problem.
"bcmp-debugify-remarks.ll" and "inlined-argument.ll" needed to be updated
because they rely on the old incorrect metadata.

Thank you,
Orlando

Orlando reopened this revision.May 14 2019, 8:23 AM
This revision is now accepted and ready to land.May 14 2019, 8:23 AM
Orlando requested review of this revision.May 14 2019, 8:27 AM

It sounds like inlining a function with a loop has a bug with respect to how the loop metadata is handled, and your patch merely tripped over that. Would it be reasonable to fix the inlining-loop-metadata bug separately first? And then the original patch is likely to Just Work?
Fixing one bug at a time is more in line with project practices.

Orlando edited the summary of this revision. (Show Details)May 15 2019, 1:43 AM
Orlando updated this revision to Diff 200296.May 20 2019, 8:34 AM

The inlined loop metadata part has been separated into D61933 (rL361149) which this patch is now based on.

This patch "just works" with D61933 and a few test changes: "bcmp-debugify-remarks.ll" and "inlined-argument.ll" needed to be updated because they rely on the old incorrect metadata.

Ping.

The major change to this previously accepted patch is the modification of "bcmp-debugify-remarks.ll" and "inlined-argument.ll".
Minor changes include spelling corrections in comments and removing some superfluous debug data from the new tests.

jmorse added a subscriber: jmorse.Jun 3 2019, 7:58 AM

Looking at the delta between the approved and ~final version, it looks good to me, seeing how the bug this tripped over was fixed in D61933 and nothing in the code has changed.

(I'll leave it a day before clicking the green button, in case people have further opinions).

jmorse accepted this revision.Jun 4 2019, 2:44 AM

LGTM -- this is the already-accepted patch with trivial change, the real difference is that a bug got fixed elsewhere.

This revision is now accepted and ready to land.Jun 4 2019, 2:44 AM
This revision was automatically updated to reflect the committed changes.

This patch broke our internal CI. You can reproduce the issue by downloading https://gist.github.com/adrian-prantl/ba88912878db855ec96534e6510246e6 (this is AArch64 bitcode for a file in the Swift stdlib) and running

clang "-cc1" "-triple" "arm64-apple-ios7.0.0" "-emit-obj" "-disable-llvm-passes" "-target-abi" "darwinpcs" "-O2" "-x" "ir"  "-o" "-" test.ll

!dbg attachment points at wrong subprogram for function
!631 = distinct !DISubprogram(name: "__hidden#25349_", scope: !9, file: !9, line: 318, type: !10, scopeLine: 319, flags: DIFlagPrototyped, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !5)
%"__ir_hidden#3555_"* (%"__ir_hidden#3418_"*, %"__ir_hidden#3549_"*)* @"\01__hidden#24886_"
  br label %19, !dbg !653
!653 = !DILocation(line: 258, column: 5, scope: !643)
!643 = distinct !DISubprogram(name: "__hidden#25389_", scope: !9, file: !9, line: 247, type: !10, scopeLine: 247, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !5)
!643 = distinct !DISubprogram(name: "__hidden#25389_", scope: !9, file: !9, line: 247, type: !10, scopeLine: 247, flags: DIFlagPrototyped, spFlags: DISPFlagLocalToUnit | DISPFlagDefinition | DISPFlagOptimized, unit: !5)
fatal error: error in backend: Broken module found, compilation aborted!

Would you mind reverting the patch? Please let me know if you need any help investigating this. My assumption is that the patch somewhere doesn't take the inlinedAt: debug info into account.

No problem at all - reverted with rGa94715639619. I'll take a look at this next week, thank you for the info.

Thanks for the quick response! Please let me know if I can help debugging this.

Orlando added a comment.EditedJun 17 2019, 2:36 AM

Hi Adrian.

This seems to be a case of having old, incorrect, metadata baked into the test.
The 'startLoc' (!680) and 'endLoc' (!681) in loop metadta for loop !679 are both
missing an 'inlinedAt' node.

!680 = !DILocation(line: 258, column: 5, scope: !659)
!681 = !DILocation(line: 259, column: 30, scope: !659)

Regenerating this test with patch rL361149 should fix this.

Note: clang reports no errors when the follwoing DILocations are modified by
hand. !660 is a DILocation with scope !647 which refers to a location within
the caller.

!680 = !DILocation(line: 258, column: 5, scope: !659, inlinedAt: !660)
!681 = !DILocation(line: 259, column: 30, scope: !659, inlinedAt: !660)

This seems to be a case of having old, incorrect, metadata baked into the test.
The 'startLoc' (!680) and 'endLoc' (!681) in loop metadta for loop !679 are both
missing an 'inlinedAt' node.

Thank you very much for the analysis!

There is one thing I don't understand:
It looks like !680 is only used in the !llvm.loop !679 metadata. How can that trigger a !dbg attachment points at wrong subprogram for function verifier failure?

https://github.com/llvm/llvm-project/blob/5d00c3060e11b1b8725c0af110f011c4d110d39a/llvm/lib/IR/Verifier.cpp#L2337

IIUC, that check is only performed for !dbg attachments and should skip over !llvm.look attachments. Do loop attachements leak into the !dbg attachments? If yes, we should probably extend the verifier to also check loop attachments in that loop, so this situation can be detected and rejected at import time.

There is one thing I don't understand:
It looks like !680 is only used in the !llvm.loop !679 metadata. How can that trigger a !dbg attachment points at wrong subprogram for function verifier failure?

Empty loop preheaders are sometimes removed and later regenerated. This patch
uses loop metadata to identify the start of the loop when regenerating the
preheader. The DILocation of the start node (node index 1) of the loop
metadata is given to the new preheader's branch.

I put up a Verifier patch at https://reviews.llvm.org/D63499. Once that patch has landed, this patch should be safe to re-apply.

Thanks, I've now resubmitted this patch (1251cac62af5).

Thanks for your help!