This is an archive of the discontinued LLVM Phabricator instance.

New option that adds the DiagID enum name and index to Diagnostic output.
AbandonedPublic

Authored by hintonda on Jul 8 2017, 10:24 PM.

Details

Reviewers
srhines
rjmccall
Summary

This option helps locate the origin of a diagnostic message
by outputing the enum name and index associated with a specific
DiagID, allowing users to grep the code for the enum name directly
without having to find it in the td files first.

Additional ideas:

  1. add another option to pass in the index (or enum) to force an assert or backtrace when a specific DiagID is seen.
  2. capture FILE and LINE when a diagnostic is created and output it. This would make it easier to find the specific instance, and verify all instances are actually tested. Currently, it's almost impossible to determine if all instances are actually tested.
  3. keep track of the permutations and make sure each one is tested.

Event Timeline

hintonda created this revision.Jul 8 2017, 10:24 PM

Thanks, that's pretty cool!

How bigger did the clang binary get after you've added all these strings?
I feel like this is more of a CC1 option as well.

I have some feedback for your additional ideas:

add another option to pass in the index (or enum) to force an assert or backtrace when a specific DiagID is seen.

That sounds quite useful, but could it be something that's more suited for an external debugging script? I have a personal script that computes the enum value for a particular diagnostic, launches clang in lldb, sets a breakpoint for that particular diagnostic enum and runs clang. I could work on upstreaming it into clang/utils if people are interested.

capture FILE and LINE when a diagnostic is created and output it. This would make it easier to find the specific instance, and verify all instances are actually tested. Currently, it's almost impossible to determine if all instances are actually tested.

I reckon the first part (find the specific instance) could be useful, but I think that if you can force a backtrace on a specific DiagID then it becomes less useful. I disagree with the second part, can't you use our coverage bots and see if the all places where the diagnostic is emitted are covered to see if they are tested? It might be tedious to find these places, but maybe we can add a search for our coverage viewer so you quickly find the lines that have the name of diagnostic?

Thanks, that's pretty cool!

How bigger did the clang binary get after you've added all these strings?

Currently looks like around 200k (4534 @ 33 byte avg length + ptr). If this is too much, we could make it conditional based on NDEBUG or some other macro at compile time.

I feel like this is more of a CC1 option as well.

Sure, I can do that.

I have some feedback for your additional ideas:

add another option to pass in the index (or enum) to force an assert or backtrace when a specific DiagID is seen.

That sounds quite useful, but could it be something that's more suited for an external debugging script? I have a personal script that computes the enum value for a particular diagnostic, launches clang in lldb, sets a breakpoint for that particular diagnostic enum and runs clang. I could work on upstreaming it into clang/utils if people are interested.

This type of behavior (either an assert/bt or coupled with debugger) could be useful as a quick and easy solution. However, capturing __FILE__ and __LINE__ when a diagnostic is reported, would be my preference. However, that change would be very invasive and should probably be handled by a source to source transformation -- I did some of this by hand as a proof of concept, but doing all of clang manually would take quite a while, not to mention various tools that also use diagnostics.

capture FILE and LINE when a diagnostic is created and output it. This would make it easier to find the specific instance, and verify all instances are actually tested. Currently, it's almost impossible to determine if all instances are actually tested.

I reckon the first part (find the specific instance) could be useful, but I think that if you can force a backtrace on a specific DiagID then it becomes less useful. I disagree with the second part, can't you use our coverage bots and see if the all places where the diagnostic is emitted are covered to see if they are tested? It might be tedious to find these places, but maybe we can add a search for our coverage viewer so you quickly find the lines that have the name of diagnostic?

Agreed, but the problem with relying exclusively on coverage is that it can't cover the various permutations, e.g., "%select{A|B|C}0". It's pretty difficult to tell if A, B, and C were actually tested -- or if that makes a difference.

If we included enum name (and permutation) with __FILE__ and __LINE__ in the output, then we could quickly analyze the test output and produce a simple report. I think this would also be helpful in allowing test writers to see exactly which diagnostic report was triggered for each test.

hintonda updated this revision to Diff 105961.Jul 10 2017, 9:52 PM
  • Make option cc1 only. Rename function, and add test.

Currently looks like around 200k (4534 @ 33 byte avg length + ptr). If this is too much, we could make it conditional based on NDEBUG or some other macro at compile time.

I think this is mostly useful during development, so some conditional mechanism would make sense IMO. I think that it makes sense to avoid the growth of our release binaries if such growth can be avoided.

This type of behavior (either an assert/bt or coupled with debugger) could be useful as a quick and easy solution. However, capturing __FILE__ and __LINE__ when a diagnostic is reported, would be my preference. However, that change would be very invasive and should probably be handled by a source to source transformation -- I did some of this by hand as a proof of concept, but doing all of clang manually would take quite a while, not to mention various tools that also use diagnostics.

I can definitely see the usefulness of __FILE__ and __LINE__ markers. However, I think that should be a development only feature as well. I agree about the source to source transformation, a refactoring tool should handle it.

Agreed, but the problem with relying exclusively on coverage is that it can't cover the various permutations, e.g., "%select{A|B|C}0". It's pretty difficult to tell if A, B, and C were actually tested -- or if that makes a difference.

If we included enum name (and permutation) with __FILE__ and __LINE__ in the output, then we could quickly analyze the test output and produce a simple report. I think this would also be helpful in allowing test writers to see exactly which diagnostic report was triggered for each test.

That makes sense, thanks for pointing it out. I agree, It would be useful if we had such "coverage" reports for diagnostics.

hintonda updated this revision to Diff 106094.Jul 11 2017, 1:52 PM
  • Only maintain enum names in debug builds. Current cost of maintaining enum names is 176k.
rjmccall edited edge metadata.Jul 11 2017, 7:36 PM

This is a cute hack, but... I'm not sure I accept the premise that it's a noteworthy obstacle to Clang development to do two greps instead of one. And I don't think I've ever had to debug a diagnostic where FILE and LINE information would've been helpful.

Also, you could easily write a script that could automatically annotate arbitrary text with this information retroactively by turning the diagnostic text into a bunch of regular expressions, and then you wouldn't even need to re-run the compiler.

It's just an effort to make the code a bit more accessible, especially for new users (or ones not used to running find/grep).

Steve had suggested adding an option that took the entire message and matched it when it was produced. However, that won't work very well since the message isn't actually produced until just before it is printed, which means the assert/backtrace isn't in the correctly location if the it's delayed. That's why I chose to simply print just the enum name and index.

Adding __FILE__/__LINE__ info would help identify the exact location (could be multiple for the same error message), but due to the way the code is structured it isn't really doable.

It might be kinda fun writing the script you suggest -- I'll look into it, but printing the enum in the first place for little or no cost seems a bit more elegant.

To me, features that only serve to help compiler development need to meet a higher bar than this. This just seems really marginal.

Like Alex said, you should be able to pretty easily write a debugger script that breaks when it sees a specific diagnostic, or a diagnostic at a specific line and column. That would be quite welcome in utils/.

hintonda abandoned this revision.Jul 12 2017, 6:13 AM

Okay, sounds good. Look forward to seeing Alex's script...

arphaman added a comment.EditedJul 12 2017, 6:32 AM

My script relies on a hack to map the name of the diagnostic to the enum value. We need to come up with a better way to map the diagnostic name to the enum value. I propose a new utility tool that would take the name of the diagnostic and map it back to the enum value.

Not sure how to do this all in a script, but perhaps we could enhance diagtool to generate this mapping for you. It currently only lists warnings, but I don't think it would be difficult to extend it and generate a mapping you could use in your script.

Right. I was aware of the diagtool before, but didn't really look into what it did. TIL! It would make sense to add this kind of mapping functionality to that tool then.

I'd be happy to do that if it would help. If so, should I do it here create a new diff?

Perhaps we might even make sense add the ability to pass in a message and find the matching name/index.

I was impatient, so I already started on a patch for diagtool. I'll post it soon.

Great, thanks...