This is an archive of the discontinued LLVM Phabricator instance.

[Debug-Info][CodeView] Fix GUID string generation for MSVC generated objects.
ClosedPublic

Authored by CarlosAlbertoEnciso on May 25 2021, 8:24 AM.

Details

Summary

Fix invalid GUID string generation described by https://bugs.llvm.org/show_bug.cgi?id=50459

The valid format for a GUID is {XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}
where X is a hex digit (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F).

The length of the individual components must be: 8, 4, 4, 4, 12.

Diff Detail

Event Timeline

CarlosAlbertoEnciso requested review of this revision.May 25 2021, 8:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 25 2021, 8:24 AM

This sounds like an appropriate place to apply Postel's Law: Be conservative in what you do, be liberal in what you accept from others. I'm be happy to always generate uppercase hex digits when formatting a GUID to registry format, but maybe we should also consider accepting upper or lowercase when reading a formatted GUID.

Some clarifications:

  • GUIDs are numbers not strings. Microsoft claims they are essentially UUIDs (per RFC 4122).
  • The cited document is actually specific to the string representation of GUIDs as required by Windows Installer. In fact, the braces indicate that it's the registry format. Windows installer--and other tools that use formatted GUIDs as registry keys--do string comparisons rather than converting the string representation to a numeric GUID that can be compared directly.
  • The bit about using uppercase for the hex digits in the string representation does not apply everywhere, not even in the Microsoft world. Microsoft's GUIDGEN tool formats GUIDs in various ways with respect to the case of the hex digits. And in RFC 4122 says string representations of UUIDs should use lowercase.
CarlosAlbertoEnciso added a comment.EditedMay 25 2021, 1:48 PM

This sounds like an appropriate place to apply Postel's Law: Be conservative in what you do, be liberal in what you accept from others. I'm be happy to always generate uppercase hex digits when formatting a GUID to registry format, but maybe we should also consider accepting upper or lowercase when reading a formatted GUID.

Some clarifications:

  • GUIDs are numbers not strings. Microsoft claims they are essentially UUIDs (per RFC 4122).
  • The cited document is actually specific to the string representation of GUIDs as required by Windows Installer. In fact, the braces indicate that it's the registry format. Windows installer--and other tools that use formatted GUIDs as registry keys--do string comparisons rather than converting the string representation to a numeric GUID that can be compared directly.
  • The bit about using uppercase for the hex digits in the string representation does not apply everywhere, not even in the Microsoft world. Microsoft's GUIDGEN tool formats GUIDs in various ways with respect to the case of the hex digits. And in RFC 4122 says string representations of UUIDs should use lowercase.

@amccarth Thanks for your feedback.

I have to upload a correct patch to include the binary files (Using arcanist), as the tests are failing due to invalid files.

Correct patch to include binary files used for the test.

I'm be happy to always generate uppercase hex digits when formatting a GUID to registry format, but maybe we should also consider accepting upper or lowercase when reading a formatted GUID.

My intention for this patch is to fix the fatal error in the yaml2obj tool.

I would suggest to open another change request to include the option to accept upper or lowercase. But if you feel that this patch should include that change, it is fine with me.

Thanks

I'm fine focusing on this part to get the bug fixed. We can consider looking at accepting upper and lower case letters in the future.

llvm/unittests/DebugInfo/CodeView/GUIDFormatTest.cpp
38

I see why it's done, but it's a bit confusing that this definition of MSGuid is different from the one in the formatting code. In fact, this one has 18 bytes of data and may contain padding.

If you were to move this definition down to the one place it's used, it might not be as confusing.

83

It's not clear to my why we need 16 of these tests. Is this overkill?

The names Data_xx don't really clue me in as to what's going on in each test set. I see the "Shifting xx" comments, but that still doesn't quite do it for me.

From the comments, I expected a set of tests for Variant 1 (UUIDs) and one for Variant 2 (MS/COM-style GUIDs).

I'm fine focusing on this part to get the bug fixed. We can consider looking at accepting upper and lower case letters in the future.

Thanks for your feedback. I will address your comments and update the patch.

Address comments from @amccarth:

  • Only the Variant 2 (MS/COM-style GUIDs) are used in the tests.
  • Too many test cases.
llvm/unittests/DebugInfo/CodeView/GUIDFormatTest.cpp
38

I see why it's done, but it's a bit confusing that this definition of MSGuid is different from the one in the formatting code. In fact, this one has 18 bytes of data and may contain padding.

To avoid confusion between those definitions, I have renamed the one used in the tests.

If you were to move this definition down to the one place it's used, it might not be as confusing.

Moved the definition down to the place where is used.

83

It's not clear to my why we need 16 of these tests. Is this overkill?

The idea was to shift a zero value across all components. But as you pointed it out, it can be overkill. I have removed most of the tests. Keeping just one that covers the corner cases.

The names Data_xx don't really clue me in as to what's going on in each test set. I see the "Shifting xx" comments, but that still doesn't quite do it for me.

Rename the test as 'ValidateFormat'.

From the comments, I expected a set of tests for Variant 1 (UUIDs) and one for Variant 2 (MS/COM-style GUIDs).

The test are intended only for the Variant 2. Added a explanatory note.

amccarth accepted this revision.Jun 11 2021, 3:01 PM

Thanks for the changes. This LGTM now.

llvm/unittests/DebugInfo/CodeView/GUIDFormatTest.cpp
94

nit: s/Shif/Shift/

This revision is now accepted and ready to land.Jun 11 2021, 3:01 PM

@amccarth: Thanks very much for your review.

llvm/unittests/DebugInfo/CodeView/GUIDFormatTest.cpp
94

Changed 'Shif' to 'Shift'

This revision was landed with ongoing or failed builds.Jun 14 2021, 10:54 PM
This revision was automatically updated to reflect the committed changes.