This is an archive of the discontinued LLVM Phabricator instance.

Support build-ids of other sizes than 16 in UUID::SetFromStringRef
ClosedPublic

Authored by jarin on May 28 2020, 1:15 PM.

Details

Summary

SBTarget::AddModule currently handles the UUID parameter in a very weird way: UUIDs with more than 16 bytes are trimmed to 16 bytes. On the other hand, shorter-than-16-bytes UUIDs are completely ignored. In this patch, we change the parsing code to handle UUIDs of arbitrary size.

To support arbitrary size UUIDs in SBTarget::AddModule, this patch changes UUID::SetFromStringRef to parse UUIDs of arbitrary length. We subtly change the semantics of SetFromStringRef - SetFromStringRef now only succeeds if the entire input is consumed to prevent some prefix-parsing confusion. This is up for discussion, but I believe this is more consistent - we always return false for invalid UUIDs rather than sometimes truncating to a valid prefix. Also, all the call-sites except the API and interpreter seem to expect to consume the entire input.

This also adds tests for adding existing modules 4-, 16-, and 20-byte build-ids. Finally, we took the liberty of testing the minidump scenario we care about - removing placeholder module from minidump and replacing it with the real module.

Diff Detail

Event Timeline

jarin created this revision.May 28 2020, 1:15 PM
jarin marked an inline comment as done.May 28 2020, 10:45 PM
jarin added inline comments.
lldb/source/API/SBTarget.cpp
1593 ↗(On Diff #267004)

The four lines above should perhaps be a method of UUID. Something like SetFromStringRef, but for any size. What do you think?

jarin updated this revision to Diff 267241.May 29 2020, 8:23 AM
jarin retitled this revision from Support build-ids of other sizes than 16 in SBTarget::AddModule to Support build-ids of other sizes than 16 in UUID::SetFromStringRef.
jarin edited the summary of this revision. (Show Details)
jankratochvil added inline comments.
lldb/source/Utility/UUID.cpp
92–93

Now the return type could be bool.

I haven't been paying close attention to this patch, but allowing 20 bytes makes sense. In GNU ld and gold, --build-id is --build-id=sha1 (20 bytes). The 3 linkers (plus LLD) don't have a way to produce a build ID longer than 20 bytes.

jarin updated this revision to Diff 267490.May 30 2020, 11:14 PM
jarin marked an inline comment as done.

Change SetFromStringRef to return bool.

labath added a reviewer: friss.
labath added inline comments.
lldb/source/Utility/UUID.cpp
87–88

I don't think we should be restricting the size here in any way. It is possible to produce larger build-ids already (-Wl,--build-id=0xlonghexstring), and the rest of the UUID class does support arbitrary sizes. Some tools (e.g. llvm-readelf) will choke on them, but let's try to not make lldb one of those tools.

Using super-short uuids is most likely a bad idea, and will result in a lot of collisions, but if someone really wants to use a 3-byte "uuid", I don't see a reason to stop him here.

jarin updated this revision to Diff 267739.Jun 1 2020, 2:32 PM
jarin edited the summary of this revision. (Show Details)

Removed size restrictions on UUIDs.

jarin marked an inline comment as done.Jun 1 2020, 2:34 PM
jarin added inline comments.
lldb/source/Utility/UUID.cpp
87–88

Good idea, that makes the code even simpler.

92–93

I was worried about having to touch even more call sites, but perhaps it is not too bad.

jarin updated this revision to Diff 267754.EditedJun 1 2020, 3:45 PM

Added a test for missing nibble in UUID.

labath accepted this revision.Jun 2 2020, 2:06 AM

Looks good. Let's just wait a while to see if Fred has any comments. If you haven't already, you can use that time to get commit access. :)

lldb/source/Utility/UUID.cpp
66–67

I guess now obsoletes Fred's D80807.

(Btw, I actually liked how Fred's solution rejects strings which end in a trailing dash.)

This revision is now accepted and ready to land.Jun 2 2020, 2:06 AM
friss added inline comments.Jun 2 2020, 9:23 AM
lldb/source/Utility/UUID.cpp
66–67

Yeah, I didn't have a strong opinion before, but given we want to reject a buffer that isn't parsed completely, I think it's better to reject a buffer ending with a -. As the code would test p.size() anyway, we might as well use p.size() >= 2 as the loop condition.

Otherwise this LGTM. Thanks!

jarin updated this revision to Diff 268497.Jun 4 2020, 8:54 AM

Exclude UUID strings ending with "-".

jarin marked an inline comment as done.Jun 4 2020, 8:55 AM
This revision was automatically updated to reflect the committed changes.