This is an archive of the discontinued LLVM Phabricator instance.

Represent invalid UUIDs as UUIDs with length zero
ClosedPublic

Authored by labath on Jun 22 2018, 6:00 AM.

Details

Summary

During the previous attempt to generalize the UUID class, it was
suggested that we represent invalid UUIDs as length zero (previously, we
used an all-zero UUID for that). This meant that some valid build-ids
could not be represented (it's possible however unlikely that a checksum of
some file would be zero) and complicated adding support for variable
length build-ids (should a 16-byte empty UUID compare equal to a 20-byte
empty UUID?).

This patch resolves these issues by introducing a canonical
representation for an invalid UUID. The slight complication here is that
some clients (MachO) actually use the all-zero notation to mean "no UUID
has been set". To keep this use case working, I have introduced an
additional argument to the UUID constructor, which specifies whether an
all-zero vector should be considered a valid UUID. For the usages where
the UUID data comes from a MachO file, I set this argument to false.

I did not introduce a similar argument to the string parsing function
with the rationalle that if somebody went through the trouble of
spelling it out as a bunch of zeroes, he probably really means that.

Diff Detail

Repository
rL LLVM

Event Timeline

labath created this revision.Jun 22 2018, 6:00 AM
lemo added a comment.Jun 22 2018, 10:52 AM

The slight complication here is that
some clients (MachO) actually use the all-zero notation to mean "no UUID
has been set". To keep this use case working, I have introduced an
additional argument to the UUID constructor, which specifies whether an
all-zero vector should be considered a valid UUID. For the usages where
the UUID data comes from a MachO file, I set this argument to false.

What is the distinction between "no UUID has been set" and "invalid UUID"? I find this subtlety confusing, can we simplify it to just valid/not-valid UUIDs?

include/lldb/Utility/UUID.h
31 ↗(On Diff #152460)

switch to llvm::SmallVector<uint8_t, 20> as suggested by Greg?

42 ↗(On Diff #152460)

If we define this we should define at least the copy constructor as well (rule of 3/5). In this case the cleanest solution may be to allow the implicit definitions (which would also allow the implicit move operations - defining assignment operator as defaulted would inhibit the implicit move SMFs)

52 ↗(On Diff #152460)

is this really needed? I'd prefer the truly explicit (pun intended) IsValid()

95 ↗(On Diff #152460)

these can be simplified if we use llvm::SmallVector (or std::vector)

The slight complication here is that
some clients (MachO) actually use the all-zero notation to mean "no UUID
has been set". To keep this use case working, I have introduced an
additional argument to the UUID constructor, which specifies whether an
all-zero vector should be considered a valid UUID. For the usages where
the UUID data comes from a MachO file, I set this argument to false.

What is the distinction between "no UUID has been set" and "invalid UUID"? I find this subtlety confusing, can we simplify it to just valid/not-valid UUIDs?

That is what I am trying to do (although not completely successfully, it seems ;) ). Once you have a UUID object around there should be no distinction. You either have a valid uuid (for now, consisting of 16 or 20 bytes), or you don't.

However, during parsing you need to know the meaning of a "0000...0" UUID. In a MachO file (at least based on the comments in the code) this value is used to denote the fact that the object file has no UUID. For elf, a "000..0" build-id is a perfectly valid identifier (and the lack of a build-id is denoted by the absence of the build-id section). The extra constructor argument is my way of trying to support both scenarios. The other possibility I see is to have a some kind of a factory function for one of the options (or both). I don't really have a preference between the two.

include/lldb/Utility/UUID.h
31 ↗(On Diff #152460)

I'm deliberately keeping that for a separate patch. Here, I just want to prepare the ground by defining the "invalid" UUID more clearly. The part with the arbitrary UUID length will come after that.

42 ↗(On Diff #152460)

Good point. I've deleted the copy constructor altogether.

52 ↗(On Diff #152460)

My main reason for an operator bool is that it allows the if-declaration syntax (if (UUID u = getUUID()) doSomethingUsefulWith(u); The most llvm-y solution would be not have neither of these methods and use Optional<UUID> when you don't know if you have one, but as far as operator bool vs. isValid goes, both styles are used in llvm (and lldb).

labath updated this revision to Diff 152530.Jun 22 2018, 12:03 PM

Delete copy constructor.

However, during parsing you need to know the meaning of a "0000...0" UUID.
In a MachO file (at least based on the comments in the code) this value is
used to denote the fact that the object file has no UUID. For elf, a
"000..0" build-id is a perfectly valid identifier (and the lack of a
build-id is denoted by the absence of the build-id section). The extra
constructor argument is my way of trying to support both scenarios. The
other possibility I see is to have a some kind of a factory function for
one of the options (or both). I don't really have a preference between the
two.

One solution might be to encapsulate the MachO convention in the MachO
code: check in there (maybe through a helper function) if the UUID is
"000...0" and map it to the empty UUID in that case. The UUID interface
would not have to know/care about this convention. Would this work?

One solution might be to encapsulate the MachO convention in the MachO
code: check in there (maybe through a helper function) if the UUID is
"000...0" and map it to the empty UUID in that case. The UUID interface
would not have to know/care about this convention. Would this work?

I'm not sure if there is a suitable place for that function. This is needed in "ObjectFileMachO" and two dynamic loader plugins.

lemo added a comment.Jun 22 2018, 12:54 PM

I'm not sure if there is a suitable place for that function. This is
needed in "ObjectFileMachO" and two dynamic loader plugins.

Then your factory idea may be the next best thing. While we're at it, maybe
we can remove the UUID::SetBytes() from the public interface, and make the
UUID an immutable value type:

Ex. instead of:

UUID uuid;
...
uuid.SetBytes(...)

We'd have:

UUID uuid;

uuid = UUID(...);
or
uuid = { ... };
or
uuid = UUID::factory(...);

What do you think?

Would love to remove the "accept_zeroes" argument everywhere. Too much matching happens in LLDB and we can't have multiple shared libraries claiming zeros as their UUID

include/lldb/Utility/UUID.h
54 ↗(On Diff #152530)

I am not sure accept_zeroes is a good idea. LLDB does a lot of matching based on UUID and we can't have multiple shared libraries claiming to have zeroes as their UUID. It will cause chaos. Zeroes as a build ID is not very useful

Would love to remove the "accept_zeroes" argument everywhere. Too much matching happens in LLDB and we can't have multiple shared libraries claiming zeros as their UUID

A zero UUID is a problem only if it someone put it there to mean "I don't know". "Real" UUIDs seem to acknowledge the concept of a nil value https://en.wikipedia.org/wiki/Universally_unique_identifier#Nil_UUID, and I guess that's what Mac is following. But we're using this class to represent other things besides the "real" UUIDs, and for those a zero value is as valid as any other number (it's extremely unlikely that a 20-byte hash will come out all-zero, but that isn't so much the case for a 32-bit CRC). I think the best we can do here is make the choice between the two treatments as explicit as possible. I hope the explicitly named constructor functions will be enough to achieve that.

labath updated this revision to Diff 152699.Jun 25 2018, 8:22 AM
  • Use static factory functions instead of the extra argument (the best names I could come up with is fromData and fromOptionalData).

We should allow 4 and 8 byte UUIDs as pointed out by inlined comments. This means we should probably modify the UUID dumper to handle those cases as well.

include/lldb/Utility/UUID.h
109 ↗(On Diff #152699)

Do we need this comment here? We currently take a 4 byte debug info CRC and call it a 16 byte UUID for no reason. Can we remove the need for this comment and allow any length?

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
735–736 ↗(On Diff #152699)

Should we just save the UUId is 4 bytes long here?

741–742 ↗(On Diff #152699)

Should we just save the UUId is 8 bytes long here?

929 ↗(On Diff #152699)

Should we just save the UUId is 8 bytes long here?

938 ↗(On Diff #152699)

Should we just save the UUId is 4 bytes long here?

source/Utility/DataExtractor.cpp
1101 ↗(On Diff #152699)

This should take a UUID byte size as a second parameter and callers should be required to specify it explicitly

Factory methods make things much clearer.

labath added inline comments.Jun 25 2018, 8:41 AM
include/lldb/Utility/UUID.h
109 ↗(On Diff #152699)

I will do that in a follow-up patch. I wanted to keep this one for dealing with the whole zero issue.

source/Plugins/ObjectFile/ELF/ObjectFileELF.cpp
735–736 ↗(On Diff #152699)

I will fix this and the other issues in the next patch.

source/Utility/DataExtractor.cpp
1101 ↗(On Diff #152699)

Hmm.. this is actually unused. Can I just remove it?

clayborg accepted this revision.Jun 25 2018, 9:26 AM

Patch is good. Feel free to remove the DataExtractor::DumpUUID() in a separate NFC commit or just remove it in this patch

source/Utility/DataExtractor.cpp
1101 ↗(On Diff #152699)

Yes, please remove.

This revision is now accepted and ready to land.Jun 25 2018, 9:26 AM
This revision was automatically updated to reflect the committed changes.