Page MenuHomePhabricator

[PECOFF] Implementation of ObjectFilePECOFF:: GetUUID()
Needs ReviewPublic

Authored by asmith on Jan 2 2019, 4:04 PM.

Details

Summary

Provide an implementation of GetUUID() for remote debugging scenarios based on the md5 of the object's path.

Include a simple lit test that checks the first 8 bytes of the UUID is non-empty.

Diff Detail

Event Timeline

asmith created this revision.Jan 2 2019, 4:04 PM

Should we use the GUID from the COFF Debug Directory instead? It certainly seems more appropriate, if it's there. The UUID's purpose is to match symbol to the executable, so if you use a hash of the path it might solve this one problem, but won't solve the general case.

labath added a subscriber: labath.Jan 3 2019, 2:22 AM

I would definitely encourage using something better than the file checksum as UUID, if at all possible.

source/Plugins/ObjectFile/PECOFF/ObjectFilePECOFF.cpp
139

I think this should be fromData. The Optional is there to give special meaning to an all-zero checksum, but you don't need that here. Just because the md5 checksum comes out as all-zero, it doesn't mean it is not valid.

PS: I am responsible for the existence of this function, so I am to blame for any confusion. If you have any idea, how to make this api more clear, I'd like to hear it.

Hui added a subscriber: Hui.Jan 4 2019, 1:33 PM

Not quite sure but correct me if i am wrong.

(1) I think the Debug Directory is optional for COFF if it does have debug information and pdb to match with.

(2) The Debug Directory does not contain COFF timestamp.

Using md5 seems very tentative. Please elaborate how to leverage both COFF contents and the existing GUID mentioned?

In D56229#1346869, @Hui wrote:

Not quite sure but correct me if i am wrong.

(1) I think the Debug Directory is optional for COFF if it does have debug information and pdb to match with.

(2) The Debug Directory does not contain COFF timestamp.

Using md5 seems very tentative. Please elaborate how to leverage both COFF contents and the existing GUID mentioned?

Well, I guess I would ask what you want to do with the GUID? If you want to match it to a debug information file, then the Debug Directory is the correct way to do that, and using a hash of the file path will not even be helpful.

Another option would be to check for a debug directory of type IMAGE_DEBUG_TYPE_REPRO, and if that exists, then it means that the COFF timestamp is a hash of the binary, so it should be stable.

If neither of these is present, then I think we should simply return false from this function and not mislead the caller. The caller might wish to use special logic if the function returns false that says "if I couldn't get a UUID from the file, then try hashing the path and doing some kind of lookup based on that", but I don't think that should be part of this function.

The UUID that is used in ELF and Mach-o is designed to be something that is stable in a binary after it has been linked and should be the same before and after any kind of post production (stripping symbols, stripping section content to make a stand alone symbol file, etc). When someone types "target symbols add /path/to/symbol/file/a.out" we will grab its UUID and try to match it up with an existing object file.