This is an archive of the discontinued LLVM Phabricator instance.

[dwarf][NFC] Move expandBundle() to MachO.h
ClosedPublic

Authored by ellis on Dec 8 2021, 7:07 PM.

Details

Summary

The function expandBundle() is defined both in llvm-dwarfdump.cpp and
llvm-gsymutil.cpp in the exact same way. Reduce code duplication by
moving this function to the MachOObjectFile class.

Diff Detail

Unit TestsFailed

Event Timeline

ellis created this revision.Dec 8 2021, 7:07 PM
ellis added a project: debug-info.
ellis updated this revision to Diff 393231.Dec 9 2021, 11:34 AM

Rename function and change behavior slightly.

ellis published this revision for review.Dec 9 2021, 11:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 9 2021, 11:35 AM
clayborg added inline comments.Dec 9 2021, 4:15 PM
llvm/lib/Object/MachOObjectFile.cpp
4726–4728

My question here is: if an error is returned, will it make any sense to the caller? The only error returned here is if something goes wrong during directory iteration. Which could be a few things:

  • Path specifies a directory with ".dSYM" extension (do we are about case here?) and it doesn't the directory Path + "Context/Resources/DWARF" doesn't exist which would return something like "directory does not exist". Will it specify which directory doesn't exist? The user could specify "/path/to/foo.dSYM" that is a valid directory but it contains no DWARF bundle stuff inside. Just wondering if this will make sense to show to the user?
  • We can't get sys::fs::file_status on a file in the Path + "Context/Resources/DWARF" + File file which would return something like "permission denied" or something else. If the error doesn't contain the path that caused the problem the user will be confused.

And I wonder if the callers of this function should show those errors to the user and would it make any sense to the user? If the errors don't make sense, then it might be good to just return a "bool" instead of a "Expected<bool>", or just return a std::vector<std::string> that is empty or that contained any successful paths that were already extracted.

If we want to keep the errors we should make the error message more clear and include information about what path caused the problem with the EC error code appended. So for the first case where sys::fs::directory_iterator constructor fills in an error, maybe returning something like "unable to iterate over directory %s: %s" where the first %s is the Path and the second is the error string. Something similar for the second?

Let me know what your thoughts are.

llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
660–668

Avoid having to take the error and test it by restructuring the code.

llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
396–405

Avoid having to take the error and test it by restructuring the code.

jhenderson added inline comments.Dec 10 2021, 12:32 AM
llvm/include/llvm/Object/MachO.h
655–657

Probably need to distinguish when the fails with an error versus when it will return false, in the comments.

658–659

a) I wouldn't bother with const on the StringRef, since they're inately const anyway (unless you reassign them of course).
b) I don't think "is*" is a sensible name for a function that does some vector population. Perhaps "loadDsymBundle" or "findDsymObjectMembers" or something to that effect.

llvm/lib/Object/MachOObjectFile.cpp
4726–4728

I think we should return the error, with additional context (i.e. what path is bad etc), and allow client code to decide what to do with it: lower-level libraries shouldn't generally assume that clients don't care about when errors exist.

When issue that is worth thinking about: we could get into a situation where the code returns an error, but the vector has been modified. I'd at least mention that in the event of an error, the vector's content is indeterminate, but perhaps better should be: make a copy of the vector and replace it, in the event of an error after the main for loop has been started.

4741
4752

This "ignore" is slightly surprising to me. Should we not emit an error of some kind for this case?

ellis updated this revision to Diff 393720.Dec 11 2021, 9:37 PM
ellis marked 5 inline comments as done.

Update.

llvm/lib/Object/MachOObjectFile.cpp
4726–4728

Thanks for the helpful comments @clayborg and @jhenderson! This function really has three uses

  1. Decide if the given path is a .dSYM.
    • From the original code this is done by simply checking if it is a directory that ends in .dSYM. If it's not a bundle, we return an empty list.
  2. Report any filesystem errors to the user.
    • This is rather important and I've updated the returned errors to hold more info.
  3. Provide a list of binaries in the .dSYM.

The biggest difference is when there are no binaries found but the path looks like a .dSYM.

$ tree
.
└── a.out.dSYM
    └── Contents
        └── Resources
            └── DWARF

4 directories, 0 files

# Original error:
$ llvm-dwarfdump a.out.dSYM/
error: a.out.dSYM/: Is a directory

# New error:
$ llvm-dwarfdump a.out.dSYM/
error: a.out.dSYM/: no objects found in dSYM bundle
4752

The rest of the file types seem to be uncommon and I'd rather not change the behavior of llvm-dwarfdump without knowing if .dSYMs ever use these other cases.

https://en.cppreference.com/w/cpp/filesystem/file_type

jhenderson added inline comments.Dec 13 2021, 12:10 AM
llvm/include/llvm/Object/MachO.h
655–657

I don't think the updated comment actually resolves this point. Without looking at the function signature for a moment, this comment implies that one of two things will be returned: either an empty vector or a non-empty vector, so returning an Expected isn't expected from the comment. I think it would be useful to add something high-level to explain when an error might be returned.

ellis updated this revision to Diff 393923.Dec 13 2021, 9:23 AM

Update comment.

ellis added inline comments.Dec 13 2021, 9:24 AM
llvm/include/llvm/Object/MachO.h
655–657

Thanks, I've updated the comment to explain when an error is returned.

jhenderson added inline comments.Dec 14 2021, 12:21 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
665–666

My reading of the relatively recently updated coding standards is that if part of an if statement has braces, the corresponding else clause(s) also need braces. Applies in the llvm-gsymutil case too.

Looking at the findDsymObjectMembers, it looks like there is still missing context, in the event of a filesystem error, so you'd end up with a message that says something like: error: foo.dsym: access denied, which doesn't explain the what was happening at the time. I'm not sure the function itself should necessarily add this context, but could be persuaded it should. My thinking was that we should use the Prefix taking error method, and add the context that way, e.g. error("whilst loading objects from DSYM bundle", DsymObjectsOrErr.takeError()). I believe this would result in a message like: error: whilst loading objects from DSYM bundle: foo.dsym: access denied.

Thoughts? Downside is that the "no objects" case will end up repeating the context a bit more than is strictly necessary.

ellis added inline comments.Dec 14 2021, 10:51 AM
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
665–666

Looking at the findDsymObjectMembers, it looks like there is still missing context, in the event of a filesystem error, so you'd end up with a message that says something like: error: foo.dsym: access denied, which doesn't explain the what was happening at the time.

I created a fake dSYM with an object with the wrong permissions to simulate a filesystem error. I actually think the error message here is acceptable, but let me know what you think.

The other error cases are when the dSYM directory itself has the wrong permissions or does not have the Contents/Resources/DWARF structure. The error for b.dSYM might be the weakest/most confusing.

$ tree a.dSYM/ b.dSYM/ c.dSYM/
a.dSYM/
└── Contents
    └── Resources
        └── DWARF
            └── bad
b.dSYM/ [error opening dir]
c.dSYM/
$ llvm-dwarfdump a.dSYM
error: a.dSYM/Contents/Resources/DWARF/bad: Permission denied
$ llvm-dwarfdump b.dSYM
error: 'b.dSYM/Contents/Resources/DWARF': Permission denied
$ llvm-dwarfdump c.dSYM
error: 'c.dSYM/Contents/Resources/DWARF': No such file or directory

My thinking was that we should use the Prefix taking error method, and add the context that way

If we pass in the error method we will sacrifice some generality, but we gain the ability to have very specific error messages.

Another option would be to replace createFileError() with createStringError() with a more specific message.

ellis updated this revision to Diff 394313.Dec 14 2021, 10:52 AM

Add brackets.

jhenderson accepted this revision.Dec 15 2021, 12:57 AM

I'm happy to go with whatever others' preference is on the error messages. As you say, the context is reasonable. LGTM, but please wait for @clayborg too.

This revision is now accepted and ready to land.Dec 15 2021, 12:57 AM
clayborg accepted this revision.Dec 15 2021, 12:28 PM

LGTM!

llvm/lib/Object/MachOObjectFile.cpp
4728

That second error looks great. You might want to see what the error looks like if you have a ".dSYM" directory but with no paths inside of it. The directory iterator will return an error if any of the "Contents", "Resources" or "DWARF do not exist and that error might be a bit cryptic.

So a few extra things to try just to make sure errors are not too bad:

  • no user permissions for a valid dSYM bundle
  • missing directories inside a dSYM bundle ("Contents", "Resources", "DWARF")
ellis updated this revision to Diff 394721.Dec 15 2021, 7:08 PM

Add specific error message when bundle.dSYM/Contents/Resources/DWARF is not a directory.

This revision was landed with ongoing or failed builds.Dec 15 2021, 7:09 PM
This revision was automatically updated to reflect the committed changes.
jhenderson added inline comments.Dec 16 2021, 12:20 AM
llvm/lib/Object/MachOObjectFile.cpp
4738–4740

I guess this should have a test case somewhere :)