This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/DebugInfo/Symbolize/
-
llvm/
-
DebugInfo/
-
Symbolize/
1
Symbolize.h
-
lib/DebugInfo/Symbolize/
-
DebugInfo/
-
Symbolize/
6/8
Symbolize.cpp
-
test/tools/llvm-symbolizer/
-
tools/
-
llvm-symbolizer/
1/1
errors.test
-
input-base.test
-
tools/llvm-symbolizer/
-
llvm-symbolizer/
4/8
llvm-symbolizer.cpp

Differential D147652

[symbolizer] Check existence of input file in GNU mode
ClosedPublic

Authored by sepavloff on Apr 5 2023, 11:58 AM.

Download Raw Diff

Details

Reviewers

jhenderson
mgorny
MaskRay
dblaikie

Commits

rG1099208b991e: [symbolizer] Check existence of input file in GNU mode

Summary

GNU addr2line exits immediately if it cannot open the file specified as
executable/relocatable. In contrast llvm-addr2line does not exit and, if
addresses are not specified in command line, waits for input on stdin. This
causes the test compiler-rt/test/asan/TestCases/Posix/asan-symbolize-bad-path.cc to block
forever on Gentoo (see https://reviews.llvm.org/rG27c4777f41d2ab204c1cf84ff1cccd5ba41354da#1190273).
To fix this issue the behavior llvm-addr2line now exits if
executable/relocatable file cannot be opened.

It fixes https://github.com/llvm/llvm-project/issues/42099 (llvm-addr2line
does not exit when passed a non-existent file).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

sepavloff created this revision.Apr 5 2023, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 11:58 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

sepavloff requested review of this revision.Apr 5 2023, 11:58 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 5 2023, 11:58 AM

Harbormaster completed remote builds in B223860: Diff 511179.Apr 5 2023, 12:56 PM

Thanks for the patch! I'm not convinced you've made the right change in the right place, and there are a number of other issues with this patch, which need to be addressed. See my inline comments for details.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
715	Can this instead return an `Error` to be handled up the call stack? An Error contains as much information as a `std::string/bool` return pairing like you're doing. Doing this means the function boils down to: Error LLVMSymbolizer::findBinaryFile(StringRef BinName) { Expected<SymbolizableModule *> BinFile = getOrCreateModuleInfo(BinName.str()); return BinFile ? Error::success() : BinFile.takeError(); } That then raises the question: is the place you're currently calling this the right place? I suspect not, and that you should instead by failing and reporting the error around the point where `getOrCreateModuleInfo` is called "for real".
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
483–484	Does llvm-addr2line already assume `a.out` is the input file if no input file is specified? If not, this seems like it's incorrect behaviour. I was under the impression that if no input file is specified, it entered "interactive" mode.
487	This is a different style to how other warnings and errors are printed for llvm-symbolizer (see above e.g. line 467 in this patch). It seems to me like a follow-up change to normalise the printing format for all warnings and errors to use `WithColor`, probably within `reportWarning`/`reportError` functions is in order. Are you happy to make that patch?

This revision now requires changes to proceed.Apr 6 2023, 12:14 AM

Do you need me to test it (it's some heavy building after all) or have you reproduced the problem? If the latter, I'd prefer to do full build of the new snapshot once the patch lands.

sepavloff added inline comments.Apr 7 2023, 1:47 AM

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
483–484	`llvm-addr2line` did not assume `a.out` if input file was not specified. Instead it followed logic of `llvm-symbolizer` and expected binary file name in the input stream, provided that addresses were not specified in command line also. I was under the impression that if no input file is specified, it entered "interactive" mode. `llvm-symbolizer` and `addr2line` enter "interactive" mode if addresses are not specified in command line. If binary file is not specified as well, `llvm-symbolizer` tries reading both binary file and addresses. `addr2line` never reads binary file from input stream.
487	If `llvm-addr2line` were given wrong binary file, it printed message: $ llvm-addr2line -e inexistent 0x12 LLVMSymbolizer: error reading file: No such file or directory ??:0 Output of GNU `addr2line` is different: $ addr2line -e inexistent 0x12 addr2line: 'inexistent': No such file With this patch output of `llvm-addr2line` is closer to GNU utility: $ llvm-addr2line -e inexistent 0x12 llvm-addr2line: 'inexistent': No such file or directory Actually output of `llvm-symbolizer` is not consistent. `LLVMSymbolizer` is a class name, it does not represent utility. Such message banner is OK for debug dumps, but for a user it is inconvenient. It seems to me like a follow-up change to normalise the printing format for all warnings and errors to use WithColor, probably within reportWarning/reportError functions is in order. Are you happy to make that patch? The main problem with such patch is decision about message format. It GNU-compatible message would be used, which part of the message `llvm-addr2line: 'inexistent': No such file or directory` would be colored red? Should the message contain text `error:`? Can it break some user scenarios due to changed output? Technically such patch looks simple, I could make it.

Use Error as return value of findBinaryFile, add test for error message

Harbormaster completed remote builds in B224183: Diff 511641.Apr 7 2023, 2:16 AM

It seems to me like there are 4(?) different issues that have come up here, and they probably need splitting into separate patches, and possibly even an RFC or two. I think the separate issues are:

GNU addr2line defaults to reading from a.out if no input file is specified.
If the input file can't be found, llvm-addr2line and llvm-symbolizer both enter interactive mode (but any addresses without an accompanying file will result in an error), whereas addr2line emits an error and exits.
The error message returned (in interactive mode) from llvm-addr2line does not match the GNU addr2line error message.
The error message is printed directly within the LLVMSymbolizer library, so has weird formatting etc. More generally, the errors printed by llvm-symbolizer are inconsistently formatted.

Feel free to correct any of the above or add more.

My opinions on the topics are:

It should be easy for llvm-addr2line to default to a value (specifically "a.out") for the "-e" command-line arg, if no input file is explicitly specified. It should be possible to implement it in such a way that the rest of the code doesn't require any changes for this to just work. I don't think we want to change llvm-symbolizer behaviour here, and it's a big enough difference that the llvm-addr2line docs should be updated to note the difference.
IMHO, if an explicitly specified input file does not exist (at the time of command-line parsing at least), I could get behind an error message being reported by BOTH llvm-symbolizer and llvm-addr2line during command-line parsing (or more likely, slightly after, but before the business logic of the symbolizer code kicks in). We'd still need to check for file existence as in the current library code, in case the file has been deleted since command-line parsing happened. The early check should just be a file existence check (don't try to open the file), and should exit the program in the event of failure. It might be worth raising this issue on the mailing lists to confirm changing the llvm-symbolizer behaviour too is okay with others. My reasoning for why this should impact both tools is that a user has requested an explicit file, so if it doesn't exist, trying to use it doesn't make sense, but I could see theoretical use cases where the file is created after the opening of the program in interactive mode (so hence the need for discussion).
Re. error message format, I don't think an exact match of error message format should be necessary. That being said, I do think we should broadly match the error style emitted by our other binutils replacements (and certainly messages from the same tool should be formatted the same way). This goes for both llvm-symbolizer and llvm-addr2line. For example, if the input file is missing for llvm-nm, the following error is emitted on my Windows machine: "D:\llvm\build\Debug\bin\llvm-nm.exe: error: missing.bin: no such file or directory". The "error" is in red. Given we already deviate (slightly) from the GNU tools in the exact message format (GNU tools, apart from readelf, don't print the "error:" part), I think normalising with the rest of our tools makes more sense. Where there's a clear message from the OS, we should print that (along with any appropriate additional context). Where there isn't (e.g. you report an error on a missing file, without having attempted to actually open that file), it doesn't hurt to match the GNU message content, if the GNU one is sensible.
My opinions on library error reporting are documented in this lightning talk: https://www.youtube.com/watch?v=YSEY4pg1YB0. It's probably a non-trivial task (albeit hopefully not too complex), but in my opinion, this is a perfect case of where a callback should be used to report failures, rather than for the library to try to handle them directly.

sepavloff mentioned this in D148032: [symbolizer] Change error message if module not found.Apr 11 2023, 10:24 AM

thesamesam added a subscriber: thesamesam.Apr 12 2023, 1:43 AM

Would it be possible to revert the original changes then reland them with the improvements mentioned above?

Not trying to be difficult at all, it just affects our continuous downstream testing in Gentoo.

In D147652#4263811, @thesamesam wrote:

Would it be possible to revert the original changes then reland them with the improvements mentioned above?

Not trying to be difficult at all, it just affects our continuous downstream testing in Gentoo.

I've got no objections to that.

sepavloff mentioned this in rGba482dde664d: Revert "[symbolizer] Treat invalid address as addr2line does".Apr 13 2023, 1:56 AM

In D147652#4257424, @jhenderson wrote:

IMHO, if an explicitly specified input file does not exist (at the time of command-line parsing at least), I could get behind an error message being reported by BOTH llvm-symbolizer and llvm-addr2line during command-line parsing (or more likely, slightly after, but before the business logic of the symbolizer code kicks in). We'd still need to check for file existence as in the current library code, in case the file has been deleted since command-line parsing happened. The early check should just be a file existence check (don't try to open the file), and should exit the program in the event of failure. It might be worth raising this issue on the mailing lists to confirm changing the llvm-symbolizer behaviour too is okay with others. My reasoning for why this should impact both tools is that a user has requested an explicit file, so if it doesn't exist, trying to use it doesn't make sense, but I could see theoretical use cases where the file is created after the opening of the program in interactive mode (so hence the need for discussion).

Both llvm-symbolizer and GNU addr2line report error if an explicitly specified input file does not exist. They differ in what they do after. The difference can be shown in such invocation:

addr2line -e inexistent_file

Here no addresses are specified in command line and both utilities would read them from input stream. However addr2line finds that the source file does not exists and exits immediately, no attempt to read imput stream is made. It is reasonable because -e is the only way to provide input file. llvm-symbolize in this case tries to interpret addresses from input stream or command line. It does not make sense because in this case because in this case llvm-symbolizer cannot override binary file.

It looks like llvm-symbolizer also should check input file existence early. It need a separate patch, maybe someone uses such unusual mode.

When llvm-symbolizer check existence of binary file, it creates its representation in memory. Even if the file then disappears or is changed, that does not affect decoding process, with uses the cached representation.

Rebased, do not handle default a.out

In D147652#4267555, @sepavloff wrote:

In D147652#4257424, @jhenderson wrote:

It looks like llvm-symbolizer also should check input file existence early. It need a separate patch, maybe someone uses such unusual mode.

Right, my points above were all intended to be separate patches (potentially).

When llvm-symbolizer check existence of binary file, it creates its representation in memory. Even if the file then disappears or is changed, that does not affect decoding process, with uses the cached representation.

I was thinking the early check could just be a "file exists" check, and not to try to open it until later when it's actually used. That would fit better into how the code currently works.

Harbormaster completed remote builds in B225542: Diff 513500.Apr 14 2023, 3:11 AM

In D147652#4267604, @jhenderson wrote:

In D147652#4267555, @sepavloff wrote:

When llvm-symbolizer check existence of binary file, it creates its representation in memory. Even if the file then disappears or is changed, that does not affect decoding process, with uses the cached representation.

I was thinking the early check could just be a "file exists" check, and not to try to open it until later when it's actually used. That would fit better into how the code currently works.

It would not solve the problem in full. A file can exists but be inaccessible for user or have incorrect format. GNU addr2line in this case exits with error but llvm-addr2line waits for input addresses. It is safer to exit early because no useful job can be done if input file is wrong. To implement the safer solution, the check for existence should be made prior to the loop that waits input from stdin.

This complexity is caused by the fact that in native mode llvm-symbolizer can read binary file name from input stream. In this mode wrong input file is not a reason to stop working - the next data in stdin can query about different file. This is why llvm-symbolizer opens input file late, after it reads input stream.

To support both modes of llvm-symbolizer the existence check should be made in two places, prior to reading standard input and in the reading loop, depending on the used mode.

This complexity is caused by the fact that in native mode llvm-symbolizer can read binary file name from input stream. In this mode wrong input file is not a reason to stop working - the next data in stdin can query about different file. This is why llvm-symbolizer opens input file late, after it reads input stream.

Perhaps the APIs could be restructured such that the file is opened, then queries run on it - then in the addr2line mode, we could open the file once, then parse and run queries, and in the llvm-symbolizer mode we could parse, open file, run query?
(presumably more invasive refactoring than the current situation, but it would be good to actually open the file that gets used for the queries, rather than doing a separate file existence/open check)

In D147652#4311166, @dblaikie wrote:

This complexity is caused by the fact that in native mode llvm-symbolizer can read binary file name from input stream. In this mode wrong input file is not a reason to stop working - the next data in stdin can query about different file. This is why llvm-symbolizer opens input file late, after it reads input stream.

Perhaps the APIs could be restructured such that the file is opened, then queries run on it - then in the addr2line mode, we could open the file once, then parse and run queries, and in the llvm-symbolizer mode we could parse, open file, run query?

Actually llvm-symbolizer works in this way with this patch. When input file is tested for existence, findBinaryFile calls getOrCreateModuleInfo, which stores the file information in LLVMSymbolizer::Modules. When later on getOrCreateModuleInfo is called again in the address translation loop, the file information is taken from the cache. So the call to findBinaryFile works as a module prefetch. No performance or memory penalty.

In D147652#4314914, @sepavloff wrote:

In D147652#4311166, @dblaikie wrote:

This complexity is caused by the fact that in native mode llvm-symbolizer can read binary file name from input stream. In this mode wrong input file is not a reason to stop working - the next data in stdin can query about different file. This is why llvm-symbolizer opens input file late, after it reads input stream.

Perhaps the APIs could be restructured such that the file is opened, then queries run on it - then in the addr2line mode, we could open the file once, then parse and run queries, and in the llvm-symbolizer mode we could parse, open file, run query?

Actually llvm-symbolizer works in this way with this patch. When input file is tested for existence, findBinaryFile calls getOrCreateModuleInfo, which stores the file information in LLVMSymbolizer::Modules. When later on getOrCreateModuleInfo is called again in the address translation loop, the file information is taken from the cache. So the call to findBinaryFile works as a module prefetch. No performance or memory penalty.

Ah, fair enough - the indirection's a bit awkward (spooky action at a distance, etc) - but at least satisfies me on the "don't do two correlated file options/expect one result to haev any meaning for a later independent operation".

Check only file existence when checking command line argument

Harbormaster completed remote builds in B234816: Diff 526007.May 26 2023, 5:30 AM

Apologies for the delay in coming back to this: I've been caught up with several weeks worth of high priority work with my job, and haven't had the time to look at LLVM reviews beyond a quick glance.

It looks like llvm-symbolizer also should check input file existence early. It need a separate patch, maybe someone uses such unusual mode.

Unless the changes are significant, I don't see a reason to limit this change to llvm-addr2line only. If someone has specified -e, they won't ever be able to get useful output from llvm-symbolizer.

I'm okay with the direction @dblaikie has suggested ultimately, namely that we restructure the code so that we open the file early, as long as we keep the file open thereafter (so as not to open the file twice). However, I think it could be a separate subsequent patch, so that we can get this immediate improvement added.

By the way, there is prior art for existence checking long before the file is needed. Specifically, LLD checks that certain files exist early on, so that it doesn't need the relatively slow work of the link before discovering that the file doesn't exist or the directory is invalid in some manner etc.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
566	Nit: have you run clang-format on this?
571–573	We have C++17 structured bindings now.
712	There's no need for doxygen comments in a .cpp file like this, as far as I know. Regardless, the comment is a bit stale: you're not opening the file here.
717	As above re. structured bindings.
llvm/test/tools/llvm-symbolizer/errors.test
2	"Inexistent" isn't a word - you probably mean "nonexistent". It would be good to have a test case that tries to open a directory.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
487	My strong preference is that libraries pass their error messages back up to the tool that invokes them, in one form or other (e.g. via a callback or an `Error`/`Expected`). As you say, end users don't care about the internal details like class names, so these shouldn't be in error messages. I don't think we aim for direct GNU compatibility with error messages, in general, though a leading toolname is considered important. There is already helpful functionality that handles all of this though. See https://github.com/llvm/llvm-project/blob/d1ef99fe1ce3e2996c317600a1398933b9078260/llvm/tools/llvm-objdump/llvm-objdump.cpp#L320 for an example (specifically, look how `WithColor` takes a tool name as an argument).

sepavloff added a child revision: D153219: [symbolizer] Exit early if input file is absent.Jun 18 2023, 12:21 AM

Updated patch

In D147652#4412823, @jhenderson wrote:

It looks like llvm-symbolizer also should check input file existence early. It need a separate patch, maybe someone uses such unusual mode.

Unless the changes are significant, I don't see a reason to limit this change to llvm-addr2line only. If someone has specified -e, they won't ever be able to get useful output from llvm-symbolizer.

The patch is in D153219. It is not large, but there are several tests that expect processing addresses in the case of nonexisting binary files. It is not clear if these tests were added for better coverage or there are real cases for such usage. In the latter case it is better to have a separate patch.

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
487	The current implementation of error handling in llvm-symbolizer is very close to your preferences. Error messages are printed in `printError(const ErrorInfoBase &EI, StringRef Path)`, which is also passed as a callback to Printer classes. This function is very similar to `objdump::reportError` with one exception: it does not call `exit`. `llvm-symbolizer` can get things like binary file path from input stream. In such case immediate exit, if the file does not exist, is not suitable, because the next portion of data can provide valid binary file. So `exit` is called when it is clear that no useful work can be made.

Harbormaster completed remote builds in B239654: Diff 532453.Jun 18 2023, 2:52 AM

jhenderson added inline comments.Jun 19 2023, 1:09 AM

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
713	Perhaps `checkFileExists` or `checkIsFile` would be a better name? If you use one of those names, you can delete the comment entirely, as it doesn't do anything useful then.
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
486	It doesn't need to be part of this change, but I don't think it would be unreasonable to add a `printError` overload that takes `Error` instead of `ErrorInfoBase` and does the `handleAllErrors` internally. It looks like there are at least 2 separate places now that would benefit from that.

Changed function name: findBinaryFile -> checkFileExists

sepavloff marked an inline comment as done.Jun 20 2023, 11:33 AM

sepavloff added inline comments.

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
713	Indeed, thank you!
llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp
486	Yes, it makes sense.

Harbormaster completed remote builds in B240063: Diff 533001.Jun 20 2023, 1:38 PM

LGTM, but would be good to get a second approval from someone else already involved in this patch before landing it.

This revision is now accepted and ready to land.Jun 21 2023, 1:22 AM

There is a time-of-check-to-time-of-use issue that we can ignore (as we may wait on the addresses), but otherwise the new behavior (also GNU's) seems to be what a casual user may expect.

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
193
llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
714	fold the two `if` into one with `&&`

In D147652#4439871, @MaskRay wrote:

There is a time-of-check-to-time-of-use issue that we can ignore (as we may wait on the addresses),

I had the same concern there (& maybe @jhenderson did too) & claimed to be addressed/not a problem in https://reviews.llvm.org/D147652#4314914 but I didn't quite follow.

What do you mean "as we may wait on the address" - like we could deal with this problem later? Any particular reason to do that, rather than avoiding introducing the time-of-check-to-time-of-use issue in the first place?

MaskRay mentioned this in D153595: [llvm-addr2line] Replace checkFileExists with getOrCreateModuleInfo.Jun 22 2023, 4:11 PM

In D147652#4441851, @dblaikie wrote:

In D147652#4439871, @MaskRay wrote:

There is a time-of-check-to-time-of-use issue that we can ignore (as we may wait on the addresses),

I had the same concern there (& maybe @jhenderson did too) & claimed to be addressed/not a problem in https://reviews.llvm.org/D147652#4314914 but I didn't quite follow.

What do you mean "as we may wait on the address" - like we could deal with this problem later? Any particular reason to do that, rather than avoiding introducing the time-of-check-to-time-of-use issue in the first place?

Apologies, my stamp was a bit hand-wavy. I saw your commend and investigated a bit more. I think D153595 may be the direction?
I don't touch llvm-symbolizer as @sepavloff can do that in D153219.

jhenderson added a comment.Jun 23 2023, 12:14 AM

This comment was removed by jhenderson.

In D147652#4441851, @dblaikie wrote:

In D147652#4439871, @MaskRay wrote:

There is a time-of-check-to-time-of-use issue that we can ignore (as we may wait on the addresses),

I had the same concern there (& maybe @jhenderson did too) & claimed to be addressed/not a problem in https://reviews.llvm.org/D147652#4314914 but I didn't quite follow.

What do you mean "as we may wait on the address" - like we could deal with this problem later? Any particular reason to do that, rather than avoiding introducing the time-of-check-to-time-of-use issue in the first place?

Apologies, this might be all caused by me (and I figured given nobody else really pushed against it, others were fine with it). Specifically this comment:

IMHO, if an explicitly specified input file does not exist (at the time of command-line parsing at least), I could get behind an error message being reported by BOTH llvm-symbolizer and llvm-addr2line during command-line parsing (or more likely, slightly after, but before the business logic of the symbolizer code kicks in). We'd still need to check for file existence as in the current library code, in case the file has been deleted since command-line parsing happened. The early check should just be a file existence check (don't try to open the file), and should exit the program in the event of failure. It might be worth raising this issue on the mailing lists to confirm changing the llvm-symbolizer behaviour too is okay with others. My reasoning for why this should impact both tools is that a user has requested an explicit file, so if it doesn't exist, trying to use it doesn't make sense, but I could see theoretical use cases where the file is created after the opening of the program in interactive mode (so hence the need for discussion).

It was my suggestion to have an early file existence check, which didn't attempt to open the file properly, on the basis that the later code would still detect the issue at the time of use. We could then restructure the code later as @dblaikie had suggested. The early check is not too dissimilar to, I believe, a few instances in LLD, which provide early feedback on input options (also highlighted above). I don't recall why I didn't just go with the opening of the file at this point, rather than later (reading my previous comments, it was implied somewhere that it wasn't straightforward to open the file upfront for some reason).

Closed by commit rG1099208b991e: [symbolizer] Check existence of input file in GNU mode (authored by sepavloff). · Explain WhyJun 23 2023, 3:21 AM

This revision was automatically updated to reflect the committed changes.

sepavloff marked an inline comment as done.

sepavloff added a commit: rG1099208b991e: [symbolizer] Check existence of input file in GNU mode.

MaskRay mentioned this in rG620bff758d44: [llvm-addr2line] Replace checkFileExists with getOrCreateModuleInfo.Jun 23 2023, 10:04 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

DebugInfo/

Symbolize/

Symbolize.h

2 lines

lib/

DebugInfo/

Symbolize/

Symbolize.cpp

10 lines

test/

tools/

llvm-symbolizer/

errors.test

3 lines

input-base.test

16 lines

tools/

llvm-symbolizer/

llvm-symbolizer.cpp

11 lines

Diff 513500

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

Show First 20 Lines • Show All 113 Lines • ▼ Show 20 Lines

public:

static std::string

DemangleName(const std::string &Name,

const SymbolizableModule *DbiModuleDescriptor);

void setBuildIDFetcher(std::unique_ptr<BuildIDFetcher> Fetcher) {

BIDFetcher = std::move(Fetcher);

}

Error findBinaryFile(StringRef Name);

private:

// Bundles together object file with code/data and object file with

// corresponding debug info. These objects can be the same.

using ObjectPair = std::pair<const ObjectFile *, const ObjectFile *>;

template <typename T>

Expected<DILineInfo>

symbolizeCodeCommon(const T &ModuleSpecifier,

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines

private:

/// architecture (e.g. if path refers to a Mach-O universal binary, only one

/// object file from it will be returned).

Expected<ObjectFile *> getOrCreateObject(const std::string &Path,

const std::string &ArchName);

/// Update the LRU cache order when a binary is accessed.

void recordAccess(CachedBinary &Bin);

std::map<std::string, std::unique_ptr<SymbolizableModule>, std::less<>>

MaskRayUnsubmitted

Not Done

void recordAccess(CachedBinary &Bin);

- /// Splits binary file name into file and architecture parts. For example,

+ /// Split binary file name into file and architecture parts. For example,

/// the name 'macho-universal:i386', will be split into 'macho-universal' and

MaskRay:

Modules;

StringMap<std::string> BuildIDPaths;

/// Contains cached results of getOrCreateObjectPair().

std::map<std::pair<std::string, std::string>, ObjectPair>

ObjectPairForPathArch;

/// Contains parsed binary for each path, or parsing error.

▲ Show 20 Lines • Show All 50 Lines • Show Last 20 Lines

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

Show First 20 Lines • Show All 557 Lines • ▼ Show 20 Lines LLVMSymbolizer::getOrCreateModuleInfo(const std::string &ModuleName) {

// Verify that substring after colon form a valid arch name. // Verify that substring after colon form a valid arch name.

if (ColonPos != std::string::npos) { if (ColonPos != std::string::npos) {

std::string ArchStr = ModuleName.substr(ColonPos + 1); std::string ArchStr = ModuleName.substr(ColonPos + 1);

if (Triple(ArchStr).getArch() != Triple::UnknownArch) { if (Triple(ArchStr).getArch() != Triple::UnknownArch) {

BinaryName = ModuleName.substr(0, ColonPos); BinaryName = ModuleName.substr(0, ColonPos);

ArchName = ArchStr; ArchName = ArchStr;

} }

jhendersonUnsubmitted

Done

Nit: have you run clang-format on this?

jhenderson: Nit: have you run clang-format on this?

auto I = Modules.find(ModuleName); auto I = Modules.find(ModuleName);

if (I != Modules.end()) { if (I != Modules.end()) {

recordAccess(BinaryForPath.find(BinaryName)->second); recordAccess(BinaryForPath.find(BinaryName)->second);

return I->second.get(); return I->second.get();

} }

auto ObjectsOrErr = getOrCreateObjectPair(BinaryName, ArchName); auto ObjectsOrErr = getOrCreateObjectPair(BinaryName, ArchName);

jhendersonUnsubmitted

Done

LLVMSymbolizer::getOrCreateModuleInfo(const std::string &ModuleName) {

- std::string BinaryName;

- std::string ArchName;

- std::tie(BinaryName, ArchName) = splitBinaryFileName(ModuleName);

+ auto[BinaryName, ArchName] = splitBinaryFileName(ModuleName);

auto I = Modules.find(ModuleName);

We have C++17 structured bindings now.

jhenderson: We have C++17 structured bindings now.

if (!ObjectsOrErr) { if (!ObjectsOrErr) {

// Failed to find valid object file. // Failed to find valid object file.

Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>()); Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());

return ObjectsOrErr.takeError(); return ObjectsOrErr.takeError();

} }

ObjectPair Objects = ObjectsOrErr.get(); ObjectPair Objects = ObjectsOrErr.get();

std::unique_ptr<DIContext> Context; std::unique_ptr<DIContext> Context;

▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines if (DbiModuleDescriptor && DbiModuleDescriptor->isWin32Module()) {

// may also be applied on top of the Itanium or Rust name mangling. // may also be applied on top of the Itanium or Rust name mangling.

if (nonMicrosoftDemangle(DemangledCName.c_str(), Result)) if (nonMicrosoftDemangle(DemangledCName.c_str(), Result))

return Result; return Result;

return DemangledCName; return DemangledCName;

} }

return Name; return Name;

} }

/// Checks if the given file exists and can be opened as executable/relocatable

/// file.

/// \param BinName Name of the file to search for.

jhendersonUnsubmitted

Done

There's no need for doxygen comments in a .cpp file like this, as far as I know. Regardless, the comment is a bit stale: you're not opening the file here.

jhenderson: There's no need for doxygen comments in a .cpp file like this, as far as I know. Regardless…

/// \returns Error::success() if the file was successfully opened, error object

jhendersonUnsubmitted

Done

Perhaps checkFileExists or checkIsFile would be a better name? If you use one of those names, you can delete the comment entirely, as it doesn't do anything useful then.

jhenderson: Perhaps `checkFileExists` or `checkIsFile` would be a better name? If you use one of those…

sepavloffAuthorUnsubmitted

Done

Indeed, thank you!

sepavloff: Indeed, thank you!

/// otherwise.

MaskRayUnsubmitted

Not Done

fold the two if into one with &&

MaskRay: fold the two `if` into one with `&&`

Error LLVMSymbolizer::findBinaryFile(StringRef BinName) {

jhendersonUnsubmitted

Not Done

Can this instead return an Error to be handled up the call stack? An Error contains as much information as a std::string/bool return pairing like you're doing.

Doing this means the function boils down to:

Error LLVMSymbolizer::findBinaryFile(StringRef BinName) {
  Expected<SymbolizableModule *> BinFile = getOrCreateModuleInfo(BinName.str());
  return BinFile ? Error::success() : BinFile.takeError();
}

That then raises the question: is the place you're currently calling this the right place? I suspect not, and that you should instead by failing and reporting the error around the point where getOrCreateModuleInfo is called "for real".

jhenderson: Can this instead return an `Error` to be handled up the call stack? An Error contains as much…

Expected<SymbolizableModule *> BinFile = getOrCreateModuleInfo(BinName.str());

return BinFile ? Error::success() : BinFile.takeError();

jhendersonUnsubmitted

Done

As above re. structured bindings.

jhenderson: As above re. structured bindings.

}

void LLVMSymbolizer::recordAccess(CachedBinary &Bin) { void LLVMSymbolizer::recordAccess(CachedBinary &Bin) {

if (Bin->getBinary()) if (Bin->getBinary())

LRUBinaries.splice(LRUBinaries.end(), LRUBinaries, Bin.getIterator()); LRUBinaries.splice(LRUBinaries.end(), LRUBinaries, Bin.getIterator());

} }

void LLVMSymbolizer::pruneCache() { void LLVMSymbolizer::pruneCache() {

// Evict the LRU binary until the max cache size is reached or there's <= 1 // Evict the LRU binary until the max cache size is reached or there's <= 1

// item in the cache. The MRU binary is always kept to avoid thrashing if it's // item in the cache. The MRU binary is always kept to avoid thrashing if it's

Show All 24 Lines

llvm/test/tools/llvm-symbolizer/errors.test

This file was added.

				RUN: not llvm-addr2line -e %p/Inputs/inexistent 0x12 2>&1 \| FileCheck %s --check-prefix=CHECK-INEXISTENT-A2L -DMSG=%errc_ENOENT
				RUN: not llvm-addr2line -e %p/Inputs/inexistent 2>&1 \| FileCheck %s --check-prefix=CHECK-INEXISTENT-A2L -DMSG=%errc_ENOENT
				jhendersonUnsubmitted Done Reply Inline Actions "Inexistent" isn't a word - you probably mean "nonexistent". It would be good to have a test case that tries to open a directory. jhenderson: "Inexistent" isn't a word - you probably mean "nonexistent". It would be good to have a test…
				CHECK-INEXISTENT-A2L: llvm-addr2line{{.}}: error: '{{.}}Inputs/inexistent': [[MSG]]

llvm/test/tools/llvm-symbolizer/input-base.test

	# llvm-symbolizer infers the number base from the form of the address.			# llvm-symbolizer infers the number base from the form of the address.
	RUN: llvm-symbolizer -e /dev/null -a 0x1234 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 0x1234 \| FileCheck %s
	RUN: llvm-symbolizer -e /dev/null -a 0X1234 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 0X1234 \| FileCheck %s
	RUN: llvm-symbolizer -e /dev/null -a 4660 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 4660 \| FileCheck %s
	RUN: llvm-symbolizer -e /dev/null -a 011064 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 011064 \| FileCheck %s
	RUN: llvm-symbolizer -e /dev/null -a 0b1001000110100 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 0b1001000110100 \| FileCheck %s
	RUN: llvm-symbolizer -e /dev/null -a 0B1001000110100 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 0B1001000110100 \| FileCheck %s
	RUN: llvm-symbolizer -e /dev/null -a 0o11064 \| FileCheck %s			RUN: llvm-symbolizer -e /dev/null -a 0o11064 \| FileCheck %s

	# llvm-symbolizer / StringRef::getAsInteger only accepts the 0o prefix in lowercase.			# llvm-symbolizer / StringRef::getAsInteger only accepts the 0o prefix in lowercase.
	RUN: llvm-symbolizer -e /dev/null -a 0O1234 \| FileCheck %s --check-prefix=INVALID-NOT-OCTAL-UPPER			RUN: llvm-symbolizer -e /dev/null -a 0O1234 \| FileCheck %s --check-prefix=INVALID-NOT-OCTAL-UPPER

	# llvm-addr2line always requires hexadecimal, but accepts an optional 0x prefix.			# llvm-addr2line always requires hexadecimal, but accepts an optional 0x prefix.
	RUN: llvm-addr2line -e /dev/null -a 0x1234 \| FileCheck %s			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 0x1234 \| FileCheck %s
	RUN: llvm-addr2line -e /dev/null -a 0X1234 \| FileCheck %s			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 0X1234 \| FileCheck %s
	RUN: llvm-addr2line -e /dev/null -a 1234 \| FileCheck %s			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 1234 \| FileCheck %s
	RUN: llvm-addr2line -e /dev/null -a 01234 \| FileCheck %s			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 01234 \| FileCheck %s
	RUN: llvm-addr2line -e /dev/null -a 0b1010 \| FileCheck %s --check-prefix=HEXADECIMAL-NOT-BINARY			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 0b1010 \| FileCheck %s --check-prefix=HEXADECIMAL-NOT-BINARY
	RUN: llvm-addr2line -e /dev/null -a 0B1010 \| FileCheck %s --check-prefix=HEXADECIMAL-NOT-BINARY			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 0B1010 \| FileCheck %s --check-prefix=HEXADECIMAL-NOT-BINARY
	RUN: llvm-addr2line -e /dev/null -a 0o1234 \| FileCheck %s --check-prefix=INVALID-NOT-OCTAL-LOWER			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 0o1234 \| FileCheck %s --check-prefix=INVALID-NOT-OCTAL-LOWER
	RUN: llvm-addr2line -e /dev/null -a 0O1234 \| FileCheck %s --check-prefix=INVALID-NOT-OCTAL-UPPER			RUN: llvm-addr2line -e %p/Inputs/addr.exe -a 0O1234 \| FileCheck %s --check-prefix=INVALID-NOT-OCTAL-UPPER

	CHECK: 0x1234			CHECK: 0x1234
	CHECK-NEXT: ??			CHECK-NEXT: ??

	HEXADECIMAL-NOT-BINARY: 0xb1010			HEXADECIMAL-NOT-BINARY: 0xb1010
	HEXADECIMAL-NOT-BINARY: ??			HEXADECIMAL-NOT-BINARY: ??

	INVALID-NOT-OCTAL-LOWER: 0o1234			INVALID-NOT-OCTAL-LOWER: 0o1234
	INVALID-NOT-OCTAL-LOWER-NOT: ??			INVALID-NOT-OCTAL-LOWER-NOT: ??

	INVALID-NOT-OCTAL-UPPER: 0O1234			INVALID-NOT-OCTAL-UPPER: 0O1234
	INVALID-NOT-OCTAL-UPPER-NOT: ??			INVALID-NOT-OCTAL-UPPER-NOT: ??

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

Show First 20 Lines • Show All 472 Lines • ▼ Show 20 Lines	#endif
std::unique_ptr<DIPrinter> Printer;		std::unique_ptr<DIPrinter> Printer;
if (Style == OutputStyle::GNU)		if (Style == OutputStyle::GNU)
Printer = std::make_unique<GNUPrinter>(outs(), printError, Config);		Printer = std::make_unique<GNUPrinter>(outs(), printError, Config);
else if (Style == OutputStyle::JSON)		else if (Style == OutputStyle::JSON)
Printer = std::make_unique<JSONPrinter>(outs(), Config);		Printer = std::make_unique<JSONPrinter>(outs(), Config);
else		else
Printer = std::make_unique<LLVMPrinter>(outs(), printError, Config);		Printer = std::make_unique<LLVMPrinter>(outs(), printError, Config);

		StringRef InputFile = Args.getLastArgValue(OPT_obj_EQ);
		if (!InputFile.empty() && IsAddr2Line) {
		Error Status = Symbolizer.findBinaryFile(InputFile);
		if (Status) {
		jhendersonUnsubmitted Not Done Reply Inline Actions Does llvm-addr2line already assume `a.out` is the input file if no input file is specified? If not, this seems like it's incorrect behaviour. I was under the impression that if no input file is specified, it entered "interactive" mode. jhenderson: Does llvm-addr2line already assume `a.out` is the input file if no input file is specified? If…
		sepavloffAuthorUnsubmitted Done Reply Inline Actions `llvm-addr2line` did not assume `a.out` if input file was not specified. Instead it followed logic of `llvm-symbolizer` and expected binary file name in the input stream, provided that addresses were not specified in command line also. I was under the impression that if no input file is specified, it entered "interactive" mode. `llvm-symbolizer` and `addr2line` enter "interactive" mode if addresses are not specified in command line. If binary file is not specified as well, `llvm-symbolizer` tries reading both binary file and addresses. `addr2line` never reads binary file from input stream. sepavloff: `llvm-addr2line` did not assume `a.out` if input file was not specified. Instead it followed…
		handleAllErrors(std::move(Status), [&](const ErrorInfoBase &EI) {
		printError(EI, InputFile);
		jhendersonUnsubmitted Not Done Reply Inline Actions It doesn't need to be part of this change, but I don't think it would be unreasonable to add a `printError` overload that takes `Error` instead of `ErrorInfoBase` and does the `handleAllErrors` internally. It looks like there are at least 2 separate places now that would benefit from that. jhenderson: It doesn't need to be part of this change, but I don't think it would be unreasonable to add a…
		sepavloffAuthorUnsubmitted Done Reply Inline Actions Yes, it makes sense. sepavloff: Yes, it makes sense.
		});
		jhendersonUnsubmitted Not Done Reply Inline Actions This is a different style to how other warnings and errors are printed for llvm-symbolizer (see above e.g. line 467 in this patch). It seems to me like a follow-up change to normalise the printing format for all warnings and errors to use `WithColor`, probably within `reportWarning`/`reportError` functions is in order. Are you happy to make that patch? jhenderson: This is a different style to how other warnings and errors are printed for llvm-symbolizer (see…
		sepavloffAuthorUnsubmitted Done Reply Inline Actions If `llvm-addr2line` were given wrong binary file, it printed message: $ llvm-addr2line -e inexistent 0x12 LLVMSymbolizer: error reading file: No such file or directory ??:0 Output of GNU `addr2line` is different: $ addr2line -e inexistent 0x12 addr2line: 'inexistent': No such file With this patch output of `llvm-addr2line` is closer to GNU utility: $ llvm-addr2line -e inexistent 0x12 llvm-addr2line: 'inexistent': No such file or directory Actually output of `llvm-symbolizer` is not consistent. `LLVMSymbolizer` is a class name, it does not represent utility. Such message banner is OK for debug dumps, but for a user it is inconvenient. It seems to me like a follow-up change to normalise the printing format for all warnings and errors to use WithColor, probably within reportWarning/reportError functions is in order. Are you happy to make that patch? The main problem with such patch is decision about message format. It GNU-compatible message would be used, which part of the message `llvm-addr2line: 'inexistent': No such file or directory` would be colored red? Should the message contain text `error:`? Can it break some user scenarios due to changed output? Technically such patch looks simple, I could make it. sepavloff: If `llvm-addr2line` were given wrong binary file, it printed message: ``` $ llvm-addr2line -e…
		jhendersonUnsubmitted Not Done Reply Inline Actions My strong preference is that libraries pass their error messages back up to the tool that invokes them, in one form or other (e.g. via a callback or an `Error`/`Expected`). As you say, end users don't care about the internal details like class names, so these shouldn't be in error messages. I don't think we aim for direct GNU compatibility with error messages, in general, though a leading toolname is considered important. There is already helpful functionality that handles all of this though. See https://github.com/llvm/llvm-project/blob/d1ef99fe1ce3e2996c317600a1398933b9078260/llvm/tools/llvm-objdump/llvm-objdump.cpp#L320 for an example (specifically, look how `WithColor` takes a tool name as an argument). jhenderson: My strong preference is that libraries pass their error messages back up to the tool that…
		sepavloffAuthorUnsubmitted Done Reply Inline Actions The current implementation of error handling in llvm-symbolizer is very close to your preferences. Error messages are printed in `printError(const ErrorInfoBase &EI, StringRef Path)`, which is also passed as a callback to Printer classes. This function is very similar to `objdump::reportError` with one exception: it does not call `exit`. `llvm-symbolizer` can get things like binary file path from input stream. In such case immediate exit, if the file does not exist, is not suitable, because the next portion of data can provide valid binary file. So `exit` is called when it is clear that no useful work can be made. sepavloff: The current implementation of error handling in llvm-symbolizer is very close to your…
		return EXIT_FAILURE;
		}
		}

std::vector<std::string> InputAddresses = Args.getAllArgValues(OPT_INPUT);		std::vector<std::string> InputAddresses = Args.getAllArgValues(OPT_INPUT);
if (InputAddresses.empty()) {		if (InputAddresses.empty()) {
const int kMaxInputStringLength = 1024;		const int kMaxInputStringLength = 1024;
char InputString[kMaxInputStringLength];		char InputString[kMaxInputStringLength];

while (fgets(InputString, sizeof(InputString), stdin)) {		while (fgets(InputString, sizeof(InputString), stdin)) {
// Strip newline characters.		// Strip newline characters.
std::string StrippedInputString(InputString);		std::string StrippedInputString(InputString);
Show All 16 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[symbolizer] Check existence of input file in GNU modeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 513500

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/test/tools/llvm-symbolizer/errors.test

llvm/test/tools/llvm-symbolizer/input-base.test

llvm/tools/llvm-symbolizer/llvm-symbolizer.cpp

[symbolizer] Check existence of input file in GNU mode
ClosedPublic