This is an archive of the discontinued LLVM Phabricator instance.

Improve llvm-symbolizer search logic for symlink
Needs ReviewPublic

Authored by yinghuitan on Nov 18 2022, 3:15 PM.

Details

Summary

I found our internal usage of LLDB failed symbolicate on Linux even though
llvm-symbolizer is next to it. Turns out that our lldb is distributed as
as symbolic link so printSymbolizedStackTrace() will try to find the
llvm-symbolizer next to the symlink instead of the real executable parent
directory.

This patch fixes this issue by adding an extra search path after resolving
symlink.

Diff Detail

Event Timeline

yinghuitan created this revision.Nov 18 2022, 3:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2022, 3:15 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
yinghuitan requested review of this revision.Nov 18 2022, 3:15 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 18 2022, 3:15 PM
clayborg requested changes to this revision.Nov 18 2022, 4:29 PM
clayborg added inline comments.
llvm/lib/Support/Signals.cpp
141

You will run into lifetime issues with the returned StringRef in ResolvedParent if we have a symlink as the data in the ResolvedParent will point to data in the "Resolved" variable. This variable contains the storage for the string in this scope, which will free its string when it destructs before returning from this function. Most of the time it will be ok because it is on the stack, but we can check this in. I would suggest inlining this code into the function below so that object lifetimes stay around long enough to get used. Or you can store "llvm::SmallString<128> Resolved;" in the printSymbolizedStackTrace() function as a local, and pass a reference to it into this function. That way the buffer will stay along long enough for the StringRef to not point to stale data.

158–175

We should probably try to get the llvm-symbolizer as we did before, and if this fails, then fall back to doing the symlink thing. You might have a directory with newly builds binaries in it, and just use a symlink to point to an older binary.

So I would vote to:
1 - check as we did before
2 - if there is an error, fallback to trying the symlink method.

We are currently relying on the parent being a symlink. What is if the binary itself is a symlink and not the parent? I would think we would want to resolve the path to the binary first, then call llvm::sys::path::parent_path(), and then look for llvm-symbolizer. The above code removes the binary from the path first, then assumes the parent will be a symlink.

This revision now requires changes to proceed.Nov 18 2022, 4:29 PM
labath resigned from this revision.Nov 21 2022, 3:16 AM
labath added a subscriber: labath.

yeah, the problem with symlinks is that one can never know whether a particular symlink (in particular context) was meant to be treated transparently or not. it gets even worse if you have multiple symlinks in the path or the resolution chain, as realpath will resolve all of them, which may not be what you wanted.

I'm not saying that this change is bad, but just as an FYI, it may be possible to fix your problem by replacing the lldb symlink with a shell script like exec path/to/real/lldb "$@".

Resigning, as I don't feel comfortable approving changes in this part of the code.

yinghuitan added inline comments.Nov 21 2022, 9:32 AM
llvm/lib/Support/Signals.cpp
141

Thanks! I hate falling into this kind of trap, I really should remember return StringRef is like returning a pointer...

158–175

@clayborg , I think you might have read the code incorrectly:

  1. The code actually only deals with executable itself being symlink not its parent.
  2. The code only adds an extra search path. And sys::findProgramByName would use paths in priority order -- if it finds llvm-symbolizer in the first path it would try the second one (resolved symlink real path). Actually, there is indeed a bug due to refactoring -- I accidentally passed Parent instead "paths". I will fix that.

So it should do the logic what you suggested.

Address review comments

@labath, thanks. I wish I can control the symlink deployment process, but unfortunately I can't. I think it is generally preferable to make lldb crash handler deal correctly with it.
I agree we do not want to regress any of existing behavior. That's why the patch only adds an extra search path for sys::findProgramByName which will prefer the executable parent first and fallback to try the second added path (real path from symlink) if failed to find in first path.

This looks good to me now. This isn't my area of expertise though, I will add some people from the "git blame" to this revision to see what they think.

clayborg requested changes to this revision.Nov 21 2022, 11:40 AM

Need to fix the scoping of "Resolved" as mentioned in the inline comments.

llvm/lib/Support/Signals.cpp
166

Actually you need to move this out of the "if" statement so it lives long enough for the call to sys::findProgramByName() below.

This revision now requires changes to proceed.Nov 21 2022, 11:40 AM

Any chance/ability to test this?

aganea added inline comments.Nov 21 2022, 12:41 PM
llvm/lib/Support/Signals.cpp
165

I feel adding the paths variable is adding complexity for future readers. Also, do we really need is_symlink_file()? Can we just call real_path()?
What about just this: (and keep the comment)

if (!LLVMSymbolizerPathOrErr) {
   llvm::SmallString<128> Resolved;
   if (!llvm::sys::fs::real_path(Argv0, Resolved)) {
     Parent = llvm::sys::path::parent_path(Resolved);
     if (!Parent.empty())
       LLVMSymbolizerPathOrErr = sys::findProgramByName("llvm-symbolizer", Parent);
   }
}

@dblaikie, I did not find any existing test for it. Do you have any suggestion/example how to test this? We need to introduce a crash using this API and get the output and check the symbolication.

llvm/lib/Support/Signals.cpp
165

is_symlink_file() underlying uses stat() API to check symlink while real_path I assume is more expensive. Also, semantic wise, I do not think they equal -- if Argv0 is not a symlink, we do not want to retry sys::findProgramByName but real_path can't tell us about this fact.

I can refactor the code to retry sys::findProgramByName if the first try failed instead of relying on the paths ordering though.

yinghuitan added inline comments.Nov 21 2022, 3:04 PM
llvm/lib/Support/Signals.cpp
165

Sorry, if this is not clear. What I mean is, for non-symlink path, llvm::sys::fs::real_path() wouldn't fail, so we will unnecessarily redo sys::findProgramByName which we shouldn't.

Address review comment

@dblaikie, I did not find any existing test for it. Do you have any suggestion/example how to test this? We need to introduce a crash using this API and get the output and check the symbolication.

Yeah, not sure - I'd try replacing all this code with an assert(false) and running lldb's tests to see if anything fails - if not, might be we just assume it's too hard to test/not easy to introduce an intentional crash, or test the crash recovery in isolation - which is unfortunate, but wouldn't be totally surprising.

Changes look good, thanks.

@dblaikie, I did not find any existing test for it. Do you have any suggestion/example how to test this? We need to introduce a crash using this API and get the output and check the symbolication.

Yeah, not sure - I'd try replacing all this code with an assert(false) and running lldb's tests to see if anything fails - if not, might be we just assume it's too hard to test/not easy to introduce an intentional crash, or test the crash recovery in isolation - which is unfortunate, but wouldn't be totally surprising.

@yinghuitan I think you can take your inspiration from llvm/unittests/Support/CrashRecoveryTest.cpp, L155, TEST(CrashRecoveryTest, UnixCRCReturnCode) and one of the tests in llvm/unittests/Support/Path.cpp. The test would have to be Unix-only, there's no reliable/safe way to test symlinks on Windows.