Page MenuHomePhabricator

[llvm-objdump] Implement --prefix option
ClosedPublic

Authored by tinti on Jul 31 2020, 5:26 AM.

Details

Summary

[llvm-objdump] Implement --prefix option

The prefix given to --prefix will be added to GNU absolute paths when
used with --source option (source interleaved with the disassembly).

This matches GNU's objdump behavior.

GNU and C++17 rules for absolute paths are different.

Depends on D87667.
Fixes PR46368.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Aug 19 2020, 2:01 AM
llvm/docs/CommandGuide/llvm-objdump.rst
172
llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
1 ↗(On Diff #285905)

Please use double coment markers for true comments (as opposed to RUN/CHECK etc lines), i.e. ;;. This helps them stand out from the actual functional lines of the test. Older tests may not follow this approach, but we can for newer ones.

11–12 ↗(On Diff #285905)

No need for a double blank line here and below - just use a single one and make the comments stand out more as suggested above.

15 ↗(On Diff #285905)

Hmmm... this is interesting. With llvm-objdump, this test case fails for Windows. However, if I replace llvm-objdump with GNU objdump and run the same invocation, it works. I dug into this a little. The problem is with the meaning of "absolute paths" for Windows - it's ambiguous. C++17's filesystem::is_absolute, whose behaviour is matched by sys::path::is_absolute essentially says that the path must be unambiguous (i.e. that there must be no drive letter for Windows). However, other languages, e.g. Python, treat a path as absolute with or without the drive letter. GNU objdump appears to follow the Python etc approach, so '/Inputs\foo' is treated as an absolute path and is prefixed.

I think we need extra logic to get this to work on Windows, to ensure GNU compatibility. Basically, you need to identify the case where a path is absolute within the current drive (i.e. starts with '/', though it might be better, if possible to rely on the underlying functions to do that), and treat that as absolute. What you can't do is modify the behaviour of sys::path::is_absolute itself.

25 ↗(On Diff #285905)
28–29 ↗(On Diff #285905)

You can avoid the duplicate check pattern here, since the output should be the same as earlier.

2 ↗(On Diff #283794)

I've tested, and it doesn't work - see below for why.

Sorry for the delay in getting back to you - I wasn't working on Monday and had a busy day yesterday.

tinti marked an inline comment as done.Aug 24 2020, 2:22 PM
tinti added inline comments.
llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
15 ↗(On Diff #285905)

I did a small test.

bool is_absolute2(const llvm::Twine &path, llvm::sys::path::Style style) {
  if (style == llvm::sys::path::Style::windows) {
    llvm::SmallString<128> path_storage;
    llvm::StringRef p = path.toStringRef(path_storage);
    
    if (p.front() == '/' || p.front() == '\\')
      return true;

    if (p.size() >= 2 && (p[0] && p[1] == ':'))
      return true;
  }

  return llvm::sys::path::is_absolute(path, style); 
}

bool is_separator2(char value, llvm::sys::path::Style style) {
  // not needed
  return llvm::sys::path::is_separator(value, style);
}

...

  std::string test1("/input/foo.c");
  std::string test2("c:/input/foo.c");
  std::string test3("\\\\input\\foo.c");
  std::string test4("c:/");
  std::string test5("c:");
  std::string test6("//");
  std::string test7("\\");
  std::string test8(":");

  std::vector<std::string> tests;
  tests.push_back(test1);
  tests.push_back(test2);
  tests.push_back(test3);
  tests.push_back(test4);
  tests.push_back(test5);
  tests.push_back(test6);
  tests.push_back(test7);
  tests.push_back(test8);

  for (auto it = tests.begin(); it < tests.end(); ++it) {
    llvm::outs() << "is_absolute?(\"" << *it << "\")" << "\n";
    llvm::outs() << "  gnu::windows:   " << IS_DOS_ABSOLUTE_PATH(it->c_str()) << "\n";
    llvm::outs() << "  llvm::windows:  " << llvm::sys::path::is_absolute(*it, llvm::sys::path::Style::windows) << "\n";
    llvm::outs() << "  llvm2::windows: " << is_absolute2(*it, llvm::sys::path::Style::windows) << "\n";

    llvm::outs() << "  gnu::posix:     " << IS_UNIX_ABSOLUTE_PATH(it->c_str()) << "\n";
    llvm::outs() << "  llvm::posix:    " << llvm::sys::path::is_absolute(*it, llvm::sys::path::Style::posix) << "\n";
    llvm::outs() << "  llvm2::posix:   " << is_absolute2(*it, llvm::sys::path::Style::posix) << "\n";
    
    llvm::outs() << "is_separator?(\"" << *it << "\")" << "\n";
    llvm::outs() << "  gnu::windows:   " << IS_DOS_DIR_SEPARATOR(it->back()) << "\n";
    llvm::outs() << "  llvm::windows:  " << llvm::sys::path::is_separator(it->back(), llvm::sys::path::Style::windows) << "\n";
    llvm::outs() << "  llvm2::windows: " << is_separator2(it->back(), llvm::sys::path::Style::windows) << "\n";

    llvm::outs() << "  gnu::posix:     " << IS_UNIX_DIR_SEPARATOR(it->back()) << "\n";
    llvm::outs() << "  llvm::posix:    " << llvm::sys::path::is_separator(it->back(), llvm::sys::path::Style::posix) << "\n";
    llvm::outs() << "  llvm2::posix:   " << is_separator2(it->back(), llvm::sys::path::Style::posix) << "\n";
    llvm::outs() << "\n";
  }

The output is:

is_absolute?("/input/foo.c")
  gnu::windows:   1
  llvm::windows:  0
  llvm2::windows: 1
  gnu::posix:     1
  llvm::posix:    1
  llvm2::posix:   1
is_separator?("/input/foo.c")
  gnu::windows:   0
  llvm::windows:  0
  llvm2::windows: 0
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0

is_absolute?("c:/input/foo.c")
  gnu::windows:   1
  llvm::windows:  1
  llvm2::windows: 1
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0
is_separator?("c:/input/foo.c")
  gnu::windows:   0
  llvm::windows:  0
  llvm2::windows: 0
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0

is_absolute?("\\input\foo.c")
  gnu::windows:   1
  llvm::windows:  1
  llvm2::windows: 1
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0
is_separator?("\\input\foo.c")
  gnu::windows:   0
  llvm::windows:  0
  llvm2::windows: 0
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0

is_absolute?("c:/")
  gnu::windows:   1
  llvm::windows:  1
  llvm2::windows: 1
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0
is_separator?("c:/")
  gnu::windows:   1
  llvm::windows:  1
  llvm2::windows: 1
  gnu::posix:     1
  llvm::posix:    1
  llvm2::posix:   1

is_absolute?("c:")
  gnu::windows:   1
  llvm::windows:  0
  llvm2::windows: 1
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0
is_separator?("c:")
  gnu::windows:   0
  llvm::windows:  0
  llvm2::windows: 0
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0

is_absolute?("//")
  gnu::windows:   1
  llvm::windows:  0
  llvm2::windows: 1
  gnu::posix:     1
  llvm::posix:    1
  llvm2::posix:   1
is_separator?("//")
  gnu::windows:   1
  llvm::windows:  1
  llvm2::windows: 1
  gnu::posix:     1
  llvm::posix:    1
  llvm2::posix:   1

is_absolute?("\")
  gnu::windows:   1
  llvm::windows:  0
  llvm2::windows: 1
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0
is_separator?("\")
  gnu::windows:   1
  llvm::windows:  1
  llvm2::windows: 1
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0

is_absolute?(":")
  gnu::windows:   0
  llvm::windows:  0
  llvm2::windows: 0
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0
is_separator?(":")
  gnu::windows:   0
  llvm::windows:  0
  llvm2::windows: 0
  gnu::posix:     0
  llvm::posix:    0
  llvm2::posix:   0

GNU and LLVM don't agree with /input/foo.c, c:, / and \. The is_absolute2 function agrees with GNU. I believe it is enough for this patch.

Working on it!

tinti updated this revision to Diff 287538.Aug 24 2020, 5:56 PM
tinti marked 6 inline comments as done.

Use ;; for comments.
Remove double blanks.
Remove unsupported for Windows [1].
Add helper function of objdump::is_absolute().
Remove duplicated checks.
Rename TRAILING check.

[1] Not tested.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
15 ↗(On Diff #285905)

I believe this time it will work. I have added a helper function bool objdumpIsAbsolute(const Twine &path, sys::path::Style style = sys::path::Style::native); at llvm-objdump.cpp. At first I have added it on llvm-objdump.h but I changed now. Should I keep it this way?

About Windows, would you mind to test again?

25 ↗(On Diff #285905)

Ok.

At least it appears that GNU's and LLVM's objdump agree on is_separator.

28–29 ↗(On Diff #285905)

I think I understand now what you were trying to say in the last replies.

I have changed to CHECK-MISS-PREFIX-FIX which produces the same output as the removed CHECK-TRAILING1.

jhenderson added inline comments.Aug 25 2020, 1:36 AM
llvm/docs/CommandGuide/llvm-objdump.rst
172

Ping this comment.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
7 ↗(On Diff #287538)

I'm not sure this or the above similar comment line are useful - this test doesn't have any control over how source-interleave.ll is generated, so the comment could become misleading. It's clear from hthe sed line that you are using the source-interleave.ll input too. No need to restate it.

29 ↗(On Diff #287538)

You don't need CHECK-TRAILING. You can use CHECK-BROKEN-PREFIX with -D ROOT=''.

15 ↗(On Diff #285905)

In general terms, put the function near to where it is actually used. Where you've got it is fine.

llvm/tools/llvm-objdump/llvm-objdump.cpp
424

We tend to refer to the GNU tools as "GNU objdump", "GNU objcopy" etc (without the "'s")

427–428
429–430

I'm not sure this last sentence makes sense to me. Maybe you can just delete it.

432

Please change all your variable names in this function to use LLVM naming styles. See the Coding Standards for the details.

433

You never specify the style parameter so unsurprisingly, the check for style == sys::path::Style::windows doesn't work, and the test still fails on Windows. You can see this by looking at the test failures listed by the pre-merge checks near the top of this page. You might want a call to real_style.

438

I think this will crash if LineInfo.FileName is empty. This might be caused by a malformed input, but we shouldn't crash still.

You also should consider crafting a test for this case.

Also, use sys::path::is_separator instead of directly querying the character.

tinti marked 4 inline comments as done.Aug 25 2020, 9:13 AM
tinti added inline comments.
llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
29 ↗(On Diff #287538)

In this case I believe I need.

CHECK-BROKEN-PREFIX uses %t2.
CHECK-TRAILING uses %t.

llvm/tools/llvm-objdump/llvm-objdump.cpp
433

You are correct. I miss understood the meaning of Style::native. I thought it would be set to either Style::posix or Style::native.

Unfortunately real_style is restricted to https://github.com/llvm-mirror/llvm/blob/master/lib/Support/Path.cpp#L39.

Thinking about it I can:

  • Add a #ifdef _WIN32 in llvm-objdump. I am really not willing to go with this one but I have seen in other tools such as llvm-objcopy [1].
  • Make real_style public or be able to query the style outside llvm::sys::path::* with a helper function. Doesn't look great also. It seems to be designed this way to avoid such queries.
  • Try another way to answer "was I built on Windows?"

LLDB has this issue too since GDB does use IS_DIR_ABSOLUTE [3]. It solves it in several ways (usage [4] and [5]):

With all this in mind I think I would go with Triple(sys::getProcessTriple()).isOSWindows() implementation. Correct me if I am wrong but on GNU you can only have one binary for one target whereas LLVM one may have one binary to many targets (I don't know if this applies here).

Taking into account CROSS-COMPILING between Linux and Windows: My guess would be to behave based on the running target rather than the compile target.

Out of ideas. It is becoming a bit complex.

Two more things. Should I care about __CYGWIN__, __MSDOS__, __OS2__? GNU tools care [2].

I would answer no. Maybe only care for __CYGWIN__.

git grep __CYGWIN__ | wc -l
43
git grep __MSDOS__ | wc -l
3
git grep __OS2__ | wc -l
3

Should I use isAlpha() like LLDB? There are some restriction about <drive letter>: format [6].

[1] See https://github.com/llvm/llvm-project/commit/6dc59629570ae6fe1836cff3ff53912917d17af7#diff-a074a8872d4fe193550d7a651ea18cabR229.
[2] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=include/filenames.h;hb=1ab8d928977f4d9f137f972d03e079555d0f29fa#l35
[3] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/buildsym.c;hb=1ab8d928977f4d9f137f972d03e079555d0f29fa#l519
[4] https://github.com/llvm/llvm-project/blob/e22f0dabcf976668d35595e17645095e97a8177a/lldb/source/Plugins/SymbolFile/Breakpad/SymbolFileBreakpad.cpp#L653.
[5] https://github.com/llvm/llvm-project/blob/bc9b6b33a0d5760370e72ae06c520c25aa8d61c6/lldb/source/Utility/FileSpec.cpp#L59
[6] https://en.wikipedia.org/wiki/Drive_letter_assignment#Common_assignments

tinti updated this revision to Diff 287689.Aug 25 2020, 9:16 AM
tinti marked an inline comment as done.

Just adding my work so far.

tinti added a subscriber: lhames.Aug 25 2020, 7:34 PM
tinti added inline comments.
llvm/tools/llvm-objdump/llvm-objdump.cpp
433

Even more complex: https://reviews.llvm.org/D34446

Please see @lhames last comment.

tinti updated this revision to Diff 287815.Aug 25 2020, 7:35 PM

Work done so far.

Having given it some thought, I think the right thing to do might be to put a method alongside the existing is_absolute method called something like either is_rooted, or is_gnu_style_absolute or similar. That function would then look something like this:

bool is_rooted(StringRef Path, Style St) {
  if (is_absolute(Path, St))
    return true;
  if (real_style(St) == windows)
    return !Path.empty() && is_separator(Path.front(), St);
  return false;
}

I would probably recommend that that be added in a separate patch, with dedicated gtest unit tests and reviewers taken from among those who have recently worked in this area (plus myself). You would then rebase this patch on top of that one.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
32 ↗(On Diff #287815)

Nit: please remember to end comments with a '.'.

29 ↗(On Diff #287538)

CHECK-BROKEN-PREFIX and CHECK-TRAILING do NOT use %t or %t2. What do you think the -DFILE bit of the FileCheck invocation does?

Please take this in the well-meaning way it's intended, but I'd highly recommend you read through the FileCheck documentation at https://llvm.org/docs/CommandGuide/FileCheck.html. It seems that you may have several misunderstandings about how FileCheck works.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2977–2980

I don't think you need this additional complexity. is_separator uses native style by default, which should be sufficient for our purposes.

tinti added a comment.Aug 26 2020, 4:31 PM

Having given it some thought, I think the right thing to do might be to put a method alongside the existing is_absolute method called something like either is_rooted, or is_gnu_style_absolute or similar. That function would then look something like this:

bool is_rooted(StringRef Path, Style St) {
  if (is_absolute(Path, St))
    return true;
  if (real_style(St) == windows)
    return !Path.empty() && is_separator(Path.front(), St);
  return false;
}

I would probably recommend that that be added in a separate patch, with dedicated gtest unit tests and reviewers taken from among those who have recently worked in this area (plus myself). You would then rebase this patch on top of that one.

Ok. I will be working on that.

I prefer using is_gnu_style_absolute. It makes clear that this function was created for compatibility reasons.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
32 ↗(On Diff #287815)

Ok.

29 ↗(On Diff #287538)

You can be crystal clear with me.

I red again the docs. Now it is more clear what you were saying.

There is no need to use CHECK-TRAILING: warning: ... because I can reuse CHECK-BROKEN-PREFIX: warning: ... with different parameters from -D.

My understanding was wrong. Given a RUN command, I thought FileCheck would only look for CHECKs after it.

llvm/tools/llvm-objdump/llvm-objdump.cpp
2977–2980

Ok.

jhenderson added inline comments.Aug 27 2020, 12:26 AM
llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths.test
29 ↗(On Diff #287538)

Yes, exactly. Perhaps the best way to think about it is to imagine all the CHECK patterns in a separate file. FileCheck is just another process, which happens to take an input file to parse patterns from, and another input (via stdin or --input-file), which contains the text to check. By convention, the same file is used for the lit directives and the FileCheck patterns (hence '%s' in the FileCheck invocation, which means "this file"), but the lit directives (e.g. RUN lines) and FileCheck directives (CHECK lines) are actually completely unrelated and parsed by two separate processes, and thus have no bearing on one another.

I'm not familiar with that testing, sorry. You'll need to get someone else who knows that area to answer you (based on what is already there, I'm guessing no, since the existing ones don't already, but I don't really know).

tinti marked 9 inline comments as done.Mon, Oct 5, 6:44 PM
tinti marked an inline comment as done.Mon, Oct 5, 7:49 PM

Hi @jhenderson,

I believe I should create more tests addressing:

  • On Windows trailing '\' should be removed.
  • On Posix trailing '\' should be kept.

As far as I know I would need to duplicate source-interleave-absolute-paths.test and add REQUIRES: x (or UNSUPPORTED: x).

Is there a better way of doing it?

Hi @jhenderson,

I believe I should create more tests addressing:

  • On Windows trailing '\' should be removed.
  • On Posix trailing '\' should be kept.

As far as I know I would need to duplicate source-interleave-absolute-paths.test and add REQUIRES: x (or UNSUPPORTED: x).

Is there a better way of doing it?

I believe where there's a difference in behaviour based on the OS, you should have two separate tests, which use REQUIRES and UNSUPPORTED accordingly. I'm not aware of a way within a single test to customise what is checked for otherwise.

tinti updated this revision to Diff 296828.Wed, Oct 7, 4:29 PM
tinti edited the summary of this revision. (Show Details)

Add specific tests.
Reword.

jhenderson added inline comments.Thu, Oct 8, 1:26 AM
llvm/docs/CommandGuide/llvm-objdump.rst
173

I don't think you need to mention GNU here. In general the concept of "absolute paths" is a little under-defined, plus people will likely expect the behaviour to match GNU.

llvm/docs/llvm-objdump.1
111

Ditto.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths-all.test
1 ↗(On Diff #296828)

Maybe just drop "-all" from the test name. In fact, the test probably just wants to be named something like "prefix.test" or possibly "source-interleave-prefix.test" to show it tests the prefix option (in general, tests should be named after the option they are testing). This second point applies to all the new tests.

3–10 ↗(On Diff #296828)

I'd prefer it if you created these inputs immediately before their first use rather than early on. It would also be good if you named the output files according to the property they are trying to show.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths-non-windows.test
1 ↗(On Diff #296828)

This comment is ambiguous - it could imply behaviour that is not specific to Windows, which would also apply to the generic tests in the -all test.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths-windows.test
1 ↗(On Diff #282782)

Tests which only run on Windows should be added very carefully. Many developers don't have Windows testing method. When they do refactoring, they can easily neglect Windows-only tests. /\ has some pain but you can get round with {{[/\\]}}

I strongly disagree with making a test more general so that it isn't dependent on Windows behaviour - if we did that, we'd have no testing for Windows-specific behaviour which isn't good. I personally don't usually test my patches on Linux, as I almost exclusively develop on Windows. Does that mean I shouldn't write tests that are Linux-specific?

There are pre-merge bots that report test failures. Developers can and should rely on these to catch places where they aren't able to properly test things themselves locally.

To me, this test looks good.

llvm/tools/llvm-objdump/llvm-objdump.cpp
357

Same as for the documentation.

tinti marked 6 inline comments as done.Thu, Oct 8, 5:36 AM
tinti added inline comments.
llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths-all.test
3–10 ↗(On Diff #296828)

Good point.

llvm/test/tools/llvm-objdump/X86/source-interleave-absolute-paths-non-windows.test
1 ↗(On Diff #296828)

I am struggling to find a good comment here.

Can I use 'POSIX' or 'unix-like' behavior? If yes, I would also rename the properly.

tinti updated this revision to Diff 296935.Thu, Oct 8, 5:43 AM
tinti marked 2 inline comments as done.

Reword comments.
Substitute CHECK-TRAILING.
Remove extra sed pipe.
Add test for relative paths.

jhenderson accepted this revision.Thu, Oct 8, 6:09 AM

LGTM, aside from some smaller formatting bits, but please wait for others.

llvm/test/tools/llvm-objdump/X86/source-interleave-prefix.test
4
20
32–33

Here and elsewhere, my personal preference is to format it as in the suggested edit. The motivation is 1) The first line clearly indicates that there is no more of the specific command indicated by that line, whilst 2) the indentation on the second line shows the line is a continuation of the previous line.

tinti updated this revision to Diff 296952.Thu, Oct 8, 6:35 AM
tinti marked 3 inline comments as done.

Reword.
Use James suggestions.

@MaskRay are you happy with the latest changes? If so, please change your review. Thanks!

MaskRay added inline comments.Thu, Oct 8, 12:02 PM
llvm/test/tools/llvm-objdump/X86/source-interleave-prefix.test
9

Since you have used = as the separator of a long option, consider changing --check-prefix to --check-prefix=

llvm/tools/llvm-objdump/llvm-objdump.cpp
1039

The two if can be merged

llvm/tools/llvm-objdump/llvm-objdump.h
52

Sort in an alphabetical order.

tinti updated this revision to Diff 297164.Fri, Oct 9, 2:45 AM
tinti marked 3 inline comments as done.

Concise use of = in tests.
Merge ifs.
Reorder alphabetically the options.

rengolin accepted this revision.Fri, Oct 16, 6:32 AM

@MaskRay ping?

You seem to have addressed his comments, and they were all in non-functional areas (comments, formatting, order), so, LGTM. Thanks!

MaskRay accepted this revision.Fri, Oct 16, 8:39 AM
This revision is now accepted and ready to land.Fri, Oct 16, 8:39 AM
This revision was automatically updated to reflect the committed changes.

This breaks tests on Windows: http://45.33.8.238/win/25999/step_11.txt

PTAL, and revert for now if it takes a while to fix.

Looks like you maybe need to replace a few / with {{[/\\]}}.

@MaskRay you seem to have fixed partially, and @thakis solution is better than what I did. To avoid a similar clash, I'll revert my change and if it breaks again, you can fix it directly.

thanks!

tinti added a comment.Fri, Oct 16, 1:53 PM

Thanks @MaskRay and @thakis.

@thakis I believe your solution will fix the issue.

I will be testing again.