This is an archive of the discontinued LLVM Phabricator instance.

[llvm-readobj][MachO] Add option to sort the symbol table before dumping (MachO only, for now).
ClosedPublic

Authored by oontvoo on Jan 6 2022, 7:41 PM.

Details

Summary

This would help making tests less brittle as the order will be fixed.

(see also PR/53026)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
jhenderson added inline comments.Jan 10 2022, 12:22 AM
llvm/test/tools/llvm-readobj/MachO/stabs.yaml
56 ↗(On Diff #398213)

It's probably worth pulling this into a new test file entirely, so that you can have additional cases that trigger different tie-breaking behaviours.

llvm/tools/llvm-readobj/MachODumper.cpp
633–644
661–662

These should be const &, right? Otherwise, you end up copying the tuple contents.

669

This isn't a bug, merely a possible improvement.

673

Note that if you move this ListScope earlier, outside the if, you a) avoid the duplication with the else, and b), shouldn't need the + 1 I mentioned in my earlier comment about indentation.

llvm/tools/llvm-readobj/Opts.td
47
  1. "displaying" is the term used for other options.
  2. This option is (currently) Mach-O specific. Unless you plan on implementing it for other formats too, please move it to the Mach-O specific options block.
  3. These options are listed alphabetically within each block. Please maintain that order.
  4. Please remember to update the documentation for llvm-readobj (and llvm-readelf, if you are planning on this being a generic option), located at llvm/docs/CommandGuide.
  5. If this is going to be Mach-O specific (for now), I'd name the variable name accordingly (i.e. something like macho_sort_symbols. Also, it will need to have the grp_macho Group, like the other Mach-O specific options.
llvm/tools/llvm-readobj/llvm-readobj.cpp
111

Noting that this change isn't needed if you drop the reference to JSON output style from elsewhere in this patch.

280

This option is currently Mach-O specific, so move it accordingly, unless you plan on implementing other formats. Also, these lists are in alphabetical order. Please maintain that order.

llvm/tools/llvm-readobj/llvm-readobj.h
45

Noting that this change isn't needed if you drop the reference to JSON output style from elsewhere in this patch.

oontvoo marked 12 inline comments as done.Jan 10 2022, 10:02 AM
oontvoo added inline comments.
llvm/tools/llvm-readobj/MachODumper.cpp
645–649

Thanks! That does look better!

673

I initially moved it here (closer to where it is used) for clarity. Anyways, i can move it back.

llvm/tools/llvm-readobj/llvm-readobj.cpp
280

Yes, I plan to add that to at least, ELF - but dont want to do it in one patch. (esp. since the ELF-dumper is a bit more complex)
Fixed the ordering, though.

oontvoo updated this revision to Diff 398685.Jan 10 2022, 10:02 AM
oontvoo marked 2 inline comments as done.

addressed review comments

oontvoo updated this revision to Diff 398691.Jan 10 2022, 10:29 AM

undo changes in stabs.yaml

jhenderson added inline comments.Jan 11 2022, 1:27 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
2

I'd name this file sort-symbols.yaml to match the option name. Also, is it really relevant that "stabs" are mentioned in the comment? Are all symbols stabs? If not, do non-stab symbols get sorted too (note: not a Mach-O developer, so I may be talking nonsense!).

5

As you've now only got one CHECK pattern in this test, you can delete the --check-prefix option and just use CHECK: and CHECK-NEXT below.

However, as you're also playing around with whitespace, I'd recommend adding --strict-whitespace and --match-full-lines to the FileCheck command - this will ensure that all whitespace on all lines after the CHECK: must exactly match that in the output. Take a look at some other examples in other tests.

55

As you've now got a separate YAML, I'd change your symbol names to emphasise the differences, rather than being basically unrelated cruft copied over from the old test.

llvm/tools/llvm-readobj/MachODumper.cpp
638

I wouldn't bother with the assert here: none of Mach-O output expects JSON format, so adding an assert in one place makes it look like it needs sorting here, but not everywhere else.

670

I'm not clear on whether this unindent needs undoing or not after this. Probably you should manually inspect the output, both for multiple symbols, and for multiple operations, where symbol dumping happens before other output.

llvm/tools/llvm-readobj/Opts.td
47

Pinging the points in this comment (specifically the inline edit, and points 3 and 4).

oontvoo updated this revision to Diff 399423.Jan 12 2022, 12:41 PM
oontvoo marked 5 inline comments as done.

Addressed reviews comment:

  • expect strict white-spaces in test
  • add docs
  • keep the option in alphabetical order (also fix the text s/outputting/displaying)
  • removed unneeded assert and restore indent
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
2

No, technically, not all symbols are STABS symbols .
( but non-stabs symbols wouldn't be printed here, so ... ) I was just trying to keep the names consistent with the other file (since both of these tested that STABS symbols can be dumped correctly).

55

Actually the names are quite realistic and are representative enough for what I wanted to test. I'm not really seeing why they need to change.

jhenderson added inline comments.Jan 13 2022, 12:29 AM
llvm/docs/CommandGuide/llvm-readobj.rst
111
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
55

It's more about clarity of test. By using "realistic" symbol names, you're actually making it a little harder to see what is important in the testing, as people may just assume they are cruft leftover from how the test input was generated. On the other hand, if you used names like "a", "b", "c" etc, it would be very obvious if they are/are not sorted.

llvm/tools/llvm-readobj/Opts.td
40
oontvoo updated this revision to Diff 400047.Jan 14 2022, 9:26 AM
oontvoo marked 3 inline comments as done.

addressed review comments

llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
55

done - renamed the symbols and added a few more

@jhenderson : Do you have any other thought on this? Thanks! :)

@jhenderson : Do you have any other thought on this? Thanks! :)

Apologies for the delay, I was on paternity leave for 2 weeks, and am only now getting back to reviews.

Sorry I didn't spot this earlier: it's not obvious to me that --sort-symbols sorts by type. I'd expect it to sort by name, probably. I'm okay with a sort option, but I think we need to reconsider the UI, especially if you are planning on rolling out this option to other formats. There may come a point in the future that people want to sort by other fields. I don't think we need to support this right away, but we could name the switch something flexible enough. A couple of ideas:

  1. Simplest: --sort-symbols-by-type. Just a rename of the switch you've implemented here.
  2. More complicated, but perhaps better UI? --sort-symbols=type, allowing future extensibility i.e. the ability to add e.g. --sort-symbols=name or --sort-symbols=size at some point.

Thoughts?

llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
55

I missed the bit about the sorting being done by n_type (I assumed it was based on name). Sorry for the noise, but I suggest you change the names again, to name them after the n_type field they are for, e.g. "_section1", "_section2", "_symDebugTable1" etc.

@jhenderson : Do you have any other thought on this? Thanks! :)

Apologies for the delay, I was on paternity leave for 2 weeks, and am only now getting back to reviews.

no worries!

Sorry I didn't spot this earlier: it's not obvious to me that --sort-symbols sorts by type. I'd expect it to sort by name, probably. I'm okay with a sort option, but I think we need to reconsider the UI, especially if you are planning on rolling out this option to other formats.

Ah, right. This sorts by both name AND type. (type first, then name amongst the group of symbols of the same type).

There may come a point in the future that people want to sort by other fields. I don't think we need to support this right away, but we could name the switch something flexible enough. A couple of ideas:

  1. Simplest: --sort-symbols-by-type. Just a rename of the switch you've implemented here.
  2. More complicated, but perhaps better UI? --sort-symbols=type, allowing future extensibility i.e. the ability to add e.g. --sort-symbols=name or --sort-symbols=size at some point.

Thoughts?

Regarding (1): it's not entirely true that this only sorts by type.(as mentioned, it sorts by both type and name). The end goal here (for me) is to have a way to deterministically sort all the symbols. The reason I didn't go with sorting them simply by name was because maskray@ raised concerns earlier that it didn't make sense semantically (with which I agreed).

Regarding (2), how would multiple sorting types interact? eg., people specifying both --sort-symbols=name --sort-symbols=type. Does the order of the flag determine which one is the first sorting priority?

Regarding (1): it's not entirely true that this only sorts by type.(as mentioned, it sorts by both type and name). The end goal here (for me) is to have a way to deterministically sort all the symbols. The reason I didn't go with sorting them simply by name was because maskray@ raised concerns earlier that it didn't make sense semantically (with which I agreed).

Regarding (2), how would multiple sorting types interact? eg., people specifying both --sort-symbols=name --sort-symbols=type. Does the order of the flag determine which one is the first sorting priority?

Ah I missed that it sorted by name and type.

Regarding 2, I think flag order determining sort priority would be wonderful, although I'd say --sort-symbols=name,type does it like that. Maybe --sort-symbols=name --sort-symbols=type does too, or maybe it sorts by just one. I'm not sure honestly.

Thoughts @MaskRay?

MaskRay added a comment.EditedFeb 1 2022, 2:38 PM

I agree that extending --sort-symbols to --sort-symbols=<value> is useful, since people may want to support different ways.
For example GNU nm has --numeric-sort, --no-sort, --size-sort. This cannot be changed but retrospectively maybe --sort={numeric,size} is a better UI.

--sort-symbols=name --sort-symbols=type specifying multi sort keys may not be obvious. The most common UI is that the last option overrides previous ones.
--sort-symbols=name,type looks good to me to specify multi sort keys.

(I will add a note that stabs-sorted.yaml looks quite long. It'd be wonderful if creating a test has less boilerplate.
Hope someone in #lld-macho may consider this as yet another motivation to improve Mach-O yaml2obj...)

Note: it may be worth adding --sort-symbols= to llvm-objdump as well. It's good to spend some time on the design.

I agree that extending --sort-symbols to --sort-symbols=<value> is useful, since people may want to support different ways.
For example GNU nm has --numeric-sort, --no-sort, --size-sort. This cannot be changed but retrospectively maybe --sort={numeric,size} is a better UI.

--sort-symbols=name --sort-symbols=type specifying multi sort keys may not be obvious. The most common UI is that the last option overrides previous ones.
--sort-symbols=name,type looks good to me to specify multi sort keys.

Ok, SG!

Note: it may be worth adding --sort-symbols= to llvm-objdump as well. It's good to spend some time on the design.

Yep, planning on doing that that too. :)

ThankS!

oontvoo planned changes to this revision.Feb 3 2022, 6:56 AM
oontvoo updated this revision to Diff 411151.Feb 24 2022, 8:55 AM

rework the patch so that --sort-symbols take one or more keys to sort + updated tests and docs accordingly

PTAL! Thanks!

oontvoo updated this revision to Diff 411152.Feb 24 2022, 8:57 AM

minor cleanup

oontvoo added inline comments.Feb 24 2022, 9:58 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
55

Just to be clear, when we sort by sections, we actually sort by their encoded values (numeric values, eg, 0x2E for debug ) and not the section names. So naming "_section1", "_section2", etc, doesn't help clarifying that point.

I feel like there's an awful lot more code than is required for this patch. I stopped commenting on inidividual things partway through. I think the following outline should be sufficient. It probably isn't 100% correct, but you should be able to get the general idea from it.

// This code could all live in generic area, since this is generic behaviour.
bool compareSymName(SymbolRef LHS, SymbolRef RHS) {
  // Implementation left as an exercise for the reader. In essence:
  // return LHS.Name < RHS.Name
}

bool compareSymType(SymbolRef LHS, SymbolRef RHS) {
  // Implementation left as an exercise for the reader. In essence:
  // return LHS.Type < RHS.Type
}

class SymbolComparer {
public:
  using ComparePred = function_ref<bool(SymbolRef, SymbolRef)>;
  void add(ComparePred Pred) { Predicates.push_back(Pred); }

  bool operator()(SymbolRef LHS, SymbolRef RHS) {
    for(ComparePred Pred : Predicates) {
      if (Pred(LHS, RHS))
        return true;
      if (Pred(RHS, LHS))
        return false;
    }
    // All considered parameters are equal. This means that a SymbolComparer
    // taking an empty vector in the constructor will treat all symbols as equal.
    return false;
  }

private:
  SmallVector<ComparePred, 2> Predicates;
};

// Code in MachODumper.cpp, possibly even other files too.
void MachODumper::printSymbols(const SymbolComparer &SymCmp) {
  ListScope Group(W, "Symbols");
  auto SymbolRange = Obj->symbol();
  std::vector<SymbolRef> SortedSymbols(SymbolRange.begin(), SymbolRange.end());
  stable_sort(SortedSymbols, SymCmp);
  for (SymbolRef Symbol : SortedSymbols)
    printSymbol(Symbol);
}

// In main
SymbolComparer SymCmp;
if (Arg *A = Args.getLastArg(OPT_sort_symbols_EQ)) {
  const StringMap <SymbolComparer::ComparePred> KeyToPredMap =
    {{"name", compareSymName}, {"type", compareSymType}};
  for (StringRef KeyStr : llvm::split(A->getValue(), ",")) {
    auto Found = KeyToPredMap.find(KeyStr);
    if(Found == KeyToPredMap.end())
      error("--sort-symbols value should be 'name' or 'type', but was '" +
            Twine(KeyStr) + "'");
    else
      SymCmp.add(Found->getValue());
  }
}
llvm/docs/CommandGuide/llvm-readobj.rst
112
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
10

Might be worth another case where just one of these values is specified, showing what happens in this case (e.g. just type means identical type symbols are left in their original order, or something like that).

As this test case is about whether or not the symbols are sorted, and not the formatting of the output, I'd remove the --strict-whitespace and --match-full-lines` options.

19–22

I don't think you need this block here.

24–32

Related to my above comment re. formatting, we don't need to show that the entire symbol printing is correct - that's the responsibility of a test that is testing the --symbols option rather than the --sort-symbols option. Instead, I'd just check the name and type fields. Something like this:

TYPE-NAME:      Name: _a
TYPE-NAME-NEXT: Type: Section
TYPE-NAME:      Name: _d
TYPE-NAME-NEXT: Type: Section
llvm/tools/llvm-readobj/MachODumper.cpp
95

Please only reformat the parts of the file that you've changed.

622–625

Just use a struct rather than a tuple. It'll be easier to reason with and you won't need these constants (which should be size_t anyway, since they're indexes).

634

Are you sure about this? They don't look all that temporary (as long as the object file is still in memory)...

oontvoo marked an inline comment as done.Feb 25 2022, 7:21 AM

I feel like there's an awful lot more code than is required for this patch. I stopped commenting on inidividual things partway through. I think the following outline should be sufficient. It probably isn't 100% correct, but you should be able to get the general idea from it.

// This code could all live in generic area, since this is generic behaviour.
bool compareSymName(SymbolRef LHS, SymbolRef RHS) {
  // Implementation left as an exercise for the reader. In essence:
  // return LHS.Name < RHS.Name
}

bool compareSymType(SymbolRef LHS, SymbolRef RHS) {
  // Implementation left as an exercise for the reader. In essence:
  // return LHS.Type < RHS.Type
}

class SymbolComparer {
public:
  using ComparePred = function_ref<bool(SymbolRef, SymbolRef)>;
  void add(ComparePred Pred) { Predicates.push_back(Pred); }

  bool operator()(SymbolRef LHS, SymbolRef RHS) {
    for(ComparePred Pred : Predicates) {
      if (Pred(LHS, RHS))
        return true;
      if (Pred(RHS, LHS))
        return false;
    }
    // All considered parameters are equal. This means that a SymbolComparer
    // taking an empty vector in the constructor will treat all symbols as equal.
    return false;
  }

private:
  SmallVector<ComparePred, 2> Predicates;
};

// Code in MachODumper.cpp, possibly even other files too.
void MachODumper::printSymbols(const SymbolComparer &SymCmp) {
  ListScope Group(W, "Symbols");
  auto SymbolRange = Obj->symbol();
  std::vector<SymbolRef> SortedSymbols(SymbolRange.begin(), SymbolRange.end());
  stable_sort(SortedSymbols, SymCmp);
  for (SymbolRef Symbol : SortedSymbols)
    printSymbol(Symbol);
}

// In main
SymbolComparer SymCmp;
if (Arg *A = Args.getLastArg(OPT_sort_symbols_EQ)) {
  const StringMap <SymbolComparer::ComparePred> KeyToPredMap =
    {{"name", compareSymName}, {"type", compareSymType}};
  for (StringRef KeyStr : llvm::split(A->getValue(), ",")) {
    auto Found = KeyToPredMap.find(KeyStr);
    if(Found == KeyToPredMap.end())
      error("--sort-symbols value should be 'name' or 'type', but was '" +
            Twine(KeyStr) + "'");
    else
      SymCmp.add(Found->getValue());
  }
}

I'm not sure that's much less code - the only substantial chunk of code right now is in MachOODumper.cpp, line ~602 to 722.
The rest is just one or two line of plumping the args through.

(Also there were some unrelated formatting changes when I ran clang-format ... will revert those)

llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
24–32

i can drop this in the other test (the NAME-TYPE one) but this was needed here to ensure the whitespace padding was done correctly.

llvm/tools/llvm-readobj/MachODumper.cpp
634

Are you sure about this? They don't look all that temporary (as long as the object file is still in memory)...

Yes - I've run the code with asan(previous version which simply stored the returned values from ->symbols() to a vector then sorted it) and got errors

I'm not sure that's much less code - the only substantial chunk of code right now is in MachOODumper.cpp, line ~602 to 722.
The rest is just one or two line of plumping the args through.

My suggested code has about 55 LOC in total, compared to over 70 in this patch just for the comparators as things stand, without even considering the other code. It's also conceptually simpler: simply store a list of functions to sort by upfront, which are akin to std::less, and therefore don't need "equals" comparison functions, and then use them directly in the sorter. This is before you consider other complications, such as the use of std::tuple and constants to look up in said tuple, or the need for enums in the argument parsing.

llvm/tools/llvm-readobj/MachODumper.cpp
634

Are you sure it was that bit of code causing the problem? I've inspected the implementation of symbols() and SymbolRef for Mach-O, and the SymbolRef in this case is just a pointer into the buffer stored by MachOObjectFile (see MachOObjectFile::getSymbolByIndex). As such, there should be no lifetime issues, as long as the object file is in existence. Please try again and analyse where the issue actually is, as it could be a bug elsewhere that this just exposed. I'm going to need more convincing than just "asan said so" that there's an issue storing a SymbolRef in a vector that has a lifetime less than that of the object file.

oontvoo planned changes to this revision.Feb 28 2022, 10:17 AM

please hold of reviewing ... i'll do some more digging and post the finding sooon

oontvoo updated this revision to Diff 417420.Mar 22 2022, 3:57 PM

Updated diff:

  • Rework patch per review request
  • Got rid of unrelated formating changes

Note: the impl of the predicates (eg., compareSymType/compareSymName) cannot be "shared" because different formats seem to have different ways to get these.

Herald added a project: Restricted Project. · View Herald TranscriptMar 22 2022, 3:57 PM
Herald added a subscriber: StephenFan. · View Herald Transcript

@jhenderson sorry this has taken awhile - was busy with other stuff - I've doubled checked the crash I was talking about - turned out it was because of getSymbolName() (completely unrelated) So yeah, you're right - storing the SymbolRef is safe. Reworked the patch per your suggestion (with some modification).
PTAL. thanks!

What happened to the CommandGuide update?

llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
10

Looks like you haven't addressed my test comments?

llvm/tools/llvm-readobj/MachODumper.cpp
627

Nit: don't start functions with blank lines.

I don't this is the place to be adding the predicates, because much of this code will end up being duplicated when other format support is added. Instead, I recommend moving it to just after the code that creates the ObjDumper class in llvm-readobj.cpp, and use virtual functions for the predicates, using std::bind tto handle the fact that they're member functions.

If you do that, there's no need for the sort key enum. Instead, you could delay processing the command-line option until you need it to identify which predicates to add.

llvm/tools/llvm-readobj/ObjDumper.h
41
44

Don't use final. It doesn't add any value and just makes later updates more complex.

64

We're inside the llvm namespace already.

llvm/tools/llvm-readobj/llvm-readobj.cpp
196–202

If sure about the warning, I recommend refactoring this to use the new warn function.

269
282–283

I think it would be simpler if this were an error. What's the motivation for a warning?

oontvoo updated this revision to Diff 418289.Mar 25 2022, 12:01 PM
oontvoo marked 12 inline comments as done.

addressed review comments + added additional tests

oontvoo added inline comments.Mar 25 2022, 12:04 PM
llvm/tools/llvm-readobj/MachODumper.cpp
627

If you do that, there's no need for the sort key enum. Instead, you could delay processing the command-line option until you need it to identify which predicates to add.

Implemented the virtual func parts as suggested but still keepng the enum (albeit local/static now) because it's weird to move the opts processing part outside of the parseOptions function.

llvm/tools/llvm-readobj/llvm-readobj.cpp
196–202

(removed)

282–283

(made it an error too)

jhenderson added inline comments.Mar 28 2022, 12:17 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
24–32

Whitespace padding isn't relevant to this test: the formatting should be tested in a basic symbol printing test, not in a test that is about the --sort-symbols option. Besides: without the --strict-whitespace and --match-full-lines FileCheck options, this doesn't actually verify the whitespace properly.

llvm/tools/llvm-readobj/MachODumper.cpp
638

We usually explicitly use llvm:: prefix for std:: algorithm variations like this, to show it's an LLVM extension we're making use of. It also facilitates find and replace, should a std:: version ever be added in the future.

jhenderson added inline comments.Mar 28 2022, 12:17 AM
llvm/tools/llvm-readobj/MachODumper.cpp
627

Yeah, I understand the concern. The downside with keeping it separate is that you have to effectively repeat the switch/case involved in option parsing in two places, which leads to ugly warts like the need for the assert(false) in the second one. Up to you though.

648

I might be mistaken, but I believe LLVM usually doesn't bother commenting out unused parameter names.

llvm/tools/llvm-readobj/ObjDumper.h
100
llvm/tools/llvm-readobj/llvm-readobj.cpp
91

UNSUPPORTED or UNKNOWN instead of UNSPEC.

360

llvm::Optional would be the more expressive form here. You could then pass it directly, rather than via the pointer, and have a None check instead of a nullptr check.

372

Here and below, I don't think you need the trailing return types.

381

Use case UNSPEC (renamed according to my earlier comment) here, instead of default, to take advantage of compiler warnings about not all cases being filled. See also https://llvm.org/docs/CodingStandards.html#don-t-use-default-labels-in-fully-covered-switches-over-enumerations.

382

Use llvm_unreachable rather than assert(false).

387
  1. Error and warning messages shouldn't end with a "."
  2. I have concerns here: this will stop dumping as soon as an unsupported file type is encountered (even in an archive), yet that is unlikely what the user wants to happen. They more likely want to continue dumping and just not sort in this case (imagine if they had mutliple different file format objects in their input). This should be at most a warning (including the input file name), although I could even see an argument for not having it at all. It also needs testing.
oontvoo marked 8 inline comments as done.Mar 28 2022, 8:09 AM
oontvoo added inline comments.
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
24–32

the comment about white was out of date (no longer needed)

llvm/tools/llvm-readobj/MachODumper.cpp
627

(ack'ed - keeping this as is for now until other format started implementing the option)

llvm/tools/llvm-readobj/llvm-readobj.cpp
387
  1. Fixed
  2. If the format doesn't support --sort-symbols then the users shouldn't specify --sort-symbols. (Keeping it as error for consistency - similarly to how it handles unknown sort-key above, which is that if it doesn't understand it, then it's an error, rather than trying to guess. This file also doesn't have precedence for "warning")
oontvoo updated this revision to Diff 418592.Mar 28 2022, 8:09 AM
oontvoo marked 2 inline comments as done.

addressed review comment

jhenderson added inline comments.Mar 28 2022, 8:21 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
24–32

You can still simplify the checks as described in my original comment.

llvm/tools/llvm-readobj/MachODumper.cpp
648

Unaddressed but marked as done?

llvm/tools/llvm-readobj/llvm-readobj.cpp
360

I was mistaken to put the llvm:: qualifier before Optional. It looks like many (most?) instances don't use the qualifier for it or None, so please remove it.

372

Not addressed?

387

You're making the potentially incorrect assumption that all input are of the same format. If a user has two objects of different formats, they might want the symbols sorted for the ones that can be:

$ llvm-readobj elf.o macho.o --sort-symbols --symbols

Would result in an error, and nothing printed (not even macho.o's symbols).

oontvoo updated this revision to Diff 418603.Mar 28 2022, 8:49 AM
oontvoo marked 3 inline comments as done.

updated diff

oontvoo marked an inline comment as done.Mar 28 2022, 8:49 AM
oontvoo marked 2 inline comments as done.

Looks good aside from the test comment.

llvm/tools/llvm-readobj/llvm-readobj.cpp
387–390

This warning is untested. Unless you have patches lined up for all other formats, I'd add a test that shows what happens for a mixture of sortable and unsortable formats, e.g. llvm-readobj wasm.o macho.o elf.o

oontvoo updated this revision to Diff 418883.Mar 29 2022, 8:07 AM

added test for warning case

oontvoo marked an inline comment as done.Mar 29 2022, 8:07 AM
jhenderson added inline comments.Mar 29 2022, 8:19 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
13–15

As this logic is testing generic logic in the tool, rather than format-specific logic, I think you'd be better off pulling it into a separate test directly in the llvm-readobj directory. Additionally, I wouldn't use ELF, as ELF is a good candidate for the next format to support. I'd use one of the other input formats supported by llvm-readobj.

oontvoo added inline comments.Mar 29 2022, 8:28 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
13–15

As you've said, this test is temporary until the other formats are implemented. As such, I dont see why it needs to go out in a separate test file

jhenderson added inline comments.Mar 29 2022, 8:30 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
13–15

As an ELF developer, I wouldn't necessarily expect me adding support to the ELF layer to break a Mach-O test, which is currently what would happen.

oontvoo marked an inline comment as done.Mar 29 2022, 8:45 AM
oontvoo added inline comments.
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
13–15

k

oontvoo updated this revision to Diff 418893.Mar 29 2022, 8:45 AM
oontvoo marked an inline comment as done.

added more tests

jhenderson added inline comments.Mar 29 2022, 11:45 PM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
231

Nit: missing newline at EOF.

llvm/test/tools/llvm-readobj/sort-symbols.test
1

Double # for comments in these tests, and remove the double space.

6

Any particular reason you've not included xcoff?

16

I'd consider pruning this back to just warning '{{.+}}_macho': to reduce the risk of false negatives due to a slight change in the warning message.

oontvoo marked 4 inline comments as done.Mar 30 2022, 6:24 AM
oontvoo added inline comments.
llvm/test/tools/llvm-readobj/sort-symbols.test
16

Wouldn't that make the test more prone to false positives? that is, if some new warning pops up somewhere else, this would trip. So i'm going to keep this.

oontvoo updated this revision to Diff 419127.Mar 30 2022, 6:24 AM
oontvoo marked an inline comment as done.

updated diff

jhenderson added inline comments.Mar 30 2022, 6:42 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

Yes, it would, but we probably shouldn't be invoking behaviour that causes those warnings anyway, so they're harmless (it would be easy to fix the test if it did trigger one in the future).

The test as it was before the last edit demonstrates why overly strict CHECK-NOTs are not much use, because typos can cause them to pass spuriously.

69

Too many blank lines at EOF (should be exactly one \n at the end).

oontvoo updated this revision to Diff 419129.Mar 30 2022, 6:43 AM

removed blank line

oontvoo marked an inline comment as done.Mar 30 2022, 6:44 AM
oontvoo added inline comments.
llvm/test/tools/llvm-readobj/sort-symbols.test
16

But that's not this test's job to guard against *OTHER* kinds of warnings.

oontvoo marked 2 inline comments as done.Mar 30 2022, 6:44 AM
oontvoo added inline comments.
llvm/test/tools/llvm-readobj/sort-symbols.test
16

Furthermore, false negatives/brittle tests are just as frustrating.

oontvoo added inline comments.Mar 30 2022, 6:51 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

Yes, it would, but we probably shouldn't be invoking behaviour that causes those warnings anyway, so they're harmless (it would be easy to fix the test if it did trigger one in the future).

I disagree with the assertion that it's easy to fix these. Imagine there were a dozen tests similar to this one which were not expecting some warnings, then someone added a new warning and they would have to go update all these tests, even though it's not their fault. (it is the test's fault that it casts too wide a nest on the warning).

oontvoo added inline comments.Mar 30 2022, 6:52 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

s/nest/net

Do you have any other comment on this patch because it seems we've been back on forth for a very long time and it doesn't seem to get any more progress ...

jhenderson added inline comments.Mar 30 2022, 7:10 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

@MaskRay, any thoughts on this or other aspectse of this patch?

As someone who has been stung way too many times by rotten tests caused by negative matches like this not actually catching the thing I'm expecting to be caught, I'm incredibly wary of strict -NOT patterns like this. This isn't some arbitrary concern: I've seen bugs in released products because of this exact kind of overly precise check pattern.

That being said, there is an alternative approach I think you could consider: stick the message after the colon in a FileCheck define and then use it in both the positive and negative matches. That way, if the message is changed, the positive matches will start failing, prompting the developer to update that check too, which in turn will ensure the CHECK-NOT doesn't rot (since it's testing guaranteed to be testing the exact same string).

# RUN: ... | FileCheck %s -DMSG="--sort-symbols is not supported yet for this format"

# CHECK: warning: '{{.+}}_coff': [[MSG]]
...
# CHECK-NOT: warning: '{{.+}}': [[MSG]]

(NB: I've left the file path loose in the negative match so that if somebody changes the input file name, the check pattern is still valid).

oontvoo added inline comments.Mar 30 2022, 7:26 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

As someone who has been stung way too many times by rotten tests caused by negative matches like this not actually catching the thing I'm expecting to be caught, I'm incredibly wary of strict -NOT patterns like this. This isn't some arbitrary concern: I've seen bugs in released products because of this exact kind of overly precise check pattern.

Generally, maybe there's a point - but in this context specifically, what would be the bug if the warning were emitted for macho and not caught? The macho specific tests should have caught it (ie., changes in code logic)

And stated before, you were arguing for making this CHECK-NOT catch all the warnings, which I disagree with (reasons stated a few comments back).
So while it's true that the current set up isn't too ideal, given a choice of false neg vs false positive, I think most would agree that a test should learn toward false positive because false negatives tend to waste a lot of other people's time in debugging test failures.

That being said, there is an alternative approach I think you could consider: stick the message after the colon in a FileCheck define and then use it in both the positive and negative matches. That way, if the message is changed, the positive matches will start failing, prompting the developer to update that check too, which in turn will ensure the CHECK-NOT doesn't rot (since it's testing guaranteed to be testing the exact same string).

# RUN: ... | FileCheck %s -DMSG="--sort-symbols is not supported yet for this format"

# CHECK: warning: '{{.+}}_coff': [[MSG]]
...
# CHECK-NOT: warning: '{{.+}}': [[MSG]]

(NB: I've left the file path loose in the negative match so that if somebody changes the input file name, the check pattern is still valid).

oontvoo added inline comments.Mar 30 2022, 7:34 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

(NB: I've left the file path loose in the negative match so that if somebody changes the input file name, the check pattern is still valid).

What would be the legitimate reason to change the macho filename and not update the test? Those names were chosen specifically to differentiate the different formats.

oontvoo updated this revision to Diff 419137.Mar 30 2022, 7:37 AM

pass MSG param to FileCheck

oontvoo updated this revision to Diff 419215.Mar 30 2022, 10:49 AM

updated diff

MaskRay added inline comments.Mar 30 2022, 10:59 AM
llvm/test/tools/llvm-readobj/MachO/stabs-sorted.yaml
2
llvm/test/tools/llvm-readobj/sort-symbols.test
16

The current # CHECK-NOT: warning '{{.+}}_macho': [[MSG]] usage looks quite nice, though you may omit : after warning. By saving MSG to a -D variable, you don't have to repeat the message.

I also wonder whether we can just use # CHECK-NOT: warning: to assert no extra warnings. To make the test more specific, we want to make the file as valid as possible and rule out possibilities for other warnings, then # CHECK-NOT: warning: suffices.

oontvoo updated this revision to Diff 419220.Mar 30 2022, 11:07 AM
oontvoo marked 2 inline comments as done.

updated diff

oontvoo added inline comments.Mar 30 2022, 11:11 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

though you may omit : after warning. By saving MSG to a -D variable, you don't have to repeat the message.

I'm not sure factoring the : to MSG helps with readability - the CHECK statements would look like:

# CHECK: warning '{{.+}}_coff'[[MSG]]

which is not as good as the current form

oontvoo added inline comments.Mar 30 2022, 11:25 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

I also wonder whether we can just use # CHECK-NOT: warning: to assert no extra warnings. To make the test more specific, we want to make the file as valid as possible and rule out possibilities for other warnings, then # CHECK-NOT: warning: suffices.

Why does only this test need to assert that there is no other warnings/errors? I've checked all other tests in this project, and none of them has a check that no additional warnings were emitted.

oontvoo added inline comments.Mar 30 2022, 11:38 AM
llvm/test/tools/llvm-readobj/sort-symbols.test
16

IOWs, I don't see why the burden has to be on this test to check that there is no other warnings. It should be as precise as possible in terms of what warning it does not expect and in this case, it does not expect this specific warning for the macho format. Any other warnings is another test's problem. (This test is named "sort-symbols.test" and not "all-warnings.test" for a reason)

If we want to have warning check tests, that's another discussion.

jhenderson accepted this revision.Mar 30 2022, 11:58 PM

LGTM now, thanks!

This revision is now accepted and ready to land.Mar 30 2022, 11:58 PM
This revision was landed with ongoing or failed builds.Mar 31 2022, 6:16 AM
This revision was automatically updated to reflect the committed changes.