This is an archive of the discontinued LLVM Phabricator instance.

Introducing llvm-cm: A Cost Model Tool
Needs ReviewPublic

Authored by JestrTulip on Jun 20 2023, 2:35 PM.

Details

Summary

Initial commit for llvm-cm as described in https://discourse.llvm.org/t/rfc-llvm-cm-cost-model-evaluation-for-object-files-machine-code/71502
The tool currently just counts instructions.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mtrofin added inline comments.Jun 20 2023, 10:03 PM
llvm/tools/llvm-cm/llvm-cm.cpp
361

Are the valid values Start-inclusive and End-exclusive, i.e. should one of these be strict inequality?

374

you can probably factor this out in a function

394

remove commented code.

395

++NumInstructions

400

can you do

if (!DisAsm->getInstruction...) {

WithColor...
break;

}

JestrTulip marked 22 inline comments as done.

rebased from parent patch

mtrofin added inline comments.Jun 22 2023, 4:09 PM
llvm/tools/llvm-cm/llvm-cm.cpp
77

Keep and IncrementIndex can be init-ed at declaration (probably the anchor of this comment moved, but it's convenient :) )

123

nit: can you put /*param_name=*/ for the 0 here, too?

141

nit: can you add a TODO here if this is something that could be shared with llvm-objdump - so we don't forget.

170

const std::vector<StringRef>&

206

did you mean to put the error after error:?

212

when won't it be empty?

255

you can avoid having this variable around by just creating IP with AsmInfo->getAssemblerDialect(). Or const it.

257

I don't follow: in case what? :)

366

why do you need to pass Aliases here?

374

it's still not factored

ondrasej added inline comments.Jun 23 2023, 10:34 AM
llvm/test/tools/llvm-cm/inst_count.ll
15–16 ↗(On Diff #533432)

This might be a bit brittle - the best option would be to use assembly directly, if possible.

llvm/tools/llvm-cm/llvm-cm.cpp
72

I'd prefer to avoid using this type of macros:

  • they tend to break automatic code manipulation tools,
  • get brittle around code that uses commas,
  • they are not commonly used in the LLVM code base.

If you really want to use this pattern, I'd go with something like

class ExitIf {
   public:
    ExitIf(bool Cond) : Condition(Cond) {}
    ~ExitIf() {
        if (Condition) {
            std::cerr << MsgStream.str() << std::endl;
            exit(1);
        }
    }

    template <typename T>
    ExitIf& operator<<(const T& other) {
        MsgStream << other;
        return *this;
    }

   private:
    bool Condition;
    std::stringstream MsgStream;  // Or llvm::raw_ostream.
};

// Used as:
ExitIf(!Foo) << "Foo is not true :(";

It will format the message even if condition holds, but that's probably OK in this case (and can be fixed with a relatively simple macro).

There's also llvm::ExitOnError that might help in a few cases here.

126

Please add a period at the end of the comment.

Same for all the other comments.

JestrTulip marked 7 inline comments as done.

feedback

feedback + added periods to comments

mtrofin added inline comments.Jun 27 2023, 2:30 PM
llvm/test/tools/llvm-cm/inst_count.s
100 ↗(On Diff #535135)

might as well test how many instructions

llvm/tools/llvm-cm/llvm-cm.cpp
66

-mcpu=help works? if not (or not right now), remove that part from cl::desc.

71

nit: class ExitIf final

upon looking more closely at the usage pattern - @ondrasej , why raii and not just a function call? raii will exit at scope exit, which is undesirable (the goal is to exit right away). A function call "just works" because exit(1), no?

82

bool Condition =false or const bool Condition

181

nit: printFunctionNames - otherwise it sounds like you're printing the whole thing?

189

this is from subsequent patch?

JestrTulip marked 3 inline comments as done.

feedback

JestrTulip marked 3 inline comments as done.

feedback (mcpu, printFunctionNames, etc...)

mtrofin accepted this revision.Jun 28 2023, 9:52 AM

lgtm, please fix the ExitIf to use <<; also make sure function names are verbs

This revision is now accepted and ready to land.Jun 28 2023, 9:52 AM

feedback (error handling + small function name change)

jhenderson requested changes to this revision.Jun 29 2023, 12:21 AM
jhenderson added a subscriber: jhenderson.

I haven't attempted to review any of the logic of this - I wanted to take a quick whizz through the code to see what was going on and got sucked in by a rather hefty heap of style issues.

There seems to be very limited testing for what is a fairly chunky block of code here. You should have testing for all your error paths, as well as other other code paths you have created here, at the very least.

Is this intended to be a user-facing tool? If so, please create a user guide in llvm/docs/CommandGuide (and make sure to update the index there). That doesn't need to be in this patch, as long as it gets added at some point.

llvm/test/tools/llvm-cm/inst_count.s
113 ↗(On Diff #535593)

Get rid of all the blank lines at EOF. The file should end with precisely one \n, at the end of the last line containing text.

llvm/tools/llvm-cm/CMakeLists.txt
2

Get rid of commented out code here and below.

16

Now you have too many new lines (as noted above, files should end with precisely one \n).

llvm/tools/llvm-cm/llvm-cm.cpp
48
58

Would it make sense to using namespace llvm::object;?

72

This comment is probably unnecessary, but even if it were, it should be immediately next to the class in question, without any blank lines separating it.

74

What is this comment about?

75

Why is this a class when a simple function would do?

76

Have you run clang-format on your new code? Because this doesn't look formatted according to the standard format.

80

This isn't a normal way to print errors in LLVM tools. Please see existing examples in tools like llvm-objdump and llvm-readobj. In particular, you should be using WithColor for printing error messages. I see you've already implemented an error function below, so should you be using that?

81

Should this be std::exit?

90

Blank line after this function.

111

It's extremely unusual to pass in a vector by value. Should this be const & or &&?

117

Using consumeError in new code is usually a code smell. Is there a reason you don't report the Error (at least as a warning)?

122
133

This can just be object::SectionFilter, right? Same below at the return site.

135

Same as above re. passing by value.

138

I don't think it's a hard rule, but I've tended to see std::numeric_limits<uint64_t>::max() used rather than the C-style macros.

150

This seems like an unnecessary comment.

165

Any reason you can't do this upfront, like you did with the section filter stuff?

183

Ditto.

188

It's still unclear there's a 100% consensus, but at least it seems like this should be a static_cast rather than a C-style one (there's a debate about functional-style casts that's ongoing). See D151187.

189

No blank line at end of function, please.

192

This seems like a weird comment? Should this be a TODO too?

195

This seems like an unnecessary temporary variable. Can this and the next line be folded together?

200

Same as above.

212

size() returns a size_t, so we should match that type.

220–221

Is there a reason you're passing in the unique_ptr here by reference, rather than simply the underlying pointer (i.e. MCDisassembler *)?
Also, simple types like uint64_t are usually passed in by value, not by const &.

252

Do you need this new line? ExitIf already adds its own.

254

The usual style is that acronyms (i.e. in this case "BB" = "BasicBlock"), are all caps.

260

I'd add a blank line after this one, after the if block.

261

This just seems to be a comment that describes what the (relatively simple) code is doing. Is it really useful?

273

Comments like this are not useful. It simply is saying what the name of the function on the next line literally says it is doing. Use variable and function names to describe the what of what your code is doing, and comments only when that is insufficient (which should be rare) or where the "why" is important.

291

There are a lot of unnecessary blank lines within this function, which disrupt the flow of someone reading it. Keep your code grouped into logical bits, e.g. variable declarations and the function that then uses them all together.

Also don't initialise local variables until you need them. E.g. TheTarget isn't used until quite a way down from here, so move its initialization until then.

303

size_t

339

Please clean up all these double/triple blank lines.

341

Should this be a StringMap?

346

This variable is named as a verb, but it is a variable. Please use a noun or adjective phrase.

378

I suspect std::unordered_map is not the type you actually want. Take a look here: https://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc

379

This should probably be called GetBBAddrMapping.

420

Don't start a block with a blank line.

464–465

Does int make sense here? Can they be negative? Should they actually be uint64_t?

473

Explicit return 0; at end of main is unnecessary.

This revision now requires changes to proceed.Jun 29 2023, 12:21 AM

Oh, and to add, the test you have added is failing according to the pre-merge CI.

I'll try to read this soon.

ondrasej added inline comments.Jun 29 2023, 2:26 AM
llvm/tools/llvm-cm/llvm-cm.cpp
71

As discussed offline yesterday: RAII is there to make the streaming operators after ExitIf() work.

With RAII, and with the intended use (ExitIf(Cond) << "Some message";), the following happens:

  1. a temporary ExitIf instance is created and takes note of Cond,
  2. all streaming operators are applied to the temporary (collecting the messages),
  3. it's a temporary value (there's no variable name), so its scope is limited to the statement where it appears.

Basically, it processes the chain of <<, and then immediately exits.

MaskRay requested changes to this revision.Jun 29 2023, 8:25 PM
MaskRay added inline comments.
llvm/tools/llvm-cm/llvm-cm.cpp
32

and clang-format this file.

283

We usually use a compact style. A variable declaration doesn't need a following blank line.

BinaryOrErr and Binary are quite related. Adding a blank line only harms readability.

291

delete blank line after std::string Error

301

We almost never use 2 blank lines for logical separation.

In this case, TrueFeatures is immediately used and should have no blank lines following it.

getFeatures only returns a non-empty string. You need an AArch32/RISC-V/Mips test to test it.

304

2-space indentation. omit braces for a single-line simple statement.

Features may start with - indicating a negative feature. It's incorrect to use TrueFeatures

JestrTulip marked 17 inline comments as done.

general style changes + feedback

RKSimon added a subscriber: RKSimon.Jul 1 2023, 9:32 AM

(style) Target specific tests should be put in a target subdirectory - in this case llvm/test/tools/llvm-cm//x86/inst_count.s etc. You can then add a lit.local.cfg file there as well that avoid you needing to add 'REQUIRES' to every test file

The pre-merge tests are still failing. Please take a look and fix the issue accordingly.

llvm/test/tools/llvm-cm/empty.s
8 ↗(On Diff #536396)

This will check that the literal string "{*}" does not appear in the output. I'm guessing that's not what you meant? If you want to check that the output is empty, you should replace your FileCheck call with count 0.

llvm/test/tools/llvm-cm/malformed.s
5 ↗(On Diff #536396)

There's no need for this text, so I'd just delete it.

llvm/tools/llvm-cm/llvm-cm.cpp
71

Why do you need RAII for this? You could just do string concatenation to get your final message, and pass that into a function that prints the message and then calls std::exit.

76

Ping - not addressed.

118
156

std::exit?

285

I don't think you want error: in the message here?

Also, test case?

Same throughout.

326

I don't think you want this blank line - the declarations are related to the loop, so they should be closely linked.

329

See my earlier comment about unhelpful comments. Same throughout.

416

Move this to where it is used. Same with a number of the other variables.

JestrTulip added inline comments.Jul 5 2023, 10:09 AM
llvm/tools/llvm-cm/llvm-cm.cpp
75

Going off @ondrasej's suggestion regarding RAII.

165

I was planning on including these changes in a later patch, since the overall scope of this one is so minimal, I wanted to avoid changing many other files.

189

it's for the current one, it maps the addresses we use for the MCInsts with the labels from the basic blocks

304

Changed to remove TrueFeatures and MAttrs

JestrTulip added inline comments.Jul 5 2023, 3:41 PM
llvm/tools/llvm-cm/llvm-cm.cpp
71

The reason for not doing string concatenation to get the message and passing the result into a function is mainly due of the reduced overhead (specifically, the temporary string objects) the class option provides. With string concatenation, in all cases, we must perform extra object allocations to get the final message.
However, with this implementation, we only assemble the message into the stringstream when there's an error.
I realize this was not exactly what original implementation did, but I've updated it alongside my most recent update. If this implementation is not preferred, I am perfectly able to accommodate.

76

I've run git clang-format on the most recent patch. Just to make sure I'm doing it correctly:
I first git added all the modified files,
I then ran git-clang format from the root of my repo
I then received the message "clang-format did not modify any files"

118

Regarding the test cases for aspects such as Section Name, a good portion of the error handling for disassembly was modeled off the error handling in tools such as objdump and mca. For this specific case, and many of those that are unaddressed within the program, I find it hard to come up with a way to properly test these occurrences.
For example, this error requires a Section to exist within the file, but its name to not be found.
If I could receive any ideas on how to properly test situations like this, i'd be very grateful.

jhenderson added inline comments.Jul 6 2023, 12:15 AM
llvm/test/tools/llvm-cm/X86/bad_triple.s
3

Do you actually need a valid object for this test? I would expect the check for a valid triple to occur before handling of the input file. If you do need a valid object, that's fine, but can it just be a trivial file, i.e. one made from an empty/single-line asm file?

35

It's easier to follow the test if this check appears before the large block of asm. Same generally goes throughout.

llvm/test/tools/llvm-cm/X86/empty.s
2

Let's be more specific: "Check that llvm-cm produces no output for an empty input file."

llvm/test/tools/llvm-cm/X86/malformed.s
2

Please be careful of trailing whitespace too. I see it in other test files.

llvm/test/tools/llvm-cm/X86/multi_funct.s
1 ↗(On Diff #537527)

I think a comment explaining each test would be a good idea.

Also "func" is a more common abbreviation for "function" than "funct".

Also, prefer - in test names over _ due to it being easier to type.

llvm/tools/llvm-cm/llvm-cm.cpp
71

LLVM has the Twine class to support efficient string concatenation, which would defer the real concatenation until needed (see https://llvm.org/docs/ProgrammersManual.html#the-twine-class). By passing the concatenated Twine to the error function for printing if the error is hit, you'll avoid any unnecessary string processing cost in the non-failing case, whilst keeping the interface simple. As a bonus, Twine takes constructors that do std::to_string-like conversions of its input, so that you can get the same functionality as the streaming option.

76

I personally don't use git-clang-format to do it - I use a clang-format tool that comes along with my Visual Studio setup, so I can't comment. Perhaps @MaskRay or another review could assist, because this is definitely not correctly formatted.

118

Take a look at the test cases for llvm-readobj - there are many examples where it uses yaml2obj to customise the input in some way, e.g. creating an input without a section header string table (see https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/sections-no-section-header-string-table.test). You should be able to use yaml2obj to achieve a broken input in some manner using these examples (more can be found in the ELF yaml2obj tests too, if you want to explore the various options you have). NB: you don't need to test every possible failure mode that the underlying library can hit (these should already have testing elsewhere). Instead, you should make sure you have testing that covers the case where a particularly library function returns an error, to show that you handled the error correctly.

165

This is brand new code. You should really avoid duplicating code in new code, if at all possible. In other words, you should be refactoring the code you want to use in earlier patches, to make it shareable, and then base this patch on top of them.

290

Your ExitIf class writes a '\n' after the messsge, so there's no need for an additional one here and elsewhere you've added it.

326

As above, delete this blank line, so that the declarations tied to the loop are tightly linked to it.

JestrTulip updated this revision to Diff 537927.Jul 6 2023, 4:42 PM

refactoring + added test

JestrTulip added a comment.EditedJul 6 2023, 4:44 PM

There is a change in ELFObjectFile.h that is due to a bug that was found while the new tests were being written, the patch is linked here.

JestrTulip added inline comments.Jul 6 2023, 4:57 PM
llvm/tools/llvm-cm/llvm-cm.cpp
76

@MaskRay suggested

git diff -U0 --no-color --relative 'HEAD^' -- | llvm-project/clang/tools/clang-format/clang-format-diff.py -p1 -i

so I used that.

118

I do have a question regarding some of the tests. I've used yaml2obj to for some of them, but I was wondering how to create the necessary conditions for others, namely the error messages that occur despite a proper target (e.g. AsmInfo, FeatureVals, SubInfo, etc..). I was also wondering if there was a way to generate an object file with invalid sections, for this message.

if (!SecNameOrErr) {
    WithColor::warning(errs(), "llvm-cm")
        << "Failed to get section name: " << toString(SecNameOrErr.takeError())
        << "\n";
}
165

Regarding the code duplication, where should these shared functions be moved to. I was thinking about ObjectFile.cpp, but I am open to any suggestions.

added "error:" to CHECK lines

RKSimon added inline comments.Jul 7 2023, 2:40 AM
llvm/include/llvm/Object/ELFObjectFile.h
537

(style)

if (Expected<section_iterator> SecOrErr = getSymbolSection(Sym))
  return (*SecOrErr)->getName();
return SecOrErr.takeError();
n-omer added a subscriber: n-omer.Jul 7 2023, 3:50 AM
JestrTulip added a comment.EditedJul 13 2023, 1:51 PM

Hi All, have the concerns regarding this patch been sufficiently addressed, and is it ready to land? The RFC and motivation for this patch can be found here.

Matt added a subscriber: Matt.Jul 17 2023, 4:29 PM

My apologies for being slow in getting back to you - I had some time off and then have been busy catching up on all sorts of other reviews. By the way, feel free to ping the thread if it goes stale for a week.

I've not reviewed the main body of the code today - I ran out of time, but there should be plenty to get on with. If I missed any questions, please reask them.

llvm/test/tools/llvm-cm/X86/bb-addr-map.test
1

Nothing in this test is X86 specific, so move the test out of the X86 folder, so that it can be run on all targets.

1–2

Avoid trailing whitespace, and also this was wrapped rather prematurely.

6

I expect there's some additional context that this error could have. Why did the reading fail? At the moment, it's basically impossible for a user to be able to know how to resolve it.

llvm/test/tools/llvm-cm/X86/empty.s
1

I guess this is more grammatically correct, sorry :)

llvm/test/tools/llvm-cm/X86/inst_count.s
1

This is more of a title than a descriptive comment. Also the tool is llvm-cm not LLVM-CM (normally!).

I'd suggest the following: "This test shows that llvm-cm can count instructions correctly." or something to that effect.

5–6

Nit: typically, I encourage adding some spaces between the CHECK: and the text that is being checked for, to make it line up with the CHECK-NEXT lines.

On the other hand, why is the second of these not a CHECK-NEXT line?

19

There's a lot of junk in the asm that somewhat obscures what is actually interesting about the input, and therefore what you really are trying to test. At a guess, without looking at the code logic, what you're really interested in are 1) the symbols, 2) the instructions within a symbol, and 3) the BB structures. If that is indeed the case, I wonder whether using YAML would allow you to exercise greater control, without needing to spell out every part of the BB structure (it might not)? Could you use sequences of nop instructions for your purposes, or is there a need for them to be all different?

llvm/test/tools/llvm-cm/X86/multi-func.s
1

In what way is this test significantly different to inst_count.s? If both are actually needed, could the same simplifications to the asm be made?

Also, prefer - in test names over _ due to it being easier to type.

This comment applied to inst_count.s and bad_triple.s too (plus any other tests you write).

5
29–30

This applies more generally, I just placed my comment here somewhat arbitrarily :)

There's a weird inconsistency here between the capitalized "Number" and all-lower-case "total". Furthermore, it's not expecially easy to spot where the end of one function is and the start of the next. Can I suggest a) being consistent with your capitalization, and b) adding blank lines in the output between the total line and the next function? You might also want to include the function name in the total line too, since it could be a long way from the start of it, but I'm not too fussed by that necessarily.

llvm/test/tools/llvm-cm/X86/sections-no-symbol-name.test
1–3

Unnecessary wrapping - you could save a line by not wrapping so early.

Also, typo in "outouts".

That being said, I don't think this comment and test name exactly line up with what you test. It is normal for section symbols to not have names. Tools like llvm-objdump synthesise a name from the section name for a section symbol, so the error here is when a section symbol doesn't have a valid section index. You don't even need a section for the test to produce the same behaviour, if I'm not mistaken.

7

More context in this message please. Which symbol couldn't you get the name for (report the index).

22

Nit: trailing whitespace.

llvm/tools/llvm-cm/llvm-cm.cpp
1–2

Please fix your comment header.

32

This inline edit doesn't seem to have been addressed?

68

Why the trailing whitespace in the description?

74

Let's just spell it Condition. There's no need for the brevity.

80

Nit: new line required between functions/structs etc.

Also this struct should be in an anonymous namespace.

92

Not sure what relevance "tool" is in this name, so get rid of it.

Also, prefer west const, in keeping with the wider LLVM style.

93

ArrayRef?

97

You've got using namespace llvm::object at the top of this file, so there's no need for the qualifiers here.

98

In general, these sort of label comments are only added for literals, so I'd get rid of them from here.

101

Result.IncerementIndex is always going to be true here...

108

Keep all your error reporting functions together in one place.

110

This "reading file: " context isn't particularly useful, as it prevents this function from being used for non-file errors, e.g. command-line processing errors. Take a look at createFileError as a way of adding the file name to Error.

112

As requested before, please use std::exit

118

Re. AsmInfo etc, I don't know if there's a good way of testing those, so it may not be possible. Try searching through the other tools to see if any of them test them and if so, how they do so.

When you say "invalid sections" what do you mean? A section can be invalid in many different ways, and different mechanisms will be needed to test the different cases.

150

static?

165

getElfSymbolType sounds like it belongs in ELF.h or ELFObjectFile.h. collectBBtoAddressLabels probably doesn't belong in ObjectFile.h simply because it doesn't involve any use of ObjectFile, but I can't see a better location, so there's probably fine, or maybe even consider a new header.

184