Initial commit for llvm-cm as described in https://discourse.llvm.org/t/rfc-llvm-cm-cost-model-evaluation-for-object-files-machine-code/71502
The tool currently just counts instructions.
Details
- Reviewers
mtrofin kazu ondrasej jhenderson MaskRay
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
360 | Are the valid values Start-inclusive and End-exclusive, i.e. should one of these be strict inequality? | |
373 | you can probably factor this out in a function | |
393 | remove commented code. | |
394 | ++NumInstructions | |
399 | can you do if (!DisAsm->getInstruction...) { WithColor... break; } |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
76 | Keep and IncrementIndex can be init-ed at declaration (probably the anchor of this comment moved, but it's convenient :) ) | |
123 | nit: can you put /*param_name=*/ for the 0 here, too? | |
141 | nit: can you add a TODO here if this is something that could be shared with llvm-objdump - so we don't forget. | |
170 | const std::vector<StringRef>& | |
206 | did you mean to put the error after error:? | |
212 | when won't it be empty? | |
255 | you can avoid having this variable around by just creating IP with AsmInfo->getAssemblerDialect(). Or const it. | |
257 | I don't follow: in case what? :) | |
366 | why do you need to pass Aliases here? | |
373 | it's still not factored |
llvm/test/tools/llvm-cm/inst_count.ll | ||
---|---|---|
16–17 | This might be a bit brittle - the best option would be to use assembly directly, if possible. | |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
72 | I'd prefer to avoid using this type of macros:
If you really want to use this pattern, I'd go with something like class ExitIf { public: ExitIf(bool Cond) : Condition(Cond) {} ~ExitIf() { if (Condition) { std::cerr << MsgStream.str() << std::endl; exit(1); } } template <typename T> ExitIf& operator<<(const T& other) { MsgStream << other; return *this; } private: bool Condition; std::stringstream MsgStream; // Or llvm::raw_ostream. }; // Used as: ExitIf(!Foo) << "Foo is not true :("; It will format the message even if condition holds, but that's probably OK in this case (and can be fixed with a relatively simple macro). There's also llvm::ExitOnError that might help in a few cases here. | |
126 | Please add a period at the end of the comment. Same for all the other comments. |
llvm/test/tools/llvm-cm/inst_count.s | ||
---|---|---|
100 ↗ | (On Diff #535135) | might as well test how many instructions |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
66 | -mcpu=help works? if not (or not right now), remove that part from cl::desc. | |
71 | nit: class ExitIf final upon looking more closely at the usage pattern - @ondrasej , why raii and not just a function call? raii will exit at scope exit, which is undesirable (the goal is to exit right away). A function call "just works" because exit(1), no? | |
82 | bool Condition =false or const bool Condition | |
181 | nit: printFunctionNames - otherwise it sounds like you're printing the whole thing? | |
189 | this is from subsequent patch? |
I haven't attempted to review any of the logic of this - I wanted to take a quick whizz through the code to see what was going on and got sucked in by a rather hefty heap of style issues.
There seems to be very limited testing for what is a fairly chunky block of code here. You should have testing for all your error paths, as well as other other code paths you have created here, at the very least.
Is this intended to be a user-facing tool? If so, please create a user guide in llvm/docs/CommandGuide (and make sure to update the index there). That doesn't need to be in this patch, as long as it gets added at some point.
llvm/test/tools/llvm-cm/inst_count.s | ||
---|---|---|
113 ↗ | (On Diff #535593) | Get rid of all the blank lines at EOF. The file should end with precisely one \n, at the end of the last line containing text. |
llvm/tools/llvm-cm/CMakeLists.txt | ||
2 | Get rid of commented out code here and below. | |
20 | Now you have too many new lines (as noted above, files should end with precisely one \n). | |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
48 | ||
58 | Would it make sense to using namespace llvm::object;? | |
72 | This comment is probably unnecessary, but even if it were, it should be immediately next to the class in question, without any blank lines separating it. | |
74 | What is this comment about? | |
75 | Why is this a class when a simple function would do? | |
76 | Have you run clang-format on your new code? Because this doesn't look formatted according to the standard format. | |
80 | This isn't a normal way to print errors in LLVM tools. Please see existing examples in tools like llvm-objdump and llvm-readobj. In particular, you should be using WithColor for printing error messages. I see you've already implemented an error function below, so should you be using that? | |
81 | Should this be std::exit? | |
90 | Blank line after this function. | |
111 | It's extremely unusual to pass in a vector by value. Should this be const & or &&? | |
117 | Using consumeError in new code is usually a code smell. Is there a reason you don't report the Error (at least as a warning)? | |
122 | ||
133 | This can just be object::SectionFilter, right? Same below at the return site. | |
135 | Same as above re. passing by value. | |
138 | I don't think it's a hard rule, but I've tended to see std::numeric_limits<uint64_t>::max() used rather than the C-style macros. | |
150 | This seems like an unnecessary comment. | |
165 | Any reason you can't do this upfront, like you did with the section filter stuff? | |
183 | Ditto. | |
188 | It's still unclear there's a 100% consensus, but at least it seems like this should be a static_cast rather than a C-style one (there's a debate about functional-style casts that's ongoing). See D151187. | |
189 | No blank line at end of function, please. | |
192 | This seems like a weird comment? Should this be a TODO too? | |
195 | This seems like an unnecessary temporary variable. Can this and the next line be folded together? | |
200 | Same as above. | |
212 | size() returns a size_t, so we should match that type. | |
220–221 | Is there a reason you're passing in the unique_ptr here by reference, rather than simply the underlying pointer (i.e. MCDisassembler *)? | |
252 | Do you need this new line? ExitIf already adds its own. | |
254 | The usual style is that acronyms (i.e. in this case "BB" = "BasicBlock"), are all caps. | |
260 | I'd add a blank line after this one, after the if block. | |
261 | This just seems to be a comment that describes what the (relatively simple) code is doing. Is it really useful? | |
273 | Comments like this are not useful. It simply is saying what the name of the function on the next line literally says it is doing. Use variable and function names to describe the what of what your code is doing, and comments only when that is insufficient (which should be rare) or where the "why" is important. | |
291 | There are a lot of unnecessary blank lines within this function, which disrupt the flow of someone reading it. Keep your code grouped into logical bits, e.g. variable declarations and the function that then uses them all together. Also don't initialise local variables until you need them. E.g. TheTarget isn't used until quite a way down from here, so move its initialization until then. | |
303 | size_t | |
339 | Please clean up all these double/triple blank lines. | |
341 | Should this be a StringMap? | |
346 | This variable is named as a verb, but it is a variable. Please use a noun or adjective phrase. | |
378 | I suspect std::unordered_map is not the type you actually want. Take a look here: https://llvm.org/docs/ProgrammersManual.html#map-like-containers-std-map-densemap-etc | |
379 | This should probably be called GetBBAddrMapping. | |
420 | Don't start a block with a blank line. | |
464–465 | Does int make sense here? Can they be negative? Should they actually be uint64_t? | |
473 | Explicit return 0; at end of main is unnecessary. |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
71 | As discussed offline yesterday: RAII is there to make the streaming operators after ExitIf() work. With RAII, and with the intended use (ExitIf(Cond) << "Some message";), the following happens:
Basically, it processes the chain of <<, and then immediately exits. |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
32 | and clang-format this file. | |
283 | We usually use a compact style. A variable declaration doesn't need a following blank line. BinaryOrErr and Binary are quite related. Adding a blank line only harms readability. | |
291 | delete blank line after std::string Error | |
301 | We almost never use 2 blank lines for logical separation. In this case, TrueFeatures is immediately used and should have no blank lines following it. getFeatures only returns a non-empty string. You need an AArch32/RISC-V/Mips test to test it. | |
304 | 2-space indentation. omit braces for a single-line simple statement. Features may start with - indicating a negative feature. It's incorrect to use TrueFeatures |
(style) Target specific tests should be put in a target subdirectory - in this case llvm/test/tools/llvm-cm//x86/inst_count.s etc. You can then add a lit.local.cfg file there as well that avoid you needing to add 'REQUIRES' to every test file
The pre-merge tests are still failing. Please take a look and fix the issue accordingly.
llvm/test/tools/llvm-cm/empty.s | ||
---|---|---|
8 ↗ | (On Diff #536396) | This will check that the literal string "{*}" does not appear in the output. I'm guessing that's not what you meant? If you want to check that the output is empty, you should replace your FileCheck call with count 0. |
llvm/test/tools/llvm-cm/malformed.s | ||
5 ↗ | (On Diff #536396) | There's no need for this text, so I'd just delete it. |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
71 | Why do you need RAII for this? You could just do string concatenation to get your final message, and pass that into a function that prints the message and then calls std::exit. | |
76 | Ping - not addressed. | |
118 | ||
156 | std::exit? | |
285 | I don't think you want error: in the message here? Also, test case? Same throughout. | |
326 | I don't think you want this blank line - the declarations are related to the loop, so they should be closely linked. | |
329 | See my earlier comment about unhelpful comments. Same throughout. | |
416 | Move this to where it is used. Same with a number of the other variables. |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
75 | Going off @ondrasej's suggestion regarding RAII. | |
165 | I was planning on including these changes in a later patch, since the overall scope of this one is so minimal, I wanted to avoid changing many other files. | |
189 | it's for the current one, it maps the addresses we use for the MCInsts with the labels from the basic blocks | |
304 | Changed to remove TrueFeatures and MAttrs |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
71 | The reason for not doing string concatenation to get the message and passing the result into a function is mainly due of the reduced overhead (specifically, the temporary string objects) the class option provides. With string concatenation, in all cases, we must perform extra object allocations to get the final message. | |
76 | I've run git clang-format on the most recent patch. Just to make sure I'm doing it correctly: | |
118 | Regarding the test cases for aspects such as Section Name, a good portion of the error handling for disassembly was modeled off the error handling in tools such as objdump and mca. For this specific case, and many of those that are unaddressed within the program, I find it hard to come up with a way to properly test these occurrences. |
llvm/test/tools/llvm-cm/X86/bad_triple.s | ||
---|---|---|
2 ↗ | (On Diff #537527) | Do you actually need a valid object for this test? I would expect the check for a valid triple to occur before handling of the input file. If you do need a valid object, that's fine, but can it just be a trivial file, i.e. one made from an empty/single-line asm file? |
34 ↗ | (On Diff #537527) | It's easier to follow the test if this check appears before the large block of asm. Same generally goes throughout. |
llvm/test/tools/llvm-cm/X86/empty.s | ||
1 ↗ | (On Diff #537527) | Let's be more specific: "Check that llvm-cm produces no output for an empty input file." |
llvm/test/tools/llvm-cm/X86/malformed.s | ||
1 ↗ | (On Diff #537527) | Please be careful of trailing whitespace too. I see it in other test files. |
llvm/test/tools/llvm-cm/X86/multi_funct.s | ||
1 ↗ | (On Diff #537527) | I think a comment explaining each test would be a good idea. Also "func" is a more common abbreviation for "function" than "funct". Also, prefer - in test names over _ due to it being easier to type. |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
71 | LLVM has the Twine class to support efficient string concatenation, which would defer the real concatenation until needed (see https://llvm.org/docs/ProgrammersManual.html#the-twine-class). By passing the concatenated Twine to the error function for printing if the error is hit, you'll avoid any unnecessary string processing cost in the non-failing case, whilst keeping the interface simple. As a bonus, Twine takes constructors that do std::to_string-like conversions of its input, so that you can get the same functionality as the streaming option. | |
76 | I personally don't use git-clang-format to do it - I use a clang-format tool that comes along with my Visual Studio setup, so I can't comment. Perhaps @MaskRay or another review could assist, because this is definitely not correctly formatted. | |
118 | Take a look at the test cases for llvm-readobj - there are many examples where it uses yaml2obj to customise the input in some way, e.g. creating an input without a section header string table (see https://github.com/llvm/llvm-project/blob/main/llvm/test/tools/llvm-readobj/ELF/sections-no-section-header-string-table.test). You should be able to use yaml2obj to achieve a broken input in some manner using these examples (more can be found in the ELF yaml2obj tests too, if you want to explore the various options you have). NB: you don't need to test every possible failure mode that the underlying library can hit (these should already have testing elsewhere). Instead, you should make sure you have testing that covers the case where a particularly library function returns an error, to show that you handled the error correctly. | |
165 | This is brand new code. You should really avoid duplicating code in new code, if at all possible. In other words, you should be refactoring the code you want to use in earlier patches, to make it shareable, and then base this patch on top of them. | |
290 | Your ExitIf class writes a '\n' after the messsge, so there's no need for an additional one here and elsewhere you've added it. | |
326 | As above, delete this blank line, so that the declarations tied to the loop are tightly linked to it. |
There is a change in ELFObjectFile.h that is due to a bug that was found while the new tests were being written, the patch is linked here.
llvm/tools/llvm-cm/llvm-cm.cpp | ||
---|---|---|
76 | @MaskRay suggested git diff -U0 --no-color --relative 'HEAD^' -- | llvm-project/clang/tools/clang-format/clang-format-diff.py -p1 -i so I used that. | |
118 | I do have a question regarding some of the tests. I've used yaml2obj to for some of them, but I was wondering how to create the necessary conditions for others, namely the error messages that occur despite a proper target (e.g. AsmInfo, FeatureVals, SubInfo, etc..). I was also wondering if there was a way to generate an object file with invalid sections, for this message. if (!SecNameOrErr) { WithColor::warning(errs(), "llvm-cm") << "Failed to get section name: " << toString(SecNameOrErr.takeError()) << "\n"; } | |
165 | Regarding the code duplication, where should these shared functions be moved to. I was thinking about ObjectFile.cpp, but I am open to any suggestions. |
llvm/include/llvm/Object/ELFObjectFile.h | ||
---|---|---|
536 ↗ | (On Diff #538011) | (style) if (Expected<section_iterator> SecOrErr = getSymbolSection(Sym)) return (*SecOrErr)->getName(); return SecOrErr.takeError(); |
Hi All, have the concerns regarding this patch been sufficiently addressed, and is it ready to land? The RFC and motivation for this patch can be found here.
My apologies for being slow in getting back to you - I had some time off and then have been busy catching up on all sorts of other reviews. By the way, feel free to ping the thread if it goes stale for a week.
I've not reviewed the main body of the code today - I ran out of time, but there should be plenty to get on with. If I missed any questions, please reask them.
llvm/test/tools/llvm-cm/X86/bb-addr-map.test | ||
---|---|---|
1 ↗ | (On Diff #538866) | Nothing in this test is X86 specific, so move the test out of the X86 folder, so that it can be run on all targets. |
1–2 ↗ | (On Diff #538866) | Avoid trailing whitespace, and also this was wrapped rather prematurely. |
6 ↗ | (On Diff #538866) | I expect there's some additional context that this error could have. Why did the reading fail? At the moment, it's basically impossible for a user to be able to know how to resolve it. |
llvm/test/tools/llvm-cm/X86/empty.s | ||
1 ↗ | (On Diff #538866) | I guess this is more grammatically correct, sorry :) |
llvm/test/tools/llvm-cm/X86/inst_count.s | ||
1 ↗ | (On Diff #538866) | This is more of a title than a descriptive comment. Also the tool is llvm-cm not LLVM-CM (normally!). I'd suggest the following: "This test shows that llvm-cm can count instructions correctly." or something to that effect. |
5–6 ↗ | (On Diff #538866) | Nit: typically, I encourage adding some spaces between the CHECK: and the text that is being checked for, to make it line up with the CHECK-NEXT lines. On the other hand, why is the second of these not a CHECK-NEXT line? |
19 ↗ | (On Diff #538866) | There's a lot of junk in the asm that somewhat obscures what is actually interesting about the input, and therefore what you really are trying to test. At a guess, without looking at the code logic, what you're really interested in are 1) the symbols, 2) the instructions within a symbol, and 3) the BB structures. If that is indeed the case, I wonder whether using YAML would allow you to exercise greater control, without needing to spell out every part of the BB structure (it might not)? Could you use sequences of nop instructions for your purposes, or is there a need for them to be all different? |
llvm/test/tools/llvm-cm/X86/multi-func.s | ||
1 ↗ | (On Diff #538866) | In what way is this test significantly different to inst_count.s? If both are actually needed, could the same simplifications to the asm be made?
This comment applied to inst_count.s and bad_triple.s too (plus any other tests you write). |
5 ↗ | (On Diff #538866) | |
29–30 ↗ | (On Diff #538866) | This applies more generally, I just placed my comment here somewhat arbitrarily :) There's a weird inconsistency here between the capitalized "Number" and all-lower-case "total". Furthermore, it's not expecially easy to spot where the end of one function is and the start of the next. Can I suggest a) being consistent with your capitalization, and b) adding blank lines in the output between the total line and the next function? You might also want to include the function name in the total line too, since it could be a long way from the start of it, but I'm not too fussed by that necessarily. |
llvm/test/tools/llvm-cm/X86/sections-no-symbol-name.test | ||
1–3 ↗ | (On Diff #538866) | Unnecessary wrapping - you could save a line by not wrapping so early. Also, typo in "outouts". That being said, I don't think this comment and test name exactly line up with what you test. It is normal for section symbols to not have names. Tools like llvm-objdump synthesise a name from the section name for a section symbol, so the error here is when a section symbol doesn't have a valid section index. You don't even need a section for the test to produce the same behaviour, if I'm not mistaken. |
7 ↗ | (On Diff #538866) | More context in this message please. Which symbol couldn't you get the name for (report the index). |
22 ↗ | (On Diff #538866) | Nit: trailing whitespace. |
llvm/tools/llvm-cm/llvm-cm.cpp | ||
2–3 | Please fix your comment header. | |
32 | This inline edit doesn't seem to have been addressed? | |
69 | Why the trailing whitespace in the description? | |
75 | Let's just spell it Condition. There's no need for the brevity. | |
81 | Nit: new line required between functions/structs etc. Also this struct should be in an anonymous namespace. | |
93 | Not sure what relevance "tool" is in this name, so get rid of it. Also, prefer west const, in keeping with the wider LLVM style. | |
94 | ArrayRef? | |
98 | You've got using namespace llvm::object at the top of this file, so there's no need for the qualifiers here. | |
99 | In general, these sort of label comments are only added for literals, so I'd get rid of them from here. | |
102 | Result.IncerementIndex is always going to be true here... | |
109 | Keep all your error reporting functions together in one place. | |
111 | This "reading file: " context isn't particularly useful, as it prevents this function from being used for non-file errors, e.g. command-line processing errors. Take a look at createFileError as a way of adding the file name to Error. | |
113 | As requested before, please use std::exit | |
118 | Re. AsmInfo etc, I don't know if there's a good way of testing those, so it may not be possible. Try searching through the other tools to see if any of them test them and if so, how they do so. When you say "invalid sections" what do you mean? A section can be invalid in many different ways, and different mechanisms will be needed to test the different cases. | |
151 | static? | |
165 | getElfSymbolType sounds like it belongs in ELF.h or ELFObjectFile.h. collectBBtoAddressLabels probably doesn't belong in ObjectFile.h simply because it doesn't involve any use of ObjectFile, but I can't see a better location, so there's probably fine, or maybe even consider a new header. | |
185 |
This might be a bit brittle - the best option would be to use assembly directly, if possible.