Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes.
Details
Diff Detail
Unit Tests
Event Timeline
Please run clang-format on the patch.
What is the plan for tests here?
tools/llvm-canon/CMakeLists.txt | ||
---|---|---|
9 ↗ | (On Diff #214431) | The filenames generally start from uppercase letter. |
tools/llvm-canon/canonicalizer.cpp | ||
---|---|---|
164 ↗ | (On Diff #214431) | Please use the range-based llvm::sort(Operands) instead of std::sort. |
256 ↗ | (On Diff #214431) | Use llvm::sort. |
285 ↗ | (On Diff #214431) | Ditto. |
366 ↗ | (On Diff #214431) | Ditto. |
467 ↗ | (On Diff #214431) | Use llvm::sort. |
I also recommend that you canonicalize PHI nodes. In past experiments looking for fixed points in the optimization pipeline, this came up as a significant issue. The order of the predecessors in the PHI operand lists don't carry any significance, also also sometimes a predecessor can be listed multiple times (always with the same corresponding value). It's probably best to canonicalize those so each predecessor is listed only once and the blocks appear in their natural order.
At a high level, I'd much rather see the underlying logic under lib/Transforms/Utils, and then we can just run this with opt and we don't need a separate utility.
tools/llvm-canon/canonicalizer.cpp | ||
---|---|---|
1 ↗ | (On Diff #214431) |
|
41 ↗ | (On Diff #214431) | Please remove unnecessary empty line. Same below. |
112 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
125 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
134 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
172 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
251 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
264 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
271 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
293 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
328 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
332 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
416 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
596 ↗ | (On Diff #214431) | Please add // namespace llvm |
598 ↗ | (On Diff #214431) | Please remove unnecessary empty lines. |
tools/llvm-canon/canonicalizer.h | ||
44 ↗ | (On Diff #214431) | Please use default member initialization for this and next members. |
99 ↗ | (On Diff #214431) | Please add // namespace llvm |
101 ↗ | (On Diff #214431) | Please add // LLVM_TOOLS_LLVM_CANON_CANONICALIZER_H |
tools/llvm-canon/llvm-canon.cpp | ||
1 ↗ | (On Diff #214431) | C++ -* is not needed for .cpp files. |
48 ↗ | (On Diff #214431) | Please remove unnecessary empty line. |
81 ↗ | (On Diff #214431) | Please remove unnecessary empty lines. |
Please also add documentation and mention new tool in LLVM documentation and Release Notes.
tools/llvm-canon/canonicalizer.cpp | ||
---|---|---|
88 ↗ | (On Diff #214431) | What is this magic number? Could this be generated with something like srand at the begining of runOnFunction? |
111 ↗ | (On Diff #214431) | This function is a little short and only used in one place, could it be dropped and just have for (auto &I : Outputs) nameInstruction(I); put in its place at the call site? |
129 ↗ | (On Diff #214431) | clang-format should clean up the formatting here. |
154 ↗ | (On Diff #214431) | Remove isa<Value>(OP). Seems redundant. As far as I know, every LLVM object is a Value. |
169 ↗ | (On Diff #214431) | don't think you need -> std::string here. |
175 ↗ | (On Diff #214431) | Again, consider dropping magic numbers. Come up with something else, like setting based on srand() |
185 ↗ | (On Diff #214431) | drop braces |
267 ↗ | (On Diff #214431) | Drop magic number. |
277 ↗ | (On Diff #214431) | Drop the braces here. |
297 ↗ | (On Diff #214431) | Drop braces: // In case of CallInst, consider callee in the instruction name. if (const CallInst *CI = dyn_cast<CallInst>(I)) if (const Function *F = CI->getCalledFunction()) Name.append(F->getName()); |
307 ↗ | (On Diff #214431) | Drop braces. |
334 ↗ | (On Diff #214431) | Drop braces |
351 ↗ | (On Diff #214431) | Drop braces and use ternary: for (auto &OP : I->operands()) if (const Instruction *IOP = dyn_cast<Instruction>(OP)) Operands.push_back( (I->getName().substr(0, 2) == "op" || I->getName().substr(0, 2) == "vl") ? // Regular/initial instruction with canonicalized name. IOP->getName().substr(0, 7)) : // User-named instruction, don't substring. Operands.push_back(IOP->getName()); |
395 ↗ | (On Diff #214431) | Drop braces |
tools/llvm-canon/canonicalizer.h | ||
1 ↗ | (On Diff #214431) | Move to somewhere in llvm/lib/Transforms. Also, run clang-format on this entire file. |
31 ↗ | (On Diff #214431) | Rename to IRCanonicalizerPass |
42 ↗ | (On Diff #214431) | Do you really need both constructors? Why not Canonicalizer() = delete; Canonicalizer(bool renameAll = false, bool foldPreoutputs = false) : ModulePass(ID), renameAll(renameAll), foldPreoutputs(foldPreoutputs) {} |
53 ↗ | (On Diff #214431) | change rename and fold to RenameAll and FoldPreoutputs. They can be named the same for initializers. |
65 ↗ | (On Diff #214431) | Rename to start with upper case. I believe that is still currently the coding standard. https://llvm.org/docs/CodingStandards.html#the-low-level-issues |
65 ↗ | (On Diff #214431) | Change to RenameAll |
67 ↗ | (On Diff #214431) | change to FoldPreoutputs |
tools/llvm-canon/canonicalizer.cpp | ||
---|---|---|
53 ↗ | (On Diff #214431) | you can include "llvm/IR/InstIterator.h" and use for (auto &I : instructions(F)) |
74 ↗ | (On Diff #214431) | Is this comment really necessary? It feels like it is repeating the if statement in the opposite order. |
88 ↗ | (On Diff #214431) | I might be missing something, but don't we want this algorithm to produce the same result if it's run on the "same" (two identical functions, modulo renaming, etc) function twice in a row? |
203 ↗ | (On Diff #214431) | Make this an early return at the very start of the function? This entire method has no effects when this if statement is false, correct? |
237 ↗ | (On Diff #214431) | Please remove unnecessary comment |
tools/llvm-canon/canonicalizer.h | ||
94 ↗ | (On Diff #214431) | High level comment about the header: A lot of functions take function pointers instead of references, and yet they are never check for nullptr. This makes me believe you really wanted them to be references. In fact, if you check how those functions are called, the Instructions were originally references, and before calling the helper functions you are always doing &I |
tools/llvm-canon/canonicalizer.cpp | ||
---|---|---|
235 ↗ | (On Diff #214431) | Isn't this a reference to a pointer? I think you meant auto *OP. The reason I'm saying this is because you use dyn_cast below, and yet dyn_cast doesn't work with references. |
298 ↗ | (On Diff #214431) | Guidelines suggest using auto * when copying pointers. https://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto |
329 ↗ | (On Diff #214431) | typo: flag |
Seems like a great idea!
Could we have an option to only rename without reordering? I have found, in the past, some issue that were order sensitive, but rarely name sensitive, and it would be great to be able to debug those.
Also, in my personal implementation I had missed anonymous types and anonymous global variable, I don't know if we captured those here.
Note that I'm not sure if we can name all metadata or function attribute lists.
First of all, thank you for your valuable feedback!
Sure, will run clang-format!
When it comes to tests, I don't have any plan yet. I am not sure if testing every scenario is the best solution here - the canonicalization techniques may change easily as these are my 'best' approaches.
Can you think of any way how to test it sensibly?
I will take a look at PHI nodes and move the pass to Transforms.
I will add the option to run just naming or reordering - both stages are independent. We also haven't considered anonymous types and anonymous global variables.
I would like to thank everyone for your valuable feedback! I have fixed the code and moved the pass to lib/Tranfroms/Utils. I hope I have correctly integrated the pass with the rest of the LLVM (we should have some checklist for that).
tools/llvm-canon/canonicalizer.cpp | ||
---|---|---|
88 ↗ | (On Diff #214431) | You are right, this number cannot be generated by srand. We want the canonicalization to be deterministic. |
We have been experimenting with various ways of reordering output instructions hoping to add it now, but it looks to be much tougher than we thought. We hope to add it in a next commit.
@hfinkel
Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.
Values are sorted alphabetically according to canonicalized names of corresponding basic blocks.
I am open to suggestions. I would like to ask for a final review of the updated diff. Especially I would like to know if I have integrated the pass with the rest of the LLVM correctly.
Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.
I suppose that you mean that, if passes are introducing duplicates, that's something that we'd rather fix? That might be true. I'm okay with proceeding on this basis. If we need the deduplicating behavior we'll find out.
Yes, this is exactly what I meant . We will see how this works, as you said alternatively we can add that later.
Does the rest of the code look good to you? I will need someone to commit this patch for me (I don't have commit rights).
Gentle ping ;)
I would like to ask someone to commit this for me. I don't have commit rights.
Some initial comments.
In general:
- Don't spell out type if you just used *cast<???>
- Don't drop */& after auto
- Do end files with newline
- Consider small-size optimization. Please try to see if some of these std::string can be replaced with reasonably-sized llvm::SmallString<?>
- Please consider preallocating some strings
- This needs a bit more refactoring i think
docs/Passes.rst | ||
---|---|---|
693 ↗ | (On Diff #216009) | Too many - |
include/llvm/Transforms/Utils/IRCanonicalizer.h | ||
96 ↗ | (On Diff #216009) | Please make sure that files end with newlines |
lib/Transforms/Utils/IRCanonicalizer.cpp | ||
37–45 ↗ | (On Diff #216009) | Should these have defaults? |
57–78 ↗ | (On Diff #216009) | runOnFunction() ? |
73 ↗ | (On Diff #216009) | if(auto *PN = dyn_cast<PHINode>(&I)) |
150–170 ↗ | (On Diff #216009) | This block will result in most of memory nagging in this pass. |
151 ↗ | (On Diff #216009) | Can you make any reasonable guess as to what would be 90'th percentile of Operand string length? |
166 ↗ | (On Diff #216009) | It would be really good to predict+preallocate the size here. |
183 ↗ | (On Diff #216009) | const int& output = ? |
190 ↗ | (On Diff #216009) | auto* CI |
219 ↗ | (On Diff #216009) | Same comments as for the previous function |
263 ↗ | (On Diff #216009) | auto* IOP |
270 ↗ | (On Diff #216009) | const int Code = ? |
277 ↗ | (On Diff #216009) | const auto *CI |
305 ↗ | (On Diff #216009) | auto* |
306 ↗ | (On Diff #216009) | auto* |
331–335 ↗ | (On Diff #216009) | I'm sensing a repetitive pattern. I think you want to refactor it. |
388 ↗ | (On Diff #216009) | auto* |
427–430 ↗ | (On Diff #216009) | llvm::less_first |
504 ↗ | (On Diff #216009) | !I->user_empty() ? |
I don't know the current process but I think you should ask for them and commit it yourself so that you get the credit (and the blame :-P ) for this work.
PL
lib/Transforms/Utils/IRCanonicalizer.cpp | ||
---|---|---|
37–45 ↗ | (On Diff #216009) | They are all false by default, this is why I haven't explicitly stated their value. I don't think this will change in the future. |
57–78 ↗ | (On Diff #216009) | Yes! I don't know why I haven't changed this any earlier. |
166 ↗ | (On Diff #216009) | Changed to standard for-loop and moved to the end of the function. |
331–335 ↗ | (On Diff #216009) | The pattern only repeats for creating the operand list. |
Updated the diff for the new revision, refactored naming functions, accepted suggestions by lebedev.
Thank you for the review! @lebedev.ri
Is the code ready for the mainline?
Just a drive by comment from someone interested in this pass.
docs/Passes.rst | ||
---|---|---|
697 ↗ | (On Diff #222320) | It looks like this sentence is not finished. |
First of all, I am sorry for such a late reply (had many things going on recently). I have updated the patch for the upstream version of the LLVM. Thanks to @aykevl I have corrected the docs/Passes.rst file. Additionaly, I have added a new flag which enables/disables sorting and reordering operands in commutative instructions.
In the meantime, the project has been presented at the LLVM Developers' Meeting 2019 in San Jose. You may want to check out the slides or watch the presentation.
I would like to thank everyone who came to the presentation for all the valuable feedback and support! It was really nice to see you all.
Hopefully, the code looks good now. I would like to ask for further comments and eventual LGTM so the code can be committed to the mainline.
I've got a few nits. Will do a second pass shortly.
llvm/lib/Transforms/Utils/IRCanonicalizer.cpp | ||
---|---|---|
176 | get this to conform to llvm style (ie OutputFootprint) | |
179 | Output as well. | |
235 | nit: auto *IOP | |
327 | nit: HasCanonicalName | |
371 | not: auto *IOP | |
422 | not: auto *VOP | |
440 | nit: Position | |
462 | nit: LHS and RHS | |
518 | for (const auto &OP : I->operands()) if (isa<Instruction>(OP)) return false; // Found non-immediate operand (instruction). | |
549 | unsigned Count = 0; for (const auto &B : *Func) { for (const auto &E : B) { if (&E == I) Outputs.insert(Count); Count++; } } | |
563 | nit: auto *U and auto *UI |
llvm/include/llvm/Transforms/Utils/IRCanonicalizer.h | ||
---|---|---|
1 ↗ | (On Diff #257243) | Does this need to be a separate header? Can the class be contained in a anonymous namespace in the .cpp file (IRCanonicalizer.cpp) like some of the other passes? |
61 ↗ | (On Diff #257243) | Is there any significance of this magic number? Can you either set this by a cl::opt or generated at runtime (maybe something using srand (time(NULL))) ? |
Thank you @plotfi for review! I will update the diff in a second.
llvm/include/llvm/Transforms/Utils/IRCanonicalizer.h | ||
---|---|---|
61 ↗ | (On Diff #257243) | There is no significance in this particular number but it needs to be consistent among all canonicalized modules (so this shouldn't be set by cl::opt or randomly generated). I have used particularly this number since it has been used in many other places in LLVM, for example here. |
llvm/test/Transforms/IRCanonicalizer/reordering-instructions.ll | ||
---|---|---|
6 | consider making the last 3 check lines "CHECK-NEXT" | |
tools/llvm-canon/canonicalizer.cpp | ||
88 ↗ | (On Diff #214431) | Ah yes, that makes a lot of sense. The MIR Canonicalizer ran into the same sort of issue. That's why the value-numbering-esque rewrite of it doesn't hash certain types of MachineOperands that might be different run to run. |
@plotfi Should I create a new review so that the HarborMaster will be able run the builds after the fix?
If updating this review did not trigger it, go ahead and create a new one. Sorry for the late reply @mpaszkowski.
Updated the patch to for the new version of LLVM. Currently the pass still utilizes the legacy pass manager. The pass will be ported to the new pass manager in a separate review.
clang-format: please reformat the code