Page MenuHomePhabricator

llvm-canon
Needs ReviewPublic

Authored by mpaszkowski on Aug 9 2019, 1:33 PM.

Details

Summary

Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Eugene.Zelenko added inline comments.Aug 9 2019, 4:42 PM
tools/llvm-canon/canonicalizer.cpp
1 ↗(On Diff #214431)
  • C++ -* is not needed for .cpp files.
41 ↗(On Diff #214431)

Please remove unnecessary empty line. Same below.

112 ↗(On Diff #214431)

Please remove unnecessary empty line.

125 ↗(On Diff #214431)

Please remove unnecessary empty line.

134 ↗(On Diff #214431)

Please remove unnecessary empty line.

172 ↗(On Diff #214431)

Please remove unnecessary empty line.

251 ↗(On Diff #214431)

Please remove unnecessary empty line.

264 ↗(On Diff #214431)

Please remove unnecessary empty line.

271 ↗(On Diff #214431)

Please remove unnecessary empty line.

293 ↗(On Diff #214431)

Please remove unnecessary empty line.

328 ↗(On Diff #214431)

Please remove unnecessary empty line.

332 ↗(On Diff #214431)

Please remove unnecessary empty line.

416 ↗(On Diff #214431)

Please remove unnecessary empty line.

596 ↗(On Diff #214431)

Please add // namespace llvm

598 ↗(On Diff #214431)

Please remove unnecessary empty lines.

tools/llvm-canon/canonicalizer.h
101 ↗(On Diff #214431)

Please add // LLVM_TOOLS_LLVM_CANON_CANONICALIZER_H

tools/llvm-canon/llvm-canon.cpp
1 ↗(On Diff #214431)

C++ -* is not needed for .cpp files.

48 ↗(On Diff #214431)

Please remove unnecessary empty line.

81 ↗(On Diff #214431)

Please remove unnecessary empty lines.

Please also add documentation and mention new tool in LLVM documentation and Release Notes.

plotfi added a subscriber: plotfi.Aug 9 2019, 5:06 PM
plotfi added inline comments.Aug 9 2019, 8:29 PM
tools/llvm-canon/canonicalizer.cpp
88 ↗(On Diff #214431)

What is this magic number? Could this be generated with something like srand at the begining of runOnFunction?

111 ↗(On Diff #214431)

This function is a little short and only used in one place, could it be dropped and just have

for (auto &I : Outputs)
    nameInstruction(I);

put in its place at the call site?

129 ↗(On Diff #214431)

clang-format should clean up the formatting here.

154 ↗(On Diff #214431)

Remove isa<Value>(OP). Seems redundant. As far as I know, every LLVM object is a Value.

169 ↗(On Diff #214431)

don't think you need -> std::string here.

175 ↗(On Diff #214431)

Again, consider dropping magic numbers. Come up with something else, like setting based on srand()

185 ↗(On Diff #214431)

drop braces

267 ↗(On Diff #214431)

Drop magic number.

277 ↗(On Diff #214431)

Drop the braces here.

297 ↗(On Diff #214431)

Drop braces:

// In case of CallInst, consider callee in the instruction name.
if (const CallInst *CI = dyn_cast<CallInst>(I))
    if (const Function *F = CI->getCalledFunction())
        Name.append(F->getName());
307 ↗(On Diff #214431)

Drop braces.

334 ↗(On Diff #214431)

Drop braces

351 ↗(On Diff #214431)

Drop braces and use ternary:

for (auto &OP : I->operands())
    if (const Instruction *IOP = dyn_cast<Instruction>(OP))
            Operands.push_back(
                  (I->getName().substr(0, 2) == "op" || I->getName().substr(0, 2) == "vl") ?
                  // Regular/initial instruction with canonicalized name.
                  IOP->getName().substr(0, 7)) :
                  // User-named instruction, don't substring.
                  Operands.push_back(IOP->getName());
395 ↗(On Diff #214431)

Drop braces

tools/llvm-canon/canonicalizer.h
1 ↗(On Diff #214431)

Move to somewhere in llvm/lib/Transforms.

Also, run clang-format on this entire file.

31 ↗(On Diff #214431)

Rename to IRCanonicalizerPass

42 ↗(On Diff #214431)

Do you really need both constructors? Why not

Canonicalizer() = delete;
Canonicalizer(bool renameAll = false, bool foldPreoutputs = false)
    : ModulePass(ID),
    renameAll(renameAll),
    foldPreoutputs(foldPreoutputs) {}
53 ↗(On Diff #214431)

change rename and fold to RenameAll and FoldPreoutputs. They can be named the same for initializers.

65 ↗(On Diff #214431)

Rename to start with upper case. I believe that is still currently the coding standard.

https://llvm.org/docs/CodingStandards.html#the-low-level-issues

65 ↗(On Diff #214431)

Change to RenameAll

67 ↗(On Diff #214431)

change to FoldPreoutputs

plotfi added a comment.Aug 9 2019, 8:30 PM

Please add lit tests.

plotfi added a subscriber: compnerd.Aug 9 2019, 8:30 PM
fdeazeve added inline comments.
tools/llvm-canon/canonicalizer.cpp
53 ↗(On Diff #214431)

you can include "llvm/IR/InstIterator.h"

and use for (auto &I : instructions(F))

74 ↗(On Diff #214431)

Is this comment really necessary? It feels like it is repeating the if statement in the opposite order.

88 ↗(On Diff #214431)

I might be missing something, but don't we want this algorithm to produce the same result if it's run on the "same" (two identical functions, modulo renaming, etc) function twice in a row?

203 ↗(On Diff #214431)

Make this an early return at the very start of the function? This entire method has no effects when this if statement is false, correct?

235 ↗(On Diff #214431)

Isn't this a reference to a pointer? I think you meant auto *OP. The reason I'm saying this is because you use dyn_cast below, and yet dyn_cast doesn't work with references.

237 ↗(On Diff #214431)

Please remove unnecessary comment

298 ↗(On Diff #214431)

Guidelines suggest using auto * when copying pointers. https://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto

329 ↗(On Diff #214431)

typo: flag

tools/llvm-canon/canonicalizer.h
94 ↗(On Diff #214431)

High level comment about the header:

A lot of functions take function pointers instead of references, and yet they are never check for nullptr. This makes me believe you really wanted them to be references. In fact, if you check how those functions are called, the Instructions were originally references, and before calling the helper functions you are always doing &I

Seems like a great idea!

Could we have an option to only rename without reordering? I have found, in the past, some issue that were order sensitive, but rarely name sensitive, and it would be great to be able to debug those.
Also, in my personal implementation I had missed anonymous types and anonymous global variable, I don't know if we captured those here.

Note that I'm not sure if we can name all metadata or function attribute lists.

First of all, thank you for your valuable feedback!

Please run clang-format on the patch.
What is the plan for tests here?

Sure, will run clang-format!

When it comes to tests, I don't have any plan yet. I am not sure if testing every scenario is the best solution here - the canonicalization techniques may change easily as these are my 'best' approaches.
Can you think of any way how to test it sensibly?

I also recommend that you canonicalize PHI nodes. In past experiments looking for fixed points in the optimization pipeline, this came up as a significant issue. The order of the predecessors in the PHI operand lists don't carry any significance, also also sometimes a predecessor can be listed multiple times (always with the same corresponding value). It's probably best to canonicalize those so each predecessor is listed only once and the blocks appear in their natural order.

At a high level, I'd much rather see the underlying logic under lib/Transforms/Utils, and then we can just run this with opt and we don't need a separate utility.

I will take a look at PHI nodes and move the pass to Transforms.

Seems like a great idea!

Could we have an option to only rename without reordering? I have found, in the past, some issue that were order sensitive, but rarely name sensitive, and it would be great to be able to debug those.
Also, in my personal implementation I had missed anonymous types and anonymous global variable, I don't know if we captured those here.

Note that I'm not sure if we can name all metadata or function attribute lists.

I will add the option to run just naming or reordering - both stages are independent. We also haven't considered anonymous types and anonymous global variables.

I would like to thank everyone for your valuable feedback! I have fixed the code and moved the pass to lib/Tranfroms/Utils. I hope I have correctly integrated the pass with the rest of the LLVM (we should have some checklist for that).

mpaszkowski added inline comments.Aug 18 2019, 12:46 PM
tools/llvm-canon/canonicalizer.cpp
88 ↗(On Diff #214431)

You are right, this number cannot be generated by srand. We want the canonicalization to be deterministic.

mpaszkowski added a reviewer: hfinkel.
  • PHI node canonicalization
  • Tests
  • Release notes
  • Docs
mpaszkowski added a comment.EditedAug 19 2019, 4:17 PM

We have been experimenting with various ways of reordering output instructions hoping to add it now, but it looks to be much tougher than we thought. We hope to add it in a next commit.

@hfinkel
Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.
Values are sorted alphabetically according to canonicalized names of corresponding basic blocks.

I am open to suggestions. I would like to ask for a final review of the updated diff. Especially I would like to know if I have integrated the pass with the rest of the LLVM correctly.

Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.

I suppose that you mean that, if passes are introducing duplicates, that's something that we'd rather fix? That might be true. I'm okay with proceeding on this basis. If we need the deduplicating behavior we'll find out.

Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.

I suppose that you mean that, if passes are introducing duplicates, that's something that we'd rather fix? That might be true. I'm okay with proceeding on this basis. If we need the deduplicating behavior we'll find out.

Yes, this is exactly what I meant . We will see how this works, as you said alternatively we can add that later.

Does the rest of the code look good to you? I will need someone to commit this patch for me (I don't have commit rights).

Looking a lot better.

Gentle ping ;)

I would like to ask someone to commit this for me. I don't have commit rights.

Some initial comments.
In general:

  • Don't spell out type if you just used *cast<???>
  • Don't drop */& after auto
  • Do end files with newline
  • Consider small-size optimization. Please try to see if some of these std::string can be replaced with reasonably-sized llvm::SmallString<?>
  • Please consider preallocating some strings
  • This needs a bit more refactoring i think
docs/Passes.rst
693

Too many -

include/llvm/Transforms/Utils/IRCanonicalizer.h
96

Please make sure that files end with newlines

lib/Transforms/Utils/IRCanonicalizer.cpp
38–46

Should these have defaults?

58–79

runOnFunction() ?

74

if(auto *PN = dyn_cast<PHINode>(&I))

151–171

This block will result in most of memory nagging in this pass.

152

Can you make any reasonable guess as to what would be 90'th percentile of Operand string length?
Maybe try using SmallString<64>.

167

It would be really good to predict+preallocate the size here.

184

const int& output = ?
The type is not clear to me here.

191

auto* CI

220

Same comments as for the previous function

264

auto* IOP

271

const int Code = ?

278

const auto *CI

306

auto*

307

auto*

332–336

I'm sensing a repetitive pattern. I think you want to refactor it.

389

auto*

428–431

llvm::less_first

505

!I->user_empty() ?

Gentle ping ;)

I would like to ask someone to commit this for me. I don't have commit rights.

I don't know the current process but I think you should ask for them and commit it yourself so that you get the credit (and the blame :-P ) for this work.

PL

mpaszkowski marked 16 inline comments as done.Sun, Sep 29, 5:35 AM
mpaszkowski added inline comments.
lib/Transforms/Utils/IRCanonicalizer.cpp
38–46

They are all false by default, this is why I haven't explicitly stated their value. I don't think this will change in the future.

58–79

Yes! I don't know why I haven't changed this any earlier.

167

Changed to standard for-loop and moved to the end of the function.

332–336

The pattern only repeats for creating the operand list.

mpaszkowski marked 2 inline comments as done.

Updated the diff for the new revision, refactored naming functions, accepted suggestions by lebedev.
Thank you for the review! @lebedev.ri
Is the code ready for the mainline?