This is an archive of the discontinued LLVM Phabricator instance.

llvm-canon
Needs ReviewPublic

Authored by mpaszkowski on Aug 9 2019, 1:33 PM.

Details

Summary

Added a new llvm-canon tool which aims to transform LLVM Modules into a canonical form by reordering and renaming instructions while preserving the same semantics. This tool makes it easier to spot semantic differences while diffing two modules which have undergone different transformation passes.

Diff Detail

Event Timeline

mpaszkowski created this revision.Aug 9 2019, 1:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 9 2019, 1:33 PM

Please run clang-format on the patch.
What is the plan for tests here?

tools/llvm-canon/CMakeLists.txt
9 ↗(On Diff #214431)

The filenames generally start from uppercase letter.

mgrang added inline comments.Aug 9 2019, 2:00 PM
tools/llvm-canon/canonicalizer.cpp
164 ↗(On Diff #214431)

Please use the range-based llvm::sort(Operands) instead of std::sort.

See https://llvm.org/docs/CodingStandards.html#beware-of-non-deterministic-sorting-order-of-equal-elements

256 ↗(On Diff #214431)

Use llvm::sort.
Braces are not needed for single statement if's.

285 ↗(On Diff #214431)

Ditto.

366 ↗(On Diff #214431)

Ditto.

467 ↗(On Diff #214431)

Use llvm::sort.

Nicola added a subscriber: Nicola.Aug 9 2019, 2:47 PM
hfinkel added a subscriber: hfinkel.Aug 9 2019, 2:56 PM

I also recommend that you canonicalize PHI nodes. In past experiments looking for fixed points in the optimization pipeline, this came up as a significant issue. The order of the predecessors in the PHI operand lists don't carry any significance, also also sometimes a predecessor can be listed multiple times (always with the same corresponding value). It's probably best to canonicalize those so each predecessor is listed only once and the blocks appear in their natural order.

At a high level, I'd much rather see the underlying logic under lib/Transforms/Utils, and then we can just run this with opt and we don't need a separate utility.

Eugene.Zelenko added inline comments.
tools/llvm-canon/canonicalizer.cpp
1 ↗(On Diff #214431)
  • C++ -* is not needed for .cpp files.
41 ↗(On Diff #214431)

Please remove unnecessary empty line. Same below.

112 ↗(On Diff #214431)

Please remove unnecessary empty line.

125 ↗(On Diff #214431)

Please remove unnecessary empty line.

134 ↗(On Diff #214431)

Please remove unnecessary empty line.

172 ↗(On Diff #214431)

Please remove unnecessary empty line.

251 ↗(On Diff #214431)

Please remove unnecessary empty line.

264 ↗(On Diff #214431)

Please remove unnecessary empty line.

271 ↗(On Diff #214431)

Please remove unnecessary empty line.

293 ↗(On Diff #214431)

Please remove unnecessary empty line.

328 ↗(On Diff #214431)

Please remove unnecessary empty line.

332 ↗(On Diff #214431)

Please remove unnecessary empty line.

416 ↗(On Diff #214431)

Please remove unnecessary empty line.

596 ↗(On Diff #214431)

Please add // namespace llvm

598 ↗(On Diff #214431)

Please remove unnecessary empty lines.

tools/llvm-canon/canonicalizer.h
44 ↗(On Diff #214431)

Please use default member initialization for this and next members.

99 ↗(On Diff #214431)

Please add // namespace llvm

101 ↗(On Diff #214431)

Please add // LLVM_TOOLS_LLVM_CANON_CANONICALIZER_H

tools/llvm-canon/llvm-canon.cpp
1 ↗(On Diff #214431)

C++ -* is not needed for .cpp files.

48 ↗(On Diff #214431)

Please remove unnecessary empty line.

81 ↗(On Diff #214431)

Please remove unnecessary empty lines.

Please also add documentation and mention new tool in LLVM documentation and Release Notes.

plotfi added a subscriber: plotfi.Aug 9 2019, 5:06 PM
plotfi added inline comments.Aug 9 2019, 8:29 PM
tools/llvm-canon/canonicalizer.cpp
88 ↗(On Diff #214431)

What is this magic number? Could this be generated with something like srand at the begining of runOnFunction?

111 ↗(On Diff #214431)

This function is a little short and only used in one place, could it be dropped and just have

for (auto &I : Outputs)
    nameInstruction(I);

put in its place at the call site?

129 ↗(On Diff #214431)

clang-format should clean up the formatting here.

154 ↗(On Diff #214431)

Remove isa<Value>(OP). Seems redundant. As far as I know, every LLVM object is a Value.

169 ↗(On Diff #214431)

don't think you need -> std::string here.

175 ↗(On Diff #214431)

Again, consider dropping magic numbers. Come up with something else, like setting based on srand()

185 ↗(On Diff #214431)

drop braces

267 ↗(On Diff #214431)

Drop magic number.

277 ↗(On Diff #214431)

Drop the braces here.

297 ↗(On Diff #214431)

Drop braces:

// In case of CallInst, consider callee in the instruction name.
if (const CallInst *CI = dyn_cast<CallInst>(I))
    if (const Function *F = CI->getCalledFunction())
        Name.append(F->getName());
307 ↗(On Diff #214431)

Drop braces.

334 ↗(On Diff #214431)

Drop braces

351 ↗(On Diff #214431)

Drop braces and use ternary:

for (auto &OP : I->operands())
    if (const Instruction *IOP = dyn_cast<Instruction>(OP))
            Operands.push_back(
                  (I->getName().substr(0, 2) == "op" || I->getName().substr(0, 2) == "vl") ?
                  // Regular/initial instruction with canonicalized name.
                  IOP->getName().substr(0, 7)) :
                  // User-named instruction, don't substring.
                  Operands.push_back(IOP->getName());
395 ↗(On Diff #214431)

Drop braces

tools/llvm-canon/canonicalizer.h
1 ↗(On Diff #214431)

Move to somewhere in llvm/lib/Transforms.

Also, run clang-format on this entire file.

31 ↗(On Diff #214431)

Rename to IRCanonicalizerPass

42 ↗(On Diff #214431)

Do you really need both constructors? Why not

Canonicalizer() = delete;
Canonicalizer(bool renameAll = false, bool foldPreoutputs = false)
    : ModulePass(ID),
    renameAll(renameAll),
    foldPreoutputs(foldPreoutputs) {}
53 ↗(On Diff #214431)

change rename and fold to RenameAll and FoldPreoutputs. They can be named the same for initializers.

65 ↗(On Diff #214431)

Rename to start with upper case. I believe that is still currently the coding standard.

https://llvm.org/docs/CodingStandards.html#the-low-level-issues

65 ↗(On Diff #214431)

Change to RenameAll

67 ↗(On Diff #214431)

change to FoldPreoutputs

plotfi added a comment.Aug 9 2019, 8:30 PM

Please add lit tests.

plotfi added a subscriber: compnerd.Aug 9 2019, 8:30 PM
fdeazeve added inline comments.
tools/llvm-canon/canonicalizer.cpp
53 ↗(On Diff #214431)

you can include "llvm/IR/InstIterator.h"

and use for (auto &I : instructions(F))

74 ↗(On Diff #214431)

Is this comment really necessary? It feels like it is repeating the if statement in the opposite order.

88 ↗(On Diff #214431)

I might be missing something, but don't we want this algorithm to produce the same result if it's run on the "same" (two identical functions, modulo renaming, etc) function twice in a row?

203 ↗(On Diff #214431)

Make this an early return at the very start of the function? This entire method has no effects when this if statement is false, correct?

237 ↗(On Diff #214431)

Please remove unnecessary comment

tools/llvm-canon/canonicalizer.h
94 ↗(On Diff #214431)

High level comment about the header:

A lot of functions take function pointers instead of references, and yet they are never check for nullptr. This makes me believe you really wanted them to be references. In fact, if you check how those functions are called, the Instructions were originally references, and before calling the helper functions you are always doing &I

fdeazeve added inline comments.Aug 10 2019, 11:03 AM
tools/llvm-canon/canonicalizer.cpp
235 ↗(On Diff #214431)

Isn't this a reference to a pointer? I think you meant auto *OP. The reason I'm saying this is because you use dyn_cast below, and yet dyn_cast doesn't work with references.

298 ↗(On Diff #214431)

Guidelines suggest using auto * when copying pointers. https://llvm.org/docs/CodingStandards.html#beware-unnecessary-copies-with-auto

329 ↗(On Diff #214431)

typo: flag

Seems like a great idea!

Could we have an option to only rename without reordering? I have found, in the past, some issue that were order sensitive, but rarely name sensitive, and it would be great to be able to debug those.
Also, in my personal implementation I had missed anonymous types and anonymous global variable, I don't know if we captured those here.

Note that I'm not sure if we can name all metadata or function attribute lists.

First of all, thank you for your valuable feedback!

Please run clang-format on the patch.
What is the plan for tests here?

Sure, will run clang-format!

When it comes to tests, I don't have any plan yet. I am not sure if testing every scenario is the best solution here - the canonicalization techniques may change easily as these are my 'best' approaches.
Can you think of any way how to test it sensibly?

I also recommend that you canonicalize PHI nodes. In past experiments looking for fixed points in the optimization pipeline, this came up as a significant issue. The order of the predecessors in the PHI operand lists don't carry any significance, also also sometimes a predecessor can be listed multiple times (always with the same corresponding value). It's probably best to canonicalize those so each predecessor is listed only once and the blocks appear in their natural order.

At a high level, I'd much rather see the underlying logic under lib/Transforms/Utils, and then we can just run this with opt and we don't need a separate utility.

I will take a look at PHI nodes and move the pass to Transforms.

Seems like a great idea!

Could we have an option to only rename without reordering? I have found, in the past, some issue that were order sensitive, but rarely name sensitive, and it would be great to be able to debug those.
Also, in my personal implementation I had missed anonymous types and anonymous global variable, I don't know if we captured those here.

Note that I'm not sure if we can name all metadata or function attribute lists.

I will add the option to run just naming or reordering - both stages are independent. We also haven't considered anonymous types and anonymous global variables.

I would like to thank everyone for your valuable feedback! I have fixed the code and moved the pass to lib/Tranfroms/Utils. I hope I have correctly integrated the pass with the rest of the LLVM (we should have some checklist for that).

mpaszkowski added inline comments.Aug 18 2019, 12:46 PM
tools/llvm-canon/canonicalizer.cpp
88 ↗(On Diff #214431)

You are right, this number cannot be generated by srand. We want the canonicalization to be deterministic.

mpaszkowski added a reviewer: hfinkel.
  • PHI node canonicalization
  • Tests
  • Release notes
  • Docs
mpaszkowski added a comment.EditedAug 19 2019, 4:17 PM

We have been experimenting with various ways of reordering output instructions hoping to add it now, but it looks to be much tougher than we thought. We hope to add it in a next commit.

@hfinkel
Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.
Values are sorted alphabetically according to canonicalized names of corresponding basic blocks.

I am open to suggestions. I would like to ask for a final review of the updated diff. Especially I would like to know if I have integrated the pass with the rest of the LLVM correctly.

Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.

I suppose that you mean that, if passes are introducing duplicates, that's something that we'd rather fix? That might be true. I'm okay with proceeding on this basis. If we need the deduplicating behavior we'll find out.

Now the canonicalizer sorts values in PHI nodes. After a discussion, I have decided not to remove duplicates. Those duplicates could come from some other passes and in my opinion, the canonicalizer should make them stand out instead of removing them.

I suppose that you mean that, if passes are introducing duplicates, that's something that we'd rather fix? That might be true. I'm okay with proceeding on this basis. If we need the deduplicating behavior we'll find out.

Yes, this is exactly what I meant . We will see how this works, as you said alternatively we can add that later.

Does the rest of the code look good to you? I will need someone to commit this patch for me (I don't have commit rights).

Looking a lot better.

Gentle ping ;)

I would like to ask someone to commit this for me. I don't have commit rights.

Some initial comments.
In general:

  • Don't spell out type if you just used *cast<???>
  • Don't drop */& after auto
  • Do end files with newline
  • Consider small-size optimization. Please try to see if some of these std::string can be replaced with reasonably-sized llvm::SmallString<?>
  • Please consider preallocating some strings
  • This needs a bit more refactoring i think
docs/Passes.rst
693 ↗(On Diff #216009)

Too many -

include/llvm/Transforms/Utils/IRCanonicalizer.h
96 ↗(On Diff #216009)

Please make sure that files end with newlines

lib/Transforms/Utils/IRCanonicalizer.cpp
37–45 ↗(On Diff #216009)

Should these have defaults?

57–78 ↗(On Diff #216009)

runOnFunction() ?

73 ↗(On Diff #216009)

if(auto *PN = dyn_cast<PHINode>(&I))

150–170 ↗(On Diff #216009)

This block will result in most of memory nagging in this pass.

151 ↗(On Diff #216009)

Can you make any reasonable guess as to what would be 90'th percentile of Operand string length?
Maybe try using SmallString<64>.

166 ↗(On Diff #216009)

It would be really good to predict+preallocate the size here.

183 ↗(On Diff #216009)

const int& output = ?
The type is not clear to me here.

190 ↗(On Diff #216009)

auto* CI

219 ↗(On Diff #216009)

Same comments as for the previous function

263 ↗(On Diff #216009)

auto* IOP

270 ↗(On Diff #216009)

const int Code = ?

277 ↗(On Diff #216009)

const auto *CI

305 ↗(On Diff #216009)

auto*

306 ↗(On Diff #216009)

auto*

331–335 ↗(On Diff #216009)

I'm sensing a repetitive pattern. I think you want to refactor it.

388 ↗(On Diff #216009)

auto*

427–430 ↗(On Diff #216009)

llvm::less_first

504 ↗(On Diff #216009)

!I->user_empty() ?

Gentle ping ;)

I would like to ask someone to commit this for me. I don't have commit rights.

I don't know the current process but I think you should ask for them and commit it yourself so that you get the credit (and the blame :-P ) for this work.

PL

mpaszkowski marked 16 inline comments as done.Sep 29 2019, 5:35 AM
mpaszkowski added inline comments.
lib/Transforms/Utils/IRCanonicalizer.cpp
37–45 ↗(On Diff #216009)

They are all false by default, this is why I haven't explicitly stated their value. I don't think this will change in the future.

57–78 ↗(On Diff #216009)

Yes! I don't know why I haven't changed this any earlier.

166 ↗(On Diff #216009)

Changed to standard for-loop and moved to the end of the function.

331–335 ↗(On Diff #216009)

The pattern only repeats for creating the operand list.

mpaszkowski marked 2 inline comments as done.

Updated the diff for the new revision, refactored naming functions, accepted suggestions by lebedev.
Thank you for the review! @lebedev.ri
Is the code ready for the mainline?

aykevl added a subscriber: aykevl.Apr 9 2020, 2:43 PM

Just a drive by comment from someone interested in this pass.

docs/Passes.rst
697 ↗(On Diff #222320)

It looks like this sentence is not finished.

mpaszkowski marked an inline comment as done.Apr 14 2020, 2:35 AM

Thanks, will fix that!

mpaszkowski added a reviewer: plotfi.

First of all, I am sorry for such a late reply (had many things going on recently). I have updated the patch for the upstream version of the LLVM. Thanks to @aykevl I have corrected the docs/Passes.rst file. Additionaly, I have added a new flag which enables/disables sorting and reordering operands in commutative instructions.

In the meantime, the project has been presented at the LLVM Developers' Meeting 2019 in San Jose. You may want to check out the slides or watch the presentation.

I would like to thank everyone who came to the presentation for all the valuable feedback and support! It was really nice to see you all.

Hopefully, the code looks good now. I would like to ask for further comments and eventual LGTM so the code can be committed to the mainline.

Gentle ping! Is the code ready for the mainline?
Could you @plotfi take a look?

I've got a few nits. Will do a second pass shortly.

llvm/lib/Transforms/Utils/IRCanonicalizer.cpp
176

get this to conform to llvm style (ie OutputFootprint)

179

Output as well.

235

nit: auto *IOP

327

nit: HasCanonicalName

371

not: auto *IOP

422

not: auto *VOP

440

nit: Position

462

nit: LHS and RHS

518
for (const auto &OP : I->operands())
  if (isa<Instruction>(OP))
    return false; // Found non-immediate operand (instruction).
549
unsigned Count = 0;
for (const auto &B : *Func) {
  for (const auto &E : B) {
    if (&E == I)
      Outputs.insert(Count);
    Count++;
  }
}
563

nit: auto *U and auto *UI

Please update the diff so the new HarborMaster setup can run some tests on it :-)

plotfi added inline comments.Apr 25 2020, 9:08 PM
llvm/include/llvm/Transforms/Utils/IRCanonicalizer.h
1 ↗(On Diff #257243)

Does this need to be a separate header? Can the class be contained in a anonymous namespace in the .cpp file (IRCanonicalizer.cpp) like some of the other passes?

61 ↗(On Diff #257243)

Is there any significance of this magic number? Can you either set this by a cl::opt or generated at runtime (maybe something using srand (time(NULL))) ?

plotfi added a comment.May 1 2020, 6:47 PM

Ping? Any Update on this? I think this is close to an LGTM.

mpaszkowski marked 12 inline comments as done.May 2 2020, 1:23 PM

Thank you @plotfi for review! I will update the diff in a second.

llvm/include/llvm/Transforms/Utils/IRCanonicalizer.h
61 ↗(On Diff #257243)

There is no significance in this particular number but it needs to be consistent among all canonicalized modules (so this shouldn't be set by cl::opt or randomly generated). I have used particularly this number since it has been used in many other places in LLVM, for example here.

Updated the diff with suggestions from @plotfi.

How do we get HarborMaster to run tests on the patch?

plotfi added a comment.May 9 2020, 7:04 PM

How do we get HarborMaster to run tests on the patch?

I thought it just does it. Strange. Maybe because this is an older diff?

plotfi added inline comments.May 9 2020, 7:08 PM
llvm/test/Transforms/IRCanonicalizer/reordering-instructions.ll
6

consider making the last 3 check lines "CHECK-NEXT"

tools/llvm-canon/canonicalizer.cpp
88 ↗(On Diff #214431)

Ah yes, that makes a lot of sense. The MIR Canonicalizer ran into the same sort of issue. That's why the value-numbering-esque rewrite of it doesn't hash certain types of MachineOperands that might be different run to run.

plotfi added a comment.May 9 2020, 7:08 PM

Nice work. I think this LGTM.

Nice work. I think this LGTM.

Thank you @plotfi ! I will update one of the tests and request commit access :)

This revision was not accepted when it landed; it landed in state Needs Review.May 23 2020, 4:12 AM
This revision was automatically updated to reflect the committed changes.

@plotfi Should I create a new review so that the HarborMaster will be able run the builds after the fix?

@plotfi Should I create a new review so that the HarborMaster will be able run the builds after the fix?

If updating this review did not trigger it, go ahead and create a new one. Sorry for the late reply @mpaszkowski.

Sorry, I will update the review and commit again at the end of next week.

uenoku added a subscriber: uenoku.Sep 11 2020, 11:29 AM

Updated the patch to for the new version of LLVM. Currently the pass still utilizes the legacy pass manager. The pass will be ported to the new pass manager in a separate review.