Page MenuHomePhabricator

[DomTree] Replace ChildrenGetter with GraphTraits over GraphDiff.
ClosedPublic

Authored by asbirlea on Apr 2 2020, 3:53 PM.

Details

Summary

This replaces the logic inChildrenGetter inside the DominatorTree with
GraphTraits over a GraphDiff object, an object which encapsulated the
view of the previous CFG.

The simplification step removing the extensions in clang which use DominatorTree and also filters for nullptr are deferred to a subsequent cleanup patch.

Diff Detail

Event Timeline

asbirlea created this revision.Apr 2 2020, 3:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2020, 3:53 PM

I sent this out to get some general feedback, but I'd like to be able to simplify the fact that this is replacing a for over ChildrenGetter with a for over children<type>(pair), plus 5 lines to set up the type and pair; one simplification is around getting the type inside children (GraphDiffBBPair) and another around having the BUI or not; for the latter I'm considering updating a few places such that BUI is never nullptr.

Couple of small nits, but I'll leave most of the review to someoen else here - I /think/ it's beyond my context/experience (but if necessary, poke me and I can look more/ask more questions/etc)

llvm/include/llvm/Support/GenericDomTreeConstruction.h
340–341

No need for make_pair if you're going to specify the types on the LHS, could write this as:

std::pair<...> GDNodePair(GDTmp, N);

(similarly two instances further down in this patch - make_pair or explicit type, not both)

Or you could use auto on the LHS given the types of the two parameters seem close enough that it's not too surprising what make_pair produces, maybe even just roll it all in together:

return !empty(children<GraphDiffBBPair>(std::make_pair(GDTmp, N)));
342–346

Probably write this as (if I'm understanding the code/intent correctly):

return !empty(children<GraphDiffBBPair>(GDNodePair));
kuhar added inline comments.Apr 2 2020, 7:54 PM
llvm/include/llvm/IR/CFGDiff.h
199 ↗(On Diff #254638)

You can use two helper functions taking integral_constant<bool, true/false> tags -- should be less verbose

llvm/include/llvm/Support/GenericDomTreeConstruction.h
89

nit: Why pointer instead of keeping a reference?

202

nit: use using instead of typefef.

203

nit: can we use std::conditional_t<...> in c++14 to get rid of typename and ::type? Or does it make some buildbots unhappy?

204

nit: I don't find this name very informative. Maybe something like DirectedBB?

208

std::pair<const GraphDiffT *, NodePtr> GDNodePair(GDTmp, BB)

210

Won't children infer the template parameter based on the passes value? I don't get why the template argument type is a pair where the second argument is a directed nodeptr, but the runtime value is always a plain nodeptr. Couldn't the second pair type also be directed for GDNodePair?

211

Could this new code be a hoisted into a helper function?

1121

Could you add a comment next to the nullptr with the argument name?

1519

Does this care about the direction (pre- or post-) at all, or does it need some CFG view? Why is pre-view incorrect?

kuhar added inline comments.Apr 2 2020, 7:58 PM
llvm/include/llvm/Support/GenericDomTreeConstruction.h
211

Or alternatively, could the old ChildrenGetter be implemented with these 5 magic lines?

asbirlea updated this revision to Diff 254943.Apr 3 2020, 4:33 PM
asbirlea marked 16 inline comments as done.

Address comments.

llvm/include/llvm/Support/GenericDomTreeConstruction.h
210

The directed nodeptr is the equivalent of a bool inside GraphDiff translating to what kind of children do you want; the second pair argument bears no info of that kind, only the actual NodePtr for which we desire the children.

211

I think it looks much cleaner after the latest update, let me know what you think.

1519

The purpose of this method was to update the DT assuming an additional list of updates (the argument given). In practice, the BUI argument is ignored in the CalculateFromScratch call below. Hence we need the additional changes to support this kind of updates inside CalculateFromScratch and the other methods.

kuhar added a comment.Apr 5 2020, 7:28 PM

Address comments.

Thanks for the changes and explanations. It think with a few more tweaks this will be a good refactoring step towards phasing out BUI.

llvm/include/llvm/IR/CFGDiff.h
198 ↗(On Diff #254943)

What benefit does an anonymous namespace in a header have over a named one, e.g., detail/impl? Doesn't it make it more difficult to deduplicate symbols across different translation units? Not sure what the best practices are in C++ now in this area.

llvm/include/llvm/Support/GenericDomTreeConstruction.h
339

This pattern repeats a few times. Could it be pushed into a helper function to get something like this?

return !empty(children<GraphDiffNodePair>(GetGraphDiffNodePair(BUI, N)));
kuhar added inline comments.Apr 5 2020, 7:31 PM
dblaikie added inline comments.Apr 5 2020, 11:02 PM
llvm/include/llvm/IR/CFGDiff.h
198 ↗(On Diff #254943)

Oh, yeah, sorry I didn't spot that - yeah, anonymous namespaces in header files are a no-go. This should be a detail (detail outnumbers impl 361 to 41 in the LLVM project) namespace within the llvm namespace as suggested.

asbirlea marked 4 inline comments as done.Apr 7 2020, 4:28 PM
asbirlea added inline comments.
llvm/include/llvm/Support/GenericDomTreeConstruction.h
339

My dilemma here is how to not allocate a GraphDiffT instance. There are 4 cases where the pattern is inside a static method and once in a class method. I used a stack var for the 4 cases and a unique_ptr for the class method.

Sure, I can do:

static auto getGDNPair(BUI, EmptyGD, N) {
return std::make_pair<>(BUI ? BUI->PreViewCFG : EmptyGD, N);
}
{
...
GraphDiffT EmptyGD;
return !empty(children<GraphDiffNodePair>(getGDNPair(BUI, &EmptyGD, N)));
}

But it felt like moving one line at the callsite to a oneline helper is not making things much better.

I was hoping for something more along the lines of:

return !empty(children<GraphDiffNodePair>({getGD(BUI), N});

Or, ensure BUI is always non-null, and simplify to:

return !empty(children<GraphDiffNodePair>({BUI->PreViewCFG, N});

Thoughts?

asbirlea updated this revision to Diff 255852.Apr 7 2020, 4:29 PM
asbirlea marked an inline comment as done.

Name anonymous namespace.

kuhar accepted this revision.Apr 8 2020, 7:41 PM

Looks good to me overall. I don't want to block it over the cosmetic issues like allocating the empty GD object.

llvm/include/llvm/Support/GenericDomTreeConstruction.h
339

Does allocating a GD have a big cost?

I think you could get away with returning a temporary GD object that would die as soon as the expression evaluation ends -- should be no different than placing it on the stack just before the function call.

Overall, I still don't understand why you try to avoid creating a wrapper function / struct that returns children, and call childredn<...>(...) directly. Either way, I being able to write:

return !empty(children<GraphDiffNodePair>({getGD(BUI), N});

or

return !empty(getChildren<GraphDiffNodePair>(BUI, N));

would definitely be concise and readable enough IMO.

This revision is now accepted and ready to land.Apr 8 2020, 7:41 PM
asbirlea marked 2 inline comments as done.Apr 9 2020, 5:37 PM

Thank you for all the comments.

llvm/include/llvm/Support/GenericDomTreeConstruction.h
339

Does allocating a GD have a big cost?

AFAIU, dynamically allocating the object does, vs having a stack variable.

I'll do a follow-up cleanup attempt.

This revision was automatically updated to reflect the committed changes.

It looks like this broke the windows lldb buildbot:

lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/15542

It already had a test failure, so you probably didn’t get the email.

@asbirlea This has broken the MLIR build. Could you please build with -DLLVM_ENABLE_PROJECTS=mlir?

FAILED: tools/mlir/lib/Analysis/CMakeFiles/MLIRAnalysis.dir/Dominance.cpp.o 
/usr/lib64/ccache/clang++  -DGTEST_HAS_RTTI=0 -DMLIR_CUDA_CONVERSIONS_ENABLED=1 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Itools/mlir/lib/Analysis -I/home/uday/llvm-project/mlir/lib/Analysis -I/usr/include/libxml2 -Iinclude -I/home/uday/llvm-project/llvm/include -I/home/uday/llvm-project/mlir/include -Itools/mlir/include -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O2     -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT tools/mlir/lib/Analysis/CMakeFiles/MLIRAnalysis.dir/Dominance.cpp.o -MF tools/mlir/lib/Analysis/CMakeFiles/MLIRAnalysis.dir/Dominance.cpp.o.d -o tools/mlir/lib/Analysis/CMakeFiles/MLIRAnalysis.dir/Dominance.cpp.o -c /home/uday/llvm-project/mlir/lib/Analysis/Dominance.cpp
In file included from /home/uday/llvm-project/mlir/lib/Analysis/Dominance.cpp:14:
In file included from /home/uday/llvm-project/mlir/include/mlir/Analysis/Dominance.h:12:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/RegionGraphTraits.h:18:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/Region.h:16:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/Block.h:16:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/BlockSupport.h:16:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/Value.h:16:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/Types.h:12:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/TypeSupport.h:17:
In file included from /home/uday/llvm-project/mlir/include/mlir/IR/StorageUniquerSupport.h:17:
In file included from /home/uday/llvm-project/mlir/include/mlir/Support/STLExtras.h:18:
In file included from /home/uday/llvm-project/llvm/include/llvm/ADT/STLExtras.h:20:
/home/uday/llvm-project/llvm/include/llvm/ADT/iterator.h:366:17: error: indirection requires pointer operand ('const mlir::PredecessorIterator' invalid)
    NR.second = *this->I;
mehdi_amini added inline comments.
llvm/include/llvm/Support/GenericDomTree.h
31

This looks like a layering violation to me here: I wouldn't expect a generic utility in Support to introduce a dependency on IR.

I speculatively reverted in https://github.com/llvm/llvm-project/commit/57d2d48399b63c0ef1dda490fdaf28efbb80c2c0 while we can figure it out.

@asbirlea This has broken the MLIR build. Could you please build with -DLLVM_ENABLE_PROJECTS=mlir?

If you use arcanist, you get pre-merge testing on Phabricator :)

dblaikie added inline comments.Apr 10 2020, 12:15 AM
llvm/include/llvm/Support/GenericDomTree.h
31

Yep - agreed on the layering violation. Thanks for the catch/revert!

I /think/ CFGDiff.h is independent of IR & shouldn't include IR/CFG.h and then can move into Support. Ah, yeah, when I did 30804d0a3fb23325c4b455108e59d00213b83891 & related work, that's what made it independent of IR.

Moved it over to Support in a838aadae3f20b9644e2ad553d3d558a91fd8fd7

mehdi_amini added inline comments.Apr 10 2020, 12:39 AM
llvm/include/llvm/Support/GenericDomTree.h
31
mehdi_amini added inline comments.Apr 10 2020, 12:47 AM
llvm/include/llvm/Support/GenericDomTree.h
31

Actually reverted again as it breaks the MLIR build in a different way, I won't be able to look into this in more details right now.

(I ran ninja check but I forgot that check is not check-all...)

nikic added a subscriber: nikic.Apr 10 2020, 1:36 AM

This change causes a 2.5% compile-time regression across the board, see http://llvm-compile-time-tracker.com/compare.php?from=37bcf2df01cfa47e4509a5d225a23e2ca95005e6&to=a90374988e4eb8c50d91e11f4e61cdbd5debb235&stat=instructions. Can you please try to find a cheaper way to approach this issue?

asbirlea reopened this revision.Apr 30 2020, 5:28 PM

I'm working to update this revision to address the compile-time regression.
Re-opening so I can rebase and add misc fixes and will mark as changes planned until I address the compile-time issue.

This revision is now accepted and ready to land.Apr 30 2020, 5:28 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2020, 5:28 PM
asbirlea planned changes to this revision.Apr 30 2020, 5:30 PM
asbirlea updated this revision to Diff 276890.Jul 9 2020, 6:23 PM

Updated to include the part of the patch that's moving the Updates to a CFGDiff object. Splitting off from the clean-up work merging the two branches when BUI is null.
This patch does not exhibit the compile-time regression which caused it to be reverted previously.

This revision is now accepted and ready to land.Jul 9 2020, 6:23 PM
asbirlea edited the summary of this revision. (Show Details)Jul 9 2020, 6:26 PM
asbirlea added a subscriber: nhaehnle.
asbirlea updated this revision to Diff 276891.Jul 9 2020, 6:29 PM

Nit: re-add consts

Thank you for the testing. Could you help with with instructions on how to run the tracker myself?
My local testing showed a negligible regression for mafft and a negligible improvement on other benchmarks, so it looked like noise on average.

nikic added a comment.Jul 10 2020, 1:03 PM

Thank you for the testing. Could you help with with instructions on how to run the tracker myself?
My local testing showed a negligible regression for mafft and a negligible improvement on other benchmarks, so it looked like noise on average.

The tracker just compiles test-suite under perf stat using the cached cmake configs. If I pick out the file with the largest regression (mafft constants.c in ReleaseThinLTO config with 18% regression) I can reproduce this locally as follows:

perf stat /home/nikic/llvm-project/build/bin/clang -DNDEBUG  -O3 -fomit-frame-pointer -flto=thin -DNDEBUG   -w -Werror=date-time -DLLVM -MD -MT MultiSource/Benchmarks/mafft/CMakeFiles/pairlocalalign.dir/constants.c.o -MF MultiSource/Benchmarks/mafft/CMakeFiles/pairlocalalign.dir/constants.c.o.d -o MultiSource/Benchmarks/mafft/CMakeFiles/pairlocalalign.dir/constants.c.o   -c ../MultiSource/Benchmarks/mafft/constants.c

This gives me 3.5M instructions before and 4.2M instructions after. Those particular numbers are for an assertion-enabled build (the numbers on the website are without assertions.)

asbirlea updated this revision to Diff 280326.Jul 23 2020, 10:12 PM

The increase in number of instructions and cycles was caused by reversing the order in which the updates are applied by GraphDiff.
I'll look into re-adding some of the original cleanups in a follow-up patch.

The increase in number of instructions and cycles was caused by reversing the order in which the updates are applied by GraphDiff.
I'll look into re-adding some of the original cleanups in a follow-up patch.

Fascinating - and you were able to reproduce this by just reversing the order in the old code (using ChildrenGetter/building a vector and returning it - not using all the iterator adapters, etc) and that the performance problem was no longer present in the new code if you changed its order to match?

Huh. (again, sorry this has been such a slog :/)

Yes. The difference in perf in instructions and cycles count is due to reversing the update order.
I'm still not seeing meaningful changes in wall-time even with the update order reversed, but reliable wall-time testing is hard.

I do have changes that

  • drop the GraphTraits usage entirely in favor of the getChildren() method in GraphDiff
  • refactor the hashmaps in GraphDiff

but I plan to send those separately.

nikic added a comment.Jul 24 2020, 2:16 PM

Numbers for the newest version: https://llvm-compile-time-tracker.com/compare.php?from=183342c0a9850e60dd7a004b651c83dfb3a7d25e&to=eabcf534fe8760d14c95b40f07c61450c819d643&stat=instructions

This is now a geomean 0.2% regressions, and I don't see any large outliers for individual files either (everything is below 2%). So this should be fine now, at least where compile-time impact is concerned :)

Great, thank you for confirming :-)

This revision was automatically updated to reflect the committed changes.