This is an archive of the discontinued LLVM Phabricator instance.

[JumpThreading] Preservation of DT and LVI across the pass
ClosedPublic

Authored by brzycki on Nov 16 2017, 12:38 PM.

Details

Summary

See D37528 for a previous (non-deferred) version of this patch and its description.

Preserves dominance in a deferred manner using a new class DeferredDominance. This reduces the performance impact of updating the DominatorTree at every edge insertion and deletion. A user may call DDT->flush() within JumpThreading for an up-to-date DT. This patch currently has one flush() at the end of runImpl() to ensure DT is preserved across the pass.

LVI is also preserved to help subsequent passes such as CorrelatedValuePropagation. LVI is simpler to maintain and is done immediately (not deferred). The code to perfom the preversation was minimally altered and was simply marked as preserved for the PassManager to be informed.

This extends the analysis available to JumpThreading for future enhancements. One example is loop boundary threading.

Diff Detail

Event Timeline

brzycki created this revision.Nov 16 2017, 12:38 PM
brzycki updated this revision to Diff 125642.Dec 5 2017, 3:51 PM

This version passes make check and test-suite runs. It adds about 0.7-1.5% overhead to JumpThreading depending on the test case (and the run taken).

brzycki updated this revision to Diff 125749.Dec 6 2017, 9:19 AM
brzycki retitled this revision from [JumpThreading] (WIP) Deferred preservation of DT and LVI across the pass. to [JumpThreading] Deferred preservation of DT and LVI across the pass..
brzycki edited the summary of this revision. (Show Details)

Comment and whitespace cleanup. NFC.

brzycki updated this revision to Diff 125778.Dec 6 2017, 11:49 AM

Ran lvi-tristate.ll through opt -metarenamer for consistency with the rest of LLVM. Also did another round of whitespace and comment fixes.

sebpop accepted this revision.Dec 6 2017, 7:54 PM

The patch looks good to me.
Please address the few inline comments and commit.

Thanks!

llvm/lib/Transforms/Scalar/JumpThreading.cpp
2037

Please add braces around the else clause.

llvm/lib/Transforms/Utils/Local.cpp
1868–1872

Please move the ++I back in the body of the for loop and remove --I.

This revision is now accepted and ready to land.Dec 6 2017, 7:54 PM
kuhar edited edge metadata.Dec 7 2017, 9:59 AM

Great work! I don't have time to check the jump-threading, but the dominance part looks solid. I only found a couple of issues.

llvm/include/llvm/IR/DeferredDominance.h
53 ↗(On Diff #125778)

Can the DT ever be null within the class? If not, maybe it would make sense to keep a reference instead?

102 ↗(On Diff #125778)

Maybe it'd make more sense to return a reference?

127 ↗(On Diff #125778)

Shouldn't this be guarded by a macro? LLVM_DUMP_METHOD
Can dump() be marked as const?

llvm/lib/IR/Dominators.cpp
402

This doesn't seem to modify any internal state, const should work here.

403

I think this should be llvm::dbgs

llvm/lib/Transforms/Scalar/JumpThreading.cpp
2477

I think we can reserve (2 * size(successors(SplitBB)) + 3 here to have a single allocation

brzycki updated this revision to Diff 125992.Dec 7 2017, 10:35 AM

Updated diff with @sebpop 's comments and rebased to tip.

brzycki updated this revision to Diff 126031.Dec 7 2017, 1:36 PM
brzycki marked 7 inline comments as done.

Added updates from @kuhar 's review (Thanks Kuba!).

brzycki updated this revision to Diff 126034.Dec 7 2017, 1:46 PM
brzycki marked an inline comment as done.

Comment cleanup to better match the code. NFC.

kuhar accepted this revision.Dec 7 2017, 1:49 PM

LGTM

@dberlin Any objections to me committing this code?

llvm/include/llvm/IR/DeferredDominance.h
53 ↗(On Diff #125778)

IT cannot be nullptr. I have changed it to a ref.

127 ↗(On Diff #125778)

I wasn't aware of these wrapper macros for dump, thanks for pointing them out. Added.

llvm/lib/IR/Dominators.cpp
402

Agreed.

llvm/lib/Transforms/Scalar/JumpThreading.cpp
2477

I did this and also added others where the size was trivial to calculate.

kuhar added inline comments.Dec 8 2017, 1:17 PM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
288

One remark: why new here instead of just putting it on the stack? If you need to pass the ownership around, who deallocates it?

brzycki updated this revision to Diff 126468.Dec 11 2017, 4:21 PM
brzycki marked an inline comment as done.

Removed usage of "new" and rebased to tip.

kuhar added inline comments.Dec 12 2017, 7:11 AM
llvm/lib/Transforms/Scalar/JumpThreading.cpp
288

Nit: DeferredDominance DDT = DeferredDominance(*DT) can be just DeferredDominance DDT(*DT);

brzycki updated this revision to Diff 126562.Dec 12 2017, 9:20 AM
brzycki marked an inline comment as done.

Cleaned up DDT init nit.

brzycki updated this revision to Diff 126608.Dec 12 2017, 12:45 PM

Fixed incorrectly-sized Update.reserve() call and resolved merge conflict to rebase onto tip.

brzycki retitled this revision from [JumpThreading] Deferred preservation of DT and LVI across the pass. to [JumpThreading] Preservation of DT and LVI across the pass.Dec 13 2017, 12:41 PM
brzycki edited the summary of this revision. (Show Details)
This revision was automatically updated to reflect the committed changes.
brzycki reopened this revision.Dec 14 2017, 2:27 PM

This patch caused a build break when attempting to use -fmodules due to a cyclic header dependency chain. I've refactored the patch to no longer use DeferredDominance.h (the code now lives as a class in Dominators.h). I also moved one of the methods into Dominators.cpp that needed Constants.h and Instructions.h. @sebpop and @kuhar could you please review again?

llvm/lib/Transforms/Scalar/JumpThreading.cpp
288

Ah, I forgot I had that. This is stale code from an older attempt at working the class into JumpThreading. I'll fix it in the next diff. Thanks for catching this.

This revision is now accepted and ready to land.Dec 14 2017, 2:27 PM
brzycki updated this revision to Diff 127027.Dec 14 2017, 2:28 PM

Refactored patch to prevent -fmodules circular dependency errors.

kuhar added a comment.Dec 14 2017, 2:32 PM

Which files and includes formed the cycle? There are a lot of files that require dominators.h, and I'm not convinced adding a new class to is a good idea, given that it currently has just one user.

Which files and includes formed the cycle? There are a lot of files that require dominators.h, and I'm not convinced adding a new class to is a good idea, given that it currently has just one user.

Hi @kuhar , this started with a BuildBot failure. After investigating further I recreated the problem locally and found the following error message:

[291/1841] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o
FAILED: lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o
/work/brzycki/tip1/install/bin/clang++  -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/IR -I/work/brzycki/stage1/llvm-project/llvm/lib/IR -I/usr/include/libxml2 -Iinclude -I/work/brzycki/stage1/llvm-project/llvm/include -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -fmodules -fmodules-cache-path=/work/brzycki/stage1/build/module.cache -Xclang -fmodules-local-submodule-visibility -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -ffunction-sections -fdata-sections -O3 -DNDEBUG    -fno-exceptions -fno-rtti -MD -MT lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -MF lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o.d -o lib/IR/CMakeFiles/LLVMCore.dir/IRBuilder.cpp.o -c /work/brzycki/stage1/llvm-project/llvm/lib/IR/IRBuilder.cpp
While building module 'LLVM_intrinsic_gen' imported from /work/brzycki/stage1/llvm-project/llvm/lib/IR/IRBuilder.cpp:15:
While building module 'LLVM_IR' imported from /work/brzycki/stage1/llvm-project/llvm/include/llvm/IR/Argument.h:20:
In file included from <module-includes>:30:
/work/brzycki/stage1/llvm-project/llvm/include/llvm/IR/DeferredDominance.h:20:10: fatal error: cyclic dependency in module 'LLVM_intrinsic_gen': LLVM_intrinsic_gen -> LLVM_IR -> LLVM_intrinsic_gen
#include "llvm/IR/Dominators.h"
         ^
While building module 'LLVM_intrinsic_gen' imported from /work/brzycki/stage1/llvm-project/llvm/lib/IR/IRBuilder.cpp:15:
In file included from <module-includes>:1:
/work/brzycki/stage1/llvm-project/llvm/include/llvm/IR/Argument.h:20:10: fatal error: could not build module 'LLVM_IR'
#include "llvm/IR/Value.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~
/work/brzycki/stage1/llvm-project/llvm/lib/IR/IRBuilder.cpp:15:10: fatal error: could not build module 'LLVM_intrinsic_gen'
#include "llvm/IR/IRBuilder.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
3 errors generated.

This is the first of many showing the circular problem. What's worse is if I move only the offending method (delBB) into Dominators.cpp I start to get another set of cyclic errors due to the relatopnship between DeferredDominance.h and Dominators.h. I tried to keep DeferredDominance.h but could not find a way that satisfied -fmodules.

brzycki added a comment.EditedDec 14 2017, 3:42 PM

Here is another failure linking it directly to DeferredDominance.h:

[292/1841] Building CXX object lib/IR/CMakeFiles/LLVMCore.dir/BasicBlock.cpp.o
FAILED: lib/IR/CMakeFiles/LLVMCore.dir/BasicBlock.cpp.o
/work/brzycki/tip1/install/bin/clang++  -DGTEST_HAS_RTTI=0 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/IR -I/work/brzycki/stage1/llvm-project/llvm/lib/IR -I/usr/include/libxml2 -Iinclude -I/work/brzycki/stage1/llvm-project/llvm/include -stdlib=libc++ -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -fmodules -fmodules-cache-path=/work/brzycki/stage1/build/module.cache -Xclang -fmodules-local-submodule-visibility -Wall -W -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wcovered-switch-default -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fcolor-diagnostics -ffunction-sections -fdata-sections -O3 -DNDEBUG    -fno-exceptions -fno-rtti -MD -MT lib/IR/CMakeFiles/LLVMCore.dir/BasicBlock.cpp.o -MF lib/IR/CMakeFiles/LLVMCore.dir/BasicBlock.cpp.o.d -o lib/IR/CMakeFiles/LLVMCore.dir/BasicBlock.cpp.o -c /work/brzycki/stage1/llvm-project/llvm/lib/IR/BasicBlock.cpp
While building module 'LLVM_IR' imported from /work/brzycki/stage1/llvm-project/llvm/lib/IR/BasicBlock.cpp:14:
While building module 'LLVM_intrinsic_gen' imported from /work/brzycki/stage1/llvm-project/llvm/include/llvm/IR/DeferredDominance.h:20:
In file included from <module-includes>:1:
/work/brzycki/stage1/llvm-project/llvm/include/llvm/IR/Argument.h:20:10: fatal error: cyclic dependency in module 'LLVM_IR': LLVM_IR -> LLVM_intrinsic_gen -> LLVM_IR
#include "llvm/IR/Value.h"
         ^
While building module 'LLVM_IR' imported from /work/brzycki/stage1/llvm-project/llvm/lib/IR/BasicBlock.cpp:14:
In file included from <module-includes>:30:
/work/brzycki/stage1/llvm-project/llvm/include/llvm/IR/DeferredDominance.h:20:10: fatal error: could not build module 'LLVM_intrinsic_gen'
#include "llvm/IR/Dominators.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
/work/brzycki/stage1/llvm-project/llvm/lib/IR/BasicBlock.cpp:14:10: fatal error: could not build module 'LLVM_IR'
#include "llvm/IR/BasicBlock.h"
 ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
3 errors generated.

Which files and includes formed the cycle? There are a lot of files that require dominators.h, and I'm not convinced adding a new class to is a good idea, given that it currently has just one user.

@kuhar would it help if I put all the larger functions in Dominators.cpp to prevent aggressive inlining? That would make it a relatively small class.

kuhar added a comment.Dec 15 2017, 8:29 AM

Which files and includes formed the cycle? There are a lot of files that require dominators.h, and I'm not convinced adding a new class to is a good idea, given that it currently has just one user.

@kuhar would it help if I put all the larger functions in Dominators.cpp to prevent aggressive inlining? That would make it a relatively small class.

That sounds better.

I wonder what caused the original cycle. I'm not entirely sure how modules are built, but have you tried getting rid of the include in Local.h? If that's not enough to break the cycle, I don't have any idea better than yours.

brzycki added a subscriber: rsmith.Dec 18 2017, 8:42 AM

I wonder what caused the original cycle. I'm not entirely sure how modules are built, but have you tried getting rid of the include in Local.h? If that's not enough to break the cycle, I don't have any idea better than yours.

It's a bit of a mystery to me how -fmodules actually works. I am getting cyclic errors when compiling a file that does not include either of Dominators.h or DeferredDominance.h. I have emailed @rsmith requesting his expertise if he can help us understand this.

I'm currently working on a version of the patch with no functions inlined in the header. I should have the updated diff available for review shortly.

brzycki updated this revision to Diff 127381.Dec 18 2017, 9:27 AM

Update patch with a minimal class definition of DeferredDominance in Dominators.h. Also removed an unnecessary #include of Dominators.h in JumpThreading.h

sebpop accepted this revision.Dec 18 2017, 9:54 AM

LGTM.
-fmodules can be fought in another patch.

kuhar accepted this revision.Dec 18 2017, 11:26 AM

LGTM

brzycki updated this revision to Diff 128529.Jan 3 2018, 9:07 AM

Getting patch ready to commit again. Rebased to llvm tip, rancheck, check-all, and test-suite on x86_64. Also passed building with LLVM_ENABLE_MODULES=1.

sebpop accepted this revision.Jan 3 2018, 1:06 PM

Looks good to me. Please commit. Thanks Brian!

This revision was automatically updated to reflect the committed changes.
rnk added a subscriber: rnk.Jan 4 2018, 3:22 PM

I'm seeing crashes while building Chromium that I think are caused by this change. The stack trace follows. I will revert and prepare a standalone reproducer.
https://ci.chromium.org/buildbot/chromium.clang/ToTLinux/1285
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.clang%2FToTLinux%2F1285%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout

FAILED: obj/third_party/flatbuffers/compiler_files/idl_parser.o 
../../third_party/llvm-build/Release+Asserts/bin/clang++ -MMD -MF obj/third_party/flatbuffers/compiler_files/idl_parser.o.d -DV8_DEPRECATION_WARNINGS -DUSE_UDEV -DUSE_AURA=1 -DUSE_GLIB=1 -DUSE_NSS_CERTS=1 -DUSE_X11=1 -DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL -DCHROMIUM_BUILD -DFIELDTRIAL_TESTING_ENABLED -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -DCR_CLANG_REVISION=\"321826\" -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -DCOMPONENT_BUILD -DNDEBUG -DNVALGRIND -DDYNAMIC_ANNOTATIONS_ENABLED=0 -I../../third_party/flatbuffers/src/grpc -I../.. -Igen -I../../third_party/flatbuffers/src/include -fno-strict-aliasing --param=ssp-buffer-size=4 -fstack-protector -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -funwind-tables -fPIC -pipe -B../../third_party/binutils/Linux_x64/Release/bin -pthread -fcolor-diagnostics -Xclang -mllvm -Xclang -instcombine-lower-dbg-declare=0 -no-canonical-prefixes -m64 -march=x86-64 -Wall -Werror -Wextra -Wno-missing-field-initializers -Wno-unused-parameter -Wno-c++11-narrowing -Wno-covered-switch-default -Wno-unneeded-internal-declaration -Wno-inconsistent-missing-override -Wno-undefined-var-template -Wno-nonportable-include-path -Wno-address-of-packed-member -Wno-unused-lambda-capture -Wno-user-defined-warnings -Wno-enum-compare-switch -Wno-tautological-unsigned-zero-compare -Wno-null-pointer-arithmetic -Wno-tautological-constant-compare -Wtautological-constant-out-of-range-compare -O2 -fno-ident -fdata-sections -ffunction-sections -fno-omit-frame-pointer -g2 -ggnu-pubnames -fvisibility=hidden -Wheader-hygiene -Wstring-conversion -Wtautological-overlap-compare -Wno-exit-time-destructors -std=gnu++14 -fno-exceptions -fno-rtti -nostdinc++ -isystem../../buildtools/third_party/libc++/trunk/include -isystem../../buildtools/third_party/libc++abi/trunk/include --sysroot=../../build/linux/debian_stretch_amd64-sysroot -fvisibility-inlines-hidden -c ../../third_party/flatbuffers/src/src/idl_parser.cpp -o obj/third_party/flatbuffers/compiler_files/idl_parser.o
clang++: /b/c/b/ToTLinux/src/third_party/llvm/include/llvm/Support/GenericDomTreeConstruction.h:1107: static void llvm::DomTreeBuilder::SemiNCAInfo<DomTreeT>::EraseNode(DomTreeT&, llvm::DomTreeBuilder::SemiNCAInfo<DomTreeT>::TreeNodePtr) [with DomTreeT = llvm::DominatorTreeBase<llvm::BasicBlock, false>; llvm::DomTreeBuilder::SemiNCAInfo<DomTreeT>::TreeNodePtr = llvm::DomTreeNodeBase<llvm::BasicBlock>*; typename DomTreeT::NodeType = llvm::BasicBlock]: Assertion `TN->getNumChildren() == 0 && "Not a tree leaf"' failed.
#0 0x000000000222e74a llvm::sys::PrintStackTrace(llvm::raw_ostream&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x222e74a)
#1 0x000000000222c82e llvm::sys::RunSignalHandlers() (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x222c82e)
#2 0x000000000222c992 SignalHandler(int) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x222c992)
#3 0x00007ff2fe151330 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x10330)
#4 0x00007ff2fcd40c37 gsignal /build/eglibc-SvCtMH/eglibc-2.19/signal/../nptl/sysdeps/unix/sysv/linux/raise.c:56:0
#5 0x00007ff2fcd44028 abort /build/eglibc-SvCtMH/eglibc-2.19/stdlib/abort.c:91:0
#6 0x00007ff2fcd39bf6 __assert_fail_base /build/eglibc-SvCtMH/eglibc-2.19/assert/assert.c:92:0
#7 0x00007ff2fcd39ca2 (/lib/x86_64-linux-gnu/libc.so.6+0x2fca2)
#8 0x0000000001d5b3f6 llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::DeleteUnreachable(llvm::DominatorTreeBase<llvm::BasicBlock, false>&, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::BatchUpdateInfo*, llvm::DomTreeNodeBase<llvm::BasicBlock>*) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1d5b3f6)
#9 0x0000000001d5b786 llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::DeleteEdge(llvm::DominatorTreeBase<llvm::BasicBlock, false>&, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::BatchUpdateInfo*, llvm::BasicBlock*, llvm::BasicBlock*) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1d5b786)
#10 0x0000000001d640aa llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::ApplyNextUpdate(llvm::DominatorTreeBase<llvm::BasicBlock, false>&, llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::BatchUpdateInfo&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1d640aa)
#11 0x0000000001d66e09 llvm::DomTreeBuilder::SemiNCAInfo<llvm::DominatorTreeBase<llvm::BasicBlock, false> >::ApplyUpdates(llvm::DominatorTreeBase<llvm::BasicBlock, false>&, llvm::ArrayRef<llvm::DomTreeBuilder::Update<llvm::BasicBlock*> >) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1d66e09)
#12 0x0000000001d67020 llvm::DeferredDominance::flush() (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1d67020)
#13 0x0000000002086930 llvm::JumpThreadingPass::runImpl(llvm::Function&, llvm::TargetLibraryInfo*, llvm::LazyValueInfo*, llvm::AAResults*, llvm::DeferredDominance*, bool, std::unique_ptr<llvm::BlockFrequencyInfo, std::default_delete<llvm::BlockFrequencyInfo> >, std::unique_ptr<llvm::BranchProbabilityInfo, std::default_delete<llvm::BranchProbabilityInfo> >) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x2086930)
#14 0x0000000002086f56 (anonymous namespace)::JumpThreading::runOnFunction(llvm::Function&) [clone .part.671] (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x2086f56)
#15 0x0000000001da84b3 llvm::FPPassManager::runOnFunction(llvm::Function&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1da84b3)
#16 0x0000000001875f1f (anonymous namespace)::CGPassManager::runOnModule(llvm::Module&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1875f1f)
#17 0x0000000001da8dff llvm::legacy::PassManagerImpl::run(llvm::Module&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x1da8dff)
#18 0x00000000023c4a8b (anonymous namespace)::EmitAssemblyHelper::EmitAssembly(clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x23c4a8b)
#19 0x00000000023c5e07 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x23c5e07)
#20 0x0000000002ad9bab clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x2ad9bab)
#21 0x0000000002efd652 clang::ParseAST(clang::Sema&, bool, bool) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x2efd652)
#22 0x0000000002ad91ef clang::CodeGenAction::ExecuteAction() (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x2ad91ef)
#23 0x000000000277deb6 clang::FrontendAction::Execute() (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x277deb6)
#24 0x000000000275462e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x275462e)
#25 0x000000000280fafb clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0x280fafb)
#26 0x0000000000bf1f98 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (../../third_party/llvm-build/Release+Asserts/bin/clang+++0xbf1f98)
#27 0x0000000000b836eb main (../../third_party/llvm-build/Release+Asserts/bin/clang+++0xb836eb)
#28 0x00007ff2fcd2bf45 __libc_start_main /build/eglibc-SvCtMH/eglibc-2.19/csu/libc-start.c:321:0
#29 0x0000000000bed219 _start (../../third_party/llvm-build/Release+Asserts/bin/clang+++0xbed219)
rnk added a comment.Jan 4 2018, 3:56 PM

I made a reproducer and am reducing it. I'll upload it some time later today.

rnk added a comment.Jan 4 2018, 5:18 PM

CReduce got this reduction, but I think you can go further:

struct a {
  int b;
};
class c {
public:
  c(bool) : d() {}
  ~c() {}
  bool e() { return d; }
  bool d;
};
class f : a {
  c g();
  c h();
  c i();
};
enum { j };
c f::h() {
  switch (b)
  case j: {
    auto a(i());
    if (a.e())
      return a;
  }
    return 0;
}
c f::i() {
  {
    auto b(g());
    if (b.e())
      return b;
  }
  return 0;
}

Use the command line clang -cc1 idl_parser.cpp -S -emit-llvm -O2 -o t.ll.

brzycki reopened this revision.Jan 5 2018, 8:47 AM

The patch was reverted in r321832 by @rnk for the following reason:

This reverts r321825, it causes crashes in Chromium. Reproducer forthcoming.

This revision is now accepted and ready to land.Jan 5 2018, 8:47 AM
In D40146#967965, @rnk wrote:

CReduce got this reduction, but I think you can go further:

Thank you @rnk for reducing the test case. I don't think you need to go further, I'll convert to llvm bytecode inside gdb and reduce with bugpoint once I verify the crash locally. Once there I'll post the .ll.

The patch was reverted in r321832 by @rnk for the following reason:

This reverts r321825, it causes crashes in Chromium. Reproducer forthcoming.

This is the reduced llvm IR that triggers the failure:

%struct.hoge = type { i8 }

define void @widget(%struct.hoge* noalias sret %arg) local_unnamed_addr #0 align 2 personality i8* bitcast (i32 (...)* @quux to i8*) {
bb:
  %tmp = load i32, i32* undef, align 4
  %tmp1 = icmp eq i32 %tmp, 0
  br i1 %tmp1, label %bb2, label %bb13

bb2:
  %tmp3 = getelementptr inbounds %struct.hoge, %struct.hoge* %arg, i64 0, i32 0
  %tmp4 = load i8, i8* %tmp3, align 1
  %tmp5 = icmp eq i8 %tmp4, 0
  br i1 %tmp5, label %bb6, label %bb7

bb6:
  store i8 0, i8* %tmp3, align 1
  br label %bb7

bb7:
  %tmp8 = load i8, i8* %tmp3, align 1
  %tmp9 = icmp ne i8 %tmp8, 0
  %tmp10 = select i1 %tmp9, i1 true, i1 false
  br i1 %tmp10, label %bb12, label %bb11

bb11:
  br label %bb12

bb12:
  br i1 %tmp9, label %bb14, label %bb13

bb13:
  unreachable

bb14:
  ret void
}

declare i32 @quux(...)

And compile with opt -S -jump-threading <foo.ll

I'm debugging it now and will add to ddt-crash.ll when I re-submit with a fix.

brzycki updated this revision to Diff 128964.EditedJan 8 2018, 11:46 AM

Fxed assert() caused when compiling Chrome.

Comments are inline describing where the patch was applied.

@kuhar if you have the time I would appreciate your feedback on the fix and if you're still ok with the patch.

llvm/lib/Transforms/Utils/Local.cpp
673

I did not see this fail in the specific case of the Chrome assert(). However, it is entirely possible on this code path for the same issue to occur which is why I added the check.

977

This check is the fix for the assert() discovered by @rnk . The shape of the IR had an existing edge from *I to Succ. Inserting was incorrect since the DT already had a pre-existing edge. It caused the balancing of Insertions/Deletions to be incorrect.

llvm/test/Transforms/JumpThreading/ddt-crash.ll
269

The reduced testcase that crashed LLVM discovered when compiling Chrome.

kuhar added a comment.Jan 8 2018, 2:29 PM

I'm a little bit worried that the tests depend on jumpthreading and it's not obvious what the sequence of operations on DDT is. It's difficult to come up with CFG for all the edge cases and I think it would be better to have something equivalent to the tests for DT and PDT incremental updates.
I'm OK with reapplying this, but when it causes a couple more crashes I'd strongly consider some more exhaustive testing approach.

llvm/lib/Transforms/Utils/Local.cpp
673

Looks reasonable.

You can use llvm::find here, something like this if (llvm::find(successors(*I), DestBB) == succ_end(*I))

977

Just like above.

llvm/test/Transforms/JumpThreading/ddt-crash.ll
269

Wouldn't it be better to add it in a separate file?

brzycki marked 9 inline comments as done.EditedJan 9 2018, 2:16 PM
brzycki added a subscriber: kuba.

I'm OK with reapplying this, but when it causes a couple more crashes I'd strongly consider some more exhaustive testing approach.

Thanks @kuhar for the review. I agree with you that DDT should have a unittest module and I've started work on it. I have the skeleton in place and am working on the actual tests now. I'm still trying to determine the set of unique CFG shapes/mutations to test for.

brzycki updated this revision to Diff 129330.Jan 10 2018, 1:27 PM

Fixed Chrome build patch with suggestions from @kuhar . Added a first pass at DDT unit tests.

brzycki updated this revision to Diff 129526.Jan 11 2018, 2:26 PM

Fixed comments in unit test.

kuhar accepted this revision.Jan 11 2018, 5:49 PM

Thanks for the tests, they look good to me. Found only some nits.

llvm/unittests/IR/DeferredDominanceTest.cpp
10 ↗(On Diff #129526)

Unused?

57 ↗(On Diff #129526)

While unlikely it makes a difference, it would be better to use ASSERT(DDT.flush().verify()) instead -- it's able to catch much more errors and explain them. And the old verifier asserts upon failure, which can be surprising.

70 ↗(On Diff #129526)

Nit: s/delete/Delete + missing comma at the end

95 ↗(On Diff #129526)

!isa<UnreachableInst> would be more readable, or ASSERT_NOT(isa<...>(...)), if the library supports is.

99 ↗(On Diff #129526)

!isa<UnreachableInst> would be more readable

105 ↗(On Diff #129526)

Better to use ASSERT(...verify())

132 ↗(On Diff #129526)

verify()

211 ↗(On Diff #129526)

verify()

223 ↗(On Diff #129526)

verify()

226 ↗(On Diff #129526)

!isa

230 ↗(On Diff #129526)

!isa

237 ↗(On Diff #129526)

verify()

306 ↗(On Diff #129526)

!isa

329 ↗(On Diff #129526)

!isa

333 ↗(On Diff #129526)

isa

344 ↗(On Diff #129526)

verify()

brzycki marked 16 inline comments as done.Jan 12 2018, 12:15 PM

@kuba Thank you again for the careful review. I'm uploading a new diff shortly. If you could give it a quick LGTM I'll try to push the patch to tip again today.

llvm/unittests/IR/DeferredDominanceTest.cpp
10 ↗(On Diff #129526)

Yes. Removed.

57 ↗(On Diff #129526)

Done. I used ASSERT_TRUE(DDT.flush().verify()) because ASSERT() isn't a valid macro.

95 ↗(On Diff #129526)

I'll use isa<> with the ASSERT_TRUE() and ASSERT_FALSE() macros.

brzycki updated this revision to Diff 129681.Jan 12 2018, 12:16 PM
brzycki marked 3 inline comments as done.

Update nits in the unit test for clarity.

kuhar accepted this revision.Jan 12 2018, 12:22 PM

Looks good now, thanks for the changes!

llvm/unittests/IR/DeferredDominanceTest.cpp
212 ↗(On Diff #129681)

Nit: missing comma.

brzycki updated this revision to Diff 129688.Jan 12 2018, 12:46 PM

Fixed punctuation in unit test comment. NFC.

brzycki marked an inline comment as done.Jan 12 2018, 12:46 PM

Preparing to commit to SVN.

This revision was automatically updated to reflect the committed changes.
MatzeB added a subscriber: MatzeB.Jan 18 2018, 2:51 PM

I'm currently seeing release+assert clang compilers apparently being stuck in an endless loop in the llvm test-suite (well I aborted after 10 minutes for a single file) hanging in some dominance tree verification code when I inspect the backtrace in lldb. Could this be caused by this commit?

kuhar added a comment.Jan 18 2018, 3:06 PM

I'm currently seeing release+assert clang compilers apparently being stuck in an endless loop in the llvm test-suite (well I aborted after 10 minutes for a single file) hanging in some dominance tree verification code when I inspect the backtrace in lldb. Could this be caused by this commit?

Are you compiling with EXPENSIVE_CHECKS enabled? The full domtree verification is O(n^3), but is only run with expensive checks or -verify-dom-info, and can blow up on some weird CFG-s.

I'm currently seeing release+assert clang compilers apparently being stuck in an endless loop in the llvm test-suite (well I aborted after 10 minutes for a single file) hanging in some dominance tree verification code when I inspect the backtrace in lldb. Could this be caused by this commit?

Are you compiling with EXPENSIVE_CHECKS enabled? The full domtree verification is O(n^3), but is only run with expensive checks or -verify-dom-info, and can blow up on some weird CFG-s.

Oh indeed, looks like my build had EXPENSIVE_CHECKS enabled (which was not intentional in this case)...

That said it's sad to not being able to compile the llvm test-suite with expensive checks enabled (in 60min at least)...

kuhar added a comment.Jan 18 2018, 3:23 PM

That said it's sad to not being able to compile the llvm test-suite with expensive checks enabled (in 60min at least)...

What kind of machine are you using and how many threads do you allow it to create to run the test suite?

@dannyb

It's possible to make the verification O(N) but it's algorithmically incredibly complicated :)

Alternatively, we could make DT.verify() only run sibling property check when -verify-dom-info is set, and make it incomplete but O(N^2) with just EXPENSIVE_CHECKS. In my experience 99% of DT problems are caught in parent property check, so maybe that would be good enough?

That said it's sad to not being able to compile the llvm test-suite with expensive checks enabled (in 60min at least)...

What kind of machine are you using and how many threads do you allow it to create to run the test suite?

That build was on a Core i5, 4 Cores @ 3.4Ghz with ninja spawning 6 clang processes (crosscompiling to aarch64).

Some files like sqlite3.c felt like they would never finish...

@dannyb

It's possible to make the verification O(N) but it's algorithmically incredibly complicated :)

Alternatively, we could make DT.verify() only run sibling property check when -verify-dom-info is set, and make it incomplete but O(N^2) with just EXPENSIVE_CHECKS. In my experience 99% of DT problems are caught in parent property check, so maybe that would be good enough?

I can't speak about the differences, but if the verifier is so slow that people will never finish the test-suite with expensive checks, then we won't catch problems there either...

kuhar added a comment.EditedJan 18 2018, 3:34 PM

That build was on a Core i5, 4 Cores @ 3.4Ghz with ninja spawning 6 clang processes (crosscompiling to aarch64).

Some files like sqlite3.c felt like they would never finish...

Can you try timing it and setting the timeout higher? It would be interesting to get more data on how it performs. So far the only report I remember was about 2-core iMac or Mac-mini taking ~7 hours, but that's rather on the extreme side of the spectrum...

Just finished this:

time /Users/mbraun/lntenv/sandbox/test_suite_device_636d761d305a435b9592881371973ec4/compiler/bin/clang -DNDEBUG -I/Users/mbraun/lntenv/sandbox/test_suite_device_636d761d305a435b9592881371973ec4/test-suite/MultiSource/Applications/sqlite3 -IMultiSource/Applications/sqlite3 -B /Applications/Xcode.app/Contents/Developer/Toolchains/iOS11.1.xctoolchain/usr/bin -O3 -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS11.1.Internal.sdk -w -Werror=date-time -DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1 -DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1 -DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1 -DHAVE_UNISTD_H=1 -DSQLITE_OMIT_LOAD_EXTENSION=1 -DSQLITE_THREADSAFE=0 -I. -MD -MT MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -MF MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o.d -o MultiSource/Applications/sqlite3/CMakeFiles/sqlite3.dir/sqlite3.c.o -c /Users/mbraun/lntenv/sandbox/test_suite_device_636d761d305a435b9592881371973ec4/test-suite/MultiSource/Applications/sqlite3/sqlite3.c -ftime-report
===-------------------------------------------------------------------------===
                         Miscellaneous Ungrouped Timers
===-------------------------------------------------------------------------===

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  6370.2168 (100.0%)  14.7216 ( 99.9%)  6384.9384 (100.0%)  7828.5324 (100.0%)  Code Generation Time
   0.2016 (  0.0%)   0.0144 (  0.1%)   0.2160 (  0.0%)   0.2402 (  0.0%)  LLVM IR Generation Time
  6370.4184 (100.0%)  14.7360 (100.0%)  6385.1544 (100.0%)  7828.7725 (100.0%)  Total

===-------------------------------------------------------------------------===
                              Register Allocation
===-------------------------------------------------------------------------===
  Total Execution Time: 0.4495 seconds (0.4523 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.4150 ( 92.9%)   0.0015 ( 49.4%)   0.4165 ( 92.7%)   0.4193 ( 92.7%)  Global Splitting
   0.0171 (  3.8%)   0.0004 ( 13.9%)   0.0176 (  3.9%)   0.0176 (  3.9%)  Spiller
   0.0106 (  2.4%)   0.0009 ( 30.6%)   0.0115 (  2.6%)   0.0115 (  2.5%)  Evict
   0.0037 (  0.8%)   0.0002 (  5.9%)   0.0039 (  0.9%)   0.0038 (  0.9%)  Seed Live Regs
   0.0001 (  0.0%)   0.0000 (  0.2%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Local Splitting
   0.4465 (100.0%)   0.0030 (100.0%)   0.4495 (100.0%)   0.4523 (100.0%)  Total

===-------------------------------------------------------------------------===
                      Instruction Selection and Scheduling
===-------------------------------------------------------------------------===
  Total Execution Time: 2.8514 seconds (2.8514 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.7690 ( 27.6%)   0.0069 ( 10.6%)   0.7759 ( 27.2%)   0.7765 ( 27.2%)  Type Legalization
   0.5942 ( 21.3%)   0.0068 ( 10.4%)   0.6010 ( 21.1%)   0.6008 ( 21.1%)  Instruction Selection
   0.4215 ( 15.1%)   0.0066 ( 10.1%)   0.4281 ( 15.0%)   0.4282 ( 15.0%)  DAG Legalization
   0.3966 ( 14.2%)   0.0094 ( 14.4%)   0.4060 ( 14.2%)   0.4068 ( 14.3%)  DAG Combining 1
   0.2121 (  7.6%)   0.0066 ( 10.1%)   0.2187 (  7.7%)   0.2185 (  7.7%)  DAG Combining 2
   0.1681 (  6.0%)   0.0077 ( 11.9%)   0.1759 (  6.2%)   0.1753 (  6.1%)  Instruction Scheduling
   0.0892 (  3.2%)   0.0070 ( 10.8%)   0.0962 (  3.4%)   0.0961 (  3.4%)  Instruction Creation
   0.0848 (  3.0%)   0.0015 (  2.3%)   0.0863 (  3.0%)   0.0864 (  3.0%)  DAG Combining after legalize types
   0.0322 (  1.2%)   0.0063 (  9.7%)   0.0385 (  1.4%)   0.0383 (  1.3%)  Instruction Scheduling Cleanup
   0.0174 (  0.6%)   0.0064 (  9.8%)   0.0238 (  0.8%)   0.0235 (  0.8%)  Vector Legalization
   0.0007 (  0.0%)   0.0000 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Type Legalization 2
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  DAG Combining after legalize vectors
   2.7863 (100.0%)   0.0651 (100.0%)   2.8514 (100.0%)   2.8514 (100.0%)  Total

===-------------------------------------------------------------------------===
                                 DWARF Emission
===-------------------------------------------------------------------------===
  Total Execution Time: 0.0029 seconds (0.0029 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
   0.0010 ( 57.7%)   0.0005 ( 43.8%)   0.0015 ( 52.3%)   0.0015 ( 52.5%)  DWARF Exception Writer
   0.0007 ( 40.2%)   0.0006 ( 56.0%)   0.0013 ( 46.3%)   0.0013 ( 46.1%)  Debug Info Emission
   0.0000 (  2.1%)   0.0000 (  0.2%)   0.0000 (  1.4%)   0.0000 (  1.3%)  DWARF Debug Writer
   0.0018 (100.0%)   0.0011 (100.0%)   0.0029 (100.0%)   0.0029 (100.0%)  Total

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 6383.6554 seconds (7826.9875 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  1811.0822 ( 28.4%)   4.0908 ( 28.5%)  1815.1730 ( 28.4%)  2283.3040 ( 29.2%)  Dominator Tree Construction
  1382.5721 ( 21.7%)   3.1086 ( 21.7%)  1385.6807 ( 21.7%)  1746.6788 ( 22.3%)  Dominator Tree Construction
  1017.1956 ( 16.0%)   2.2479 ( 15.7%)  1019.4436 ( 16.0%)  1282.1470 ( 16.4%)  Dominator Tree Construction
  698.5721 ( 11.0%)   1.6101 ( 11.2%)  700.1822 ( 11.0%)  881.0982 ( 11.3%)  Dominator Tree Construction
  591.7651 (  9.3%)   1.0344 (  7.2%)  592.7994 (  9.3%)  648.0347 (  8.3%)  Dominator Tree Construction
  366.1419 (  5.7%)   0.8411 (  5.9%)  366.9830 (  5.7%)  461.8269 (  5.9%)  Dominator Tree Construction
  260.9003 (  4.1%)   0.1933 (  1.3%)  261.0936 (  4.1%)  261.1696 (  3.3%)  Dominator Tree Construction
  17.1747 (  0.3%)   0.0606 (  0.4%)  17.2353 (  0.3%)  21.9036 (  0.3%)  Dominator Tree Construction
   9.6751 (  0.2%)   0.0234 (  0.2%)   9.6985 (  0.2%)  12.1725 (  0.2%)  Unroll loops
   8.6642 (  0.1%)   0.0201 (  0.1%)   8.6843 (  0.1%)  10.9126 (  0.1%)  Dominator Tree Construction
   7.3346 (  0.1%)   0.0175 (  0.1%)   7.3521 (  0.1%)   9.2415 (  0.1%)  Dominator Tree Construction
   8.2538 (  0.1%)   0.0315 (  0.2%)   8.2853 (  0.1%)   8.2870 (  0.1%)  Verify generated machine code
   8.2532 (  0.1%)   0.0292 (  0.2%)   8.2824 (  0.1%)   8.2841 (  0.1%)  Verify generated machine code
   8.2382 (  0.1%)   0.0271 (  0.2%)   8.2654 (  0.1%)   8.2677 (  0.1%)  Verify generated machine code
   7.9754 (  0.1%)   0.0311 (  0.2%)   8.0065 (  0.1%)   8.0083 (  0.1%)  Verify generated machine code
   7.8422 (  0.1%)   0.0405 (  0.3%)   7.8828 (  0.1%)   7.8843 (  0.1%)  Verify generated machine code
   7.8252 (  0.1%)   0.0231 (  0.2%)   7.8483 (  0.1%)   7.8504 (  0.1%)  Verify generated machine code
   7.6636 (  0.1%)   0.0252 (  0.2%)   7.6889 (  0.1%)   7.6907 (  0.1%)  Verify generated machine code
   7.6414 (  0.1%)   0.0302 (  0.2%)   7.6716 (  0.1%)   7.6736 (  0.1%)  Verify generated machine code
   6.0292 (  0.1%)   0.0141 (  0.1%)   6.0433 (  0.1%)   7.6656 (  0.1%)  Dominator Tree Construction
   7.6319 (  0.1%)   0.0283 (  0.2%)   7.6602 (  0.1%)   7.6616 (  0.1%)  Verify generated machine code
   7.6192 (  0.1%)   0.0290 (  0.2%)   7.6483 (  0.1%)   7.6509 (  0.1%)  Verify generated machine code
   7.6059 (  0.1%)   0.0307 (  0.2%)   7.6366 (  0.1%)   7.6380 (  0.1%)  Verify generated machine code
   7.6062 (  0.1%)   0.0277 (  0.2%)   7.6339 (  0.1%)   7.6353 (  0.1%)  Verify generated machine code
   7.5977 (  0.1%)   0.0263 (  0.2%)   7.6240 (  0.1%)   7.6275 (  0.1%)  Verify generated machine code
   7.5916 (  0.1%)   0.0262 (  0.2%)   7.6179 (  0.1%)   7.6196 (  0.1%)  Verify generated machine code
   7.5885 (  0.1%)   0.0276 (  0.2%)   7.6161 (  0.1%)   7.6178 (  0.1%)  Verify generated machine code
   7.0549 (  0.1%)   0.0309 (  0.2%)   7.0858 (  0.1%)   7.1104 (  0.1%)  Verify generated machine code
   6.7323 (  0.1%)   0.0297 (  0.2%)   6.7620 (  0.1%)   6.7643 (  0.1%)  Verify generated machine code
   6.7279 (  0.1%)   0.0283 (  0.2%)   6.7563 (  0.1%)   6.7581 (  0.1%)  Verify generated machine code
   6.5326 (  0.1%)   0.0313 (  0.2%)   6.5640 (  0.1%)   6.5701 (  0.1%)  Verify generated machine code
   3.6521 (  0.1%)   0.0138 (  0.1%)   3.6658 (  0.1%)   4.5258 (  0.1%)  Natural Loop Information
   3.5962 (  0.1%)   0.1294 (  0.9%)   3.7256 (  0.1%)   3.7271 (  0.0%)  AArch64 Instruction Selection
   2.9391 (  0.0%)   0.0073 (  0.1%)   2.9464 (  0.0%)   3.7231 (  0.0%)  Dominator Tree Construction
   2.9442 (  0.0%)   0.0111 (  0.1%)   2.9553 (  0.0%)   3.7108 (  0.0%)  Natural Loop Information
   2.8810 (  0.0%)   0.0071 (  0.0%)   2.8881 (  0.0%)   3.6305 (  0.0%)  Conditionally eliminate dead library calls
   2.1528 (  0.0%)   0.0078 (  0.1%)   2.1606 (  0.0%)   2.6682 (  0.0%)  Natural Loop Information
   2.4722 (  0.0%)   0.0024 (  0.0%)   2.4746 (  0.0%)   2.4753 (  0.0%)  Dominator Tree Construction
   2.4647 (  0.0%)   0.0019 (  0.0%)   2.4666 (  0.0%)   2.4675 (  0.0%)  Dominator Tree Construction
   2.2790 (  0.0%)   0.0020 (  0.0%)   2.2809 (  0.0%)   2.2817 (  0.0%)  Dominator Tree Construction
   2.2531 (  0.0%)   0.0025 (  0.0%)   2.2556 (  0.0%)   2.2569 (  0.0%)  Dominator Tree Construction
   1.4120 (  0.0%)   0.0059 (  0.0%)   1.4178 (  0.0%)   1.8723 (  0.0%)  Natural Loop Information
   1.1641 (  0.0%)   0.0061 (  0.0%)   1.1702 (  0.0%)   1.4764 (  0.0%)  Global Value Numbering
   1.2461 (  0.0%)   0.0036 (  0.0%)   1.2496 (  0.0%)   1.3694 (  0.0%)  Natural Loop Information
   0.9376 (  0.0%)   0.0070 (  0.0%)   0.9446 (  0.0%)   1.1897 (  0.0%)  Function Integration/Inlining
   0.7963 (  0.0%)   0.0036 (  0.0%)   0.7999 (  0.0%)   1.0116 (  0.0%)  Natural Loop Information
   0.5423 (  0.0%)   0.0036 (  0.0%)   0.5459 (  0.0%)   0.6912 (  0.0%)  Loop-Closed SSA Form Pass
   0.6131 (  0.0%)   0.0063 (  0.0%)   0.6193 (  0.0%)   0.6229 (  0.0%)  Greedy Register Allocator
   0.5654 (  0.0%)   0.0021 (  0.0%)   0.5675 (  0.0%)   0.5673 (  0.0%)  Natural Loop Information
   0.4210 (  0.0%)   0.0029 (  0.0%)   0.4239 (  0.0%)   0.5253 (  0.0%)  Loop-Closed SSA Form Pass
   0.3984 (  0.0%)   0.0027 (  0.0%)   0.4011 (  0.0%)   0.4982 (  0.0%)  Combine redundant instructions
   0.4655 (  0.0%)   0.0041 (  0.0%)   0.4697 (  0.0%)   0.4696 (  0.0%)  MachineDominator Tree Construction
   0.3331 (  0.0%)   0.0015 (  0.0%)   0.3346 (  0.0%)   0.4409 (  0.0%)  Combine redundant instructions
   0.3392 (  0.0%)   0.0023 (  0.0%)   0.3415 (  0.0%)   0.4190 (  0.0%)  Memory SSA
   0.3212 (  0.0%)   0.0020 (  0.0%)   0.3231 (  0.0%)   0.4088 (  0.0%)  Induction Variable Simplification
   0.3208 (  0.0%)   0.0013 (  0.0%)   0.3220 (  0.0%)   0.4033 (  0.0%)  Combine redundant instructions
   0.3170 (  0.0%)   0.0014 (  0.0%)   0.3184 (  0.0%)   0.3923 (  0.0%)  Combine redundant instructions
   0.3066 (  0.0%)   0.0021 (  0.0%)   0.3087 (  0.0%)   0.3766 (  0.0%)  Dominator Tree Construction
   0.2977 (  0.0%)   0.0018 (  0.0%)   0.2995 (  0.0%)   0.3759 (  0.0%)  Dominator Tree Construction
   0.2807 (  0.0%)   0.0015 (  0.0%)   0.2822 (  0.0%)   0.3461 (  0.0%)  Jump Threading
   0.2826 (  0.0%)   0.0017 (  0.0%)   0.2843 (  0.0%)   0.3406 (  0.0%)  Loop Invariant Code Motion
   0.2603 (  0.0%)   0.0014 (  0.0%)   0.2617 (  0.0%)   0.3391 (  0.0%)  Combine redundant instructions
   0.2491 (  0.0%)   0.0009 (  0.0%)   0.2500 (  0.0%)   0.3350 (  0.0%)  Combine redundant instructions
   0.2363 (  0.0%)   0.0026 (  0.0%)   0.2389 (  0.0%)   0.3077 (  0.0%)  Value Propagation
   0.2211 (  0.0%)   0.0023 (  0.0%)   0.2234 (  0.0%)   0.2958 (  0.0%)  Value Propagation
   0.2552 (  0.0%)   0.0014 (  0.0%)   0.2566 (  0.0%)   0.2904 (  0.0%)  Loop Invariant Code Motion
   0.2106 (  0.0%)   0.0012 (  0.0%)   0.2118 (  0.0%)   0.2593 (  0.0%)  Loop Invariant Code Motion
   0.2006 (  0.0%)   0.0011 (  0.0%)   0.2017 (  0.0%)   0.2477 (  0.0%)  Jump Threading
   0.1906 (  0.0%)   0.0008 (  0.0%)   0.1914 (  0.0%)   0.2445 (  0.0%)  Combine redundant instructions
   0.1688 (  0.0%)   0.0012 (  0.0%)   0.1701 (  0.0%)   0.2214 (  0.0%)  Combine redundant instructions
   0.2186 (  0.0%)   0.0005 (  0.0%)   0.2191 (  0.0%)   0.2190 (  0.0%)  CodeGen Prepare
   0.2039 (  0.0%)   0.0011 (  0.0%)   0.2050 (  0.0%)   0.2052 (  0.0%)  Loop Strength Reduction
   0.1543 (  0.0%)   0.0012 (  0.0%)   0.1555 (  0.0%)   0.1961 (  0.0%)  Dominator Tree Construction
   0.1449 (  0.0%)   0.0013 (  0.0%)   0.1462 (  0.0%)   0.1786 (  0.0%)  Loop-Closed SSA Form Pass
   0.1419 (  0.0%)   0.0009 (  0.0%)   0.1428 (  0.0%)   0.1740 (  0.0%)  Loop-Closed SSA Form Pass
   0.1378 (  0.0%)   0.0009 (  0.0%)   0.1387 (  0.0%)   0.1699 (  0.0%)  Loop-Closed SSA Form Pass
   0.1411 (  0.0%)   0.0009 (  0.0%)   0.1421 (  0.0%)   0.1679 (  0.0%)  Loop-Closed SSA Form Pass
   0.1325 (  0.0%)   0.0011 (  0.0%)   0.1336 (  0.0%)   0.1608 (  0.0%)  Unroll loops
   0.1148 (  0.0%)   0.0009 (  0.0%)   0.1157 (  0.0%)   0.1537 (  0.0%)  Early CSE w/ MemorySSA
   0.1457 (  0.0%)   0.0004 (  0.0%)   0.1460 (  0.0%)   0.1461 (  0.0%)  Machine Instruction Scheduler
   0.1085 (  0.0%)   0.0014 (  0.0%)   0.1099 (  0.0%)   0.1438 (  0.0%)  Rotate Loops
   0.1248 (  0.0%)   0.0007 (  0.0%)   0.1255 (  0.0%)   0.1376 (  0.0%)  Loop-Closed SSA Form Pass
   0.1098 (  0.0%)   0.0012 (  0.0%)   0.1111 (  0.0%)   0.1338 (  0.0%)  SROA
   0.1266 (  0.0%)   0.0014 (  0.0%)   0.1279 (  0.0%)   0.1279 (  0.0%)  Live Interval Analysis
   0.1229 (  0.0%)   0.0013 (  0.0%)   0.1242 (  0.0%)   0.1241 (  0.0%)  Induction Variable Users
   0.1173 (  0.0%)   0.0004 (  0.0%)   0.1177 (  0.0%)   0.1179 (  0.0%)  Control Flow Optimizer
   0.1097 (  0.0%)   0.0013 (  0.0%)   0.1109 (  0.0%)   0.1109 (  0.0%)  MachineDominator Tree Construction
   0.1093 (  0.0%)   0.0004 (  0.0%)   0.1097 (  0.0%)   0.1104 (  0.0%)  Branch Probability Basic Block Placement
   0.0864 (  0.0%)   0.0005 (  0.0%)   0.0869 (  0.0%)   0.1040 (  0.0%)  SLP Vectorizer
   0.0992 (  0.0%)   0.0004 (  0.0%)   0.0996 (  0.0%)   0.0995 (  0.0%)  Simple Register Coalescing
   0.0934 (  0.0%)   0.0007 (  0.0%)   0.0942 (  0.0%)   0.0942 (  0.0%)  Live Variable Analysis
   0.0662 (  0.0%)   0.0008 (  0.0%)   0.0670 (  0.0%)   0.0925 (  0.0%)  Reassociate expressions
   0.0714 (  0.0%)   0.0023 (  0.0%)   0.0737 (  0.0%)   0.0922 (  0.0%)  Called Value Propagation
   0.0730 (  0.0%)   0.0007 (  0.0%)   0.0738 (  0.0%)   0.0906 (  0.0%)  Simplify the CFG
   0.0708 (  0.0%)   0.0004 (  0.0%)   0.0712 (  0.0%)   0.0893 (  0.0%)  Simplify the CFG
   0.0646 (  0.0%)   0.0007 (  0.0%)   0.0653 (  0.0%)   0.0856 (  0.0%)  Simplify the CFG
   0.0646 (  0.0%)   0.0014 (  0.0%)   0.0660 (  0.0%)   0.0816 (  0.0%)  Sparse Conditional Constant Propagation
   0.0559 (  0.0%)   0.0012 (  0.0%)   0.0571 (  0.0%)   0.0801 (  0.0%)  Unswitch loops
   0.0744 (  0.0%)   0.0006 (  0.0%)   0.0751 (  0.0%)   0.0773 (  0.0%)  Simplify the CFG
   0.0651 (  0.0%)   0.0007 (  0.0%)   0.0658 (  0.0%)   0.0751 (  0.0%)  Dead Store Elimination
   0.0740 (  0.0%)   0.0007 (  0.0%)   0.0746 (  0.0%)   0.0746 (  0.0%)  Verify generated machine code
   0.0543 (  0.0%)   0.0008 (  0.0%)   0.0551 (  0.0%)   0.0728 (  0.0%)  Module Verifier
   0.0710 (  0.0%)   0.0008 (  0.0%)   0.0718 (  0.0%)   0.0728 (  0.0%)  Verify generated machine code
   0.0694 (  0.0%)   0.0013 (  0.0%)   0.0707 (  0.0%)   0.0712 (  0.0%)  Verify generated machine code
   0.0688 (  0.0%)   0.0007 (  0.0%)   0.0695 (  0.0%)   0.0708 (  0.0%)  Verify generated machine code
   0.0685 (  0.0%)   0.0006 (  0.0%)   0.0690 (  0.0%)   0.0691 (  0.0%)  Verify generated machine code
   0.0381 (  0.0%)   0.0009 (  0.0%)   0.0390 (  0.0%)   0.0688 (  0.0%)  Natural Loop Information
   0.0674 (  0.0%)   0.0007 (  0.0%)   0.0681 (  0.0%)   0.0688 (  0.0%)  Verify generated machine code
   0.0677 (  0.0%)   0.0005 (  0.0%)   0.0683 (  0.0%)   0.0683 (  0.0%)  Verify generated machine code
   0.0676 (  0.0%)   0.0006 (  0.0%)   0.0682 (  0.0%)   0.0683 (  0.0%)  Verify generated machine code
   0.0676 (  0.0%)   0.0006 (  0.0%)   0.0682 (  0.0%)   0.0682 (  0.0%)  Verify generated machine code
   0.0672 (  0.0%)   0.0003 (  0.0%)   0.0675 (  0.0%)   0.0676 (  0.0%)  Verify generated machine code
   0.0661 (  0.0%)   0.0006 (  0.0%)   0.0667 (  0.0%)   0.0670 (  0.0%)  Verify generated machine code
   0.0661 (  0.0%)   0.0006 (  0.0%)   0.0668 (  0.0%)   0.0670 (  0.0%)  Verify generated machine code
   0.0653 (  0.0%)   0.0006 (  0.0%)   0.0658 (  0.0%)   0.0658 (  0.0%)  Verify generated machine code
   0.0640 (  0.0%)   0.0005 (  0.0%)   0.0644 (  0.0%)   0.0644 (  0.0%)  Module Verifier
   0.0407 (  0.0%)   0.0005 (  0.0%)   0.0411 (  0.0%)   0.0634 (  0.0%)  Loop Vectorization
   0.0510 (  0.0%)   0.0007 (  0.0%)   0.0516 (  0.0%)   0.0631 (  0.0%)  Simplify the CFG
   0.0478 (  0.0%)   0.0003 (  0.0%)   0.0481 (  0.0%)   0.0617 (  0.0%)  Simplify the CFG
   0.0591 (  0.0%)   0.0005 (  0.0%)   0.0597 (  0.0%)   0.0613 (  0.0%)  Verify generated machine code
   0.0551 (  0.0%)   0.0007 (  0.0%)   0.0558 (  0.0%)   0.0608 (  0.0%)  Aggressive Dead Code Elimination
   0.0576 (  0.0%)   0.0008 (  0.0%)   0.0584 (  0.0%)   0.0607 (  0.0%)  Verify generated machine code
   0.0600 (  0.0%)   0.0003 (  0.0%)   0.0603 (  0.0%)   0.0603 (  0.0%)  Verify generated machine code
   0.0570 (  0.0%)   0.0005 (  0.0%)   0.0575 (  0.0%)   0.0595 (  0.0%)  Verify generated machine code
   0.0581 (  0.0%)   0.0003 (  0.0%)   0.0584 (  0.0%)   0.0584 (  0.0%)  Verify generated machine code
   0.0571 (  0.0%)   0.0003 (  0.0%)   0.0574 (  0.0%)   0.0574 (  0.0%)  Machine code sinking
   0.0568 (  0.0%)   0.0003 (  0.0%)   0.0571 (  0.0%)   0.0571 (  0.0%)  Module Verifier
   0.0545 (  0.0%)   0.0020 (  0.0%)   0.0565 (  0.0%)   0.0566 (  0.0%)  AArch64 Assembly Printer
   0.0387 (  0.0%)   0.0009 (  0.0%)   0.0396 (  0.0%)   0.0564 (  0.0%)  Natural Loop Information
   0.0424 (  0.0%)   0.0007 (  0.0%)   0.0431 (  0.0%)   0.0562 (  0.0%)  Early CSE
   0.0492 (  0.0%)   0.0003 (  0.0%)   0.0496 (  0.0%)   0.0545 (  0.0%)  Remove redundant instructions
   0.0358 (  0.0%)   0.0009 (  0.0%)   0.0368 (  0.0%)   0.0522 (  0.0%)  Natural Loop Information
   0.0388 (  0.0%)   0.0026 (  0.0%)   0.0414 (  0.0%)   0.0516 (  0.0%)  Interprocedural Sparse Conditional Constant Propagation
   0.0506 (  0.0%)   0.0003 (  0.0%)   0.0509 (  0.0%)   0.0510 (  0.0%)  Eliminate PHI nodes for register allocation
   0.0463 (  0.0%)   0.0007 (  0.0%)   0.0470 (  0.0%)   0.0475 (  0.0%)  MachineDominator Tree Construction
   0.0385 (  0.0%)   0.0007 (  0.0%)   0.0392 (  0.0%)   0.0462 (  0.0%)  Bit-Tracking Dead Code Elimination
   0.0431 (  0.0%)   0.0008 (  0.0%)   0.0438 (  0.0%)   0.0439 (  0.0%)  Merge disjoint stack slots
   0.0371 (  0.0%)   0.0005 (  0.0%)   0.0376 (  0.0%)   0.0438 (  0.0%)  Loop Load Elimination
   0.0329 (  0.0%)   0.0006 (  0.0%)   0.0335 (  0.0%)   0.0432 (  0.0%)  Simplify the CFG
   0.0239 (  0.0%)   0.0003 (  0.0%)   0.0242 (  0.0%)   0.0430 (  0.0%)  Branch Probability Analysis
   0.0419 (  0.0%)   0.0003 (  0.0%)   0.0422 (  0.0%)   0.0421 (  0.0%)  Simplify the CFG
   0.0328 (  0.0%)   0.0009 (  0.0%)   0.0337 (  0.0%)   0.0417 (  0.0%)  Post-Dominator Tree Construction
   0.0336 (  0.0%)   0.0008 (  0.0%)   0.0344 (  0.0%)   0.0414 (  0.0%)  Simplify the CFG
   0.0147 (  0.0%)   0.0012 (  0.0%)   0.0158 (  0.0%)   0.0368 (  0.0%)  Recognize loop idioms
   0.0290 (  0.0%)   0.0006 (  0.0%)   0.0296 (  0.0%)   0.0365 (  0.0%)  Branch Probability Analysis
   0.0268 (  0.0%)   0.0006 (  0.0%)   0.0275 (  0.0%)   0.0365 (  0.0%)  SROA
   0.0196 (  0.0%)   0.0027 (  0.0%)   0.0223 (  0.0%)   0.0359 (  0.0%)  Scalar Evolution Analysis
   0.0333 (  0.0%)   0.0003 (  0.0%)   0.0336 (  0.0%)   0.0336 (  0.0%)  Virtual Register Rewriter
   0.0332 (  0.0%)   0.0003 (  0.0%)   0.0335 (  0.0%)   0.0335 (  0.0%)  Machine Loop Invariant Code Motion
   0.0322 (  0.0%)   0.0003 (  0.0%)   0.0325 (  0.0%)   0.0325 (  0.0%)  AArch64 load / store optimization pass
   0.0318 (  0.0%)   0.0003 (  0.0%)   0.0321 (  0.0%)   0.0321 (  0.0%)  Machine InstCombiner
   0.0313 (  0.0%)   0.0004 (  0.0%)   0.0317 (  0.0%)   0.0316 (  0.0%)  MachineDominator Tree Construction
   0.0201 (  0.0%)   0.0007 (  0.0%)   0.0208 (  0.0%)   0.0315 (  0.0%)  Block Frequency Analysis
   0.0297 (  0.0%)   0.0003 (  0.0%)   0.0301 (  0.0%)   0.0300 (  0.0%)  Machine Common Subexpression Elimination
   0.0144 (  0.0%)   0.0029 (  0.0%)   0.0173 (  0.0%)   0.0293 (  0.0%)  Canonicalize natural loops
   0.0186 (  0.0%)   0.0106 (  0.1%)   0.0292 (  0.0%)   0.0288 (  0.0%)  Insert stack protectors
   0.0212 (  0.0%)   0.0004 (  0.0%)   0.0216 (  0.0%)   0.0288 (  0.0%)  Branch Probability Analysis
   0.0283 (  0.0%)   0.0003 (  0.0%)   0.0286 (  0.0%)   0.0285 (  0.0%)  Peephole Optimizations
   0.0097 (  0.0%)   0.0128 (  0.1%)   0.0224 (  0.0%)   0.0278 (  0.0%)  Globals Alias Analysis
   0.0187 (  0.0%)   0.0006 (  0.0%)   0.0193 (  0.0%)   0.0265 (  0.0%)  Deduce function attributes
   0.0216 (  0.0%)   0.0006 (  0.0%)   0.0222 (  0.0%)   0.0260 (  0.0%)  Dominator Tree Construction
   0.0088 (  0.0%)   0.0154 (  0.1%)   0.0242 (  0.0%)   0.0244 (  0.0%)  Machine Module Information
   0.0163 (  0.0%)   0.0006 (  0.0%)   0.0169 (  0.0%)   0.0241 (  0.0%)  Tail Call Elimination
   0.0233 (  0.0%)   0.0004 (  0.0%)   0.0237 (  0.0%)   0.0237 (  0.0%)  Natural Loop Information
   0.0220 (  0.0%)   0.0004 (  0.0%)   0.0224 (  0.0%)   0.0224 (  0.0%)  Natural Loop Information
   0.0205 (  0.0%)   0.0001 (  0.0%)   0.0206 (  0.0%)   0.0220 (  0.0%)  Global Variable Optimizer
   0.0211 (  0.0%)   0.0003 (  0.0%)   0.0214 (  0.0%)   0.0213 (  0.0%)  MachinePostDominator Tree Construction
   0.0125 (  0.0%)   0.0024 (  0.0%)   0.0149 (  0.0%)   0.0212 (  0.0%)  Canonicalize natural loops
   0.0133 (  0.0%)   0.0006 (  0.0%)   0.0139 (  0.0%)   0.0211 (  0.0%)  Natural Loop Information
   0.0203 (  0.0%)   0.0002 (  0.0%)   0.0206 (  0.0%)   0.0206 (  0.0%)  Prologue/Epilogue Insertion & Frame Finalization
   0.0189 (  0.0%)   0.0005 (  0.0%)   0.0194 (  0.0%)   0.0204 (  0.0%)  Branch Probability Analysis
   0.0082 (  0.0%)   0.0120 (  0.1%)   0.0202 (  0.0%)   0.0202 (  0.0%)  Function Alias Analysis Results
   0.0146 (  0.0%)   0.0008 (  0.0%)   0.0155 (  0.0%)   0.0199 (  0.0%)  Natural Loop Information
   0.0194 (  0.0%)   0.0003 (  0.0%)   0.0196 (  0.0%)   0.0199 (  0.0%)  MachinePostDominator Tree Construction
   0.0190 (  0.0%)   0.0007 (  0.0%)   0.0197 (  0.0%)   0.0197 (  0.0%)  MachinePostDominator Tree Construction
   0.0138 (  0.0%)   0.0006 (  0.0%)   0.0143 (  0.0%)   0.0194 (  0.0%)  Natural Loop Information
   0.0181 (  0.0%)   0.0003 (  0.0%)   0.0183 (  0.0%)   0.0184 (  0.0%)  Branch Probability Analysis
   0.0142 (  0.0%)   0.0004 (  0.0%)   0.0146 (  0.0%)   0.0182 (  0.0%)  Block Frequency Analysis
   0.0178 (  0.0%)   0.0002 (  0.0%)   0.0180 (  0.0%)   0.0180 (  0.0%)  Machine Copy Propagation Pass
   0.0131 (  0.0%)   0.0009 (  0.0%)   0.0140 (  0.0%)   0.0180 (  0.0%)  Lazy Value Information Analysis
   0.0175 (  0.0%)   0.0005 (  0.0%)   0.0179 (  0.0%)   0.0179 (  0.0%)  Branch Probability Analysis
   0.0172 (  0.0%)   0.0003 (  0.0%)   0.0174 (  0.0%)   0.0174 (  0.0%)  MachineDominator Tree Construction
   0.0163 (  0.0%)   0.0003 (  0.0%)   0.0166 (  0.0%)   0.0169 (  0.0%)  MachineDominator Tree Construction
   0.0157 (  0.0%)   0.0006 (  0.0%)   0.0163 (  0.0%)   0.0163 (  0.0%)  MemCpy Optimization
   0.0126 (  0.0%)   0.0037 (  0.0%)   0.0163 (  0.0%)   0.0163 (  0.0%)  Free MachineFunction
   0.0160 (  0.0%)   0.0003 (  0.0%)   0.0163 (  0.0%)   0.0163 (  0.0%)  Two-Address instruction pass
   0.0111 (  0.0%)   0.0008 (  0.0%)   0.0118 (  0.0%)   0.0158 (  0.0%)  Canonicalize natural loops
   0.0155 (  0.0%)   0.0003 (  0.0%)   0.0158 (  0.0%)   0.0157 (  0.0%)  Dominator Tree Construction
   0.0147 (  0.0%)   0.0003 (  0.0%)   0.0150 (  0.0%)   0.0152 (  0.0%)  MachineDominator Tree Construction
   0.0134 (  0.0%)   0.0008 (  0.0%)   0.0143 (  0.0%)   0.0152 (  0.0%)  Lazy Value Information Analysis
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0148 (  0.0%)  CallGraph Construction
   0.0118 (  0.0%)   0.0006 (  0.0%)   0.0124 (  0.0%)   0.0148 (  0.0%)  Block Frequency Analysis
   0.0026 (  0.0%)   0.0006 (  0.0%)   0.0032 (  0.0%)   0.0145 (  0.0%)  Scalar Evolution Analysis
   0.0142 (  0.0%)   0.0003 (  0.0%)   0.0144 (  0.0%)   0.0144 (  0.0%)  AArch64 Conditional Compares
   0.0130 (  0.0%)   0.0008 (  0.0%)   0.0138 (  0.0%)   0.0138 (  0.0%)  Machine Block Frequency Analysis
   0.0125 (  0.0%)   0.0012 (  0.0%)   0.0138 (  0.0%)   0.0138 (  0.0%)  Canonicalize natural loops
   0.0082 (  0.0%)   0.0003 (  0.0%)   0.0085 (  0.0%)   0.0133 (  0.0%)  Float to int
   0.0095 (  0.0%)   0.0033 (  0.0%)   0.0128 (  0.0%)   0.0128 (  0.0%)  Machine Natural Loop Construction
   0.0080 (  0.0%)   0.0000 (  0.0%)   0.0080 (  0.0%)   0.0127 (  0.0%)  CallGraph Construction
   0.0102 (  0.0%)   0.0025 (  0.0%)   0.0127 (  0.0%)   0.0126 (  0.0%)  Slot index numbering
   0.0119 (  0.0%)   0.0002 (  0.0%)   0.0122 (  0.0%)   0.0122 (  0.0%)  Block Frequency Analysis
   0.0114 (  0.0%)   0.0003 (  0.0%)   0.0118 (  0.0%)   0.0118 (  0.0%)  Block Frequency Analysis
   0.0051 (  0.0%)   0.0033 (  0.0%)   0.0084 (  0.0%)   0.0116 (  0.0%)  Function Alias Analysis Results
   0.0111 (  0.0%)   0.0003 (  0.0%)   0.0115 (  0.0%)   0.0115 (  0.0%)  Machine Block Frequency Analysis
   0.0110 (  0.0%)   0.0004 (  0.0%)   0.0114 (  0.0%)   0.0114 (  0.0%)  Machine Block Frequency Analysis
   0.0105 (  0.0%)   0.0007 (  0.0%)   0.0112 (  0.0%)   0.0112 (  0.0%)  Rotate Loops
   0.0098 (  0.0%)   0.0011 (  0.0%)   0.0109 (  0.0%)   0.0109 (  0.0%)  Delete dead loops
   0.0105 (  0.0%)   0.0002 (  0.0%)   0.0108 (  0.0%)   0.0108 (  0.0%)  Machine Block Frequency Analysis
   0.0103 (  0.0%)   0.0003 (  0.0%)   0.0106 (  0.0%)   0.0106 (  0.0%)  Remove dead machine instructions
   0.0102 (  0.0%)   0.0003 (  0.0%)   0.0105 (  0.0%)   0.0105 (  0.0%)  Constant Hoisting
   0.0095 (  0.0%)   0.0010 (  0.0%)   0.0105 (  0.0%)   0.0104 (  0.0%)  Scalar Evolution Analysis
   0.0100 (  0.0%)   0.0003 (  0.0%)   0.0103 (  0.0%)   0.0104 (  0.0%)  AArch64 Collect Linker Optimization Hint (LOH)
   0.0094 (  0.0%)   0.0008 (  0.0%)   0.0102 (  0.0%)   0.0102 (  0.0%)  Machine Natural Loop Construction
   0.0042 (  0.0%)   0.0021 (  0.0%)   0.0063 (  0.0%)   0.0099 (  0.0%)  Function Alias Analysis Results
   0.0045 (  0.0%)   0.0003 (  0.0%)   0.0047 (  0.0%)   0.0099 (  0.0%)  Canonicalize natural loops
   0.0076 (  0.0%)   0.0021 (  0.0%)   0.0097 (  0.0%)   0.0098 (  0.0%)  Scalar Evolution Analysis
   0.0090 (  0.0%)   0.0007 (  0.0%)   0.0097 (  0.0%)   0.0097 (  0.0%)  Canonicalize natural loops
   0.0087 (  0.0%)   0.0008 (  0.0%)   0.0095 (  0.0%)   0.0095 (  0.0%)  Slot index numbering
   0.0036 (  0.0%)   0.0005 (  0.0%)   0.0041 (  0.0%)   0.0094 (  0.0%)  Function Alias Analysis Results
   0.0091 (  0.0%)   0.0004 (  0.0%)   0.0095 (  0.0%)   0.0094 (  0.0%)  Remove dead machine instructions
   0.0088 (  0.0%)   0.0002 (  0.0%)   0.0090 (  0.0%)   0.0090 (  0.0%)  Machine Natural Loop Construction
   0.0087 (  0.0%)   0.0002 (  0.0%)   0.0090 (  0.0%)   0.0090 (  0.0%)  Machine Loop Invariant Code Motion
   0.0085 (  0.0%)   0.0006 (  0.0%)   0.0090 (  0.0%)   0.0090 (  0.0%)  Canonicalize natural loops
   0.0087 (  0.0%)   0.0002 (  0.0%)   0.0090 (  0.0%)   0.0090 (  0.0%)  Natural Loop Information
   0.0027 (  0.0%)   0.0018 (  0.0%)   0.0045 (  0.0%)   0.0089 (  0.0%)  Function Alias Analysis Results
   0.0089 (  0.0%)   0.0000 (  0.0%)   0.0089 (  0.0%)   0.0089 (  0.0%)  Global Variable Optimizer
   0.0040 (  0.0%)   0.0030 (  0.0%)   0.0070 (  0.0%)   0.0088 (  0.0%)  Globals Alias Analysis
   0.0083 (  0.0%)   0.0003 (  0.0%)   0.0085 (  0.0%)   0.0088 (  0.0%)  Machine Natural Loop Construction
   0.0080 (  0.0%)   0.0005 (  0.0%)   0.0085 (  0.0%)   0.0087 (  0.0%)  Machine Natural Loop Construction
   0.0082 (  0.0%)   0.0002 (  0.0%)   0.0085 (  0.0%)   0.0085 (  0.0%)  Loop Data Prefetch
   0.0079 (  0.0%)   0.0006 (  0.0%)   0.0085 (  0.0%)   0.0084 (  0.0%)  Promote 'by reference' arguments to scalars
   0.0037 (  0.0%)   0.0046 (  0.0%)   0.0083 (  0.0%)   0.0084 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0078 (  0.0%)   0.0004 (  0.0%)   0.0082 (  0.0%)   0.0082 (  0.0%)  Machine Natural Loop Construction
   0.0074 (  0.0%)   0.0003 (  0.0%)   0.0077 (  0.0%)   0.0078 (  0.0%)  Machine Natural Loop Construction
   0.0045 (  0.0%)   0.0000 (  0.0%)   0.0045 (  0.0%)   0.0078 (  0.0%)  Dead Argument Elimination
   0.0075 (  0.0%)   0.0002 (  0.0%)   0.0077 (  0.0%)   0.0077 (  0.0%)  AArch64 Dead register definitions
   0.0027 (  0.0%)   0.0026 (  0.0%)   0.0054 (  0.0%)   0.0074 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0073 (  0.0%)   0.0000 (  0.0%)   0.0074 (  0.0%)   0.0074 (  0.0%)  AArch64 Promote Constant
   0.0049 (  0.0%)   0.0006 (  0.0%)   0.0055 (  0.0%)   0.0073 (  0.0%)  Memory Dependence Analysis
   0.0066 (  0.0%)   0.0006 (  0.0%)   0.0073 (  0.0%)   0.0072 (  0.0%)  Scalar Evolution Analysis
   0.0016 (  0.0%)   0.0024 (  0.0%)   0.0040 (  0.0%)   0.0072 (  0.0%)  LCSSA Verifier
   0.0011 (  0.0%)   0.0008 (  0.0%)   0.0019 (  0.0%)   0.0072 (  0.0%)  Lazy Branch Probability Analysis
   0.0069 (  0.0%)   0.0002 (  0.0%)   0.0071 (  0.0%)   0.0071 (  0.0%)  Expand Atomic instructions
   0.0069 (  0.0%)   0.0002 (  0.0%)   0.0071 (  0.0%)   0.0071 (  0.0%)  Early If-Conversion
   0.0066 (  0.0%)   0.0002 (  0.0%)   0.0069 (  0.0%)   0.0069 (  0.0%)  AArch64 pseudo instruction expansion pass
   0.0065 (  0.0%)   0.0003 (  0.0%)   0.0068 (  0.0%)   0.0068 (  0.0%)  Branch relaxation pass
   0.0035 (  0.0%)   0.0005 (  0.0%)   0.0040 (  0.0%)   0.0065 (  0.0%)  Function Alias Analysis Results
   0.0055 (  0.0%)   0.0010 (  0.0%)   0.0065 (  0.0%)   0.0065 (  0.0%)  Scalar Evolution Analysis
   0.0059 (  0.0%)   0.0003 (  0.0%)   0.0062 (  0.0%)   0.0062 (  0.0%)  Stack Slot Coloring
   0.0050 (  0.0%)   0.0007 (  0.0%)   0.0058 (  0.0%)   0.0060 (  0.0%)  Canonicalize natural loops
   0.0035 (  0.0%)   0.0005 (  0.0%)   0.0040 (  0.0%)   0.0060 (  0.0%)  Call-site splitting
   0.0054 (  0.0%)   0.0006 (  0.0%)   0.0060 (  0.0%)   0.0060 (  0.0%)  MergedLoadStoreMotion
   0.0057 (  0.0%)   0.0002 (  0.0%)   0.0059 (  0.0%)   0.0059 (  0.0%)  AArch64 Condition Optimizer
   0.0037 (  0.0%)   0.0008 (  0.0%)   0.0045 (  0.0%)   0.0059 (  0.0%)  Function Alias Analysis Results
   0.0055 (  0.0%)   0.0002 (  0.0%)   0.0057 (  0.0%)   0.0057 (  0.0%)  Expand memcmp() to load/stores
   0.0035 (  0.0%)   0.0010 (  0.0%)   0.0046 (  0.0%)   0.0056 (  0.0%)  Function Alias Analysis Results
   0.0013 (  0.0%)   0.0005 (  0.0%)   0.0018 (  0.0%)   0.0054 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0047 (  0.0%)   0.0006 (  0.0%)   0.0053 (  0.0%)   0.0053 (  0.0%)  Remove unused exception handling info
   0.0049 (  0.0%)   0.0002 (  0.0%)   0.0051 (  0.0%)   0.0051 (  0.0%)  Tail Duplication
   0.0007 (  0.0%)   0.0006 (  0.0%)   0.0012 (  0.0%)   0.0051 (  0.0%)  Optimization Remark Emitter
   0.0043 (  0.0%)   0.0007 (  0.0%)   0.0050 (  0.0%)   0.0050 (  0.0%)  Canonicalize natural loops
   0.0022 (  0.0%)   0.0027 (  0.0%)   0.0049 (  0.0%)   0.0049 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0045 (  0.0%)   0.0002 (  0.0%)   0.0047 (  0.0%)   0.0048 (  0.0%)  Remove unreachable machine basic blocks
   0.0047 (  0.0%)   0.0000 (  0.0%)   0.0047 (  0.0%)   0.0047 (  0.0%)  CallGraph Construction
   0.0019 (  0.0%)   0.0029 (  0.0%)   0.0048 (  0.0%)   0.0047 (  0.0%)  LCSSA Verifier
   0.0007 (  0.0%)   0.0008 (  0.0%)   0.0014 (  0.0%)   0.0047 (  0.0%)  Lazy Block Frequency Analysis
   0.0044 (  0.0%)   0.0002 (  0.0%)   0.0047 (  0.0%)   0.0046 (  0.0%)  Shrink Wrapping analysis
   0.0040 (  0.0%)   0.0005 (  0.0%)   0.0045 (  0.0%)   0.0046 (  0.0%)  PGOMemOPSize
   0.0043 (  0.0%)   0.0002 (  0.0%)   0.0045 (  0.0%)   0.0045 (  0.0%)  Post-RA pseudo instruction expansion pass
   0.0036 (  0.0%)   0.0009 (  0.0%)   0.0044 (  0.0%)   0.0044 (  0.0%)  Function Alias Analysis Results
   0.0004 (  0.0%)   0.0005 (  0.0%)   0.0009 (  0.0%)   0.0043 (  0.0%)  Lazy Block Frequency Analysis
   0.0038 (  0.0%)   0.0003 (  0.0%)   0.0041 (  0.0%)   0.0042 (  0.0%)  AArch64 Redundant Copy Elimination
   0.0036 (  0.0%)   0.0005 (  0.0%)   0.0041 (  0.0%)   0.0042 (  0.0%)  Function Alias Analysis Results
   0.0033 (  0.0%)   0.0008 (  0.0%)   0.0042 (  0.0%)   0.0041 (  0.0%)  Function Alias Analysis Results
   0.0036 (  0.0%)   0.0005 (  0.0%)   0.0041 (  0.0%)   0.0041 (  0.0%)  Function Alias Analysis Results
   0.0038 (  0.0%)   0.0002 (  0.0%)   0.0041 (  0.0%)   0.0041 (  0.0%)  Scalar Evolution Analysis
   0.0035 (  0.0%)   0.0005 (  0.0%)   0.0040 (  0.0%)   0.0040 (  0.0%)  Function Alias Analysis Results
   0.0038 (  0.0%)   0.0003 (  0.0%)   0.0041 (  0.0%)   0.0040 (  0.0%)  Hoist/decompose integer division and remainder
   0.0038 (  0.0%)   0.0002 (  0.0%)   0.0040 (  0.0%)   0.0040 (  0.0%)  Tail Duplication
   0.0037 (  0.0%)   0.0003 (  0.0%)   0.0040 (  0.0%)   0.0040 (  0.0%)  Scalar Evolution Analysis
   0.0037 (  0.0%)   0.0002 (  0.0%)   0.0039 (  0.0%)   0.0039 (  0.0%)  Partially inline calls to library functions
   0.0034 (  0.0%)   0.0005 (  0.0%)   0.0039 (  0.0%)   0.0039 (  0.0%)  Function Alias Analysis Results
   0.0033 (  0.0%)   0.0002 (  0.0%)   0.0036 (  0.0%)   0.0035 (  0.0%)  AArch64 Store Pair Suppression
   0.0028 (  0.0%)   0.0007 (  0.0%)   0.0035 (  0.0%)   0.0035 (  0.0%)  Scalar Evolution Analysis
   0.0026 (  0.0%)   0.0008 (  0.0%)   0.0034 (  0.0%)   0.0034 (  0.0%)  Function Alias Analysis Results
   0.0032 (  0.0%)   0.0002 (  0.0%)   0.0034 (  0.0%)   0.0034 (  0.0%)  Remove unreachable blocks from the CFG
   0.0030 (  0.0%)   0.0002 (  0.0%)   0.0032 (  0.0%)   0.0032 (  0.0%)  Scalar Evolution Analysis
   0.0025 (  0.0%)   0.0007 (  0.0%)   0.0032 (  0.0%)   0.0032 (  0.0%)  Loop Sink
   0.0025 (  0.0%)   0.0006 (  0.0%)   0.0030 (  0.0%)   0.0030 (  0.0%)  Scalar Evolution Analysis
   0.0028 (  0.0%)   0.0002 (  0.0%)   0.0030 (  0.0%)   0.0030 (  0.0%)  Interleaved Access Pass
   0.0027 (  0.0%)   0.0002 (  0.0%)   0.0030 (  0.0%)   0.0030 (  0.0%)  AArch64 Conditional Branch Tuning
   0.0023 (  0.0%)   0.0006 (  0.0%)   0.0029 (  0.0%)   0.0029 (  0.0%)  Speculatively execute instructions if target has divergent branches
   0.0026 (  0.0%)   0.0002 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Process Implicit Definitions
   0.0017 (  0.0%)   0.0011 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0017 (  0.0%)   0.0010 (  0.0%)   0.0028 (  0.0%)   0.0028 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0022 (  0.0%)   0.0005 (  0.0%)   0.0026 (  0.0%)   0.0027 (  0.0%)  Machine Trace Metrics
   0.0027 (  0.0%)   0.0000 (  0.0%)   0.0027 (  0.0%)   0.0027 (  0.0%)  Dead Global Elimination
   0.0024 (  0.0%)   0.0002 (  0.0%)   0.0026 (  0.0%)   0.0027 (  0.0%)  Local Stack Slot Allocation
   0.0024 (  0.0%)   0.0002 (  0.0%)   0.0026 (  0.0%)   0.0026 (  0.0%)  Optimize machine instruction PHIs
   0.0022 (  0.0%)   0.0003 (  0.0%)   0.0025 (  0.0%)   0.0026 (  0.0%)  Debug Variable Analysis
   0.0018 (  0.0%)   0.0008 (  0.0%)   0.0026 (  0.0%)   0.0026 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0023 (  0.0%)   0.0002 (  0.0%)   0.0026 (  0.0%)   0.0026 (  0.0%)  Expand ISel Pseudo-instructions
   0.0020 (  0.0%)   0.0006 (  0.0%)   0.0025 (  0.0%)   0.0026 (  0.0%)  Function Alias Analysis Results
   0.0026 (  0.0%)   0.0000 (  0.0%)   0.0026 (  0.0%)   0.0026 (  0.0%)  Dead Global Elimination
   0.0015 (  0.0%)   0.0010 (  0.0%)   0.0026 (  0.0%)   0.0025 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0023 (  0.0%)   0.0002 (  0.0%)   0.0025 (  0.0%)   0.0025 (  0.0%)  Loop Access Analysis
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0025 (  0.0%)  Loop Access Analysis
   0.0014 (  0.0%)   0.0011 (  0.0%)   0.0025 (  0.0%)   0.0024 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0022 (  0.0%)   0.0002 (  0.0%)   0.0024 (  0.0%)   0.0024 (  0.0%)  Scalarize Masked Memory Intrinsics
   0.0016 (  0.0%)   0.0006 (  0.0%)   0.0021 (  0.0%)   0.0023 (  0.0%)  Function Alias Analysis Results
   0.0015 (  0.0%)   0.0008 (  0.0%)   0.0023 (  0.0%)   0.0023 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0020 (  0.0%)   0.0002 (  0.0%)   0.0022 (  0.0%)   0.0022 (  0.0%)  Expand reduction intrinsics
   0.0009 (  0.0%)   0.0013 (  0.0%)   0.0022 (  0.0%)   0.0021 (  0.0%)  LCSSA Verifier
   0.0015 (  0.0%)   0.0006 (  0.0%)   0.0021 (  0.0%)   0.0021 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0013 (  0.0%)   0.0008 (  0.0%)   0.0021 (  0.0%)   0.0021 (  0.0%)  Memory Dependence Analysis
   0.0015 (  0.0%)   0.0006 (  0.0%)   0.0021 (  0.0%)   0.0020 (  0.0%)  Demanded bits analysis
   0.0017 (  0.0%)   0.0003 (  0.0%)   0.0020 (  0.0%)   0.0020 (  0.0%)  Function Alias Analysis Results
   0.0016 (  0.0%)   0.0003 (  0.0%)   0.0020 (  0.0%)   0.0020 (  0.0%)  Bundle Machine CFG Edges
   0.0017 (  0.0%)   0.0002 (  0.0%)   0.0019 (  0.0%)   0.0020 (  0.0%)  Function Alias Analysis Results
   0.0014 (  0.0%)   0.0005 (  0.0%)   0.0020 (  0.0%)   0.0019 (  0.0%)  Lower 'expect' Intrinsics
   0.0011 (  0.0%)   0.0009 (  0.0%)   0.0019 (  0.0%)   0.0019 (  0.0%)  Lazy Branch Probability Analysis
   0.0010 (  0.0%)   0.0008 (  0.0%)   0.0018 (  0.0%)   0.0018 (  0.0%)  Lazy Branch Probability Analysis
   0.0015 (  0.0%)   0.0003 (  0.0%)   0.0018 (  0.0%)   0.0018 (  0.0%)  Function Alias Analysis Results
   0.0010 (  0.0%)   0.0009 (  0.0%)   0.0018 (  0.0%)   0.0018 (  0.0%)  Lazy Branch Probability Analysis
   0.0008 (  0.0%)   0.0010 (  0.0%)   0.0018 (  0.0%)   0.0018 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0012 (  0.0%)   0.0005 (  0.0%)   0.0017 (  0.0%)   0.0017 (  0.0%)  Promote Memory to Register
   0.0008 (  0.0%)   0.0005 (  0.0%)   0.0013 (  0.0%)   0.0017 (  0.0%)  Lazy Branch Probability Analysis
   0.0008 (  0.0%)   0.0008 (  0.0%)   0.0016 (  0.0%)   0.0017 (  0.0%)  Lazy Branch Probability Analysis
   0.0009 (  0.0%)   0.0008 (  0.0%)   0.0017 (  0.0%)   0.0016 (  0.0%)  Memory Dependence Analysis
   0.0013 (  0.0%)   0.0002 (  0.0%)   0.0015 (  0.0%)   0.0016 (  0.0%)  Exception handling preparation
   0.0011 (  0.0%)   0.0004 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  Spill Code Placement Analysis
   0.0009 (  0.0%)   0.0006 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  Machine Trace Metrics
   0.0007 (  0.0%)   0.0008 (  0.0%)   0.0015 (  0.0%)   0.0015 (  0.0%)  Lazy Block Frequency Analysis
   0.0006 (  0.0%)   0.0008 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  Lazy Block Frequency Analysis
   0.0012 (  0.0%)   0.0002 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  Loop Distribution
   0.0010 (  0.0%)   0.0003 (  0.0%)   0.0014 (  0.0%)   0.0014 (  0.0%)  Live Register Matrix
   0.0008 (  0.0%)   0.0006 (  0.0%)   0.0014 (  0.0%)   0.0013 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0006 (  0.0%)   0.0008 (  0.0%)   0.0014 (  0.0%)   0.0013 (  0.0%)  Lazy Block Frequency Analysis
   0.0006 (  0.0%)   0.0008 (  0.0%)   0.0013 (  0.0%)   0.0013 (  0.0%)  Lazy Block Frequency Analysis
   0.0006 (  0.0%)   0.0006 (  0.0%)   0.0012 (  0.0%)   0.0013 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0010 (  0.0%)   0.0002 (  0.0%)   0.0013 (  0.0%)   0.0012 (  0.0%)  AArch64 SIMD instructions optimization pass
   0.0005 (  0.0%)   0.0007 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  LCSSA Verifier
   0.0007 (  0.0%)   0.0006 (  0.0%)   0.0013 (  0.0%)   0.0012 (  0.0%)  Optimization Remark Emitter
   0.0005 (  0.0%)   0.0007 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  LCSSA Verifier
   0.0007 (  0.0%)   0.0005 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Lazy Branch Probability Analysis
   0.0005 (  0.0%)   0.0007 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  LCSSA Verifier
   0.0005 (  0.0%)   0.0007 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  LCSSA Verifier
   0.0007 (  0.0%)   0.0005 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Optimization Remark Emitter
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Optimization Remark Emitter
   0.0006 (  0.0%)   0.0006 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Optimization Remark Emitter
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0012 (  0.0%)  Lazy Branch Probability Analysis
   0.0008 (  0.0%)   0.0004 (  0.0%)   0.0012 (  0.0%)   0.0012 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0007 (  0.0%)   0.0005 (  0.0%)   0.0012 (  0.0%)   0.0011 (  0.0%)  Live Stack Slot Analysis
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Lazy Branch Probability Analysis
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Optimization Remark Emitter
   0.0006 (  0.0%)   0.0006 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Lazy Block Frequency Analysis
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (pre inlining)
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Optimization Remark Emitter
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0006 (  0.0%)   0.0005 (  0.0%)   0.0011 (  0.0%)   0.0011 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0005 (  0.0%)   0.0005 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)  Optimization Remark Emitter
   0.0007 (  0.0%)   0.0003 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0008 (  0.0%)   0.0002 (  0.0%)   0.0010 (  0.0%)   0.0010 (  0.0%)  Function Alias Analysis Results
   0.0004 (  0.0%)   0.0005 (  0.0%)   0.0009 (  0.0%)   0.0010 (  0.0%)  Lazy Block Frequency Analysis
   0.0005 (  0.0%)   0.0003 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)  Lazy Branch Probability Analysis
   0.0005 (  0.0%)   0.0004 (  0.0%)   0.0009 (  0.0%)   0.0009 (  0.0%)  Lazy Branch Probability Analysis
   0.0004 (  0.0%)   0.0005 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  Lazy Block Frequency Analysis
   0.0005 (  0.0%)   0.0003 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  Lazy Branch Probability Analysis
   0.0006 (  0.0%)   0.0002 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  Falkor HW Prefetch Fix
   0.0005 (  0.0%)   0.0003 (  0.0%)   0.0008 (  0.0%)   0.0008 (  0.0%)  Virtual Register Map
   0.0005 (  0.0%)   0.0002 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Alignment from assumptions
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Lazy Branch Probability Analysis
   0.0003 (  0.0%)   0.0004 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Lazy Block Frequency Analysis
   0.0003 (  0.0%)   0.0003 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0003 (  0.0%)   0.0004 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Lazy Block Frequency Analysis
   0.0003 (  0.0%)   0.0004 (  0.0%)   0.0007 (  0.0%)   0.0007 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Basic Alias Analysis (stateless AA impl)
   0.0006 (  0.0%)   0.0000 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Deduce function attributes in RPO
   0.0003 (  0.0%)   0.0004 (  0.0%)   0.0007 (  0.0%)   0.0006 (  0.0%)  Lazy Block Frequency Analysis
   0.0003 (  0.0%)   0.0003 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Lazy Branch Probability Analysis
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Machine Optimization Remark Emitter
   0.0004 (  0.0%)   0.0003 (  0.0%)   0.0007 (  0.0%)   0.0006 (  0.0%)  Loop Access Analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Optimization Remark Emitter
   0.0004 (  0.0%)   0.0003 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Machine Optimization Remark Emitter
   0.0004 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Implement the 'patchable-function' attribute
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Optimization Remark Emitter
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Optimization Remark Emitter
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Insert XRay ops
   0.0003 (  0.0%)   0.0003 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Lazy Branch Probability Analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)  Demanded bits analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Optimization Remark Emitter
   0.0003 (  0.0%)   0.0003 (  0.0%)   0.0006 (  0.0%)   0.0006 (  0.0%)  Optimization Remark Emitter
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0006 (  0.0%)  Insert fentry calls
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0005 (  0.0%)  Machine Optimization Remark Emitter
   0.0005 (  0.0%)   0.0000 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Merge Duplicate Global Constants
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  StackMap Liveness Analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0006 (  0.0%)   0.0005 (  0.0%)  PostRA Machine Instruction Scheduler
   0.0003 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Live DEBUG_VALUE analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Optimization Remark Emitter
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Falkor HW Prefetch Fix Late Phase
   0.0003 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Optimization Remark Emitter
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  A57 FP Anti-dependency breaker
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Detect Dead Lanes
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Demanded bits analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Optimization Remark Emitter
   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Lazy Block Frequency Analysis
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Lazy Machine Block Frequency Analysis
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Instrument function entry/exit with calls to e.g. mcount() (post inlining)
   0.0003 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Rename Disconnected Subregister Components
   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Contiguously Lay Out Funclets
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Safe Stack instrumentation pass
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0005 (  0.0%)  Analyze Machine Code For Garbage Collection
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0005 (  0.0%)   0.0004 (  0.0%)  Lazy Block Frequency Analysis
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  Merge internal globals
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  Lazy Block Frequency Analysis
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  Shadow Stack GC Lowering
   0.0002 (  0.0%)   0.0002 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  Lower Garbage Collection Instructions
   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  Target Library Information
   0.0002 (  0.0%)   0.0003 (  0.0%)   0.0004 (  0.0%)   0.0004 (  0.0%)  ObjC ARC contraction
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Dominator Tree Construction
   0.0001 (  0.0%)   0.0001 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Create Garbage Collector Module Metadata
   0.0002 (  0.0%)   0.0000 (  0.0%)   0.0002 (  0.0%)   0.0002 (  0.0%)  Infer set function attributes
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Assumption Cache Tracker
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Assumption Cache Tracker
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Strip Unused Function Prototypes
   0.0001 (  0.0%)   0.0000 (  0.0%)   0.0001 (  0.0%)   0.0001 (  0.0%)  Eliminate Available Externally Globals
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Pre-ISel Intrinsic Lowering
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Force set function attributes
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Rewrite Symbols
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Pass Configuration
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Transform Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Profile summary info
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  A No-Op Barrier Pass
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Scoped NoAlias Alias Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Target Library Information
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Machine Branch Probability Analysis
   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)   0.0000 (  0.0%)  Type-Based Alias Analysis
  6369.3137 (100.0%)  14.3417 (100.0%)  6383.6554 (100.0%)  7826.9875 (100.0%)  Total

===-------------------------------------------------------------------------===
                          Clang front-end time report
===-------------------------------------------------------------------------===
  Total Execution Time: 6385.4041 seconds (7829.0221 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  6370.6518 (100.0%)  14.7523 (100.0%)  6385.4041 (100.0%)  7829.0221 (100.0%)  Clang front-end timer
  6370.6518 (100.0%)  14.7523 (100.0%)  6385.4041 (100.0%)  7829.0221 (100.0%)  Total


real	130m29.070s
user	106m10.663s

@MatzeB wrote:
Some files like sqlite3.c felt like they would never finish...

sqlite3.c has some rediculously large functions where they emit 3000+ basic blocks per function. It doesn't surprise me you see the worst timing on them and it's even worse if I perform the DT updates incrementally instead of a large batch. On my test machine, an Intel(R) Xeon(R) CPU E5-4660 v4 @ 2.20GHz it takes test-suite's sqlite3.c around 7-8 minutes to compile when using a Debug compiler. Release is much faster, somewhere around 15-25 seconds.

fhahn added a subscriber: fhahn.Mar 19 2018, 8:42 AM
fhahn added inline comments.
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2610 ↗(On Diff #129694)

I think DuplicateInstructionsInSplitBetween can easily update the DT itself: D44629. Is it better to do the update incrementally here or make DuplicateInstructionsInSplitBetween take care of it (via splitBlock)? Letting functions update the DT themselves seems to be what similar functions do.

We do not have access to the plain DT here though, which makes that slightly trickier and that's why I want to check what's better before doing any changes here :)

brzycki added inline comments.Mar 19 2018, 9:08 AM
llvm/trunk/lib/Transforms/Scalar/JumpThreading.cpp
2610 ↗(On Diff #129694)

Hi @fhahn , when I initially did the work to preserve DT across JumpThreading I tried to do two things:

  1. Minimize the changes to external calls.
  2. Update the DT real-time.

Compile time testing showed that updating the DT real-time was just too slow. This not-so-great compromise is the deferred class we have today. Any time we have to call DDT-flush() incurs a compile-time penalty which is why you can find external calls that take DT as a parameter but I didn't use it because it'd require me to flush first. I'd also like to avoid having multiple functions that take both a DT and a DDT parameter. It complexifies function signatures, positional nullPtr placement, and more branchy checks inside these routines. I know that @kuhar is leading a GSoC topic to unify the DT/PDT/DDT interface and it may be worth following that before spending any of your time trying to hack on this mess...

You are more than welcome to see if you can make changes here, just make sure test-suite ctmark isn't adversely impacted when you do. :)

+1. Real time updating in the presence of how JT operates right now is too
costly.
The lazy analysis it uses require DT be updated at all times, forever
(instead of say, once an iteration).
There is no way to update the domtree 100000 times a pass and make it fast
:)
What we should be targeting is O(10)-O(100) updates.

fhahn added a comment.Mar 19 2018, 1:33 PM

+1. Real time updating in the presence of how JT operates right now is too
costly.
The lazy analysis it uses require DT be updated at all times, forever
(instead of say, once an iteration).
There is no way to update the domtree 100000 times a pass and make it fast
:)
What we should be targeting is O(10)-O(100) updates.

Thanks for the feedback. As DuplicateInstructionsInSplitBetween does not take a dominator tree as arugment, it might be better to pass in a DeferredDominance object to update the DT to start with and do the update incrementally. I am just trying to figure out the best way going forward, as I need to update the DT after DuplicateInstructionsInSplitBetween in D43173 too :)

This is a bit late on this, but we're currently trying to upgrade LLVM in Rust and we've run into a performance regression bisected to this change, if those here familiar with this commit could help investigate that it'd be much appreciated!