Previously this was assuming piontsToConstantMemory implies
dereferenceable.
Details
Diff Detail
Event Timeline
LGTM
Looks like the test changes are coming from analyzing "undef" and the AMDGPU addrspace 4. Those changes seem reasonable. (I'm a little surprised only amdgpu tests required changes.)
Matt, did you see my comments on https://reviews.llvm.org/rGbb70b5d40652207c0bd3d385def10ef3ef1d45b4? The summary is that clang codegen time has significantly increased on some of our code after this patch, pushing the compile times far beyond the time limit. Here's a reduced test case:
This is how the output of clang --target=x86_64--linux-gnu -O1 -c q.cc -ftime-report changes after this commit:
diff -u clang-llvmorg-16-init-4458-gc812b4a1d895.time clang-llvmorg-16-init-4459-gbb70b5d40652.time --- clang-llvmorg-16-init-4458-gc812b4a1d895.time 2022-09-19 13:41:47.202346523 +0200 +++ clang-llvmorg-16-init-4459-gbb70b5d40652.time 2022-09-19 13:41:27.602138226 +0200 @@ -1,99 +1,99 @@ ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== - Total Execution Time: 2.9992 seconds (2.9992 wall clock) + Total Execution Time: 2.9692 seconds (2.9693 wall clock) ... ===-------------------------------------------------------------------------=== Miscellaneous Ungrouped Timers ===-------------------------------------------------------------------------=== ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- - 3.2616 ( 97.5%) 0.4235 ( 98.4%) 3.6851 ( 97.6%) 3.6854 ( 97.6%) Code Generation Time - 0.0839 ( 2.5%) 0.0070 ( 1.6%) 0.0909 ( 2.4%) 0.0909 ( 2.4%) LLVM IR Generation Time - 3.3454 (100.0%) 0.4305 (100.0%) 3.7759 (100.0%) 3.7762 (100.0%) Total + 14.4539 ( 99.4%) 0.3530 ( 97.1%) 14.8068 ( 99.4%) 14.8079 ( 99.4%) Code Generation Time + 0.0818 ( 0.6%) 0.0105 ( 2.9%) 0.0924 ( 0.6%) 0.0924 ( 0.6%) LLVM IR Generation Time + 14.5357 (100.0%) 0.3635 (100.0%) 14.8992 (100.0%) 14.9002 (100.0%) Total ... ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== - Total Execution Time: 2.0628 seconds (2.0630 wall clock) + Total Execution Time: 13.1945 seconds (13.1955 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- - 0.8411 ( 51.2%) 0.2109 ( 50.3%) 1.0520 ( 51.0%) 1.0521 ( 51.0%) X86 DAG->DAG Instruction Selection ... + 12.0451 ( 93.7%) 0.1443 ( 42.5%) 12.1893 ( 92.4%) 12.1902 ( 92.4%) X86 DAG->DAG Instruction Selection ... ===-------------------------------------------------------------------------=== Clang front-end time report ===-------------------------------------------------------------------------=== - Total Execution Time: 3.9323 seconds (3.9326 wall clock) + Total Execution Time: 15.0529 seconds (15.0539 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- - 3.4902 (100.0%) 0.4421 (100.0%) 3.9323 (100.0%) 3.9326 (100.0%) Clang front-end timer - 3.4902 (100.0%) 0.4421 (100.0%) 3.9323 (100.0%) 3.9326 (100.0%) Total + 14.6894 (100.0%) 0.3635 (100.0%) 15.0529 (100.0%) 15.0539 (100.0%) Clang front-end timer + 14.6894 (100.0%) 0.3635 (100.0%) 15.0529 (100.0%) 15.0539 (100.0%) Total
Clang's execution time profile looks like this:
- 99.51% 0.00% clang-llvmorg-1 clang-llvmorg-16-init-4459-gbb70b5d40652 [.] clang_main ▒ - clang_main ▒ - 99.50% clang::driver::Driver::ExecuteCompilation ▒ clang::driver::Compilation::ExecuteJobs ▒ clang::driver::Compilation::ExecuteCommand ▒ clang::driver::CC1Command::Execute ▒ llvm::CrashRecoveryContext::RunSafely ▒ llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> >*, bool*) const::$_1> ▒ ExecuteCC1Tool ▒ - cc1_main ▒ - 99.47% clang::ExecuteCompilerInvocation ▒ - 99.47% clang::CompilerInstance::ExecuteAction ▒ - 99.46% clang::FrontendAction::Execute ▒ - clang::ParseAST ▒ - 97.83% clang::BackendConsumer::HandleTranslationUnit ▒ - 97.83% clang::EmitBackendOutput ▒ - 87.02% llvm::legacy::PassManagerImpl::run ▒ - 86.96% llvm::FPPassManager::runOnModule ▒ - llvm::FPPassManager::runOnFunction ▒ - 85.90% llvm::MachineFunctionPass::runOnFunction ▒ - 80.45% (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction ▒ - llvm::SelectionDAGISel::runOnMachineFunction ▒ - 80.30% llvm::SelectionDAGISel::SelectAllBasicBlocks ▒ - 73.55% llvm::SelectionDAGISel::SelectBasicBlock ▒ - 73.53% llvm::SelectionDAGBuilder::visit ▒ - 72.72% llvm::SelectionDAGBuilder::visitLoad ▒ - 72.30% llvm::isDereferenceableAndAlignedPointer ▒ - 72.29% isDereferenceableAndAlignedPointer ▒ - 63.27% isDereferenceableAndAlignedPointer ▒ llvm::getKnowledgeForValue ▒ 9.00% llvm::getKnowledgeForValue ▒ + 6.70% llvm::SelectionDAGISel::CodeGenAndEmitDAG ▒ + 1.88% llvm::X86AsmPrinter::runOnMachineFunction ▒ + 10.79% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline ▒ + 1.03% clang::Parser::ParseTopLevelDecl ▒ + 0.60% clang::BackendConsumer::HandleTopLevelDecl ▒
Before the commit it looked like this:
- 98.11% 0.00% clang-llvmorg-1 clang-llvmorg-16-init-4458-gc812b4a1d895 [.] clang_main ▒ - clang_main ▒ - 98.07% clang::driver::Driver::ExecuteCompilation ▒ clang::driver::Compilation::ExecuteJobs ▒ clang::driver::Compilation::ExecuteCommand ▒ clang::driver::CC1Command::Execute ▒ llvm::CrashRecoveryContext::RunSafely ▒ llvm::function_ref<void ()>::callback_fn<clang::driver::CC1Command::Execute(llvm::ArrayRef<llvm::Optional<llvm::StringRef> >, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char> >*, bool*) const::$_1> ▒ ExecuteCC1Tool ▒ - cc1_main ▒ - 97.97% clang::ExecuteCompilerInvocation ▒ - clang::CompilerInstance::ExecuteAction ▒ - 97.95% clang::FrontendAction::Execute ▒ - clang::ParseAST ▒ - 91.46% clang::BackendConsumer::HandleTranslationUnit ▒ - 91.46% clang::EmitBackendOutput ▒ - 50.30% llvm::legacy::PassManagerImpl::run ▒ - 50.11% llvm::FPPassManager::runOnModule ▒ - llvm::FPPassManager::runOnFunction ▒ - 46.10% llvm::MachineFunctionPass::runOnFunction ▒ - 25.78% (anonymous namespace)::X86DAGToDAGISel::runOnMachineFunction ▒ - llvm::SelectionDAGISel::runOnMachineFunction ▒ - 25.27% llvm::SelectionDAGISel::SelectAllBasicBlocks ▒ - 22.34% llvm::SelectionDAGISel::CodeGenAndEmitDAG ▒ - 8.32% llvm::SelectionDAG::Combine ▒ - 5.12% (anonymous namespace)::DAGCombiner::combine ▒ + 1.24% (anonymous namespace)::DAGCombiner::visitADD ▒ 0.75% (anonymous namespace)::DAGCombiner::visitOR ▒ 0.65% (anonymous namespace)::DAGCombiner::visitAND ▒ 0.52% llvm::X86TargetLowering::PerformDAGCombine ▒ 0.51% (anonymous namespace)::DAGCombiner::visitXOR ▒ 1.19% (anonymous namespace)::DAGCombiner::AddToWorklist ▒ + 3.13% llvm::Timer::stopTimer ▒ + 2.73% llvm::Timer::startTimer ▒ + 2.55% llvm::SelectionDAGISel::DoInstructionSelection ▒ + 1.67% (anonymous namespace)::ScheduleDAGRRList::Schedule ▒ + 1.31% llvm::ScheduleDAGSDNodes::EmitSchedule ▒ + 0.88% llvm::SelectionDAG::LegalizeTypes ▒ 0.74% llvm::SelectionDAG::Legalize ▒ + 2.73% llvm::SelectionDAGISel::SelectBasicBlock ▒ - 7.05% llvm::X86AsmPrinter::runOnMachineFunction ▒ - 7.05% llvm::AsmPrinter::emitFunctionBody ▒ - 3.24% llvm::Timer::startTimer ▒ - 3.06% __GI___getrusage ▒ + 2.52% entry_SYSCALL_64 ▒ 0.53% syscall_return_via_sysret ▒ + 3.08% llvm::Timer::stopTimer ▒ 0.51% llvm::X86AsmPrinter::emitInstruction ▒ + 1.23% (anonymous namespace)::MachineScheduler::runOnMachineFunction ▒ + 1.04% (anonymous namespace)::DeadMachineInstructionElim::runOnMachineFunction ▒ + 1.00% llvm::MachineDominatorTree::runOnMachineFunction ▒ + 0.83% (anonymous namespace)::RegisterCoalescer::runOnMachineFunction ▒ 0.74% llvm::RAGreedy::runOnMachineFunction ▒ 0.70% (anonymous namespace)::MachineCSE::runOnMachineFunction ▒ + 0.63% llvm::LiveIntervals::runOnMachineFunction ▒
This is currently blocking our internal release. If you don't see an obvious fix, could you revert the patch while working on this?
Thanks!
I was expecting this to be a runaway case in isDereferenceableAndAlignedPointer, but this function just has over 27000 loads in it and they all take a similar amount of time