Page MenuHomePhabricator

alexfh (Alexander Kornienko)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 10 2012, 10:35 AM (514 w, 6 d)

Recent Activity

Today

alexfh updated subscribers of D114171: [SLP]Improve reductions analysis and emission, part 1..

And finally a reduced test case:
$ cat q.cc
struct S {

template<int N>
bool f() const;

};
int f(const S& s) {

int d = 0;
if (s.f<1>()) ++d;
// 4998 lines skipped
if (s.f<5000>()) ++d;
if (d == 0) {
  return s.f<-1>();
}
return 0;

}
$ ./clang-11004 --target=x86_64--linux-gnu -O1 -fslp-vectorize -c -xc++

q.cc -o q.o -ftime-report

... Pass execution timing report ...

-------------------------------------------------------------------------

Total Execution Time: 8.7398 seconds (8.8367 wall clock)
Mon, May 23, 2:16 AM · Restricted Project, Restricted Project

Fri, May 20

alexfh added a comment to D114171: [SLP]Improve reductions analysis and emission, part 1..

Aha, I committed 4e271fc49517362a9333371fb1ab7e865d4c1b0e earlier today, which should improve it. Try to update the compiler.

Thanks! This patch makes SLP vectorizer pass much faster on the problematic input, but it doesn't completely compensate the slowdown introduced here. This is how -ftime-report looks like after 4e271fc49517362a9333371fb1ab7e865d4c1b0e:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 308.9466 seconds (308.9519 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  132.7263 ( 45.9%)   0.0001 (  0.0%)  132.7264 ( 43.0%)  132.7351 ( 43.0%)  SLPVectorizerPass
  48.0866 ( 16.6%)   5.7199 ( 29.3%)  53.8065 ( 17.4%)  53.8123 ( 17.4%)  ModuleInlinerWrapperPass
  47.3402 ( 16.4%)   5.4604 ( 27.9%)  52.8007 ( 17.1%)  52.8060 ( 17.1%)  DevirtSCCRepeatedPass
  22.2568 (  7.7%)   0.3222 (  1.6%)  22.5790 (  7.3%)  22.5785 (  7.3%)  GVNPass
  11.3194 (  3.9%)   1.0507 (  5.4%)  12.3701 (  4.0%)  12.3520 (  4.0%)  InstCombinePass
   4.8834 (  1.7%)   1.2548 (  6.4%)   6.1382 (  2.0%)   6.1350 (  2.0%)  InlinerPass

And this is how it looked at 38d0df557706940af5d7110bdf662590449f8a60 (the closest commit before 7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd where I could compile clang):

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 181.4693 seconds (181.4723 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  48.1675 ( 29.7%)   5.6593 ( 29.3%)  53.8269 ( 29.7%)  53.8332 ( 29.7%)  ModuleInlinerWrapperPass
  47.4270 ( 29.2%)   5.3983 ( 28.0%)  52.8253 ( 29.1%)  52.8315 ( 29.1%)  DevirtSCCRepeatedPass
  22.1909 ( 13.7%)   0.2916 (  1.5%)  22.4824 ( 12.4%)  22.4825 ( 12.4%)  GVNPass
  11.1989 (  6.9%)   1.0292 (  5.3%)  12.2281 (  6.7%)  12.2204 (  6.7%)  InstCombinePass
   4.9943 (  3.1%)   1.1928 (  6.2%)   6.1871 (  3.4%)   6.1842 (  3.4%)  InlinerPass
   5.2197 (  3.2%)   0.0000 (  0.0%)   5.2198 (  2.9%)   5.2201 (  2.9%)  SLPVectorizerPass

Note the 5s -> 130s jump in time spent in SLPVectorizerPass. I'll grab the profile for the updated binary, but it will take some time. And yes, still trying to reduce the test case.

The perf profile should help, thanks

Looking at this I wonder whether SmallPtrSet is not that small? :)

-   86.47%     0.00%  clang    clang               [.] cc1_main                                                                                                                                                                                     ▒
   - cc1_main                                                                                                                                                                                                                                       ▒
      - 86.47% clang::ExecuteCompilerInvocation                                                                                                                                                                                                     ▒
         - 86.47% clang::CompilerInstance::ExecuteAction                                                                                                                                                                                            ▒
            - 86.47% clang::FrontendAction::Execute                                                                                                                                                                                                 ▒
               - 86.47% clang::ParseAST                                                                                                                                                                                                             ▒
                  - 80.62% clang::BackendConsumer::HandleTranslationUnit                                                                                                                                                                            ▒
                     - 80.17% clang::EmitBackendOutput                                                                                                                                                                                              ▒
                        - 63.55% (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline                                                                                                                                                 ▒
                           - 63.55% llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run                                                                                                                                       ▒
                              - 46.20% llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module>>::run                                                                  ▒
                                 - 46.17% llvm::ModuleToFunctionPassAdaptor::run                                                                                                                                                                    ▒
                                    - 45.90% llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                 ▒
                                       - 45.89% llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function>>::run                                                                                                                       ▒
                                          - 43.23% llvm::detail::PassModel<llvm::Function, llvm::SLPVectorizerPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function>>::run                                                            ▒
                                               llvm::SLPVectorizerPass::run                                                                                                                                                                         ▒
                                             - llvm::SLPVectorizerPass::runImpl                                                                                                                                                                     ▒
                                                - 43.20% llvm::SLPVectorizerPass::vectorizeChainsInBlock                                                                                                                                            ▒
                                                   - 42.84% llvm::SLPVectorizerPass::vectorizeSimpleInstructions                                                                                                                                    ▒
                                                      - 41.02% llvm::SLPVectorizerPass::vectorizeRootInstruction                                                                                                                                    ▒
                                                         - 40.93% (anonymous namespace)::HorizontalReduction::tryToReduce                                                                                                                           ▒
                                                            - 18.19% llvm::slpvectorizer::BoUpSLP::buildTree                                                                                                                                        ▒
                                                                 9.89% llvm::SmallPtrSetImplBase::insert_imp_big                                                                                                                                    ▒
                                                               - 6.24% llvm::slpvectorizer::BoUpSLP::buildTree_rec                                                                                                                                  ▒
                                                                  - 3.23% llvm::slpvectorizer::BoUpSLP::buildTree_rec(llvm::ArrayRef<llvm::Value*>, unsigned int, llvm::slpvectorizer::BoUpSLP::EdgeInfo const&)::$_32::operator()                  ▒
                                                                     + 1.61% llvm::DenseMapBase<llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseMapPair<llvm::Value*, unsigned int> >, llvm::V▒
                                                                    0.85% getSameOpcode                                                                                                                                                             ▒
                                                                  - 0.78% llvm::slpvectorizer::BoUpSLP::newTreeEntry                                                                                                                                ▒
                                                                       0.72% llvm::SmallPtrSetImplBase::insert_imp_big                                                                                                                              ▒
                                                                 0.98% llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                     ▒
                                                              8.57% llvm::SmallPtrSetImplBase::FindBucketFor                                                                                                                                        ▒
                                                            + 4.71% llvm::MapVector<llvm::Value*, llvm::SmallVector<llvm::Instruction*, 2u>, llvm::DenseMap<llvm::Value*, unsigned int, llvm::DenseMapInfo<llvm::Value*, void>, llvm::detail::DenseM▒
                                                              3.23% (anonymous namespace)::HorizontalReduction::tryToReduce(llvm::slpvectorizer::BoUpSLP&, llvm::TargetTransformInfo*)::{lambda(bool)#1}::operator()                                ▒
                                                              0.70% memset                                                                                                                                                                          ▒
                                                      + 1.82% tryToVectorizeSequence<llvm::Value>                                                                                                                                                   ▒

Ok, thanks, will improve it on Monday. We can avoid multiple creation of SmallPtrSet and I'll check for other possible optimizations too.

Fri, May 20, 5:33 PM · Restricted Project, Restricted Project
alexfh added a comment to D114171: [SLP]Improve reductions analysis and emission, part 1..

Aha, I committed 4e271fc49517362a9333371fb1ab7e865d4c1b0e earlier today, which should improve it. Try to update the compiler.

Thanks! This patch makes SLP vectorizer pass much faster on the problematic input, but it doesn't completely compensate the slowdown introduced here. This is how -ftime-report looks like after 4e271fc49517362a9333371fb1ab7e865d4c1b0e:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 308.9466 seconds (308.9519 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  132.7263 ( 45.9%)   0.0001 (  0.0%)  132.7264 ( 43.0%)  132.7351 ( 43.0%)  SLPVectorizerPass
  48.0866 ( 16.6%)   5.7199 ( 29.3%)  53.8065 ( 17.4%)  53.8123 ( 17.4%)  ModuleInlinerWrapperPass
  47.3402 ( 16.4%)   5.4604 ( 27.9%)  52.8007 ( 17.1%)  52.8060 ( 17.1%)  DevirtSCCRepeatedPass
  22.2568 (  7.7%)   0.3222 (  1.6%)  22.5790 (  7.3%)  22.5785 (  7.3%)  GVNPass
  11.3194 (  3.9%)   1.0507 (  5.4%)  12.3701 (  4.0%)  12.3520 (  4.0%)  InstCombinePass
   4.8834 (  1.7%)   1.2548 (  6.4%)   6.1382 (  2.0%)   6.1350 (  2.0%)  InlinerPass

And this is how it looked at 38d0df557706940af5d7110bdf662590449f8a60 (the closest commit before 7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd where I could compile clang):

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 181.4693 seconds (181.4723 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  48.1675 ( 29.7%)   5.6593 ( 29.3%)  53.8269 ( 29.7%)  53.8332 ( 29.7%)  ModuleInlinerWrapperPass
  47.4270 ( 29.2%)   5.3983 ( 28.0%)  52.8253 ( 29.1%)  52.8315 ( 29.1%)  DevirtSCCRepeatedPass
  22.1909 ( 13.7%)   0.2916 (  1.5%)  22.4824 ( 12.4%)  22.4825 ( 12.4%)  GVNPass
  11.1989 (  6.9%)   1.0292 (  5.3%)  12.2281 (  6.7%)  12.2204 (  6.7%)  InstCombinePass
   4.9943 (  3.1%)   1.1928 (  6.2%)   6.1871 (  3.4%)   6.1842 (  3.4%)  InlinerPass
   5.2197 (  3.2%)   0.0000 (  0.0%)   5.2198 (  2.9%)   5.2201 (  2.9%)  SLPVectorizerPass

Note the 5s -> 130s jump in time spent in SLPVectorizerPass. I'll grab the profile for the updated binary, but it will take some time. And yes, still trying to reduce the test case.

The perf profile should help, thanks

Fri, May 20, 5:22 PM · Restricted Project, Restricted Project
alexfh added a comment to D114171: [SLP]Improve reductions analysis and emission, part 1..

Aha, I committed 4e271fc49517362a9333371fb1ab7e865d4c1b0e earlier today, which should improve it. Try to update the compiler.

Fri, May 20, 4:58 PM · Restricted Project, Restricted Project
alexfh added a comment to D114171: [SLP]Improve reductions analysis and emission, part 1..

No reduced test case, but one observation is that most samples in the slow version are around this (preexisting code):

Fri, May 20, 10:36 AM · Restricted Project, Restricted Project

Thu, May 19

alexfh added a comment to D114171: [SLP]Improve reductions analysis and emission, part 1..

I've noticed that this patch (7ea03f0b4e4ec5d91d48ba2976f5adc299089ffd) significantly increases compile time for a certain type of code (functions with a large number of branches on top level). In extreme cases (generated code with really large functions) this leads to tens of minutes of time spent in SLP vectorizer:

===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 1895.1082 seconds (1895.2216 wall clock)
Thu, May 19, 8:13 PM · Restricted Project, Restricted Project

Mon, May 9

alexfh added a comment to D123720: [VPlan] Replace use of needsVectorIV with VPlan user check..

The last recommit of this patch (8b48223447311af8b3022697dd58858e1ce6975f "Recommit ""[VPlan] Remove uneeded needsVectorIV check.""") causes the compiler to crash on some of our code. A reduced test case:

$ cat q.cc
int a[2];
int b;
bool c() {
  bool d;
  for (int e; e; e--) {
    d = !d;
    b += d ?: a[e];
  }
  return b;
}
$ ./clang -cc1 -triple x86_64--linux-gnu -emit-obj -O2 -vectorize-loops q.cc
Stack dump:
0.      Program arguments: ./clang -cc1 -triple x86_64--linux-gnu -emit-obj -O2 -vectorize-loops q.cc
1.      <eof> parser at end of file
2.      Optimizer
 #0 0x000055cb5104bfa8 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (./clang+0x6e4bfa8)
 #1 0x000055cb5104c3c5 SignalHandler(int) (./clang+0x6e4c3c5)
 #2 0x00007f464c15b750 __restore_rt (/usr/grte/v5/lib64/libpthread.so.0+0x15750)
 #3 0x000055cb4d2f15fc llvm::VPlanTransforms::optimizeInductions(llvm::VPlan&, llvm::ScalarEvolution&) (./clang+0x30f15fc)
 #4 0x000055cb4d2eae74 llvm::LoopVectorizationPlanner::buildVPlanWithVPRecipes(llvm::VFRange&, llvm::SmallPtrSetImpl<llvm::Instruction*>&, llvm::MapVector<llvm::Instruction*, llvm::Instruction*, llvm::DenseMap<llvm::Instruction*, unsigned int, llvm::DenseMapInfo<llvm::Instruction*, void>, llvm::detail::DenseMapPair<llvm::Instruction*, unsigned int> >, std::__u::vector<std::__u::pair<llvm::Instruction*, llvm::Instruction*>, std::__u::allocator<std::__u::pair<llvm::Instruction*, llvm::Instruction*> > > > const&) (./clang+0x30eae74)
 #5 0x000055cb4d2e903b llvm::LoopVectorizationPlanner::buildVPlansWithVPRecipes(llvm::ElementCount, llvm::ElementCount) (./clang+0x30e903b)
 #6 0x000055cb4e865734 llvm::LoopVectorizePass::processLoop(llvm::Loop*) (.cold) (./clang+0x4665734)
 #7 0x000055cb4d143dcf llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo&, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AAResults&, llvm::AssumptionCache&, std::__u::function<llvm::LoopAccessInfo const& (llvm::Loop&)>&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) (./clang+0x2f43dcf)
 #8 0x000055cb4d143b7d llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x2f43b7d)
 #9 0x000055cb4d143832 llvm::detail::PassModel<llvm::Function, llvm::LoopVectorizePass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x2f43832)
#10 0x000055cb4db331fd llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x39331fd)
#11 0x000055cb4db32db2 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) (./clang+0x3932db2)
#12 0x000055cb4d95fd16 llvm::ModuleToFunctionPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (./clang+0x375fd16)
#13 0x000055cb4d3cca92 llvm::detail::PassModel<llvm::Module, llvm::ModuleToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (./clang+0x31cca92)
#14 0x000055cb4d1fdbe4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) (./clang+0x2ffdbe4)
#15 0x000055cb4d233efd (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&) (./clang+0x3033efd)
#16 0x000055cb4d22cee5 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >) (./clang+0x302cee5)
#17 0x000055cb4dd2e253 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (./clang+0x3b2e253)
#18 0x000055cb4d226af4 clang::ParseAST(clang::Sema&, bool, bool) (./clang+0x3026af4)
#19 0x000055cb4d308654 clang::FrontendAction::Execute() (./clang+0x3108654)
#20 0x000055cb4d308472 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (./clang+0x3108472)
#21 0x000055cb4d3082e5 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (./clang+0x31082e5)
#22 0x000055cb4dcf3fca cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (./clang+0x3af3fca)
#23 0x000055cb4d3cab81 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&) (./clang+0x31cab81)
#24 0x000055cb4dbdddf2 main (./clang+0x39dddf2)
#25 0x00007f464bfe98d3 __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x628d3)
#26 0x000055cb4d3ca9fa _start (./clang+0x31ca9fa)
Segmentation fault
Mon, May 9, 9:13 AM · Restricted Project, Restricted Project

Mar 24 2022

alexfh added a comment to D121306: [Sema] add warning for tautological FP compare with literal.

What's the right fix for this warning, using a literal of the same floating point type? Did you consider adding a fixit hint to the diagnostic?

Mar 24 2022, 10:08 AM · Restricted Project, Restricted Project

Feb 10 2022

alexfh added a comment to D115955: [SLP]Alternate vectorization for cmp instructions..

So far I have a stack trace from a release clang build:

 #0 0x000055efacbf6c29 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int)
 #1 0x000055efacbf7065 SignalHandler(int)
 #2 0x00007f1f25022750 __restore_rt
 #3 0x000055efa973210b (anonymous namespace)::LazyValueInfoImpl::getEdgeValue(llvm::Value*, llvm::BasicBlock*, llvm::BasicBlock*, llvm::Instruction*)
 #4 0x000055efa972db80 (anonymous namespace)::LazyValueInfoImpl::solve() (.llvm.1834014330660040286)
 #5 0x000055efa972a2dd (anonymous namespace)::LazyValueInfoImpl::getValueInBlock(llvm::Value*, llvm::BasicBlock*, llvm::Instruction*)
 #6 0x000055efa978882b runImpl(llvm::Function&, llvm::LazyValueInfo*, llvm::DominatorTree*, llvm::SimplifyQuery const&)
 #7 0x000055efa9787755 llvm::CorrelatedValuePropagationPass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
 #8 0x000055efa97873b2 llvm::detail::PassModel<llvm::Function, llvm::CorrelatedValuePropagationPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
 #9 0x000055efa983d69a llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
#10 0x000055efa983d2b2 llvm::detail::PassModel<llvm::Function, llvm::PassManager<llvm::Function, llvm::AnalysisManager<llvm::Function> >, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Function> >::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&)
#11 0x000055efa999414e llvm::CGSCCToFunctionPassAdaptor::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#12 0x000055efa9993b32 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CGSCCToFunctionPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#13 0x000055efa9940a64 llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) 
#14 0x000055efa99406f2 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#15 0x000055efa987de69 llvm::DevirtSCCRepeatedPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)
#16 0x000055efa987dc52 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::DevirtSCCRepeatedPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) 
#17 0x000055efa98934a0 llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#18 0x000055efa919ddb2 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#19 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#20 0x000055efa8f49e4a llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#21 0x000055efa8f49c92 llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#22 0x000055efa8f4a1c4 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#23 0x000055efa8f8c260 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream>>&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&)
#24 0x000055efa8f84bb2 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >) 
#25 0x000055efa9a4b8d1 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&)
#26 0x000055efa8f7e180 clang::ParseAST(clang::Sema&, bool, bool)
#27 0x000055efa90338f5 clang::FrontendAction::Execute()
#28 0x000055efa903372e clang::CompilerInstance::ExecuteAction(clang::FrontendAction&)
#29 0x000055efa90335a1 clang::ExecuteCompilerInvocation(clang::CompilerInstance*)
#30 0x000055efa9a15d58 cc1_main(llvm::ArrayRef<char const*>, char const*, void*)
#31 0x000055efa9112f7e ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&)
#32 0x000055efa98b2a2d main
Feb 10 2022, 8:02 AM · Restricted Project
alexfh added a comment to D115955: [SLP]Alternate vectorization for cmp instructions..

I think, the last commit (ad2a0ccf8f9816fd7805138bbf582117246f5489) still has some problems. We are seeing sporadic clang crashes after this commit. I'm trying to confirm this by comparing this revision with the previous one and trying to get a test case.

Feb 10 2022, 6:22 AM · Restricted Project

Feb 7 2022

alexfh added a reverting change for rGf9e3ca542ec1: [ELF] Move Symbol::needsTlsLd to config->needsTlsLd. NFC: rGec8a693717b1: Revert "[ELF] Move Symbol::needsTlsLd to config->needsTlsLd. NFC".
Feb 7 2022, 10:00 AM
alexfh committed rGec8a693717b1: Revert "[ELF] Move Symbol::needsTlsLd to config->needsTlsLd. NFC" (authored by alexfh).
Revert "[ELF] Move Symbol::needsTlsLd to config->needsTlsLd. NFC"
Feb 7 2022, 10:00 AM

Jan 28 2022

alexfh accepted D117376: Remove reference type when checking const structs.

Looks good. Thanks for the fix!

Jan 28 2022, 5:54 AM · Restricted Project

Jan 27 2022

alexfh added a comment to D116821: [DebugInfo][InstrRef] Move instr-ref controlling flag out of TargetOptions.

I reduced our problematic file to a function with a couple thousand variables of a type containing a few std::function<>s. This can be demonstrated by this synthetic test case (compiled on x86-64 linux with libc++):

#include <functional>
#include <vector>
namespace n {
Jan 27 2022, 6:16 PM · Restricted Project, Restricted Project

Jan 24 2022

alexfh added a comment to D116821: [DebugInfo][InstrRef] Move instr-ref controlling flag out of TargetOptions.

I found one particularly bad input. Here what time -v clang -fno-omit-frame-pointer -fcommon -no-canonical-prefixes -fdiagnostics-show-option -fmessage-length=0 -fno-exceptions -fbracket-depth=768 -fno-strict-aliasing -fmerge-all-constants -flax-vector-conversions=all -funsigned-char -Xclang=-fuse-ctor-homing -fcolor-diagnostics --cuda-host-only -faligned-allocation -fnew-alignment=8 -fshow-overloads=best -mllvm -disable-binop-extract-shuffle -fdebug-types-section -O2 -momit-leaf-frame-pointer -ffp-contract=off --target=x86_64-unknown-linux-gnu -fno-unique-section-names -maes -m64 -mcx16 -msse4.2 -mpclmul -mprefer-vector-width=128 -gmlt -gz=zlib -fdebug-default-version=5 -fsanitize=address,bool,bounds,builtin,float-cast-overflow,integer-divide-by-zero,null,return,returns-nonnull-attribute,shift-exponent,signed-integer-overflow,unreachable,vla-bound -fsanitize-address-globals-dead-stripping -fsanitize-trap=null -fno-sanitize-recover=all -fno-omit-frame-pointer -fsized-deallocation -fPIE -fdirect-access-external-data -ffunction-sections -fdata-sections -pthread -fno-sanitize-trap=all -std=gnu++17 -stdlib=libc++ -Wno-deprecated-declarations -Wno-inconsistent-missing-override -save-stats -c q.cc -o q.o says 1. at rG74db5c8c95e2aed40d288bb2df92eb859b87c827 (with the new behavior disabled) and 2. at rG80532ebb508d0ca62f96df5f253db8caed969397 (right before rG74db5c8c95e2aed40d288bb2df92eb859b87c827):

1
User time (seconds): 190.92
System time (seconds): 2.16
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:13.10
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 4926000
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 27080
Voluntary context switches: 8
Involuntary context switches: 579
Swaps: 0
File system inputs: 0
File system outputs: 54168
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
2
User time (seconds): 255.60
System time (seconds): 8.09
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 4:23.71
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 27272308
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 32213
Voluntary context switches: 8
Involuntary context switches: 779
Swaps: 0
File system inputs: 0
File system outputs: 54168
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Jan 24 2022, 6:40 PM · Restricted Project, Restricted Project

Jan 21 2022

alexfh added a comment to D116821: [DebugInfo][InstrRef] Move instr-ref controlling flag out of TargetOptions.

Hi Jeremy,
after this patch we see significantly increased compile-time memory usage, which makes a large number of targets to fail to compile with the (already quite large) memory limit we have on our build cluster. I'll try to get specific numbers and maybe an example, but it seems like this patch has raised enough concerns to maybe flip the default back while all of these are being addressed?

Jan 21 2022, 6:03 PM · Restricted Project, Restricted Project
alexfh added a comment to D116048: [clang][CodeGen][UBSan] VLA size checking for unsigned integer parameter.

We've started seeing some tests dying with SIGILL (illegal instruction) after this patch. I'm working on a reduced repro, but please take a look, if you can find any issues yourself.

Jan 21 2022, 5:38 PM · Restricted Project
alexfh added a comment to D116048: [clang][CodeGen][UBSan] VLA size checking for unsigned integer parameter.

We've started seeing some tests dying with SIGILL (illegal instruction) after this patch. I'm working on a reduced repro, but please take a look, if you can find any issues yourself.

Jan 21 2022, 8:54 AM · Restricted Project

Jan 17 2022

alexfh added a comment to D49864: [clang-tidy] The script clang-tidy-diff.py doesn't accept 'pass by' options (--).

Do you need help landing the change?

Jan 17 2022, 8:33 AM · Restricted Project, Restricted Project
alexfh accepted D49864: [clang-tidy] The script clang-tidy-diff.py doesn't accept 'pass by' options (--).

Looks good with one suggestion.

Jan 17 2022, 8:29 AM · Restricted Project, Restricted Project
alexfh added a comment to D117460: [clang-tidy][NFC] Reduce map lookups in IncludeSorter.

I wonder what motivated the patch. Is this a performance optimization? If so, do you have any measurements?

Jan 17 2022, 7:26 AM · Restricted Project, Restricted Project

Dec 16 2021

alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

Scratch that. It wasn't a valid experiment at all. Looking further.

Ok, please let me know if there's a regression. Otherwise I'll go ahead with the fix, as the compile-time-tracker shows this to be compile-time neutral (or a tiny bit positive).

Dec 16 2021, 6:59 PM · Restricted Project

Dec 15 2021

alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

The patch fixes the crash, but it looks like the compilation time with the patch grows by a factor of ~7 for one of the problematic files. I'll try to make a cleaner experiment with two compilers built with exactly the same compiler options, but I doubt it will change much.

Dec 15 2021, 10:07 AM · Restricted Project
alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Thanks! I guess, I can come up a couple of distinct test cases that have the same effect (compiler stack overflow). Would you like to have a look at them as well?

The fix is available here: https://github.com/llvm/llvm-project/commit/b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1, in case you want to give it a try against the other test cases.

Dec 15 2021, 9:29 AM · Restricted Project
alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

(general discussion, not necessarily related to this patch)

Dec 15 2021, 8:45 AM · Restricted Project
alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

I took a closer look and confirmed that the patch indeed slightly increases the call stack for the reproducer and this causes the stack overflow.

The issue is roughly the following: we apply the loop guards to the last loop in a large chain of loops. The loop guard itself is the exit condition of an earlier loop. To construct the SCEV, we compute the exit count along one of the code paths. This in turn applies loop guards which is in an earlier predecessor loop. To construct the SCEV, we again need to compute the exit count of this loop, applying loop guards and so on.

This can be fixed by adjusting the order we apply loop guards. At the moment, the guard closest to the starting loop is evaluated first, triggering the large evaluation chain. If we instead apply the earliest guard first no excessive call chain is needed. This seems to have a tiny positive compile-time impact http://llvm-compile-time-tracker.com/compare.php?from=529833377ccdf4381f8bc9961bfa96ec4f5e2eed&to=b07452a70ac22ae4b1f0dbf4b84df3ee44c171a1&stat=instructions

It is not quite NFC, because the order in which guards are applied can impact the result in some cases. I looked into some and so far it looks like this new order makes applying loop guards slightly more effective. I am planning on isolating a test that shows the improved analysis results and commit the fix soonish.

Dec 15 2021, 8:20 AM · Restricted Project

Dec 14 2021

alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

Unfortunately, the burden of mitigating the underlying issue in practice frequently lies with the author of the patch exposing the issue (unless someone else volunteers to do this).

This isn't quite as clear cut as you make it out to be here. Yes, we will frequently fix issues exposed in the process of introducing an unrelated bug. However, the standard is not (and has never been) any reported issue which happens to be exposed. We will certainly fix upstream tests, common workloads, and quickly reported issues, but at some point, the responsibility shifts to the downstream maintainer. That's one of the reasons rapid reporting is so strongly encouraged.

To be clear, I'm not commenting on which this situation might be. I'm just making the general point that this is a lot more complicated than your wording implies.

Dec 14 2021, 9:20 AM · Restricted Project
alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

Thanks for the heads-up, I'll take a look now. But from the stack trace you shared, I doubt that the patch itself is causing the crash, but rather exposes an existing bug in GetMinTrailingZeros/computeKnownBits

Dec 14 2021, 8:42 AM · Restricted Project

Dec 13 2021

alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

A bit cleaner test case:

Dec 13 2021, 6:32 AM · Restricted Project

Dec 12 2021

alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

I observe the problem on Linux on x86-64.

Dec 12 2021, 3:59 PM · Restricted Project
alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

I've created a standalone test case for the issue:

Dec 12 2021, 3:55 PM · Restricted Project

Dec 10 2021

alexfh added a comment to D113578: [SCEV] Apply loop guards when computing max BTC for arbitrary steps..

Hi Florian,

Dec 10 2021, 9:29 AM · Restricted Project
alexfh added reviewers for D115528: Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr": MatzeB, rupprecht.
Dec 10 2021, 8:17 AM · Restricted Project
alexfh accepted D115528: Revert "X86InstrInfo: Support immediates that are +1/-1 different in optimizeCompareInstr".

LG per https://reviews.llvm.org/D110867#3185369

Dec 10 2021, 8:16 AM · Restricted Project

Nov 22 2021

alexfh added a comment to rGc93f93b2e3f2: Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when….

Reduced the problematic file to this:

Nov 22 2021, 3:02 PM
alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

FYI, the last re-commit of this patch introduced another compile time regression. See https://reviews.llvm.org/rGc93f93b2e3f28997f794265089fb8138dd5b5f13 for more information.

Nov 22 2021, 10:41 AM · Restricted Project
alexfh added a comment to rGc93f93b2e3f2: Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when….

Early heads up that this revision causes a large regression in compilation time for some of our internal source files: we are seeing an almost 20x increase in compilation times for some files (from 42s to 728s).

I'm working on a reproducer which I'm going to upload when ready.

Nov 22 2021, 9:24 AM
alexfh updated subscribers of rGc93f93b2e3f2: Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when….
Nov 22 2021, 9:23 AM

Nov 19 2021

alexfh updated subscribers of rGc93f93b2e3f2: Revert "Revert "Recommit "Revert "[CVP] processSwitch: Remove default case when….
Nov 19 2021, 8:34 AM

Nov 17 2021

alexfh requested review of D114111: Remove a useless temporary of a base class type..
Nov 17 2021, 10:57 AM · Restricted Project
alexfh added a comment to D113148: Add new clang-tidy check for string_view(nullptr).

This looks really good. An impressive effort! Just a few changes. Also, please ping Alex to get his opinion on the high level issue.

Nov 17 2021, 10:43 AM · Restricted Project

Nov 15 2021

alexfh added a comment to D113027: [libcxx] Remove nonstandard _FilesystemClock::{to,from}_time_t.

https://reviews.llvm.org/D113430 landed now, which should have implemented the standard methods instead.

There is code out there, which already relies on these non-standard functions and it's going to be broken by this change without a reasonable alternative (migrating the code to C++20 is not a real alternative).

Why is migrating to C++20 not a real alternative? Just curious.

I would suggest we do this:

  1. Add the methods back unless _LIBCPP_ENABLE_REMOVED_FILE_CLOCK_TIME_T_METHODS is defined.
  2. Update the release note to mention that folks can use that macro to re-enable the methods temporarily, but that the option will be removed in two releases so they should move to C++20.

Whoever is broken by this will see the breakage, then read the release notes, define _LIBCPP_ENABLE_REMOVED_FILE_CLOCK_TIME_T_METHODS, and make a note to move to C++20 before they are broken for good.

Nov 15 2021, 6:04 PM · Restricted Project
alexfh added a comment to D113027: [libcxx] Remove nonstandard _FilesystemClock::{to,from}_time_t.

This revision removes the only available methods to convert std::chrono::file_time_type to/from anything else.

The standard says std::chrono::file_time_type::clock should have methods: to_sys() and from_sys() to convert to/from std::chrono::system_time.

Is it possible to revert this change until the standard methods are implemented?

https://reviews.llvm.org/D113430 landed now, which should have implemented the standard methods instead.

Nov 15 2021, 6:41 AM · Restricted Project

Oct 20 2021

alexfh added a comment to D111109: AddGlobalAnnotations for function with or without function body..

Reduced the test further to:

struct k {
  ~k() __attribute__((annotate(""))) {}
};
void m() { k(); }
Oct 20 2021, 4:10 PM · Restricted Project
alexfh added a comment to D111109: AddGlobalAnnotations for function with or without function body..

Reduced test case:

template <class a> void b(a);
template <typename c> class d {
public:
  class e {
  public:
    c *f();
  };
  e *g();
};
template <class> class j;
template <class h, class... i> class j<h(i...)> {
public:
  class k {
  public:
    k(int *);
    ~k();
  };
  int n();
  d<int *> l;
};
template <class h, class... i> int j<h(i...)>::n() {
  auto m = l.g()->f();
  k a(*m);
  b(a);
}
template <class h, class... i>
j<h(i...)>::k::~k() __attribute__((annotate(""))) {}
class o {
  int m_fn4();
  j<int()> p;
};
int o::m_fn4() { p.n(); }
Oct 20 2021, 2:19 PM · Restricted Project

Oct 19 2021

alexfh added a comment to D111109: AddGlobalAnnotations for function with or without function body..

It looks like this patch causes clang to crash on valid code with the following stack trace:

 #0 0x0000560442f12f45 SignalHandler(int)
 #1 0x00007ffac7c789a0 __restore_rt
 #2 0x0000560440083992 llvm::LazyCallGraph::Node::populateSlow()
 #3 0x0000560440082a60 llvm::LazyCallGraph::buildRefSCCs()
 #4 0x000056043fef166d llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) 
 #5 0x000056043f8852f2 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) 
 #6 0x000056043f617204 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
 #7 0x000056043f616e34 llvm::ModuleInlinerWrapperPass::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) 
 #8 0x000056043f616cb2 llvm::detail::PassModel<llvm::Module, llvm::ModuleInlinerWrapperPass, llvm::PreservedAnalyses, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) 
 #9 0x000056043f617204 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module> >::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&)
#10 0x0000560440100459 (anonymous namespace)::EmitAssemblyHelper::RunOptimizationPipeline(clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >&, std::__u::unique_ptr<llvm::ToolOutputFile, std::__u::default_delete<llvm::ToolOutputFile> >&)
#11 0x000056043f63dff7 clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::StringRef, llvm::Module*, clang::BackendAction, std::__u::unique_ptr<llvm::raw_pwrite_stream, std::__u::default_delete<llvm::raw_pwrite_stream> >)
#12 0x00005604400f6703 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) 
#13 0x000056043f638300 clang::ParseAST(clang::Sema&, bool, bool)
#14 0x000056043f6f96d5 clang::FrontendAction::Execute()
#15 0x000056043f6f9515 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) 
#16 0x000056043f6f9381 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) 
#17 0x00005604400b84f8 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) 
#18 0x000056043f7a7a79 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&)
#19 0x000056043ff4b6f4 main
#20 0x00007ffac7ae6bbd __libc_start_main
#21 0x000056043f815679 _start

Working on a reduced test.

Oct 19 2021, 7:21 AM · Restricted Project

Oct 1 2021

alexfh added a comment to D110613: [Taildup] Don't tail-duplicate loop header with multiple successors as its latches.

So this change blocks optimization from D106056, @alexfh would you like to workaround with -mllvm -tail-dup-indirect-size=4?

Oct 1 2021, 4:35 PM · Restricted Project
alexfh removed a reviewer for D110019: [gn build] improve write_cmake_config to be truthy and exception friendly: alexfh.
Oct 1 2021, 4:14 PM · Restricted Project
alexfh added a comment to D49864: [clang-tidy] The script clang-tidy-diff.py doesn't accept 'pass by' options (--).

That makes sense.
Should I add it somewhere? Or do I need to talk to someone?

Oct 1 2021, 4:08 PM · Restricted Project, Restricted Project

Sep 29 2021

alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

@alexfh Turns out it was caused by EarlyTailDuplicatePass, would you try with -mllvm -disable-early-taildup=true?

Sep 29 2021, 4:24 PM · Restricted Project

Sep 27 2021

alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Well, it is not "common" code, switch 1000 cases. But easy fix is probably to introduce some reasonable limit when to bail out.

switch 1000 cases is not the issue, use loop induction variable as condition is. It cause LVI recursive call SolveBlockValue.

@xbolva00 I was wrong. The compiler time regression is caused by CodeGen Passes rather than CVP or LVI.

Sep 27 2021, 4:45 AM · Restricted Project
alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

We'd appreciate a prompt fix or a workaround. If neither is possible, please revert the commit while working on a proper solution.

Sep 27 2021, 1:14 AM · Restricted Project
alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

Reduced test case:

Sep 27 2021, 1:07 AM · Restricted Project

Sep 26 2021

alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

FYI, we see a ~5x increase in compile time on some of our code (mainly on some large protobuf-generated files) after rG8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614. We're working on an isolated test case.

With switches with the huge number of cases?

Sep 26 2021, 10:02 PM · Restricted Project

Sep 25 2021

alexfh added a comment to D34654: Allow passing a regex for headers to exclude from clang-tidy.

Repeating my question from an earlier comment: would a header glob list (similar to how the format of the -checks= flag) be enough for the use cases folks have? On the one hand glob list doesn't support all the features of regular expressions, but are they all useful for matching paths? Glob list in clang-tidy currently supports set union (similar to regex |) and <any subsequence> - * (regex .*). If needed, the support can be expanded with ?, character classes, character ranges and/or other features of POSIX globs (https://man7.org/linux/man-pages/man7/glob.7.html). On the flipside, glob list has a cleaner syntax (no need to quote characters common in paths - like .), and allows to easily express exclusion of subsets. It should be a convenient tool to represent a set of files / directories. In comparison to the proposed header-filter + exclude-header-filter glob list makes it possible to naturally express restrictions similar to "everything under a/ (except for everything under a/b/ (except for everything under a/b/c/))" - a/,-a/b/,a/b/c/.

Sep 25 2021, 3:26 PM · Restricted Project, Restricted Project

Sep 24 2021

alexfh added a comment to D106056: [CVP] processSwitch: Remove default case when switch cover all possible values..

FYI, we see a ~5x increase in compile time on some of our code (mainly on some large protobuf-generated files) after rG8ba2adcf9e54b34ba8efa73ac0d81a1192e4f614. We're working on an isolated test case.

Sep 24 2021, 2:32 PM · Restricted Project

Sep 15 2021

alexfh added a comment to D105020: [SLP]Improve graph reordering..

Here is a repro I found

$ cat ./repro.cc
#include <iostream>
namespace a {
__attribute__((noinline)) void b(uint8_t *c, float *d, uint16_t e, uint16_t f) {
  uint16_t g = f;
  uint16_t h = c[0] << 2 | c[4] & 3;
  uint16_t l = c[1] << 2 | c[4] >> 2 & 3;
  uint16_t v = c[2] << 2 | 3;
  uint16_t j = c[3] << 2 | c[4] >> 6;
  *(d - 1) = float(l - e) / g;
  *(d - 2) = float(h - e) / g;
  *(d - 3) = float(j - e) / g;
  *(d - 4) = float(v - e) / g;
}
void bar(uint8_t *buffer, size_t k, size_t height, size_t, uint16_t e,
         uint16_t f, float *m) {
  int n = 24, g = f;
  float *t = &m[(24 - 2) * 4];
  size_t u = 2 * k;
  for (int o; o < height; o++) {
    uint8_t *p = buffer;
    uint8_t *q = p;
    float *r = &t[u];
    for (int a = 0; a < u; a += 4) {
      b(q, r, e, g);
      r -= 4;
    }
    t -= n;
  }
}
}  // namespace a
int main() {
  uint8_t s[]{255, 0, 255, 0, 51};
  float m[4 * 24]{};
  a::bar(s, 4, 4, 0, 0, 1023, m);
  int *out = reinterpret_cast<int *>(m);
  int64_t sum;
  for (int i = 0; i < sizeof(m) / sizeof(int); i++) sum += out[i];
  std::cout << sum;
}
$ clang-before -O2 -std=gnu++17 repro.cc
$ ./a.out
17045651456
$ clang-after -O2 -std=gnu++17 repro.cc
$ ./a.out
16609148640

Sorry, the reproducer is not correct. It has not initialized variables, writes after array bounds etc. Unable to use it as a reproducer for the investigation.

So if they are not able to produce UB-free reproducer, you should recommit the patch.

I believe they have a reproducer and tried to reduce it but the tool just extracted/removed too much code here :). I'll try to check manually some parts of the code and wait for the actual reproducer, will recommit the patch in a couple of days if would not be able to find a bug/no correct reproducer provided.

Sep 15 2021, 9:26 AM · Restricted Project

Sep 1 2021

alexfh committed rG893ac53afc1a: Fix -Wunused-variable (authored by alexfh).
Fix -Wunused-variable
Sep 1 2021, 2:30 AM

Aug 31 2021

alexfh added inline comments to D108968: [DIArgList] Re-unique after changing operands to fix non-determinism.
Aug 31 2021, 2:25 PM · Restricted Project

Aug 30 2021

alexfh added a comment to D105020: [SLP]Improve graph reordering..

Some of our tests started failing after rG84cbd71c95923f9912512f3051c6ab548a99e016 (and previously after rGa28234e37af877b2b4a23c2091c27fa18c155f9a). Could you revert it again while we're working on a test case?

Aug 30 2021, 3:20 AM · Restricted Project

Aug 27 2021

alexfh added a comment to D108821: [Codegen][X86] EltsFromConsecutiveLoads(): if only have AVX1, ensure that the "load" is actually foldable (PR51615).

Thanks for the prompt fix! One nit.

Aug 27 2021, 10:09 AM · Restricted Project

Aug 25 2021

alexfh added a comment to D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069).

Please can somebody confirm that rG10c982e0b3e6d46d1fe288d7dbe0a393c65a640f reverts the regressions you were seeing

Aug 25 2021, 3:19 PM · Restricted Project

Aug 23 2021

alexfh added a comment to D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069).

Please can somebody confirm that rG10c982e0b3e6d46d1fe288d7dbe0a393c65a640f reverts the regressions you were seeing

Aug 23 2021, 2:36 PM · Restricted Project
alexfh added a comment to D108485: [DSE] Check post-dominance for malloc+memset->calloc transform..

I think, it's the right first step, and if there's a need to make the transformation support more use cases like checking the result of malloc for null, that can be added as a separate step.

Aug 23 2021, 9:54 AM · Restricted Project
alexfh accepted D108485: [DSE] Check post-dominance for malloc+memset->calloc transform..

Looks good to me!

Aug 23 2021, 9:53 AM · Restricted Project
alexfh added a comment to D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069).

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Especially if there's no obvious and clear forward fix, I'd appreciate if you could unblock us by reverting for now. Thanks!

Except that regresses other benchmarks that I was addressing with this patch - bullet etc.

Aug 23 2021, 9:51 AM · Restricted Project

Aug 20 2021

alexfh added a comment to D103009: [DSE] Transform memset + malloc --> calloc (PR25892).

I found and reduced a test case that shows a too eager replacement of malloc with calloc: https://gcc.godbolt.org/z/dTjonof74
(...) This is quite a serious problem. Would you please revert while working on a fix?

First of all just to make it clear - in this particular case GCC does exactly same: https://gcc.godbolt.org/z/qbE6Kxdnv unless you pass different flags.

Aug 20 2021, 7:49 AM · Restricted Project

Aug 19 2021

alexfh added a comment to D103009: [DSE] Transform memset + malloc --> calloc (PR25892).

Well, I am not sure, your code compiled with GCC will have same issue.

Aug 19 2021, 4:59 PM · Restricted Project
alexfh added a comment to D103009: [DSE] Transform memset + malloc --> calloc (PR25892).

I think i see where this is going - the just-malloced, but never touched memory
doesn't need to be actually backed by an actual pages (see overcommit),
while i guess calloc doesn't just mark the pages as zeroed-out,
but actually marks them dirty and needed to be allocated,
at least not unless you happen to allocate in multiples of page size?

Aug 19 2021, 3:54 PM · Restricted Project
alexfh added a comment to D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069).

Given that we have a test case and multiple people have reported regressions, can we revert in the meantime?

Aug 19 2021, 3:15 PM · Restricted Project
alexfh added a comment to D103009: [DSE] Transform memset + malloc --> calloc (PR25892).

This commit seems to cause memory usage (rss) increase in MariaDB's mysqld by a factor of 4. Looking back into the history, I found that the previous commit here caused the same regression, but we quickly picked up the revert and moved on. Now I'm trying to isolate the problem, but it's taking time.
So far, my sole hypothesis is that malloc + memset of a smaller size can still be converted to calloc. But I have no evidence so far. Naive attempts to synthetically reproduce this didn't work. Maybe this only happens when some UB is in place, but again, I have nothing to back this up.

Given this is a quite serious regression, can we roll this back while I'm investigating?

You should be able to use flag -fno-builtin-calloc to disable this transformation.

This transformation is correct and valid; GCC does it as well. No reason to revert, not justified. You should check asm diffs w and w/o patch for any surprises.

Aug 19 2021, 2:58 PM · Restricted Project
alexfh added a comment to D103009: [DSE] Transform memset + malloc --> calloc (PR25892).

This commit seems to cause memory usage (rss) increase in MariaDB's mysqld by a factor of 4. Looking back into the history, I found that the previous commit here caused the same regression, but we quickly picked up the revert and moved on. Now I'm trying to isolate the problem, but it's taking time.
So far, my sole hypothesis is that malloc + memset of a smaller size can still be converted to calloc. But I have no evidence so far. Naive attempts to synthetically reproduce this didn't work. Maybe this only happens when some UB is in place, but again, I have nothing to back this up.

Aug 19 2021, 5:32 AM · Restricted Project

Aug 18 2021

alexfh updated subscribers of D106450: [InstCombine] Fold (gep (oneuse(gep Ptr, Idx0)), Idx1) -> (gep Ptr, (add Idx0, Idx1)) (PR51069).

@RKSimon we've discovered that this commit is causing substantial performance regressions (~11% runtime) of some of our benchmarks on different x86 microarchitectures (haswell, skylake). @wmi is trying to get an isolated test.

Aug 18 2021, 10:46 AM · Restricted Project

Jul 30 2021

alexfh added a comment to D104601: [Preprocessor] Implement -fminimize-whitespace..

@romanovvlad This is due to -fms-extensions. It Expands __Pragma to #pragma instead of keeping it a __Pragma. This is a one-line fix.

@alexfh This was fixed by D106924

Jul 30 2021, 6:34 PM · Restricted Project
alexfh added a comment to D104601: [Preprocessor] Implement -fminimize-whitespace..

Looking at this once again, I'm pretty sure that reverting this is much safer than trying to fix forward. Unless there's a really trivial fix to the outstanding issues, I'll revert this later today.

Jul 30 2021, 6:31 AM · Restricted Project

Jul 29 2021

alexfh added a comment to D104601: [Preprocessor] Implement -fminimize-whitespace..

This commit changes the behavior of clang -E -P even when no -fminimize-whitespace is used. This breaks certain use cases like using clang to preprocess files for flex, which turns out to be sensitive to the presence of line breaks in places where C++ compilers aren't.

An isolated test case:

$ clang-old -E -x c++ -P - -o /tmp/pp.good
#define I(x, ...) \
 x { return X##x; }



#ifndef A
#define A(op, x) I(op, x)
#endif

A(foo, {







})
A(bar, {})
$ cat /tmp/pp.good
foo { return Xfoo; }
bar { return Xbar; }
$ clang-new -E -x c++ -P - -o /tmp/pp.bad
#define I(x, ...) \
 x { return X##x; }



#ifndef A
#define A(op, x) I(op, x)
#endif

A(foo, {







})
A(bar, {})
$ cat /tmp/pp.bad
foo { return Xfoo; }bar { return Xbar; }

Please fix or revert the commit.

Thanks!

Jul 29 2021, 4:18 PM · Restricted Project
alexfh added a comment to D104601: [Preprocessor] Implement -fminimize-whitespace..

This commit changes the behavior of clang -E -P even when no -fminimize-whitespace is used. This breaks certain use cases like using clang to preprocess files for flex, which turns out to be sensitive to the presence of line breaks in places where C++ compilers aren't.

Jul 29 2021, 4:07 PM · Restricted Project

Jul 14 2021

alexfh added a comment to D105890: [NFC] Add paranthesis around logical expression to silence -Wlogical-op-parentheses warning..

I think this is incorrect. See D105892 and https://reviews.llvm.org/D104915#2873851 . Don't blindly land changes to suppress warnings – the warnings exist to tell us something :)

Jul 14 2021, 3:06 PM · Restricted Project

Jul 13 2021

alexfh committed rGe9533b849207: [NFC] Add paranthesis around logical expression to silence -Wlogical-op… (authored by bgraur).
[NFC] Add paranthesis around logical expression to silence -Wlogical-op…
Jul 13 2021, 6:54 AM
alexfh closed D105890: [NFC] Add paranthesis around logical expression to silence -Wlogical-op-parentheses warning..
Jul 13 2021, 6:54 AM · Restricted Project
alexfh accepted D105890: [NFC] Add paranthesis around logical expression to silence -Wlogical-op-parentheses warning..

LG. I can commit the patch for you.

Jul 13 2021, 6:36 AM · Restricted Project

Jul 1 2021

alexfh accepted D105275: [SLP]Fix gathering of the scalars by not ignoring UndefValues..

I have no idea what this code is doing, but just based on the fact that it fixes the problem, it looks good to me ;)

Jul 1 2021, 9:18 AM · Restricted Project
alexfh added a comment to D105275: [SLP]Fix gathering of the scalars by not ignoring UndefValues..

The patch fixes the segfaults we've seen after https://reviews.llvm.org/D103458.

Jul 1 2021, 9:07 AM · Restricted Project
alexfh added inline comments to D105275: [SLP]Fix gathering of the scalars by not ignoring UndefValues..
Jul 1 2021, 8:01 AM · Restricted Project

Jun 28 2021

alexfh added a comment to D49864: [clang-tidy] The script clang-tidy-diff.py doesn't accept 'pass by' options (--).

Function-wise the patch looks good, but I'm somewhat concerned about silently breaking users. At the very least there should be a relevant entry in the release notes.

Jun 28 2021, 12:42 PM · Restricted Project, Restricted Project
alexfh added a comment to D69560: [clang-tidy] Add 'bugprone-easily-swappable-parameters' check.

I thought I sent this batch of comments loong ago =\

Jun 28 2021, 12:29 PM · Restricted Project, Restricted Project

Jun 10 2021

alexfh added a reverting change for rG08664d005c02: [Verifier] Speed up and parallelize dominance checking. NFC: rGad6a84f82c45: Revert "[Verifier] Speed up and parallelize dominance checking. NFC".
Jun 10 2021, 1:01 AM
alexfh committed rGad6a84f82c45: Revert "[Verifier] Speed up and parallelize dominance checking. NFC" (authored by alexfh).
Revert "[Verifier] Speed up and parallelize dominance checking. NFC"
Jun 10 2021, 1:01 AM
alexfh added a reverting change for D103373: [Verifier] Speed up and parallelize dominance checking. NFC: rGad6a84f82c45: Revert "[Verifier] Speed up and parallelize dominance checking. NFC".
Jun 10 2021, 1:01 AM · Restricted Project

May 9 2021

alexfh added a comment to D69560: [clang-tidy] Add 'bugprone-easily-swappable-parameters' check.

BTW, any ideas why "patch application failed"? (See the Build status in the Diff Detail section)

May 9 2021, 8:38 PM · Restricted Project, Restricted Project
alexfh added a comment to D69560: [clang-tidy] Add 'bugprone-easily-swappable-parameters' check.

Thanks for the great work! This is mostly looking fine, but I've added a few comments. I could only make a quite superficial review so far, but I'll try to take a deeper look into this soon.

May 9 2021, 8:34 PM · Restricted Project, Restricted Project

Apr 12 2021

alexfh committed rG8a944d82cd14: [clang-tidy] Add option to ignore macros in readability-function-cognitive… (authored by massberg).
[clang-tidy] Add option to ignore macros in readability-function-cognitive…
Apr 12 2021, 9:47 AM
alexfh committed rG8883cb3e4004: Fix nits. (authored by alexfh).
Fix nits.
Apr 12 2021, 9:47 AM
alexfh closed D98070: [clang-tidy] Add option to ignore macros in readability-function-cognitive-complexity check..
Apr 12 2021, 9:46 AM · Restricted Project, Restricted Project
alexfh accepted D98070: [clang-tidy] Add option to ignore macros in readability-function-cognitive-complexity check..

LG with a couple of nits.

Apr 12 2021, 9:13 AM · Restricted Project, Restricted Project

Mar 10 2021

alexfh committed rG481079e2841f: [NFC] Unify FIME with FIXME in comments (authored by b1f6c1c4).
[NFC] Unify FIME with FIXME in comments
Mar 10 2021, 5:13 AM
alexfh closed D98321: [NFC] Unify FIME with FIXME in comments.
Mar 10 2021, 5:12 AM · Restricted Project, Restricted Project
alexfh accepted D98321: [NFC] Unify FIME with FIXME in comments.

Looks good. Thanks for the fix! I'll get it landed for you.

Mar 10 2021, 4:57 AM · Restricted Project, Restricted Project

Feb 23 2021

alexfh added a comment to D97288: Added `Follow` parameter to llvm::vfs::FileSystem::status().

How strong is the need for this?
This adds complexity to a widely implemented and used interface, and the combination of virtual + default parameters can be at least a little confusing.

Feb 23 2021, 4:44 PM · Restricted Project, Restricted Project
alexfh requested review of D97288: Added `Follow` parameter to llvm::vfs::FileSystem::status().
Feb 23 2021, 7:00 AM · Restricted Project, Restricted Project