Page MenuHomePhabricator

[NewPM] Make GlobalsAA available earlier in the pipeline

Authored by aeubanks on Apr 21 2021, 9:15 PM.



A future change will cause fewer analyses to be invalidated. Currently,
the passes before the inliner pipeline don't have access to GlobalsAA
since it can only be used by AAManager if it is already available. We
happen to invalidate AAManager before the inliner pipeline so that it
can be rebuilt and use GlobalsAA. But if AAManager is never invalidated,
the inliner pipeline may not have access to GlobalsAA.

Diff Detail

Unit TestsFailed

0 msx64 debian > libomptarget.mapping::declare_mapper_nested_default_mappers_array.cpp
Script: -- : 'RUN: at line 1'; echo ignored-command

Event Timeline

aeubanks created this revision.Apr 21 2021, 9:15 PM
aeubanks requested review of this revision.Apr 21 2021, 9:15 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 21 2021, 9:15 PM
nikic added a subscriber: nikic.Apr 22 2021, 2:01 PM

@aeubanks It seems like this has non-trivial impact on code size (, so probably the compile-time change is a side-effect of that.

ychen added a comment.Apr 22 2021, 2:33 PM

That sounds like less powerful optimizations due to the GlobalsAA movement. If adding a GlobalsAA plus D100917 is a win overall, that should work.

Looking at one of the examples in llvm-test-suite that regressed, this seems to be regressing some very simple code:

void f(double *a) {
  double c[20] = {
      0.1051, 0.0157, 0.0185, 0.0089, 0.0219, 0.0141, 0.0097,
      0.0758, 0.0168, 0.1188, 0.1635, 0.0112, 0.0333, 0.0777,
      0.0260, 0.0568, 0.0523, 0.0223, 0.0324, 0.1195,

  for (int i = 0; i < 20; i++)
    a[i] = c[i];

Before, it would just turn it into a memcpy.
With this change, it becomes a bunch of individual stores.

Looks like [1] is to blame. The extra AA causes it to fire and introduce the individual stores.


Actually, the InstCombine transform allows LoopFullUnrollPass to unroll the copy loop which results in a bunch of stores. Previously during LoopFullUnrollPass we wouldn't see the global within the loop.

ah the issue is LoopIdiomRecognize can't recognize the following as a memcpy:

@__const.f.c = private unnamed_addr constant [20 x double] [double 1.051000e-01, double 1.570000e-02, double 1.850000e-02, double 0x3F823A29C779A6B5, double 2.190000e-02, double 1.410000e-02, double 9.700000e-03, double 7.580000e-02, double 1.680000e-02, double 1.188000e-01, double 1.635000e-01, double 1.120000e-02, double 3.330000e-02, double 7.770000e-02, double 2.600000e-02, double 5.680000e-02, double 5.230000e-02, double 2.230000e-02, double 3.240000e-02, double 1.195000e-01], align 16

; Function Attrs: nofree norecurse nosync nounwind uwtable writeonly
define dso_local void @f(double* nocapture %a) local_unnamed_addr #0 {
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body
  ret void

for.body:                                         ; preds = %entry, %for.body
  %indvars.iv = phi i64 [ 0, %entry ], [, %for.body ]
  %arrayidx = getelementptr inbounds [20 x double], [20 x double]* @__const.f.c, i64 0, i64 %indvars.iv
  %0 = load double, double* %arrayidx, align 8, !tbaa !4
  %arrayidx2 = getelementptr inbounds double, double* %a, i64 %indvars.iv
  store double %0, double* %arrayidx2, align 8, !tbaa !4 = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp ne i64, 20
  br i1 %exitcond, label %for.body, label %for.cond.cleanup, !llvm.loop !8

because it thinks the store may affect the load, even though the load is from a constant global.

aeubanks abandoned this revision.Apr 27 2021, 5:05 PM

turns out it was an issue with GlobalsAA not being recalculated and being too conservative
anyway, I don't think adding GlobalsAA early on really does much, adding it to the inliner pipeline seems good enough to properly fix GlobalsAA