A future change will cause fewer analyses to be invalidated. Currently,
the passes before the inliner pipeline don't have access to GlobalsAA
since it can only be used by AAManager if it is already available. We
happen to invalidate AAManager before the inliner pipeline so that it
can be rebuilt and use GlobalsAA. But if AAManager is never invalidated,
the inliner pipeline may not have access to GlobalsAA.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
this seems to have some compile time improvements?
http://llvm-compile-time-tracker.com/compare.php?from=a62cbd9a0211d08bace8794b435996890feb44d4&to=7805d7f72c337bfcc2fbc2dc9d2b9ac23474d5d9&stat=instructions
@aeubanks It seems like this has non-trivial impact on code size (http://llvm-compile-time-tracker.com/compare.php?from=a62cbd9a0211d08bace8794b435996890feb44d4&to=7805d7f72c337bfcc2fbc2dc9d2b9ac23474d5d9&stat=size-text), so probably the compile-time change is a side-effect of that.
That sounds like less powerful optimizations due to the GlobalsAA movement. If adding a GlobalsAA plus D100917 is a win overall, that should work.
Looking at one of the examples in llvm-test-suite that regressed, this seems to be regressing some very simple code:
void f(double *a) { double c[20] = { 0.1051, 0.0157, 0.0185, 0.0089, 0.0219, 0.0141, 0.0097, 0.0758, 0.0168, 0.1188, 0.1635, 0.0112, 0.0333, 0.0777, 0.0260, 0.0568, 0.0523, 0.0223, 0.0324, 0.1195, }; for (int i = 0; i < 20; i++) a[i] = c[i]; }
Before, it would just turn it into a memcpy.
With this change, it becomes a bunch of individual stores.
Looks like [1] is to blame. The extra AA causes it to fire and introduce the individual stores.
Actually, the InstCombine transform allows LoopFullUnrollPass to unroll the copy loop which results in a bunch of stores. Previously during LoopFullUnrollPass we wouldn't see the global within the loop.
ah the issue is LoopIdiomRecognize can't recognize the following as a memcpy:
@__const.f.c = private unnamed_addr constant [20 x double] [double 1.051000e-01, double 1.570000e-02, double 1.850000e-02, double 0x3F823A29C779A6B5, double 2.190000e-02, double 1.410000e-02, double 9.700000e-03, double 7.580000e-02, double 1.680000e-02, double 1.188000e-01, double 1.635000e-01, double 1.120000e-02, double 3.330000e-02, double 7.770000e-02, double 2.600000e-02, double 5.680000e-02, double 5.230000e-02, double 2.230000e-02, double 3.240000e-02, double 1.195000e-01], align 16 ; Function Attrs: nofree norecurse nosync nounwind uwtable writeonly define dso_local void @f(double* nocapture %a) local_unnamed_addr #0 { entry: br label %for.body for.cond.cleanup: ; preds = %for.body ret void for.body: ; preds = %entry, %for.body %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ] %arrayidx = getelementptr inbounds [20 x double], [20 x double]* @__const.f.c, i64 0, i64 %indvars.iv %0 = load double, double* %arrayidx, align 8, !tbaa !4 %arrayidx2 = getelementptr inbounds double, double* %a, i64 %indvars.iv store double %0, double* %arrayidx2, align 8, !tbaa !4 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp ne i64 %indvars.iv.next, 20 br i1 %exitcond, label %for.body, label %for.cond.cleanup, !llvm.loop !8 }
because it thinks the store may affect the load, even though the load is from a constant global.
turns out it was an issue with GlobalsAA not being recalculated and being too conservative
anyway, I don't think adding GlobalsAA early on really does much, adding it to the inliner pipeline seems good enough
https://reviews.llvm.org/D101379 to properly fix GlobalsAA