This patch is a fix for a compile time issue I was seeing in
SPEC2017/cam4, reducing the monstrous compile time for
one file from 80+min to under 10sec.
The test changes seem reasonable, although I am not too sure
about the change in LazyValueAnalysis dump as I am not familiar with it.
Essentially the file I was compiling had many lines of fortran code of this
form: arr(:ncol,:) = 0. The 2d array dimensions in this case are statically
known.
When compiling with flang-new this is converted into a nested loop
with a store, and just before Jump Threading is run we have tens of
thousands of branches containing GEP + memsets and memcpys and each
threadable chain is very long as well.
With the current top down approach 90% of the time was spent
in renaming non-local uses of instructions in updateSSA(). Profiling
showed that about half that time was spent in the
SSAUpdater::FindAvailableVals() --> FindExistingPHI() call. This is
because we were accumulating new PHIs as we kept on threading the
successors BBs.
I was not able to reduce a test case showing the high compile time but
the file is cldwat2m_macro.f90 and I compiled with O3.