User Details
- User Since
- Feb 10 2016, 9:51 AM (398 w, 5 d)
Aug 29 2023
I thought the zero count is from the most significant bit but I may be wrong. In any case, your point is valid. I see how you can construct an example where the .first fields are not equal, but then the popcount and countl_zero are both equal, leading to a "false" on the LessThan call for both (a,b) and (b,a) comparison.
The change proposed here does provide an absolute ordering so it resolves the issue you found and it's reasonable to go with it. I'm not sure it's the "desired" one (or if it even matters). It will rely more often on the Idx default, than on the two comparisons, and perhaps that's ok.
A more classical comparison would be for LessThan to provide a full <; so comparing on the .first for equivalency, but then determining the follow up on conditions which defines a strict ordering (e.g. bitsToDouble()?).
Proving the size is non-zero is a higher bar than proving the zero case and eliminating the memcpy in those cases. So this seems reasonable to me.
Aug 15 2023
LGTM.
Another alternative is creating a createMemoryAccess API which does not take the defining access, but likely not worth it.
I'm fairly certain we cannot keep the defining access when inserting Defs in the general case, in particular when needing to insert new Phis.
Aug 14 2023
Aug 10 2023
Aug 7 2023
Can you replace this with an assert instead?
Aug 3 2023
Jun 14 2023
lgtm
Jun 12 2023
May 24 2023
This went through a few rounds of testing on our internal benchmarks and it's at a point where there are no meaningful run-time regressions observed, but the compile-time improvements remain and are significant enough to move fwd with the change.
Approving to move things forward. If folks do encounter regressions from this change before or after landing, please flag them here.
May 22 2023
Thank you, no issues after the fix patches afaict.
May 19 2023
I still don't have all the answers, here's what I know so far:
- The test in question is publicly available in https://github.com/google/jax/blob/main/tests/lax_scipy_test.py
- With the 3 patches reverted from yesterday, the test passes up until https://github.com/llvm/llvm-project/commit/494dee0f7a7701a57f7c5b755b4133844d0dcbdf, when it fails again
- With local revert of 494dee0f vs ToT now, and looking only at one of the failures (with 201 modules) I only see a difference in the attached module ( )
- Running opt -O3 at ToT gives different output than the one dumped by that test, so here's the optimized with at 494dee0f ( ) and at the revision right before ( )
Hi Matt,
May 18 2023
Apr 11 2023
lgtm.
Mar 27 2023
Nit: the diffs would be easier to analyze with "-fno-discard-value-names".
Mar 18 2023
Mar 16 2023
Mar 15 2023
Adding a line in release notes might be useful; next to a note with the removal of legacy PM pieces (e.g. PassManagerBuilder).
Mar 7 2023
Yeah, I noticed right after I sent this patch out :-).
Mar 2 2023
Thank you for updating this!
Feb 28 2023
Wondering if having the flag would still be useful if you want this statistic and not everything else in stats. But I don't think we particularly need that now, so works for me and flag can always be re-added.
Feb 15 2023
Feb 14 2023
lgtm with comments addressed.
Feb 9 2023
Feb 6 2023
SGTM, thank you Florian!
Jan 30 2023
Jan 26 2023
Following this change, the following emits warnings: https://godbolt.org/z/cvGdMWqEa, https://godbolt.org/z/GhTP85WzE
Jan 25 2023
Jan 24 2023
Heads-up that we're seeing crashes due to the recommit of this change.
msan reports "use-of-uninitialized-value" in multiple places, where input is clearly initialized.
Jan 23 2023
Re-ping to current reviewers. Any thoughts on the above @cor3ntin? Is there someone who you recommend look into this issue?
Jan 18 2023
Thank you @fhahn!
Jan 17 2023
Reverting should be acceptable here.
AFAIU, the norm for work with high impact (adding a pass to the pipeline is one) is to do the work itself under a flag so the work itself does not get reverted if there are issues, and for the flag flip patch (such as this one) to be reverted if issues arise.
When the issue is resolved, the flag flip patch should be re-landed.
Regressing a test by ~1000%, seems like a solid reason to revert if a solution is not immediately available, especially with having a reproducer available that confirms the issue.
Jan 12 2023
I'm ok to not bikeshed this one at this time :-).
In my opinion, this is fine.
I did now and it also looks clean. Commit with 200?
Jan 11 2023
I'm curious where all the time is spent in the 200s to 2s case. Is Eager strategy just as slow?
Seeing a regression after this change. Filed: https://github.com/llvm/llvm-project/issues/59951
Jan 10 2023
I'm seeing one instance of increase in compile-time with the 100 limit and a couple of small run time regressions. The results look somewhat cleaner with a cap of 300. Could we use 300 as an intermediate step?
Jan 9 2023
Jan 6 2023
Gentle ping. Can you propose an alternate patch to fix this crash? The github issue is 6 months old and similar crashes are reported due to this.
Jan 3 2023
Dec 15 2022
Rebase to trigger pre-merge checks?
lgtm.
Reducing this limit is certainly the right thing to do. To what value so as to not cause large regressions is a question I'd like to answer with some experimental data.
I'll start some experiments and come back to this.
Dec 13 2022
Dec 12 2022
Dec 5 2022
lgtm.
Dec 1 2022
Considering the previous reviewer comments, this lgtm.
Nov 29 2022
Could you add updated compile-time impact measurements after the latest rebase?
Nov 28 2022
Nov 17 2022
Green light perf-wise.
I cannot comment on whether the position is "the right" one though. I'm deferring to the other reviewers.
Nov 16 2022
As far as performance, this looks fine. Not seeing measurable gains.
Nov 15 2022
Makes sense to use BAA at call sites if the usage is such!
Missed the dependent patch on LoopAccessAnalysis, now it makes sense :-)
+1 on making the AST only work on immutable IR.
IIUC compile time impact for adding another SROA (the one outside LTO) is negligible?
Nov 11 2022
The original starts crashing with -O2 with this revision, but crashes with -O0 before and after this revision, so the issue looks preexisting.
The reduced test from @rupprecht no longer crashes with -O0 before this revision.
If this becomes an issue for resolving, we could share the un-reduced offline.
Nov 2 2022
Nov 1 2022
It seems unnecessary to force all uses to instantiate and pass the BatchAA.
Can this remain part of the ClobberWalkerBase instead? Default the calls to getClobberingMemoryAccess to instantiate a BatchAA internally and pass to the ClobberWalker.
Then, there can be an additional API added to override the above behavior when useful (known as safe): either passing a given BatchAA like current patch, or, keeping a BatchAA in the ClobberWalker itself instead of the AA, and being able to "set" it to not re-initialize it on each call (and perhaps "reset" to reseting each time).
What do you think?
LGTM.
Thank you for adding this!
Oct 25 2022
Simplify conditions in canPeel.
Simplified to only the changes in canPeel.