- User Since
- May 5 2018, 9:37 AM (118 w, 6 d)
Thank you for picking this up. Some pointers from my side:
- I would recommend against including GlobalISel support in the initial implementation. Unless trivial, SDAG and GlobalISel changes should never be made in the same patch, as the reviewers for these parts of the codebase are essentially disjoint.
- You may want to replace the VT specifying the saturation width with a simple constant integer operand. This is the approach that the fixed-point ISD opcodes went with, and it should make legalization a bit simpler.
- I originally restricted this patch to a minimum viable implementation, in the hope that it would make review easier and get this landed quickly. Given how things turned out, it might make sense to put the soft float and vector legalization (which are part of D54696) back into this patch, so that legalization support is complete and this can stand on its own.
- For soft float legalization, I would switch from libcall legalization to expanding and recursively legalizing. Adding so many new compiler-rt functions is an unnecessary burden.
- Finally, something worth mentioning is that the legalization implemented here is not compatible with trapping fptoi (at least one of the expansions isn't). I don't believe trapping fptoi's are actually legal per langref, but I've also seen people adjust x86 fptoi lowering to work with trapping fptoi at some point, so I'm a bit confused on what the state here is.
The regression here seems to be because operations on an empty GraphDiff don't optimize out. Not going through GraphDiff but using the same basic logic (https://github.com/llvm/llvm-project/commit/4c6a5de8131183ff88f52cc3dda67180e31501a1 -- going through a separate method seems to be necessary for NRVO) we get back at least part of it: https://llvm-compile-time-tracker.com/compare.php?from=38537307e502c1ac9a09e6f75f9208db1327a0bf&to=4c6a5de8131183ff88f52cc3dda67180e31501a1&stat=instructions
Wed, Aug 12
Tue, Aug 11
D85684 has landed now. After rebasing, you can also use SQ.getWithoutUndef() to reduce the boilerplate for creating a new SimplifyQuery.
D85684 has landed, so we can try reapplying this change.
@aqjune Thank you! So my mistake here is that it's not valid to use "let undef = xxx" type reasoning to construct counter examples, one has to always consider a superposition of all possible values for undef at the same time.
Mon, Aug 10
Oops, looks like we were working on this in parallel. I just submitted D85684 for the InstSimplify part of the problem.
Looks good to me. You already test the zext/trunc cases. I think this fold is also not problematic wrt https://bugs.llvm.org/show_bug.cgi?id=34548.
This LGTM, with some thoughts on how this could be improved in a followup.
Sun, Aug 9
Abandoning this revision, as I expect this to get superseded by the LangRef patch for D85393 or a variation thereon, which will have to define both what constitutes forward progress and how the attributes either opts into or out of it.
Sat, Aug 8
This has since been addressed by D82444.
Fri, Aug 7
I've tried to get this in for a long time, but I don't think there's enough interest in this functionality. Rust ended up doing the saturation itself, even though that produces code that is much worse than a custom lowered intrinsic. At some point you have to cut your losses :)
Thu, Aug 6
LGTM. There are various ways in which the value choice can be improved, but this looks like a good starting point.
Wed, Aug 5
Just as another possibility, we could simply always fold freeze undef to zeroinitializer in InstCombine (regardless of number of uses). This is not always the optimal fold, but usually a good one. At least at this point, I don't think freeze undef is expected to be common and it's more important to get rid of the freeze than find the best replacement.
Tue, Aug 4
Mon, Aug 3
For the record, this was a 1% compile-time regression on SPASS (https://llvm-compile-time-tracker.com/compare.php?from=7ba82a7320df82d07d3d5679bce89b14526b536c&to=ee1c12708a4519361729205168dedb2b61bc2638&stat=instructions). This seems to be caused due to a 2.3% increase in code size (https://llvm-compile-time-tracker.com/compare.php?from=7ba82a7320df82d07d3d5679bce89b14526b536c&to=ee1c12708a4519361729205168dedb2b61bc2638&stat=size-text), with some of the files getting 20-30% larger. Not particularly unexpected for this kind of change though.
Here's the numbers for this patch: https://llvm-compile-time-tracker.com/compare.php?from=4fdc4d892b988bb9f2e06c3440971d28d6361722&to=63b8e6e79174411d9340790fa5e4f67bc73620a0&stat=instructions There's a measurable impact, but doesn't seem particularly concerning (geomean 0.1% regression).
Sun, Aug 2
Sat, Aug 1
LG. I was wondering if we can use any of our more powerful negation machinery here, but I don't think so, because it would end up looping infinitely.
LG. This has some overlap with D85043, which will already catch the cases where the inner abs has poison=true, but not those with poison=false, so this seems reasonable as an explicit fold. One could make an argument that this should live in InstCombine and try to preserve the outer poison flag, but this doesn't seem particularly important to me.
I don't really know what the tradeoff would be between using abandon() here and using InvalidateAnalysisPass. I went with this variant as @asbirlea indicated in https://reviews.llvm.org/D70376#2151283 that this would be preferred, as InvalidateAnalysisPass is really more intended for testing purposes.
Fri, Jul 31
Yes, we should be fine compile-time wise. The icmp + select representation of abs implicitly did this as well, because we fold comparisons against zero more aggressively than others.
Please take a look at how Instruction::clone() is implemented, using a protected cloneImpl() method on each instruction type. I would suggest to follow the same pattern, and make these type-specific methods protected (and drop the redundant doc comment on each one of them).
Drop -lvi-no-abandon flag, update pipeline tests.
Assuming D84792 lands, would performing the two SimplifyBinOp queries for L and R with CanUseUndef=false solve this problem as well? I would prefer that if it works.
Thu, Jul 30
Here is how updating one of the tests would look like: https://gist.github.com/nikic/577dead281f1aee10b492c3fc7130f5b
Do these show up in practice?
Wed, Jul 29
Set AllowUndef=false, switch tests to use !range instead of comparisons.
I don't think this is the right approach to the problem. Can you give some more information on which places in LLVM assume that only bitcasts feed into lifetime intrinsics?
Okay, after looking at this again I think you're on the right track here.
Is the goal here that it's legal to replace the output value of SimplifyInstruction with the input value, in addition to the normal replacement of the input value with the output value? Or do we specifically care about literal UndefValue constants somehow, as opposed to values which are undef? If we had a PoisonValue constant, would we also need to exclude it?
This change had a significant negative compile-time impact: https://llvm-compile-time-tracker.com/compare.php?from=0b161def6cacff1a63d3cf1a1efe95b550814d7a&to=e22de4e46d1dd1aacc3a7060d24bcbe89908ba6c&stat=instructions
Tue, Jul 28
LG as well.
Mon, Jul 27
It looks like currently we only transform this for the single-use case, but not the multi-use case: https://godbolt.org/z/9oYeEq
But (if my undef reasoning is correct), this is not as easy as always returning the existing operand. In the case of undefs, we need to clamp to the limit constant. This is similar to what we do with the saturating math intrinsics.
As a possible followup: This code could be generalized a bit by using "constant && !constexpr" for non-freeze and "guaranteed-not-undef" (without any constant checks) for freeze, with some appropriate renames. This would basically change the transform from splitting into "constant" and "non-constant" inputs, into "foldable" and "non-foldable" inputs (where "foldable = constant" for most cases).