@efriedma When using AssertingVH a bunch of the existing JT/CVP tests start to fail, so I think we don't need a test case that produces an actual collision in the map.
Reopening as this has been reverted in rG7a3ad48d6de0e79a92361252a815b894565b9a0f.
Thu, Nov 14
Assuming this is a legitimate mis-compile, the root cause might be the same as D70103, just for the pointer dereference cache, rather than overdefined value cache.
Sat, Nov 9
Fri, Nov 8
Rebase over changes to pointer handling.
I feel like the names are a bit inconsistent with existing uadd_sat etc, but utrunc_sat seems worse than truncUSat, so let's just go with it...
Thu, Nov 7
Unlike ConstantRange::shl(), these are precise.
Make eraseValueFromPerBlockValueCache a free function, add todo.
Wed, Nov 6
LGTM. addWithNoWrap() has some more tests for specific inputs, but I don't think we really need them here.
If that's the case, I'm not sure whether or not we should rely on that, it seems rather subtle.
Now that i've added EXPECT_EQ(CR.isEmptySet(), AllOverflow); test,
that seems like a safe assumption to me? Seems like a net win to me.
@lebedev.ri I suspect this is due to the intersection with the raw add() range. Maybe it happens that the intersection between add() and [MAX, MAX] is always empty in the case where the overflow check triggers? Can you try removing that intersection and see what happens?
Tue, Nov 5
Can you give some context on the problem you're trying to solve? This doesn't look quite right, but maybe with some context I can make a suggestion as to how to approach cleanly?
@lebedev.ri Weirdly that does not match the results I get:
Sun, Nov 3
Rebase and use new matchers.
Sat, Nov 2
Fri, Nov 1
@lebedev.ri I added a couple of comments on the JT tests, probably after you opened the tab ^^ The reason for the diffs is that we now evaluate the condition to a constant, while previously these were phi threaded. It has a bit of an odd effect here, but I would expect that in practice folding a branch completely is generally preferable.
Thu, Oct 31
Ugh, I now see that this is actually intentional. I think any changes to the default behavior of update_tests_checks should be discussed on llvm-dev in the future. These kinds of changes are quite disruptive and in this case imho don't pay for themselves.
Even if --function-signature is not present, this generates spurious diffs:
Wed, Oct 30
Mon, Oct 28
@lebedev.ri Might use m_UAddWithOverflow(), iirc it handles all these edge cases.
Sun, Oct 27
Sat, Oct 26
So, this looks fine, but I'm still not quite clear on the use-case. I thought this might be useful for computing ranges of bit ops, but now that I see the implementation, I don't think that's the case. We just get the known top bits, but lose any information about the low bits (which can still be used, though in an operation-specific manner).
I initially thought that there might be a cache invalidation problem here, because we recently started to take nowrap flags into account when computing ranges, and we might end up reasoning based on the no longer present nowrap flags. In particular I had something like this in mind:
LGTM, though I'd suggest a more compact name.
Fri, Oct 25
Thu, Oct 24
Possible use case: A better implementation for binaryAnd/binaryOr. Conceptually, those can be implemented by doing:
Wed, Oct 23
Thanks for looking into this! I think we should definitely do this change as the functionality already exists, but based on the numbers, it probably doesn't make sense to invest in NoWrap implementations of other operators unless we also need them for something else (SCEV was the original motivation here).
Tue, Oct 22
This needs a rebase after D68926. Possibly some parts of it will no longer make sense (though the USUBSAT case at least should still be a win...)
Mon, Oct 21
LGTM. Small overall effect, but I think it still makes sense for completeness, and as far as I know this shouldn't be particularly expensive.
Oct 20 2019
Oct 19 2019
Oct 18 2019
LGTM, we already do the same for div/rem.
Oct 17 2019
Could you please also include test coverage for the smin/smax case? Otherwise this looks good.
Oct 16 2019
@lebedev.ri I believe that pattern is principally handled in canonicalizeSaturatedAdd(), it just depends on the way in which the overflow check is expressed. Apparently it checks for the ~X u< Y ? -1 : X + Y pattern, but not for (X + Y) < X ? -1 : X + Y, which is our canonical non-constant add overflow check form, I think.
A bit surprised that this pattern is not picked up by instcombine: https://godbolt.org/z/hP4wyd
As the property you're ultimately using here is that zext x u<= sext x, would it make more sense to include that as part of isKnownViaNonRecursiveReasoning()? A corollary is that sext x s<= zext x.
Oct 14 2019
Overall this still looks good, and some of the tests (like AArch64/*_sat.ll) would look better than they do right now with appropriate zeroext/signext attributes. I think it's reasonable to expect that extension will often be free when doing promotion.
Oct 12 2019
I'd like to try to extend ConstantRange::makeGuaranteedNoWrapRegion()
to deal with Instruction::Shl so i believe i need rounding right shifts.
I'm wondering if we can't extend Float2Int to convert these to operations on integers. I'm assuming it currently doesn't due to a FIXME: Handle select and phi nodes.
The signed cases are a bit of a mixed bag in isolation, but will probably do better inside a loop or with adjacent instructions.
Generally looks good to me, I'm only wondering whether the trunc is the right place to start the match. Starting from the min/max we could match a larger set of patterns, in particular those where the result of the saturation is still extended to a larger type -- for example doing a 16-bit saturating add but continuing with a 32-bit result.
Oct 10 2019
Oct 8 2019
@hsaito We've put quite a bit of effort into making sure that saturating intrinsics optimize as well or better than expanded IR sequences, which is why they are indeed considered canonical IR. Whether intrinsics are canonical needs to be decided on a case by case basis, there is no general rule about it. For example the unsigned mul overflow intrinsic is considered canonical (for obvious reasons -- the alternatives tend to be much more expensive) and is formed in InstCombine. Unsigned add/sub overflow on the other hand are not, because they tends to optimize much worse than expanded IR. Those are formed in CGP instead.
This still uses the existing promotion when the promoted add/sub_sat is legal. In many situations (but not all) it would probably be better to just perform the new promotion.
LGTM. Especially as we already have the corresponding ashr to lshr transform, this seems like an obvious extension.