- User Since
- May 22 2014, 1:24 PM (169 w, 1 d)
Thanks for breaking this up - LGTM.
See inline for a couple of nits.
Note: memcmp expansion for x86 was reduced with https://reviews.llvm.org/rL308986 because it caused a perf regression in PR33914 ( https://bugs.llvm.org/show_bug.cgi?id=33914 ).
There's a proposal to revert the expansion for x86 entirely in PR34032 ( https://bugs.llvm.org/show_bug.cgi?id=34032 ) because of a related perf regression and compile-time explosion in CGP (PR33900 - https://bugs.llvm.org/show_bug.cgi?id=33900 ). IMO, we need to solve the CGP addressing-mode bug anyway, but just so everyone is aware - memcmp expansion for x86 is still shaky at this point.
Thu, Aug 17
Remove the general case fold which actually isn't a win if vselect is fast. We only convert the vselect when there's a clear optimization now.
On Thu, Aug 17, 2017 at 3:52 PM, <email@example.com> wrote:btw, to note, blendv *is* a “fast op” on most arches; what’s one extra uop is the fact that the V form takes 3 inputs. iirc, it’s the same reason that ADC, CMOV, etc can’t be 1 uop (2 inputs + flags). the forms that don’t take 3 inputs are usually 1 uop.
Adding some more x86 experts.
> For x86, blendv* is always a multi-uop / multi-cycle instruction according to Agner's docs
Are you sure?
Bulldozer, Piledriver, Ryzen, and Skylake seem to list PBLENDVB and BLENDVPS as 1 uop.
Thanks for the updates. I have no other suggestions.
Wed, Aug 16
I checked in the tests from cmov-promotion.ll at rL311052, so we'd see the baseline codegen. Can you rebase after that?
Just to make sure I'm understanding: the multiply patch (D36679) is independent of this?
Tue, Aug 15
Can we have the ashr fix as a separate patch before the mul patch? If yes, make tests for just that diff and split this up?
LGTM with change of the test diffs so this one goes in first.
Can you commit this one first with at least one test independent of those in D36498 to avoid the extra inst regressions seen there?
The comment about multi-uses of the compare in D36711 made me wonder what we're doing here. Should we check one-use before trying to transform?
I think all of the non-multiply tests are simplified after rL310942. Can you double-check that? If so, you can remove those tests from this patch. Feel free to add them as an independent NFC commit if they cover some case that isn't covered already.
It's worth noting that we could convert any select of constants to math ops. I was planning a follow-up to https://reviews.llvm.org/rL310717 that would do that, but there might be cases where we would prefer to select cmov? If not, that would make this patch obsolete.
Mon, Aug 14
Make sure I'm seeing this correctly: this patch should contain additional tests for ule/uge; ult/ugt should be added to the step before this (D36593)?
Wait - this does have a testable change, right? You can check the ult/ugt cases from InstSimplify with this:
I've been operating under the assumption that we want to transform in the opposite direction in IR. Ie, preserve and probably create more select-of-constants (see https://reviews.llvm.org/D24480 which I am planning to abandon). We should be able to reduce selects to logic/math in the DAG where it makes sense (eg, https://reviews.llvm.org/rL310717 ). Some reasons to prefer select were noted in:
Sun, Aug 13
LGTM - thanks!
Sat, Aug 12
Replace the assume deletion with a comment about why it can't happen. :)
Patch updated - sorry, the last diff was not the one I intended to upload. This is NFC from the last rev, but:
- Includes Hal's comment about range metadata.
- Has CHECK lines for the poison_on_call_user_is_ok() test.
Fri, Aug 11
Patch updated - mostly just cutting and pasting the code Hal posted:
- Make isExternallyVisible() check the calculated demanded bits rather than a list of opcodes.
- Add a check for and delete llvm.assume.
- Use a loop (worklist) rather than recursion to avoid blowing up the stack.
- Add an exclusion list based on opcodes to stop the search.
- Add a test to show that the exclusion list works for a call inst.
- Add a TODO comment with Nuno's suggestion from the bug report to make this better.
Thu, Aug 10
This patch as uploaded makes even less sense to me than before:
- There are no tests to show the diff.
- You've changed the code rather than an assertion in an unrelated test to show a functional difference?
Wed, Aug 9
As shown in PR34097 ( https://bugs.llvm.org/show_bug.cgi?id=34097 ), I failed to account for the subtract overflow case, so add a check for that.
Patch updated - no functional changes intended from the previous rev, but cleaner:
- Use isa<VectorType> in the assert and the callers' type check to make it clearer that vectors are allowed.
- Use m_SpecificInt in pattern matches to reduce code.
Tue, Aug 8
Mon, Aug 7
- Add an assert for the shouldChangeType() guard that's in the caller function.
- Replace the explicit match for zext of the shift value with a call to MaskedValueIsZero().
- Remove the MaskedValueIsZero() check for the shift amount and generate our own masks to make the transform safe:
- Add test to exercise #2 and #3.
Sun, Aug 6
Sat, Aug 5
Fri, Aug 4
Thu, Aug 3
Wed, Aug 2
Please add the sext test to this patch. I assume a zext variant already exists - if not, we could use one of those too.
That way we'll be sure that the intended folds are still happening via other mechanisms.
LGTM. I think it makes sense for the same reason as the zext case.
Double-check to make sure, but I think the sext variant isn't handled if we remove this block (sadly, there's no regression test for it):
I pushed 'test7' through llc for x86 and PPC64LE, and no problems. But then I tried AArch64 and ARM, and they went nuts whether it was an 'xor' or an 'and':