- User Since
- Apr 22 2015, 3:01 PM (221 w, 2 d)
Jul 8 2016
Jul 7 2016
OK, sounds reasonable.
Jul 6 2016
What have you done to prove to yourself that this code is no longer necessary?
If unnecessary, then both before and after the change, the code for 401.bzip2 from spec20006 ought to
Jun 15 2016
Jun 14 2016
Ping. I believe I have addressed all comments.
Jun 10 2016
Addressed comments from Ahmed. Got rid of typedef, and created MIR based test.
Jun 9 2016
Updated changes so this will continue to promote 16 bit compares to 32 bits if one of the compare
operands is a constant. This addresses Eli Friedman's comment.
Jun 8 2016
Updated the comment in the test.
Jun 7 2016
Added minimalist (as possible) test, and changed to use auto as suggested.
I'd love to add a test case. Of the spec2000 in tests, 175, 176, 252, 254, 256, and 300 all hit instances where
this new code finds new opportunities when going forward, that are missed when going backwards.
May 31 2016
May 27 2016
Addressed comments from Dave & Quentin.
Updated test to use MIR output, and fixed Dave's minor comments on 32-bit and 16-bit.
May 26 2016
Great, I'll fix the small things Dave pointed out and get this committed.
May 25 2016
May 9 2016
May 6 2016
Nice changes. I think those greatly improve the readability.
May 5 2016
Did you add tests to check that lt/gt conditions don't get transformed?
The code generally looks good. I don't see any tests that explicitly test this code though:
Apr 27 2016
Apr 25 2016
Apr 14 2016
Replied to inline comments.
Apr 13 2016
See inline comments.
Apr 8 2016
LGTM. One nit, fix or not at your discretion.
Added inline comment.
Yes, I have run perf results locally on a number of different kinds of boxes. As stated this does generate exactly the code desired
for PR23155 (modulo the alignment differences noted as having some performance effect in that PR).
Apr 7 2016
From: Simon Pilgrim [mailto:firstname.lastname@example.org]
Sent: Thursday, April 07, 2016 11:03 AM
To: Smith, Kevin B <email@example.com>; Kreitzer, David L
Cc: firstname.lastname@example.org; email@example.com
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251
RKSimon added a subscriber: RKSimon.
RKSimon added a comment.
> Comment at: test/CodeGen/X86/vector-blend.ll:1011
> @@ -1010,3 +1010,3 @@
> ; SSE2-NEXT: pslld $31, %xmm1
> ; SSE2-NEXT: psrad $31, %xmm1
> ; SSE2-NEXT: pxor %xmm1, %xmm0
> I know this isn't related to your change, but the redundant shifts here are
> pretty gross.
Why are the shifts redundant? One is a shift left by 31, and the other is a
arithmetic shift right by 31.This has the effect of propagating bit 0 through all 32 bits of the vector
For the pre-SSE41 cases its doing ashr( shl( lshr( v, 31 ), 31 ), 31) which
should be combined to ashr( v, 31 ) - this isn't that difficult.
For the SSE41 cases we don't need the shifts at all as (v)blendvps will select
elements based on the sign bit alone, but the vselect/blendv relationship is
rather nasty (change in input type behaviour after legalization) and I can
imagine a number of problems getting it to work cleanly - I've hit some of
these before trying to do late constant folding of vselect.
From: David Kreitzer [mailto:firstname.lastname@example.org]
Sent: Thursday, April 07, 2016 8:22 AM
To: Smith, Kevin B <email@example.com>;
firstname.lastname@example.org; Kreitzer, David L <email@example.com>
Subject: Re: [PATCH] D18850: [X86]: Fix for PR27251
DavidKreitzer added a comment.
Comment at: test/CodeGen/X86/vector-blend.ll:1011
@@ -1010,3 +1010,3 @@
; SSE2-NEXT: pslld $31, %xmm1
; SSE2-NEXT: psrad $31, %xmm1
; SSE2-NEXT: pxor %xmm1, %xmm0
I know this isn't related to your change, but the redundant shifts here are
Thank you for the quick review Ahmed. I addressed both your comments.
Addressed Ahmed's review comments.
Apr 6 2016
Mar 28 2016
Feb 21 2016
Feb 19 2016
Thanks for the tips Sanjay.
Feb 18 2016
Doh - missed one instance. Now fixed.
Fixes problem that Quentin pointed out.
Sanjay - See if you like this form for testing with fixup-byte-word-insts both on and off. If you prefer this, then I
could make similar changes to the few other tests that are also affected when fixup-byte-word-insts would change
to be turned on by default.
Feb 12 2016
Feb 11 2016
Addressed inline comments.
New revision should address most of the comments.
Feb 10 2016
Thanks for the comments. I'll work on fixing all those, and have a new version early tomorrow.
Sanjay - Thanks for the encouraging words. I'll be happy to commit whenever folks think that is appropriate.
From: llvm-commits [mailto:firstname.lastname@example.org] On Behalf
Of Sanjay Patel via llvm-commits
Sent: Wednesday, February 10, 2016 10:58 AM
To: email@example.com; firstname.lastname@example.org;
Subject: Re: [PATCH] D17041: [X86] Don't assume that a shuffle operand is
#0: it isn't for VPERMV.
spatel added a comment.
..but now that I look it up, the AVX-512 intrinsics use the instruction order
*sigh*Back to square one.
Wow. Is it too late for Intel to fix/deprecate/rename those intrinsics? If the
argument is that the intrinsics should match the asm, then what happened
with the AVX2 vperm variants?
Feb 9 2016
Jan 21 2016
Two nits. Up to you whether they are worth changing. Either way LGTM.
Jan 15 2016
Dec 15 2015
Dec 3 2015
Nov 9 2015
Oct 21 2015
I would still have preferred that in the "only correct for call-site" case that the cfi_adjust occurred after the pushes rather than before. But I don't feel strongly about that, and so, am good with the changes now.
Oct 15 2015
Please see inline comments.
Oct 14 2015
Oct 7 2015
Sep 9 2015
Abandoned changes due to Quentin's opposition.
Sep 8 2015
There are a handful (6-8) tests that currently are checking for movw, where either movzwl or movw would be legal choices. There are quite a few more that specifically expect movb where movzbl would also be legal. My intent was to add a test that tested for the specific movb vs movzbl or movw vs movzwl when the transformations to change these opcodes got added.
Changed title, and explanation as Sanjay suggested. My expectation is that these tests should allow either the zero-extending
loads, or the plain byte/short instructions, as either are correct for the cpus specified.