This is an archive of the discontinued LLVM Phabricator instance.

[x86] use psubus for more vsetcc lowering (PR39859)
ClosedPublic

Authored by spatel on Apr 17 2019, 2:51 PM.

Details

Summary

Circling back to a leftover bit from PR39859:
https://bugs.llvm.org/show_bug.cgi?id=39859#c1

...we have this counter-intuitive (based on the test diffs) opportunity to use 'psubus'. This appears to be the better perf option for both Haswell and Jaguar based on llvm-mca. We already do this transform for the SETULT predicate, so this would make the code more symmetrical too. If we have pminub/pminuw, we prefer those, so this should not affect anything but pre-SSE4.1 subtargets.

$ cat before.s 
	movdqa	-16(%rip), %xmm2    ## xmm2 = [32768,32768,32768,32768,32768,32768,32768,32768]
	pxor	%xmm0, %xmm2
	pcmpgtw	-32(%rip), %xmm2 ## xmm2 = [255,255,255,255,255,255,255,255]
	pand	%xmm2, %xmm0
	pandn	%xmm1, %xmm2
	por	%xmm2, %xmm0

$ cat after.s 
	movdqa	-16(%rip), %xmm2    ## xmm2 = [256,256,256,256,256,256,256,256]
	psubusw	%xmm0, %xmm2
	pxor	%xmm3, %xmm3
	pcmpeqw	%xmm2, %xmm3
	pand	%xmm3, %xmm0
	pandn	%xmm1, %xmm3
	por	%xmm3, %xmm0
$ llvm-mca before.s -mcpu=haswell
Iterations:        100
Instructions:      600
Total Cycles:      909
Total uOps:        700

Dispatch Width:    4
uOps Per Cycle:    0.77
IPC:               0.66
Block RThroughput: 1.8


$ llvm-mca after.s -mcpu=haswell
Iterations:        100
Instructions:      700
Total Cycles:      409
Total uOps:        700

Dispatch Width:    4
uOps Per Cycle:    1.71
IPC:               1.71
Block RThroughput: 1.8

Diff Detail

Event Timeline

spatel created this revision.Apr 17 2019, 2:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2019, 2:51 PM
craig.topper added inline comments.Apr 17 2019, 6:33 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
19749

Would it make sense to merge this with decrementVectorConstant using a flag or something?

spatel marked an inline comment as done.Apr 18 2019, 6:28 AM
spatel added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
19749

Yes - I just did the slightly quicker copy/paste version first to see if there were any objections to the direction. Will update.

spatel updated this revision to Diff 195729.Apr 18 2019, 6:36 AM

Patch updated:
Add inc or dec param to helper function to reduce code duplication.

RKSimon accepted this revision.Apr 23 2019, 4:28 AM

LGTM

This revision is now accepted and ready to land.Apr 23 2019, 4:28 AM
This revision was automatically updated to reflect the committed changes.