This is an archive of the discontinued LLVM Phabricator instance.

[CostModel] Return TCC_Expensive for non-speculatable ctlz/cttz.
Needs ReviewPublic

Authored by fhahn on Oct 16 2020, 11:35 AM.

Details

Summary

Before 871556a494552c0f503eec17055f075bcd859937, we would return
TCC_Expensive for non-speculatable CTTZ/CTLZ, but the patch removed the
exit. I am not sure if that was intentional, but it seems now we also
treat un-speculatable CTLZ/CTTZ as non-expensive.

See the change in the test case. The current speculation limit in
SimplifyCFG is set so that a single expensive instruction can be
speculated, but in test9_loop and expensive and a cheap instruction
needs speculating, pushing it over the limit.

Note that currently the X86 backend considers CTTZ/CTLZ as cheap to speculate
on architectures like haswell or skylake, where it is actually quite expensive.
But that is a separate issue.

Diff Detail

Event Timeline

fhahn created this revision.Oct 16 2020, 11:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 16 2020, 11:35 AM
Herald added a subscriber: pengfei. · View Herald Transcript
fhahn requested review of this revision.Oct 16 2020, 11:35 AM

Doesn't this affect tests in test/Analysis/CostModel/X86 ?
We might need to enclose the speculatable -> basic vs. expensive
difference inside the vector checks.

fhahn updated this revision to Diff 298717.Oct 16 2020, 12:21 PM

Doesn't this affect tests in test/Analysis/CostModel/X86 ?
We might need to enclose the speculatable -> basic vs. expensive
difference inside the vector checks.

I originally only updated the simplifycfg checks. Now all impacted tests should be updated.

This might be restoring the old behavior, but I want to make sure we are ok with the potential regressions. If not, we should make some adjustments to the x86 model first.

The SLP v4i32 diff with zero-is-undef set would be something like this, and it's hard to justify IMO:

	bsfl	src32(%rip), %eax
	bsfl	src32+4(%rip), %ecx
	bsfl	src32+8(%rip), %edx
	bsfl	src32+12(%rip), %esi
	movl	%eax, dst32(%rip)
	movl	%ecx, dst32+4(%rip)
	movl	%edx, dst32+8(%rip)
	movl	%esi, dst32+12(%rip)
	retq

vs.

	movdqa	src32(%rip), %xmm0
	pcmpeqd	%xmm1, %xmm1
	paddd	%xmm0, %xmm1
	pandn	%xmm1, %xmm0
	movdqa	%xmm0, %xmm1
	psrlw	$1, %xmm1
	pand	.LCPI0_0(%rip), %xmm1
	psubb	%xmm1, %xmm0
	movdqa	.LCPI0_1(%rip), %xmm1           # xmm1 = [51,51,51,51,51,51,51,51,51,51,51,51,51,51,51,51]
	movdqa	%xmm0, %xmm2
	pand	%xmm1, %xmm2
	psrlw	$2, %xmm0
	pand	%xmm1, %xmm0
	paddb	%xmm2, %xmm0
	movdqa	%xmm0, %xmm1
	psrlw	$4, %xmm1
	paddb	%xmm0, %xmm1
	pand	.LCPI0_2(%rip), %xmm1
	pxor	%xmm0, %xmm0
	movdqa	%xmm1, %xmm2
	punpckhdq	%xmm0, %xmm2            # xmm2 = xmm2[2],xmm0[2],xmm2[3],xmm0[3]
	psadbw	%xmm0, %xmm2
	punpckldq	%xmm0, %xmm1            # xmm1 = xmm1[0],xmm0[0],xmm1[1],xmm0[1]
	psadbw	%xmm0, %xmm1
	packuswb	%xmm2, %xmm1
	movdqa	%xmm1, dst32(%rip)
	retq
llvm/test/CodeGen/X86/dagcombine-select.ll
438–441

I think we would view this as a regression based on https://bugs.llvm.org/PR46203 / 2328cab16ccd8f17fee782c29fb844662c089fbb

Do we need to adjust the isCheapToSpeculateXXXX APIs to acknowledge the zero-is-undef parameter of the intrinsic?

RKSimon added inline comments.Oct 22 2020, 1:53 AM
llvm/test/CodeGen/X86/dagcombine-select.ll
438–441

Yes, it feels like we're going to need some combo of checks for zero-is-undef and if the value is known-never-zero if we can.

lebedev.ri resigned from this revision.Jan 12 2023, 5:19 PM

This review seems to be stuck/dead, consider abandoning if no longer relevant.

Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2023, 5:19 PM