Download Raw Diff

Details

Reviewers

fhahn
nikic
spatel
dmgreen
lebedev.ri

Commits

rGc89d9d8a48c0: [TTI] Consider select form of and/or i1 as having arithmetic cost

Summary

This is a patch that updates the cost of select i1 a, b, false to be equivalent to that of and i1 a, b
as well as the cost of select i1 a, true, b equivalent to or i1 a, b.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aqjune requested review of this revision.Feb 23 2021, 11:11 PM

aqjune created this revision.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2021, 11:11 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

This patch has tests missing, and I'd like to add these. Where should I go to do this?

Harbormaster completed remote builds in B90544: Diff 325992.Feb 24 2021, 12:20 AM

In D97360#2584048, @aqjune wrote:

This patch has tests missing, and I'd like to add these. Where should I go to do this?

llvm/test/Analysis/CostModel/

Thanks!
Tests added

Herald added subscribers: frasercrmck, kerbowa, luismarques and 23 others. · View Herald TranscriptFeb 24 2021, 6:23 PM

Update tests to test throughput and size both

Clang-format

Harbormaster completed remote builds in B90724: Diff 326260.Feb 24 2021, 9:20 PM

Harbormaster completed remote builds in B90726: Diff 326263.Feb 24 2021, 9:30 PM

Harbormaster completed remote builds in B90727: Diff 326264.Feb 24 2021, 9:50 PM

I can't tell from the tests (so we should probably add at least 1 to confirm) - do we expect the same behavior for vector types?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
997	It would be nice to have a code comment here to show the expected patterns: // select x, y, false --> x & y // select x, true, y --> x \| y

Yes, vectors are expected to have the same cost. I'll update tests for them.

In D97360#2594268, @aqjune wrote:

Yes, vectors are expected to have the same cost. I'll update tests for them.

Thanks - IIUC, we're also going to need to some codegen changes to avoid regressions. For example (the icmps are to avoid diffs due to lack of signext/zeroext specifiers on parameters with vector type):

define <4 x i1> @b(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = select <4 x i1> %c1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %c2
  ret <4 x i1> %s
}
define <4 x i1> @b2(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = or <4 x i1> %c1, %c2
  ret <4 x i1> %s
}

$ llc -o - logical.ll -mattr=avx -mtriple=x86_64--
_b:                                     ## @b
	vmovdqa	LCPI2_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vblendvps	%xmm0, LCPI2_1(%rip), %xmm1, %xmm0
	retq
_b2:                                    ## @b2
	vmovdqa	LCPI3_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vpor	%xmm1, %xmm0, %xmm0
	retq

add tests for vector operands

In D97360#2594315, @spatel wrote:

define <4 x i1> @b(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = select <4 x i1> %c1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %c2
  ret <4 x i1> %s
}
define <4 x i1> @b2(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = or <4 x i1> %c1, %c2
  ret <4 x i1> %s
}

$ llc -o - logical.ll -mattr=avx -mtriple=x86_64--
_b:                                     ## @b
	vmovdqa	LCPI2_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vblendvps	%xmm0, LCPI2_1(%rip), %xmm1, %xmm0
	retq
_b2:                                    ## @b2
	vmovdqa	LCPI3_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vpor	%xmm1, %xmm0, %xmm0
	retq

Thank you for the info..!
I'll check whether regression happens with different targets as well and make patches if any.

select form of and/or operations are eagerly folded by InstCombine currently, so they'll look good hopefully.

aqjune marked an inline comment as done.Mar 1 2021, 8:27 AM

Would adding 'select -> and/or i1' transformation into the end of CodeGenPrepare::runOnFunction work?
I see that there are quite many targets that generate different assemblies for select vs. and/or i1.

spatel added inline comments.Mar 1 2021, 8:58 AM

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1001	Assert that Op[0/1]->getType()->getScalarSizeInBits() == 1 ?

In D97360#2594420, @aqjune wrote:

Would adding 'select -> and/or i1' transformation into the end of CodeGenPrepare::runOnFunction work?
I see that there are quite many targets that generate different assemblies for select vs. and/or i1.

No, I think we should adjust some code in DAGCombiner for this. I was already looking at it, so I can continue if you want to keep working on the IR side.

LGTM.

This revision is now accepted and ready to land.Mar 1 2021, 9:00 AM

In D97360#2594461, @spatel wrote:

In D97360#2594420, @aqjune wrote:

Would adding 'select -> and/or i1' transformation into the end of CodeGenPrepare::runOnFunction work?
I see that there are quite many targets that generate different assemblies for select vs. and/or i1.

No, I think we should adjust some code in DAGCombiner for this. I was already looking at it, so I can continue if you want to keep working on the IR side.

LGTM.