This is an archive of the discontinued LLVM Phabricator instance.

[TTI] Consider select form of and/or i1 as having arithmetic cost
ClosedPublic

Authored by aqjune on Feb 23 2021, 11:11 PM.

Details

Summary

This is a patch that updates the cost of select i1 a, b, false to be equivalent to that of and i1 a, b
as well as the cost of select i1 a, true, b equivalent to or i1 a, b.

Until now, these selects were folded into and/or i1 by InstCombine, but the transformation is poison-unsafe.
This is a step towards removing the unsafe transformation. D93065 has relevant transformations linked.
These selects should be translated into the assemblies as and/or i1 do in the same manner. The cost should be equivalent.

Diff Detail

Event Timeline

aqjune requested review of this revision.Feb 23 2021, 11:11 PM
aqjune created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 23 2021, 11:11 PM

This patch has tests missing, and I'd like to add these. Where should I go to do this?

lebedev.ri added a subscriber: lebedev.ri.

This patch has tests missing, and I'd like to add these. Where should I go to do this?

llvm/test/Analysis/CostModel/

aqjune updated this revision to Diff 326260.Feb 24 2021, 6:23 PM

Thanks!
Tests added

aqjune updated this revision to Diff 326263.Feb 24 2021, 6:37 PM

Update tests to test throughput and size both

aqjune updated this revision to Diff 326264.Feb 24 2021, 6:39 PM

Clang-format

spatel added a comment.Mar 1 2021, 7:39 AM

I can't tell from the tests (so we should probably add at least 1 to confirm) - do we expect the same behavior for vector types?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
997

It would be nice to have a code comment here to show the expected patterns:

// select x, y, false --> x & y
// select x, true, y --> x | y
aqjune added a comment.Mar 1 2021, 7:41 AM

Yes, vectors are expected to have the same cost. I'll update tests for them.

spatel added a comment.Mar 1 2021, 8:03 AM

Yes, vectors are expected to have the same cost. I'll update tests for them.

Thanks - IIUC, we're also going to need to some codegen changes to avoid regressions. For example (the icmps are to avoid diffs due to lack of signext/zeroext specifiers on parameters with vector type):

define <4 x i1> @b(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = select <4 x i1> %c1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %c2
  ret <4 x i1> %s
}
define <4 x i1> @b2(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = or <4 x i1> %c1, %c2
  ret <4 x i1> %s
}
$ llc -o - logical.ll -mattr=avx -mtriple=x86_64--
_b:                                     ## @b
	vmovdqa	LCPI2_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vblendvps	%xmm0, LCPI2_1(%rip), %xmm1, %xmm0
	retq
_b2:                                    ## @b2
	vmovdqa	LCPI3_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vpor	%xmm1, %xmm0, %xmm0
	retq
aqjune updated this revision to Diff 327125.Mar 1 2021, 8:21 AM

add tests for vector operands

aqjune added a comment.Mar 1 2021, 8:26 AM

Thanks - IIUC, we're also going to need to some codegen changes to avoid regressions. For example (the icmps are to avoid diffs due to lack of signext/zeroext specifiers on parameters with vector type):

define <4 x i1> @b(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = select <4 x i1> %c1, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i1> %c2
  ret <4 x i1> %s
}
define <4 x i1> @b2(<4 x i32> %x, <4 x i32> %y) {
  %c1 = icmp eq <4 x i32> %x, <i32 42, i32 42, i32 42, i32 42>
  %c2 = icmp sgt <4 x i32> %y, <i32 42, i32 42, i32 42, i32 42>
  %s = or <4 x i1> %c1, %c2
  ret <4 x i1> %s
}
$ llc -o - logical.ll -mattr=avx -mtriple=x86_64--
_b:                                     ## @b
	vmovdqa	LCPI2_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vblendvps	%xmm0, LCPI2_1(%rip), %xmm1, %xmm0
	retq
_b2:                                    ## @b2
	vmovdqa	LCPI3_0(%rip), %xmm2            ## xmm2 = [42,42,42,42]
	vpcmpeqd	%xmm2, %xmm0, %xmm0
	vpcmpgtd	%xmm2, %xmm1, %xmm1
	vpor	%xmm1, %xmm0, %xmm0
	retq

Thank you for the info..!
I'll check whether regression happens with different targets as well and make patches if any.

select form of and/or operations are eagerly folded by InstCombine currently, so they'll look good hopefully.

aqjune marked an inline comment as done.Mar 1 2021, 8:27 AM
aqjune added a comment.Mar 1 2021, 8:51 AM

Would adding 'select -> and/or i1' transformation into the end of CodeGenPrepare::runOnFunction work?
I see that there are quite many targets that generate different assemblies for select vs. and/or i1.

spatel added inline comments.Mar 1 2021, 8:58 AM
llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
1001

Assert that Op[0/1]->getType()->getScalarSizeInBits() == 1 ?

spatel accepted this revision.Mar 1 2021, 9:00 AM

Would adding 'select -> and/or i1' transformation into the end of CodeGenPrepare::runOnFunction work?
I see that there are quite many targets that generate different assemblies for select vs. and/or i1.

No, I think we should adjust some code in DAGCombiner for this. I was already looking at it, so I can continue if you want to keep working on the IR side.

LGTM.

This revision is now accepted and ready to land.Mar 1 2021, 9:00 AM
aqjune added a comment.Mar 1 2021, 9:04 AM

Would adding 'select -> and/or i1' transformation into the end of CodeGenPrepare::runOnFunction work?
I see that there are quite many targets that generate different assemblies for select vs. and/or i1.

No, I think we should adjust some code in DAGCombiner for this. I was already looking at it, so I can continue if you want to keep working on the IR side.

LGTM.

This is great, thank you!

This revision was landed with ongoing or failed builds.Mar 1 2021, 9:18 AM
This revision was automatically updated to reflect the committed changes.
aqjune marked an inline comment as done.Mar 1 2021, 9:27 AM