This is an archive of the discontinued LLVM Phabricator instance.

[VectorCombine] scalarize compares with insertelement operand(s)
ClosedPublic

Authored by spatel on Jun 11 2020, 7:52 AM.

Details

Summary

Generalize scalarization (recently enhanced with D80885) to allow compares as well as binops.
Similar to binops, we are avoiding scalarization of a loaded value because that could avoid a register transfer in codegen.
This requires 1 extra predicate that I am aware of: we do not want to scalarize the condition value of a vector select. That might also invert a transform that we do in instcombine that prefers a vector condition operand for a vector select.

I think this is the final step in solving PR37463:
https://bugs.llvm.org/show_bug.cgi?id=37463

Diff Detail

Event Timeline

spatel created this revision.Jun 11 2020, 7:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2020, 7:52 AM
lebedev.ri added inline comments.Jun 13 2020, 9:59 AM
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
326–327

This needs more words i'd say.

https://godbolt.org/z/4tqgwP
I believe the problem we are preventing is the differences in bit patterns for true/false between scalar and vector comparisons.

And there are other variants on other targets. (there's enum, can't find it right now)

Can we just model that new extra cost?

lebedev.ri requested changes to this revision.Jun 16 2020, 5:02 AM
This revision now requires changes to proceed.Jun 16 2020, 5:02 AM
spatel marked an inline comment as done.Jun 16 2020, 7:39 AM
spatel added inline comments.
llvm/lib/Transforms/Vectorize/VectorCombine.cpp
326–327

Yes, the bool format (enum BooleanContent in SDAG) is one problem. The other is the potential transfer between register files as the x86 example demonstrates.

IMO, this is similar to the existing bailout for loaded operands below here. We can try to improve the cost model to deal with larger patterns, but the ability to handle this situation doesn't exist currently AFAIK. We don't have a way to specify the combined cost of the not-yet-created scalar op + insertelt with existing users of the vector op.

spatel updated this revision to Diff 271097.Jun 16 2020, 7:47 AM

Patch updated:
Added comments to better explain the select-user bailout and TODO for possible cost model improvement.

This revision is now accepted and ready to land.Jun 16 2020, 8:18 AM
This revision was automatically updated to reflect the committed changes.

Vector combine dies on assertion with this test case.

; opt -vector-combine t.ll -S

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.s = type { i8* }

; Function Attrs: nounwind uwtable
define <4 x i1> @test(%struct.s** nocapture readonly %t1) #0 {

%t5 = insertelement <4 x %struct.s**> undef, %struct.s** %t1, i32 0
%t6 = icmp ne <4 x %struct.s**> %t5, zeroinitializer
ret <4 x i1> %t6

}

attributes #0 = { nounwind uwtable }

Vector combine dies on assertion with this test case.

; opt -vector-combine t.ll -S

target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

%struct.s = type { i8* }

; Function Attrs: nounwind uwtable
define <4 x i1> @test(%struct.s** nocapture readonly %t1) #0 {

%t5 = insertelement <4 x %struct.s**> undef, %struct.s** %t1, i32 0
%t6 = icmp ne <4 x %struct.s**> %t5, zeroinitializer
ret <4 x i1> %t6

}

attributes #0 = { nounwind uwtable }

Thanks - seems like the assert just became too strict for compares, so I changed it here:
rG741e20f3d61