This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
InstCombineSelect.cpp
-
test/Transforms/InstCombine/
-
Transforms/
-
InstCombine/
-
select-extractelement.ll

Differential D66095

[InstCombine] canonicalize a scalar-select-of-vectors to vector select
ClosedPublic

Authored by spatel on Aug 12 2019, 10:26 AM.

Download Raw Diff

Details

Reviewers

vporpo
ABataev
dtemirbulatov
efriedma
lebedev.ri
nikic

Commits

rG39eb2324f7ec: [InstCombine] canonicalize a scalar-select-of-vectors to vector select
rL369140: [InstCombine] canonicalize a scalar-select-of-vectors to vector select

Summary

This pattern may arise with an enhancement to SLP vectorization suggested in PR42755:
https://bugs.llvm.org/show_bug.cgi?id=42755

For all in-tree targets that I looked at, codegen looks better when we change to a vector select, so I think this is safe to do without a cost model (in other words, as a target-independent canonicalization).

For example, if the condition of the select is a scalar, we end up with something like this on x86:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpextrb	$12, %xmm0, %eax
	testb	$1, %al
	jne	LBB0_2
## %bb.1:
	vmovaps	%xmm3, %xmm2
LBB0_2:
	vmovaps	%xmm2, %xmm0

Rather than the splat-condition variant:

	vpcmpgtd	%xmm0, %xmm1, %xmm0
	vpshufd	$255, %xmm0, %xmm0      ## xmm0 = xmm0[3,3,3,3]
	vblendvps	%xmm0, %xmm2, %xmm3, %xmm0

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Aug 12 2019, 10:26 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2019, 10:26 AM

Herald added subscribers: hiraditya, mcrosier. · View Herald Transcript

spatel added reviewers: efriedma, lebedev.ri, nikic.Aug 13 2019, 9:16 AM

The codegen improvement is a bug in X86TargetLowering::LowerSELECT(), not dis-similar to https://bugs.llvm.org/show_bug.cgi?id=42903
It never even tries to lower integer vector select to blend, but produces cmov

Optimized type-legalized selection DAG: %bb.0 'extract_cond:'
SelectionDAG has 17 nodes:
  t0: ch = EntryToken
            t6: v4i32,ch = CopyFromReg t0, Register:v4i32 %2
          t21: v16i8 = bitcast t6
        t23: i8 = extract_vector_elt t21, Constant:i64<12>
      t20: i8 = and t23, Constant:i8<1>
      t2: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
      t4: v4i32,ch = CopyFromReg t0, Register:v4i32 %1
    t11: v4i32 = select t20, t2, t4
  t14: ch,glue = CopyToReg t0, Register:v4i32 $xmm0, t11
  t15: ch = X86ISD::RET_FLAG t14, TargetConstant:i32<0>, Register:v4i32 $xmm0, t14:1



Legalizing vector op: t11: v4i32 = select t20, t2, t4
Trying custom legalization
Creating constant: t24: i8 = Constant<5>
Creating constant: t25: i8 = Constant<0>
Creating new node: t26: i32 = X86ISD::CMP t20, Constant:i8<0>
Creating new node: t27: v4i32 = X86ISD::CMOV t4, t2, Constant:i8<5>, t26
Successfully custom legalized node
Vector-legalized selection DAG: %bb.0 'extract_cond:'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
      t4: v4i32,ch = CopyFromReg t0, Register:v4i32 %1
      t2: v4i32,ch = CopyFromReg t0, Register:v4i32 %0
              t6: v4i32,ch = CopyFromReg t0, Register:v4i32 %2
            t21: v16i8 = bitcast t6
          t23: i8 = extract_vector_elt t21, Constant:i64<12>
        t20: i8 = and t23, Constant:i8<1>
      t26: i32 = X86ISD::CMP t20, Constant:i8<0>
    t27: v4i32 = X86ISD::CMOV t4, t2, Constant:i8<5>, t26
  t14: ch,glue = CopyToReg t0, Register:v4i32 $xmm0, t27
  t15: ch = X86ISD::RET_FLAG t14, TargetConstant:i32<0>, Register:v4i32 $xmm0, t14:1

I'm honestly not sure here if i would consider splat-of-i1 or i1 more canonical,
i would kind-of guessed i1 since it is a single bit while vector isn't.

In D66095#1627455, @lebedev.ri wrote:

I'm honestly not sure here if i would consider splat-of-i1 or i1 more canonical,
i would kind-of guessed i1 since it is a single bit while vector isn't.

Yes, I can see that argument. However, if we consider the i1 canonical, then we need to reverse this patch + add the codegen fixup + make sure that IR transforms based on vector splat patterns can match that potential variation of the pattern. That seems like a lot more work than what we have here: try to convert a select with vector true/false operands to also have a vector condition because we know that plays nicely with vector code in general.

In D66095#1627478, @spatel wrote:

In D66095#1627455, @lebedev.ri wrote:

I'm honestly not sure here if i would consider splat-of-i1 or i1 more canonical,
i would kind-of guessed i1 since it is a single bit while vector isn't.

Yes, I can see that argument.

However, if we consider the i1 canonical, then we need to reverse this patch +

True.

add the codegen fixup

To be noted, that codegen fix is wanted regardless.

+ make sure that IR transforms based on vector splat patterns can match that potential variation of the pattern.

I don't have any good grasp to guess what those would be.

That seems like a lot more work than what we have here: try to convert a select with vector true/false operands to also have a vector condition because we know that plays nicely with vector code in general.

I just feel like pointing out that it is the case because we don't do said opposite transform (select based on splat -> select-of-vecs) in dagcombine as of now.
But yeah, i suspect this may be more canonical; e.g. on x86 there is no i128 blend, only <? X ?> blends.

Looks ok to me, but would be good for someone else to double-check.

This revision is now accepted and ready to land.Aug 14 2019, 2:46 PM

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

In D66095#1630408, @ABataev wrote:

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

In D66095#1631495, @spatel wrote:

In D66095#1630408, @ABataev wrote:

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

But it adds extra cost for vectors splitting. Maybe limit the size of the vectors in the patch?

In D66095#1631501, @ABataev wrote:

In D66095#1631495, @spatel wrote:

In D66095#1630408, @ABataev wrote:

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

But it adds extra cost for vectors splitting. Maybe limit the size of the vectors in the patch?

How would we do that? This is canonicalization, so we are not using any target-dependent information here. Presumably, whoever or whatever created the illegal vectors in the first place knows that codegen will have to alter the those ops to create legal code, so that will be handled by a pass that has a cost model.

In D66095#1631524, @spatel wrote:

In D66095#1631501, @ABataev wrote:

In D66095#1631495, @spatel wrote:

In D66095#1630408, @ABataev wrote:

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

But it adds extra cost for vectors splitting. Maybe limit the size of the vectors in the patch?

How would we do that? This is canonicalization, so we are not using any target-dependent information here. Presumably, whoever or whatever created the illegal vectors in the first place knows that codegen will have to alter the those ops to create legal code, so that will be handled by a pass that has a cost model.

It means that you can make a transformation that may be less cost-effective than the original code.

In D66095#1631547, @ABataev wrote:

In D66095#1631524, @spatel wrote:

In D66095#1631501, @ABataev wrote:

In D66095#1631495, @spatel wrote:

In D66095#1630408, @ABataev wrote:

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

But it adds extra cost for vectors splitting. Maybe limit the size of the vectors in the patch?

How would we do that? This is canonicalization, so we are not using any target-dependent information here. Presumably, whoever or whatever created the illegal vectors in the first place knows that codegen will have to alter the those ops to create legal code, so that will be handled by a pass that has a cost model.

It means that you can make a transformation that may be less cost-effective than the original code.

That's correct. As for much of instcombine, this is not proposed as an optimization, just a canonicalization. If there's reason to believe that this will induce more perf regressions than wins, then we should prepare the backend to reverse the transform in advance. But as I wrote before, I don't see that happening for the common cases/targets that I looked at.

In D66095#1631547, @ABataev wrote:

In D66095#1631524, @spatel wrote:

In D66095#1631501, @ABataev wrote:

In D66095#1631495, @spatel wrote:

In D66095#1630408, @ABataev wrote:

What about very long vectors that do no fit into single register? Is it cost effective for such vectors too?

We would split the long vectors into values that fit the target registers in the backend. At that point, the target can decide if N vector selects are better or worse than a transfer to scalar compare and branch. As @lebedev.ri mentioned, we're missing that logic in SDAG, but given that this transform produces the better code for the default case, I don't think we need to make this patch dependent on backend fixups.

But it adds extra cost for vectors splitting. Maybe limit the size of the vectors in the patch?

How would we do that? This is canonicalization, so we are not using any target-dependent information here. Presumably, whoever or whatever created the illegal vectors in the first place knows that codegen will have to alter the those ops to create legal code, so that will be handled by a pass that has a cost model.

It means that you can make a transformation that may be less cost-effective than the original code.

I agree with @spatel, there is no TTI in InstCombine, neither should there be.

In general, i do think this fold is the opposite from what the correct canonicalization is,
but as @spatel notes, this fix clearly results in better results right as of this moment.
Things can be readjusted later.

Closed by commit rL369140: [InstCombine] canonicalize a scalar-select-of-vectors to vector select (authored by spatel). · Explain WhyAug 16 2019, 11:50 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

InstCombine/

InstCombineSelect.cpp

27 lines

test/

Transforms/

InstCombine/

select-extractelement.ll

14 lines

Diff 215650

llvm/trunk/lib/Transforms/InstCombine/InstCombineSelect.cpp

Show First 20 Lines • Show All 1,690 Lines • ▼ Show 20 Lines	if (Elt->isOneValue()) {
return nullptr;		return nullptr;
}		}
}		}

return new ShuffleVectorInst(SI.getTrueValue(), SI.getFalseValue(),		return new ShuffleVectorInst(SI.getTrueValue(), SI.getFalseValue(),
ConstantVector::get(Mask));		ConstantVector::get(Mask));
}		}

		/// If we have a select of vectors with a scalar condition, try to convert that
		/// to a vector select by splatting the condition. A splat may get folded with
		/// other operations in IR and having all operands of a select be vector types
		/// is likely better for vector codegen.
		static Instruction *canonicalizeScalarSelectOfVecs(
		SelectInst &Sel, InstCombiner::BuilderTy &Builder) {
		Type *Ty = Sel.getType();
		if (!Ty->isVectorTy())
		return nullptr;

		// We can replace a single-use extract with constant index.
		Value *Cond = Sel.getCondition();
		if (!match(Cond, m_OneUse(m_ExtractElement(m_Value(), m_ConstantInt()))))
		return nullptr;

		// select (extelt V, Index), T, F --> select (splat V, Index), T, F
		// Splatting the extracted condition reduces code (we could directly create a
		// splat shuffle of the source vector to eliminate the intermediate step).
		unsigned NumElts = Ty->getVectorNumElements();
		Value *SplatCond = Builder.CreateVectorSplat(NumElts, Cond);
		Sel.setCondition(SplatCond);
		return &Sel;
		}

/// Reuse bitcasted operands between a compare and select:		/// Reuse bitcasted operands between a compare and select:
/// select (cmp (bitcast C), (bitcast D)), (bitcast' C), (bitcast' D) -->		/// select (cmp (bitcast C), (bitcast D)), (bitcast' C), (bitcast' D) -->
/// bitcast (select (cmp (bitcast C), (bitcast D)), (bitcast C), (bitcast D))		/// bitcast (select (cmp (bitcast C), (bitcast D)), (bitcast C), (bitcast D))
static Instruction *foldSelectCmpBitcasts(SelectInst &Sel,		static Instruction *foldSelectCmpBitcasts(SelectInst &Sel,
InstCombiner::BuilderTy &Builder) {		InstCombiner::BuilderTy &Builder) {
Value *Cond = Sel.getCondition();		Value *Cond = Sel.getCondition();
Value *TVal = Sel.getTrueValue();		Value *TVal = Sel.getTrueValue();
Value *FVal = Sel.getFalseValue();		Value *FVal = Sel.getFalseValue();
▲ Show 20 Lines • Show All 280 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitSelectInst(SelectInst &SI) {

if (Value *V = SimplifySelectInst(CondVal, TrueVal, FalseVal,		if (Value *V = SimplifySelectInst(CondVal, TrueVal, FalseVal,
SQ.getWithInstruction(&SI)))		SQ.getWithInstruction(&SI)))
return replaceInstUsesWith(SI, V);		return replaceInstUsesWith(SI, V);

if (Instruction *I = canonicalizeSelectToShuffle(SI))		if (Instruction *I = canonicalizeSelectToShuffle(SI))
return I;		return I;

		if (Instruction *I = canonicalizeScalarSelectOfVecs(SI, Builder))
		return I;

// Canonicalize a one-use integer compare with a non-canonical predicate by		// Canonicalize a one-use integer compare with a non-canonical predicate by
// inverting the predicate and swapping the select operands. This matches a		// inverting the predicate and swapping the select operands. This matches a
// compare canonicalization for conditional branches.		// compare canonicalization for conditional branches.
// TODO: Should we do the same for FP compares?		// TODO: Should we do the same for FP compares?
CmpInst::Predicate Pred;		CmpInst::Predicate Pred;
if (match(CondVal, m_OneUse(m_ICmp(Pred, m_Value(), m_Value()))) &&		if (match(CondVal, m_OneUse(m_ICmp(Pred, m_Value(), m_Value()))) &&
!isCanonicalPredicate(Pred)) {		!isCanonicalPredicate(Pred)) {
// Swap true/false values and condition.		// Swap true/false values and condition.
▲ Show 20 Lines • Show All 441 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/InstCombine/select-extractelement.ll

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	entry:
%a.sink3 = select i1 %tobool11, <4 x float> %a, <4 x float> %b		%a.sink3 = select i1 %tobool11, <4 x float> %a, <4 x float> %b
%10 = extractelement <4 x float> %a.sink3, i32 3		%10 = extractelement <4 x float> %a.sink3, i32 3
%11 = insertelement <4 x float> %8, float %10, i32 3		%11 = insertelement <4 x float> %8, float %10, i32 3
ret <4 x float> %11		ret <4 x float> %11
}		}

define <4 x i32> @extract_cond(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv) {		define <4 x i32> @extract_cond(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv) {
; CHECK-LABEL: @extract_cond(		; CHECK-LABEL: @extract_cond(
; CHECK-NEXT: [[COND:%.]] = extractelement <4 x i1> [[CONDV:%.]], i32 3		; CHECK-NEXT: [[DOTSPLAT:%.]] = shufflevector <4 x i1> [[CONDV:%.]], <4 x i1> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: [[R:%.]] = select i1 [[COND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]		; CHECK-NEXT: [[R:%.]] = select <4 x i1> [[DOTSPLAT]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%cond = extractelement <4 x i1> %condv, i32 3		%cond = extractelement <4 x i1> %condv, i32 3
%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y		%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y
ret <4 x i32> %r		ret <4 x i32> %r
}		}

define <4 x i32> @splat_cond(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv) {		define <4 x i32> @splat_cond(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv) {
; CHECK-LABEL: @splat_cond(		; CHECK-LABEL: @splat_cond(
; CHECK-NEXT: [[SPLATCOND:%.]] = shufflevector <4 x i1> [[CONDV:%.]], <4 x i1> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		; CHECK-NEXT: [[SPLATCOND:%.]] = shufflevector <4 x i1> [[CONDV:%.]], <4 x i1> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
; CHECK-NEXT: [[R:%.]] = select <4 x i1> [[SPLATCOND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]		; CHECK-NEXT: [[R:%.]] = select <4 x i1> [[SPLATCOND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%splatcond = shufflevector <4 x i1> %condv, <4 x i1> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>		%splatcond = shufflevector <4 x i1> %condv, <4 x i1> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3>
%r = select <4 x i1> %splatcond, <4 x i32> %x, <4 x i32> %y		%r = select <4 x i1> %splatcond, <4 x i32> %x, <4 x i32> %y
ret <4 x i32> %r		ret <4 x i32> %r
}		}

declare void @extra_use(i1)		declare void @extra_use(i1)

		; Negative test

define <4 x i32> @extract_cond_extra_use(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv) {		define <4 x i32> @extract_cond_extra_use(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv) {
; CHECK-LABEL: @extract_cond_extra_use(		; CHECK-LABEL: @extract_cond_extra_use(
; CHECK-NEXT: [[COND:%.]] = extractelement <4 x i1> [[CONDV:%.]], i32 3		; CHECK-NEXT: [[COND:%.]] = extractelement <4 x i1> [[CONDV:%.]], i32 3
; CHECK-NEXT: call void @extra_use(i1 [[COND]])		; CHECK-NEXT: call void @extra_use(i1 [[COND]])
; CHECK-NEXT: [[R:%.]] = select i1 [[COND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]		; CHECK-NEXT: [[R:%.]] = select i1 [[COND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%cond = extractelement <4 x i1> %condv, i32 3		%cond = extractelement <4 x i1> %condv, i32 3
call void @extra_use(i1 %cond)		call void @extra_use(i1 %cond)
%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y		%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y
ret <4 x i32> %r		ret <4 x i32> %r
}		}

		; Negative test

define <4 x i32> @extract_cond_variable_index(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv, i32 %index) {		define <4 x i32> @extract_cond_variable_index(<4 x i32> %x, <4 x i32> %y, <4 x i1> %condv, i32 %index) {
; CHECK-LABEL: @extract_cond_variable_index(		; CHECK-LABEL: @extract_cond_variable_index(
; CHECK-NEXT: [[COND:%.]] = extractelement <4 x i1> [[CONDV:%.]], i32 [[INDEX:%.*]]		; CHECK-NEXT: [[COND:%.]] = extractelement <4 x i1> [[CONDV:%.]], i32 [[INDEX:%.*]]
; CHECK-NEXT: [[R:%.]] = select i1 [[COND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]		; CHECK-NEXT: [[R:%.]] = select i1 [[COND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%cond = extractelement <4 x i1> %condv, i32 %index		%cond = extractelement <4 x i1> %condv, i32 %index
%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y		%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y
ret <4 x i32> %r		ret <4 x i32> %r
}		}

		; IR shuffle can alter the number of elements in the vector, so this is ok.

define <4 x i32> @extract_cond_type_mismatch(<4 x i32> %x, <4 x i32> %y, <5 x i1> %condv) {		define <4 x i32> @extract_cond_type_mismatch(<4 x i32> %x, <4 x i32> %y, <5 x i1> %condv) {
; CHECK-LABEL: @extract_cond_type_mismatch(		; CHECK-LABEL: @extract_cond_type_mismatch(
; CHECK-NEXT: [[COND:%.]] = extractelement <5 x i1> [[CONDV:%.]], i32 1		; CHECK-NEXT: [[DOTSPLAT:%.]] = shufflevector <5 x i1> [[CONDV:%.]], <5 x i1> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1>
; CHECK-NEXT: [[R:%.]] = select i1 [[COND]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]		; CHECK-NEXT: [[R:%.]] = select <4 x i1> [[DOTSPLAT]], <4 x i32> [[X:%.]], <4 x i32> [[Y:%.*]]
; CHECK-NEXT: ret <4 x i32> [[R]]		; CHECK-NEXT: ret <4 x i32> [[R]]
;		;
%cond = extractelement <5 x i1> %condv, i32 1		%cond = extractelement <5 x i1> %condv, i32 1
%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y		%r = select i1 %cond, <4 x i32> %x, <4 x i32> %y
ret <4 x i32> %r		ret <4 x i32> %r
}		}


attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }		attributes #0 = { nounwind ssp uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "unsafe-fp-math"="false" "use-soft-float"="false" }