This is an archive of the discontinued LLVM Phabricator instance.

[X86] Combine constant vector inputs for FMA
ClosedPublic

Authored by e-kud on Mar 20 2023, 7:49 PM.

Diff Detail

Event Timeline

e-kud created this revision.Mar 20 2023, 7:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 20 2023, 7:49 PM
e-kud updated this revision to Diff 507565.Mar 22 2023, 5:55 PM

Limit to constant splat vectors.

e-kud updated this revision to Diff 508868.Mar 27 2023, 7:05 PM

Rebased

e-kud published this revision for review.Mar 27 2023, 7:09 PM
e-kud added reviewers: pengfei, goldstein.w.n, RKSimon.
Herald added a project: Restricted Project. · View Herald TranscriptMar 27 2023, 7:09 PM
pengfei added inline comments.Mar 27 2023, 7:36 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
54037

Why just limit to splat constant?

RKSimon added inline comments.Mar 28 2023, 3:49 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
54037

Also - why the AVX2 and hasOneUse limits?

RKSimon added inline comments.Mar 28 2023, 3:51 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
54037

Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants

goldstein.w.n added inline comments.Mar 28 2023, 10:20 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
54037

Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants

That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally.

Also why AVX2 still?

54037

Also - why the AVX2 and hasOneUse limits?

54037

Why just limit to splat constant?

Should be match on ImmConstant no?

e-kud added inline comments.Mar 28 2023, 7:02 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
54037

Also - why the AVX2 and hasOneUse limits?

Why just limit to splat constant?

This is to limit to broadcasts, probably it is better to work with broadcast explicitly.

I tried more general approach but it seems not much profitable. E.g.

@@ -1,7 +1,7 @@
-; FMA3-LABEL: negated_constant_v4f64:
 ; FMA3:       # %bb.0:
 ; FMA3-NEXT:    vmovapd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1]
-; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm2 = (ymm0 * ymm2) + ymm1
-; FMA3-NEXT:    vfmadd231pd {{.*#+}} ymm1 = (ymm0 * mem) + ymm1
-; FMA3-NEXT:    vaddpd %ymm1, %ymm2, %ymm0
+; FMA3-NEXT:    vmovapd %ymm2, %ymm3
+; FMA3-NEXT:    vfmadd213pd {{.*#+}} ymm3 = (ymm0 * ymm3) + ymm1
+; FMA3-NEXT:    vfnmadd213pd {{.*#+}} ymm2 = -(ymm0 * ymm2) + ymm1
+; FMA3-NEXT:    vaddpd %ymm2, %ymm3, %ymm0
 ; FMA3-NEXT:    retq

What do you think?

54037

Should be match on ImmConstant no?

AFAIK, m_* matching works with IR. Here we work with DAG. Am I missing something?

54037

Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants

Yes. The problem appears when each constant initially has several users, then we start ping-ponging between them. To solve it thoroughly we need to check whether a constant can be eliminated completely: take two constant vectors and check that all users are fmas. It seems out of combiner context to me, doesn't it? However, I see some code that checks instruction's users during combining, maybe, it is the right way.

e-kud updated this revision to Diff 509881.Mar 30 2023, 7:30 PM

Addressed the comments.

pengfei added inline comments.Mar 30 2023, 7:58 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
53984–53985

Do we need to check it again given it's the only condition checked in line 54073?

53986

This doesn't explain why we need to iterate all its uses?

53988

== is intended? Shouldn't it be covered by User->getOpcode() != ISD::FMA?

53996

Maybe use cast?

53999–54000

Any case can fall into here?

54008–54010

Is the comment for line 54015? Why checking the use again?

54012

ditto.

llvm/test/CodeGen/X86/fma-fneg-combine-2.ll
132–133

We can make the elements different now, right?

e-kud updated this revision to Diff 510166.Mar 31 2023, 8:37 PM
e-kud marked 7 inline comments as done.

Addressed comments.
Added comments.
Fixed negative value search in case of undefs.
Updated tests.

e-kud added inline comments.Mar 31 2023, 8:39 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
53984–53985

Do we need to check it again given it's the only condition checked in line 54073?

Dropped it.

53986

This doesn't explain why we need to iterate all its uses?

I added comments describing the approach. Generally we don't want to optimize anything if there are other non-FMA consumers of constant BUILD_VECTOR. Because it will be loaded anyway. So, we want to understand which one (inverted or original) can be eliminated. If both can be eliminated, we need to fix a particular vector because we don't have a context to understand which one is original and which one is inverted even among a single instruction operands.

If there is a concern about complexity of this check, it can be replaced with simplier hasOneUse. It limits the scope though.

53996

Maybe use cast?

53999–54000

Maybe use cast?

Any case can fall into here?

This is completely OK to have a constant vector with undef values, e.g <2 x double> <double undef, double undef>. There are some tests with such vectors. So, we can't use cast here. I assumed that during optimization we may obtain such the constant.

54008–54010

I've described approach above. Here we check users of an inverted vector.

pengfei accepted this revision.Mar 31 2023, 10:57 PM

Thanks @e-kud, LGTM.

This revision is now accepted and ready to land.Mar 31 2023, 10:57 PM
e-kud requested review of this revision.Apr 7 2023, 5:33 PM
RKSimon added inline comments.Apr 13 2023, 8:16 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
54178

Couldn't both V and NV have negative values now that they aren't splats?

e-kud marked an inline comment as done.Apr 14 2023, 8:04 PM
e-kud added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
54178

It seems to me that they can't. If V is <x0, x1, ..., xn> then NV is <-x0, -x1, ..., -xn>. So, V and NV always have elements of different sign. The corner case is V consisting only of undefs. But original and negated versions of such vector are the same. So, everything should be handled well.

goldstein.w.n added inline comments.Apr 25 2023, 4:46 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
54173

persistently

54178

But couldn't both V and NV contain some negative and some positive values?

e-kud marked 8 inline comments as done.Apr 26 2023, 7:11 PM
e-kud added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
54178

The idea here is to prefer a vector that has the first negative operand. We don't care about other operands. A negative value can be only in V or NV not in both.

But here is a pitfall that instead of FP value we may meet undef, That's why we iterate over operands until the first non-undef value is met. V and NV have undefs on same places.

So, V and NV may contain both negative and positive values but we consider only the first non-undef operand.

e-kud updated this revision to Diff 521700.May 12 2023, 10:02 AM
e-kud marked an inline comment as done.

Rebase. Spelling.

e-kud marked 2 inline comments as done.May 12 2023, 10:05 AM

@e-kud do you want me to commit if for you? Or you are waiting for another approval?

e-kud added a comment.May 15 2023, 5:11 PM

@e-kud do you want me to commit if for you? Or you are waiting for another approval?

@pengfei yes, I'll appreciate it! I wanted additionally check on SPEC: there are minor binary size reductions and several cases where this combine applies -- eliminated broadcasts as well as using a register operand instead of memory.

craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
54487

Can we use llvm::any_of or !llvm::all_of with a lambda?

54495

Capitalize op

54510

Can we use llvm::any_of or !llvm::all_of with a lambda?

54514

"consistently" is probably a better word than "peristently"

54517

llvm::any_of

e-kud updated this revision to Diff 522410.May 15 2023, 7:23 PM

Addressed the comments.

e-kud marked 5 inline comments as done.May 15 2023, 7:27 PM
e-kud added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
54514

Indeed :)

54517

I don't think that any_of is much better here. We need to stop traversing after the first non-undef value (this is controlled by return value of the lambda) and return true or false (using a variable from context). Then we generally ignore return result of any_of and use only the variable with the result. It looks artificial to me. Probably I'm missing something.

Added an extra comment here.

craig.topper added inline comments.May 15 2023, 7:37 PM
llvm/lib/Target/X86/X86ISelLowering.cpp
54517

Oops. I think I missed the break statement.

This revision was not accepted when it landed; it landed in state Needs Review.May 17 2023, 12:21 AM
This revision was automatically updated to reflect the committed changes.