Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54037 | Why just limit to splat constant? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54037 | Also - why the AVX2 and hasOneUse limits? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54037 | Sorry - the hasOneUse is probably necessary to stop ping-ponging between negated constants |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54037 |
That should be commented explicitly as these subtle checks to prevent infinite loops can easily be removed accidentally. Also why AVX2 still? | |
54037 |
| |
54037 |
Should be match on ImmConstant no? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54037 |
This is to limit to broadcasts, probably it is better to work with broadcast explicitly. I tried more general approach but it seems not much profitable. E.g. @@ -1,7 +1,7 @@ -; FMA3-LABEL: negated_constant_v4f64: ; FMA3: # %bb.0: ; FMA3-NEXT: vmovapd {{.*#+}} ymm2 = [-5.0E-1,-5.0E-1,-5.0E-1,-5.0E-1] -; FMA3-NEXT: vfmadd213pd {{.*#+}} ymm2 = (ymm0 * ymm2) + ymm1 -; FMA3-NEXT: vfmadd231pd {{.*#+}} ymm1 = (ymm0 * mem) + ymm1 -; FMA3-NEXT: vaddpd %ymm1, %ymm2, %ymm0 +; FMA3-NEXT: vmovapd %ymm2, %ymm3 +; FMA3-NEXT: vfmadd213pd {{.*#+}} ymm3 = (ymm0 * ymm3) + ymm1 +; FMA3-NEXT: vfnmadd213pd {{.*#+}} ymm2 = -(ymm0 * ymm2) + ymm1 +; FMA3-NEXT: vaddpd %ymm2, %ymm3, %ymm0 ; FMA3-NEXT: retq What do you think? | |
54037 |
AFAIK, m_* matching works with IR. Here we work with DAG. Am I missing something? | |
54037 |
Yes. The problem appears when each constant initially has several users, then we start ping-ponging between them. To solve it thoroughly we need to check whether a constant can be eliminated completely: take two constant vectors and check that all users are fmas. It seems out of combiner context to me, doesn't it? However, I see some code that checks instruction's users during combining, maybe, it is the right way. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
53984–53985 | Do we need to check it again given it's the only condition checked in line 54073? | |
53986 | This doesn't explain why we need to iterate all its uses? | |
53988 | == is intended? Shouldn't it be covered by User->getOpcode() != ISD::FMA? | |
53996 | Maybe use cast? | |
53999–54000 | Any case can fall into here? | |
54008–54010 | Is the comment for line 54015? Why checking the use again? | |
54012 | ditto. | |
llvm/test/CodeGen/X86/fma-fneg-combine-2.ll | ||
132 | We can make the elements different now, right? |
Addressed comments.
Added comments.
Fixed negative value search in case of undefs.
Updated tests.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
53984–53985 |
Dropped it. | |
53986 |
I added comments describing the approach. Generally we don't want to optimize anything if there are other non-FMA consumers of constant BUILD_VECTOR. Because it will be loaded anyway. So, we want to understand which one (inverted or original) can be eliminated. If both can be eliminated, we need to fix a particular vector because we don't have a context to understand which one is original and which one is inverted even among a single instruction operands. If there is a concern about complexity of this check, it can be replaced with simplier hasOneUse. It limits the scope though. | |
53996 |
| |
53999–54000 |
This is completely OK to have a constant vector with undef values, e.g <2 x double> <double undef, double undef>. There are some tests with such vectors. So, we can't use cast here. I assumed that during optimization we may obtain such the constant. | |
54008–54010 | I've described approach above. Here we check users of an inverted vector. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54178 | Couldn't both V and NV have negative values now that they aren't splats? |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54178 | It seems to me that they can't. If V is <x0, x1, ..., xn> then NV is <-x0, -x1, ..., -xn>. So, V and NV always have elements of different sign. The corner case is V consisting only of undefs. But original and negated versions of such vector are the same. So, everything should be handled well. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54178 | The idea here is to prefer a vector that has the first negative operand. We don't care about other operands. A negative value can be only in V or NV not in both. But here is a pitfall that instead of FP value we may meet undef, That's why we iterate over operands until the first non-undef value is met. V and NV have undefs on same places. So, V and NV may contain both negative and positive values but we consider only the first non-undef operand. |
@pengfei yes, I'll appreciate it! I wanted additionally check on SPEC: there are minor binary size reductions and several cases where this combine applies -- eliminated broadcasts as well as using a register operand instead of memory.
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54514 | Indeed :) | |
54517 | I don't think that any_of is much better here. We need to stop traversing after the first non-undef value (this is controlled by return value of the lambda) and return true or false (using a variable from context). Then we generally ignore return result of any_of and use only the variable with the result. It looks artificial to me. Probably I'm missing something. Added an extra comment here. |
llvm/lib/Target/X86/X86ISelLowering.cpp | ||
---|---|---|
54517 | Oops. I think I missed the break statement. |
Do we need to check it again given it's the only condition checked in line 54073?