This is an archive of the discontinued LLVM Phabricator instance.

Expand constrained FP operations
ClosedPublic

Authored by cameron.mcinally on May 29 2018, 11:50 AM.

Details

Diff Detail

Event Timeline

Corrected a small whitespace problem.

Updated this patch to handle a constrained vector POWI. The arguments to that function do not have a uniform type.

I have not included a test case for POWI since the intrinsic declaration also needs to be updated to handle vectors. That is a separate issue and probably deserves its own Diff.

Updated this patch to handle a constrained vector POWI. The arguments to that function do not have a uniform type.

I have not included a test case for POWI since the intrinsic declaration also needs to be updated to handle vectors. That is a separate issue and probably deserves its own Diff.

The LLVM LangRef states that the POWI intrinsic always takes a scalar i32 as the 2nd argument, even when the 1st argument is a vector. That seems wrong, but I'll save the llvm-dev discussion for another time.

I'll have to special case the POWI expansion in light of this...

Apologies for the churn. I've left out POWI for now since it's such an oddball. Will handle it in another Diff.

I'll let this Diff sit for review now...

I'm adding Craig Topper as a reviewer because he knows the vector selection DAG stuff better than I do. The constrained FP handling looks OK to me.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
2

I'm guessing you copied these run lines from somewhere else. It doesn't look like you need multiple runs or the extra check prefixes.

6

I think this check needs to do more than this. You should be verifying the complete expansion.

Also, can you add checks for some other instructions. Some of these won't require expansion, right?

craig.topper added inline comments.Jun 7 2018, 12:16 PM
lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
1134

This makes an ArrayRef to a temporary std::initializer_list. This is very unsafe.

You just want

EVT ValuesVTs[] = {EltVT, MVT::Other};
1158

Add SDValue in front of this and remove the earlier declaration.

1164

Add SDValue in front of these and remvoe the earlier declaration.

cameron.mcinally marked 4 inline comments as done.

Make updates as needed from the code review.

Thanks, guys.

I'd also like to point out that this Diff will slightly clobber Ulrich's D45576. But, I suspect that it will be a small change to accommodate the difference.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
6

I think this check needs to do more than this. You should be verifying the complete expansion.

Thanks. I've added a check for the upper POW.

Also, can you add checks for some other instructions. Some of these won't require expansion, right?

POW is the only strict operation that I've found that currently needs expansion on X86. That said, I have a full set of tests that cover each strict operation. I could add them if you'd like, but it's a bit superfluous for this change. I do intend to add the full set once other patches are sent upstream. Thought?

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
6

I'd really like to see checks of the entire sequence to which this gets expanded. That seems like the only way to be certain it was expanded correctly. In some of the other tests I took a shortcut similar to what you did here because there was a 1-to-1 correspondence between the intrinsic and the instruction to which it was lowered. Even in that case it was probably not an ideal test.

Regarding instructions that aren't supposed to be expanded, what I would like to see here is a test that uses a vector form of the constrained intrinsic and a check that verifies that we generated the corresponding vector instruction.

Non-string RINT and NEARBYINT should also be Expand on pre-SSE4.1 targets. So that would be mean we should expand the strict versions too.

Err that should have said non-strict not non-string

Non-string RINT and NEARBYINT should also be Expand on pre-SSE4.1 targets. So that would be mean we should expand the strict versions too.

Ah, you're right. I do not have tests for those locally. I'll add a few.

test/CodeGen/X86/vector-constrained-fp-intrinsics.ll
6

Andrew, just to be clear, you'd like to check for more than the Op+MOVH? There's not a lot to the expansion with these. The only thing I left out of the checks is the loads to feed the POW. The rest is epilogue code.

I suspect you missed the change I made to check for the MOVH, but maybe I'm mistaken?

Or maybe you're suggesting a more complicated vector operation? One where we have to generate two scalar operations and then shuffle them back together? I could see some value in that.

%bb.0: # %entry
pushq %rax
.cfi_def_cfa_offset 16
movsd .LCPI0_0(%rip), %xmm0 # xmm0 = mem[0],zero
movsd .LCPI0_1(%rip), %xmm1 # xmm1 = mem[0],zero
callq pow
movlhps %xmm0, %xmm0 # xmm0 = xmm0[0,0]
popq %rax
.cfi_def_cfa_offset 8
retq

Non-string RINT and NEARBYINT should also be Expand on pre-SSE4.1 targets. So that would be mean we should expand the strict versions too.

Ah, you're right. I do not have tests for those locally. I'll add a few.

Looking closer... the math lib calls are also expanded.

New patch for conversation...

  • Added more vector tests: -> Arith ops remain vector -> Math lib ops are expanded -> Rounding ops are expanded

TODO: Work on the CHECKs to make sure everything is covered.

Are you aware of the update_llc_test_checks.py script that will autogenerate the CHECKs for you?

Update diff after running update_llc_test_checks.py.

Are you aware of the update_llc_test_checks.py script that will autogenerate the CHECKs for you?

Oh, nice. Thanks!

I don't get a lot of opportunities to upstream changes, so it's safe to assume my LLVM knowledge is at least 5 years old. ;)

It feels like we should be testing non-splat vectors here so we get two calls to the library functions in the output. Rather than one call and a reuse of the result.

It feels like we should be testing non-splat vectors here so we get two calls to the library functions in the output. Rather than one call and a reuse of the result.

Ok, that seems reasonable...

Would you be okay with somewhat random input to the upper operation? It should be okay functionally, since only the first trap is reported. I ask since it can be tricky to find inputs that can trap for each of the different operations. There are finicky details that are unique to each operation. Seems unnecessarily cumbersome IMO. Thoughts?

Update tests to remove splat inputs.

Notice that the input values are not checked, so there's a modicum of false security there. The low/high parts of the result could be incorrectly swapped and the test would not catch it. Other than that, the test is more robust since two operations are generated for expanded operations.

craig.topper accepted this revision.Jun 12 2018, 2:59 PM

LGTM. And I spoke to Andrew at work and he says it looks good to him to.

This revision is now accepted and ready to land.Jun 12 2018, 2:59 PM
This revision was automatically updated to reflect the committed changes.