This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Fixed f16-from-vector promotion problem
ClosedPublic

Authored by tpr on Dec 12 2017, 12:07 PM.

Details

Summary

In the case of an fp_extend of v1f16 to v1f32 where the v1f16 is the
result of a bitcast from i16, avoid creating an illegal fp16_to_fp where
the input is not a vector and the result is a v1f32.

Event Timeline

tpr created this revision.Dec 12 2017, 12:07 PM
arsenm added inline comments.Dec 12 2017, 12:33 PM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
1977 ↗(On Diff #126599)

I don't think here should be encountering a v1* anything. i.e. getTypeAction(v1f16) should not be a TypePromoteFloat?

1979 ↗(On Diff #126599)

Shouldn't this be v1f16?

test/CodeGen/AMDGPU/unpack-half.ll
10

instnamer and check something

21

You can probably replace the intrinsics with a regular load and store

tpr added inline comments.Dec 12 2017, 2:41 PM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
1977 ↗(On Diff #126599)

OK, thanks, I'll have a look at that.

1979 ↗(On Diff #126599)

I believe the idea of fp16_to_fp is that its input is an f16 but as an i16 type. That's what the code seems to be doing. This all seems to be to cope with f16 on hw that does not implement f16, so it does not do any of this on AMDGPU gfx800 and later because that implements hw f16.

test/CodeGen/AMDGPU/unpack-half.ll
10

Do you think I need to check something even though the point of this test is to check that the compiler does not get a fatal error?

What do you mean "instnamer"?

arsenm added inline comments.Dec 12 2017, 3:01 PM
test/CodeGen/AMDGPU/unpack-half.ll
10

Usually tests without checks are frowned upon. For this it should reduce down to maybe one conversion instruction and the store.

Run the pass instnamer on the test. I don't like committing tests with anonymous values since it makes them harder to modify in the future.

tpr added inline comments.Dec 13 2017, 1:29 AM
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp
1977 ↗(On Diff #126599)

getTypeAction(v1f16) is TypeScalarizeVector, but it is still getting here.

tpr updated this revision to Diff 126696.Dec 13 2017, 2:15 AM

Ran instnamer on the test.

tpr updated this revision to Diff 126698.Dec 13 2017, 2:18 AM

Also added a check for the v_cvt_f32_f16 instruction to the test.

tpr marked 4 inline comments as done.Dec 13 2017, 2:21 AM
tpr added inline comments.
test/CodeGen/AMDGPU/unpack-half.ll
21

That made it crash, probably because I didn't get the right address space or something. I think it's easier to leave it with the intrinsics from the original reproducer.

Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25

LLVM ERROR: Do not know how to scalarize this operator's operand!

and sure enough ScalarizeVectorOperand doesn't handle this.

test/CodeGen/AMDGPU/unpack-half.ll
21

Replacing this with
%tmp = load volatile float, float addrspace(1)* undef
and
store volatile i32 %tmp6, i32 addrspace(1)* undef

works for me

tpr marked an inline comment as done.Dec 13 2017, 8:55 AM

Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25

LLVM ERROR: Do not know how to scalarize this operator's operand!

and sure enough ScalarizeVectorOperand doesn't handle this.

Yes; that's the bug I'm fixing. :-)

In D41126#953997, @tpr wrote:

Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25

LLVM ERROR: Do not know how to scalarize this operator's operand!

and sure enough ScalarizeVectorOperand doesn't handle this.

Yes; that's the bug I'm fixing. :-)

But your patch is hacking around it in PromoteFloatRes_BITCAST rather than handling the missing scalarization it's complaining about there

tpr added a comment.Dec 13 2017, 12:46 PM

It can't handle it because it is a fp16_to_fp whose input is v1i16 and result is f32. My theory was that that is illegal, and I needed to fix where that was being generated, which is what I have done.

Do you think my theory is wrong?

tpr added a comment.Dec 13 2017, 12:47 PM

Sorry, I mean the other way round: the fp16_to_fp has input i16 and result v1f32, and my theory was that this is illegal.

In D41126#954337, @tpr wrote:

Sorry, I mean the other way round: the fp16_to_fp has input i16 and result v1f32, and my theory was that this is illegal.

OK, I see. ScalarizeVectorOperand doesn't handle this, but it's already broken at this point because it isn't v1*<->v1*. Maybe bitcast is special, but I still would not expect to see a v1 input here. You aren't just doing a promote here, you are also manually doing the scalarize. The equivalent bitcast promote for integer types does not worry about this, but both should have the same problem. I think something is off, and ScalarizeVecOp_BITCAST should be fixing this

tpr added a comment.Dec 13 2017, 2:29 PM

OK, I'll have another look tomorrow. The reason I thought that was the right place to do it is that it is folding an i16 -> v1i16 bitcast in to the float promotion without thinking about the fact that the bitcast also adds vectorness.

Thanks.

tpr added a comment.Dec 15 2017, 10:12 AM

So are you saying that having a i16 -> v1i16 bitcast there is bad?

Legalizing node: t28: v1i16 = truncate t26
Analyzing result type: v1i16
Scalarize node result 0: t28: v1i16 = truncate t26

Creating new node: t34: i16 = truncate t11
Legalizing node: t30: v1f16 = bitcast t28
Analyzing result type: v1f16
Scalarize node result 0: t30: v1f16 = bitcast t28

Creating new node: t35: f16 = bitcast t28
Legalizing node: t32: v1f32 = fp_extend t30
Analyzing result type: v1f32
Scalarize node result 0: t32: v1f32 = fp_extend t30

Creating new node: t36: f32 = fp_extend t35
Legalizing node: t35: f16 = bitcast t28
Analyzing result type: f16

When it scalarizes the t28 = trunc, it creates a new t34. But then when it scalarizes t30 = bitcast, it creates t35 = bitcast without bothering to scalarize its input t28 to the already existing t34. That's why I've got a i16 -> v1i16 bitcast. Is that bad?

tpr updated this revision to Diff 127244.Dec 16 2017, 7:52 AM

V2: The fix is now to avoid vector scalarization creating a v1->scalar bitcast.

tpr updated this revision to Diff 127247.Dec 16 2017, 8:19 AM

Addressed review comments on test.

tpr updated this revision to Diff 127248.Dec 16 2017, 8:20 AM

Another test tidy-up.

tpr marked an inline comment as done.Dec 16 2017, 8:21 AM
arsenm added inline comments.Dec 18 2017, 9:27 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
172–178

There is already ScalarizeVecOp_BITCAST, so it seems the intent was this is a separate step. My question is more of why isn't that happening already in this example

tpr added inline comments.Dec 19 2017, 9:34 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
172–178

Looking at how a single node is handled in DAGTypeLegalizer::run:

First it looks at the result type. If it needs legalizing, it calls the applicable function (ScalarizeVectorResult in our case), and then skips past the code that checks the operand types. It is assuming that the ScalarizeVecRes_* also scalarized the operand(s) where applicable. That seems to be true for everything except bitcast, presumably because the operand type is less predictable.

So that is why it is not calling ScalarizeVecOp_BITCAST.

Can you think of a better way of fixing this? Maybe special case bitcast so, after creating the new one in ScalarizeVecRes_BITCAST, it somehow gets added back to the worklist so its operand gets scanned? Or shall we stick with the fix I have?

tpr added a comment.Jan 4 2018, 11:23 AM

If there are no further comments, I'll stick with the fix I now have and land it.

arsenm accepted this revision.Jan 9 2018, 7:59 AM

LGTM

This revision is now accepted and ready to land.Jan 9 2018, 7:59 AM
arsenm added inline comments.Jan 9 2018, 8:03 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175

Can you add an assert that the v1 type isn't legal?

This revision was automatically updated to reflect the committed changes.
samparker added inline comments.
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175

Hi,
The added assert is causing issues in our AArch64 tests... why is it necessary?

arsenm added inline comments.Jan 11 2018, 7:27 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175

It probably isn't necessary, but unlike the other operations handled here, it would make sense for bitcast to not separately scalarize its operand if v1 is a legal type. However, v1 as legal is a degenerate case which probably should not be allowed. It looks like AArch64 is using this as a hack for some reason with a FIXME about it.

samparker added inline comments.Jan 11 2018, 7:53 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175

Could you point me to this FIXME please? I'm still confused as to why we should prevent a scalar from being produced just because the vector is legal, is this because of how the legalizer is expected to operate?

arsenm added inline comments.Jan 11 2018, 7:58 AM
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
175

AArch64ISelLowering.cpp:596

if (Subtarget->hasNEON()) {
  // FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to
  // silliness like this:

Bitcast is possibly special because it is a conversion. If the legalizer is expecting to leave some v1 operations in it, then it would probably be required to leave the bitcast in case some operation was expecting to legalize in terms of a bitcast from an 1 vector.