In the case of an fp_extend of v1f16 to v1f32 where the v1f16 is the
result of a bitcast from i16, avoid creating an illegal fp16_to_fp where
the input is not a vector and the result is a v1f32.
Details
Diff Detail
- Build Status
Buildable 13209 Build 13209: arc lint + arc unit
Event Timeline
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp | ||
---|---|---|
1977 ↗ | (On Diff #126599) | I don't think here should be encountering a v1* anything. i.e. getTypeAction(v1f16) should not be a TypePromoteFloat? |
1979 ↗ | (On Diff #126599) | Shouldn't this be v1f16? |
test/CodeGen/AMDGPU/unpack-half.ll | ||
10 | instnamer and check something | |
21 | You can probably replace the intrinsics with a regular load and store |
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp | ||
---|---|---|
1977 ↗ | (On Diff #126599) | OK, thanks, I'll have a look at that. |
1979 ↗ | (On Diff #126599) | I believe the idea of fp16_to_fp is that its input is an f16 but as an i16 type. That's what the code seems to be doing. This all seems to be to cope with f16 on hw that does not implement f16, so it does not do any of this on AMDGPU gfx800 and later because that implements hw f16. |
test/CodeGen/AMDGPU/unpack-half.ll | ||
10 | Do you think I need to check something even though the point of this test is to check that the compiler does not get a fatal error? What do you mean "instnamer"? |
test/CodeGen/AMDGPU/unpack-half.ll | ||
---|---|---|
10 | Usually tests without checks are frowned upon. For this it should reduce down to maybe one conversion instruction and the store. Run the pass instnamer on the test. I don't like committing tests with anonymous values since it makes them harder to modify in the future. |
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp | ||
---|---|---|
1977 ↗ | (On Diff #126599) | getTypeAction(v1f16) is TypeScalarizeVector, but it is still getting here. |
test/CodeGen/AMDGPU/unpack-half.ll | ||
---|---|---|
21 | That made it crash, probably because I didn't get the right address space or something. I think it's easier to leave it with the intrinsics from the original reproducer. |
Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25
LLVM ERROR: Do not know how to scalarize this operator's operand!
and sure enough ScalarizeVectorOperand doesn't handle this.
test/CodeGen/AMDGPU/unpack-half.ll | ||
---|---|---|
21 | Replacing this with works for me |
Are you running with asserts disabled? I see:
ScalarizeVectorOperand Op #0: t34: f32 = fp16_to_fp t25LLVM ERROR: Do not know how to scalarize this operator's operand!
and sure enough ScalarizeVectorOperand doesn't handle this.
Yes; that's the bug I'm fixing. :-)
But your patch is hacking around it in PromoteFloatRes_BITCAST rather than handling the missing scalarization it's complaining about there
It can't handle it because it is a fp16_to_fp whose input is v1i16 and result is f32. My theory was that that is illegal, and I needed to fix where that was being generated, which is what I have done.
Do you think my theory is wrong?
Sorry, I mean the other way round: the fp16_to_fp has input i16 and result v1f32, and my theory was that this is illegal.
OK, I see. ScalarizeVectorOperand doesn't handle this, but it's already broken at this point because it isn't v1*<->v1*. Maybe bitcast is special, but I still would not expect to see a v1 input here. You aren't just doing a promote here, you are also manually doing the scalarize. The equivalent bitcast promote for integer types does not worry about this, but both should have the same problem. I think something is off, and ScalarizeVecOp_BITCAST should be fixing this
OK, I'll have another look tomorrow. The reason I thought that was the right place to do it is that it is folding an i16 -> v1i16 bitcast in to the float promotion without thinking about the fact that the bitcast also adds vectorness.
Thanks.
So are you saying that having a i16 -> v1i16 bitcast there is bad?
Legalizing node: t28: v1i16 = truncate t26
Analyzing result type: v1i16
Scalarize node result 0: t28: v1i16 = truncate t26
Creating new node: t34: i16 = truncate t11
Legalizing node: t30: v1f16 = bitcast t28
Analyzing result type: v1f16
Scalarize node result 0: t30: v1f16 = bitcast t28
Creating new node: t35: f16 = bitcast t28
Legalizing node: t32: v1f32 = fp_extend t30
Analyzing result type: v1f32
Scalarize node result 0: t32: v1f32 = fp_extend t30
Creating new node: t36: f32 = fp_extend t35
Legalizing node: t35: f16 = bitcast t28
Analyzing result type: f16
When it scalarizes the t28 = trunc, it creates a new t34. But then when it scalarizes t30 = bitcast, it creates t35 = bitcast without bothering to scalarize its input t28 to the already existing t34. That's why I've got a i16 -> v1i16 bitcast. Is that bad?
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
172–178 | There is already ScalarizeVecOp_BITCAST, so it seems the intent was this is a separate step. My question is more of why isn't that happening already in this example |
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
172–178 | Looking at how a single node is handled in DAGTypeLegalizer::run: First it looks at the result type. If it needs legalizing, it calls the applicable function (ScalarizeVectorResult in our case), and then skips past the code that checks the operand types. It is assuming that the ScalarizeVecRes_* also scalarized the operand(s) where applicable. That seems to be true for everything except bitcast, presumably because the operand type is less predictable. So that is why it is not calling ScalarizeVecOp_BITCAST. Can you think of a better way of fixing this? Maybe special case bitcast so, after creating the new one in ScalarizeVecRes_BITCAST, it somehow gets added back to the worklist so its operand gets scanned? Or shall we stick with the fix I have? |
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
175 | Can you add an assert that the v1 type isn't legal? |
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
175 | Hi, |
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
175 | It probably isn't necessary, but unlike the other operations handled here, it would make sense for bitcast to not separately scalarize its operand if v1 is a legal type. However, v1 as legal is a degenerate case which probably should not be allowed. It looks like AArch64 is using this as a hack for some reason with a FIXME about it. |
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
175 | Could you point me to this FIXME please? I'm still confused as to why we should prevent a scalar from being produced just because the vector is legal, is this because of how the legalizer is expected to operate? |
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | ||
---|---|---|
175 | AArch64ISelLowering.cpp:596 if (Subtarget->hasNEON()) { // FIXME: v1f64 shouldn't be legal if we can avoid it, because it leads to // silliness like this: Bitcast is possibly special because it is a conversion. If the legalizer is expecting to leave some v1 operations in it, then it would probably be required to leave the bitcast in case some operation was expecting to legalize in terms of a bitcast from an 1 vector. |
Can you add an assert that the v1 type isn't legal?