v2f16 is a special case in NVPTX. v4f16 may be loaded as a pair of v2f16
and that was not previously handled correctly by tryLDGLDU()
Details
Diff Detail
- Build Status
Buildable 16833 Build 16833: arc lint + arc unit
Event Timeline
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | ||
---|---|---|
1244 | I played with this and found that the only f16 vector that can reach here is v2f16, due to the way that the legalizer is set up for the target. If we can somehow make v4f16 reach here, I believe that we will still see an error in routine GetConvertOpcode, similar to what we see for v2f16 before this fix. For this reason, I prefer an assertion, something like this: if (EltVT == MVT::v2f16) { assert(EltVT.getVectorNumElements() == 2 && "missed legalizing vector-of-f16"); } else { NumElts = EltVT.getVectorNumElements(); EltVT = EltVT.getVectorElementType(); } |
lgtm modulo Bixia's comment.
llvm/test/CodeGen/NVPTX/ldg-invariant.ll | ||
---|---|---|
17 | This could use a comma or something somewhere :) |
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | ||
---|---|---|
1244 | Please take a look at the test cases in ldg-invariants.ll in this patch. v8f16 variant also gets here. Legalizing node: t6: v8f16,ch = load<(invariant load 16 from %ir.ptr, addrspace 1)> t0, t23, undef:i64 ... Legalizing node: t24: v2f16,v2f16,v2f16,v2f16,ch = NVPTXISD::LoadV4<(invariant load 16 from %ir.ptr, addrspace 1)> t0, t23, undef:i64, Constant:i64<0> Potentially v4f16 could also get here as a load of a pair of v2f16 elements. If you insist on assertion, it should accept all three cases. |
llvm/lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp | ||
---|---|---|
1244 | What do we do if NumElts % 2 != 0? Can we assert the situation, assuming that the legalizer can guarantee this? |
Made the check for even number of elements an assertion.
Cosmetic typo fix in the tests.
I played with this and found that the only f16 vector that can reach here is v2f16, due to the way that the legalizer is set up for the target. If we can somehow make v4f16 reach here, I believe that we will still see an error in routine GetConvertOpcode, similar to what we see for v2f16 before this fix. For this reason, I prefer an assertion, something like this:
if (EltVT == MVT::v2f16) {
} else {
}