It's quite inconvenient when dealing with G_EXTRACT_VEC_ELT of <4xs8>,0 where s8 is not legal in the target - typically one ends up with G_UNMERGE_VALUES + G_LSHR by 8 which expands to three instructions.
Instead allow the destination type to be larger than source scalar type and it's higher bits are any extended. This is similar to DAG's opcode.
Note - I couldn't really find a use in tree - so I added a unit test.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I always thought this behavior in SelectionDAG was error prone. I think I broke this on every combine I ever wrote for extract_vector_elt. Should there be a separate extending opcode instead? Although that would probably make some combines more painful. In GlobalISel we already have an explicit G_BUILD_VECTOR_TRUNC instead of the old ISD::BUILD_VECTOR truncate behavior, so that would match. However I don't think G_BUILD_VECTOR_TRUNC is well developed enough to declare that was definitely a good decision yet.
lib/CodeGen/MachineVerifier.cpp | ||
---|---|---|
1276 | braces | |
1281 | braces |
Personally in an out of tree backend, I find BUILD_VEC_TRUNC to be very elegant when trying to legalize/optimize vector types whose scalar types are illegal in the architecture otherwise needing several unmerges + shifts + ands.
I'm fine with adding a separate opcode (say G_EXTRACT_VECTOR_ELT_ANYEXT - suggestions accepted).
We need to do something similar for INSERT_VEC_ELT which can implicitly truncate.
One question I've been thinking about is whether implicit extension is needed for G_INSERT/G_EXTRACT. I figured extract_vector_elt/insert_vector_elt with a constant could always be implemented as a G_EXTRACT/G_INSERT.
G_EXTRACT and G_INSERT are way more powerful instructions IMO and going from EXTRACT_VEC_ELT to EXTRACT seems to adding complexity (where as I'd prefer the other way if possible as sort of a canonicalization). I'm curious to hear others thoughts on this as well.
I think this should be symmetric with G_BUILD_VECTOR_TRUNC, so either a separate opcode or re-merge BUILD_VECTOR_TRUNC with BUILD_VECTOR
braces