As discussed on https://reviews.llvm.org/D148347, we could handle the vector shuffle mask with splats more efficiently with dup.
Details
Details
Diff Detail
Diff Detail
Event Timeline
Comment Actions
When I was mentioning the testcases in D149638, I was think of cases like test11 in build-vector-two-dup.ll. For values already in vector registers, replacing a tbl with three shuffle instructions probably isn't an improvement (particularly on newer cores where tbl is fast).
Comment Actions
Thanks for kind comment.
From the diff of the test output, I was not sure this transformation is useful even though it does not use constant pool... As you mentioned, in loop, the constant pool load could be hoisted...
Let me close this patch.