This fixes lowering of mm*_undefined* intrinsics to use freeze poison instead of zeroinitializer.
(mentioned & discussed in D103874)
Details
- Reviewers
efriedma spatel craig.topper RKSimon
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?
There are tests like test/CodeGen/X86/avx-intrinsics-fast-isel.ll that are supposed to contain the IR the frontend generates. They mostly contain optimized IR, but then run fast-isel in the backend. I don't think all intrinsics are tested this way.
We may want to update the code in X86ISelLowering getAVX2GatherNode and getGatherNode to replace freeze+poison on Src with a zero vector. We already do this when the Src is undef.
- Update llvm's fast-isels tests for undefined intrinsics to compile freeze(poison)
- Update X86ISelLowering's getAVX2GatherNode and getGatherNode to consider freeze(poison) as well
- Update DAGCombiner to fold bitcast(freeze(poison)) -> freeze(poison)
Is this actually better in any meaningful way? InstCombine will turn freeze poison into zeroinitializer, and until then this is just a completely opaque value.
I think to correctly emit IR for intrinsics like mm256_castsi128_si256 (D103874 has more context) efficient handling of this kind of pattern is necessary:
%v = freeze <n x ty> poison %w = shufflevector %a, %v, mask
The zeroinitializer folding is done by InstCombine's visitFreeze, which should be fixed maybe.
I'll play with some patterns and create patches for this.