This is an archive of the discontinued LLVM Phabricator instance.

[x86] fix mm*_undefined* intrinsics to use arbitrary frozen bit pattern
Needs ReviewPublic

Authored by aqjune on Jun 23 2021, 9:03 AM.

Details

Summary

This fixes lowering of mm*_undefined* intrinsics to use freeze poison instead of zeroinitializer.
(mentioned & discussed in D103874)

Diff Detail

Event Timeline

aqjune created this revision.Jun 23 2021, 9:03 AM
aqjune requested review of this revision.Jun 23 2021, 9:03 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 23 2021, 9:03 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

There are tests like test/CodeGen/X86/avx-intrinsics-fast-isel.ll that are supposed to contain the IR the frontend generates. They mostly contain optimized IR, but then run fast-isel in the backend. I don't think all intrinsics are tested this way.

We may want to update the code in X86ISelLowering getAVX2GatherNode and getGatherNode to replace freeze+poison on Src with a zero vector. We already do this when the Src is undef.

aqjune updated this revision to Diff 354675.Jun 26 2021, 7:26 AM
  • Update llvm's fast-isels tests for undefined intrinsics to compile freeze(poison)
  • Update X86ISelLowering's getAVX2GatherNode and getGatherNode to consider freeze(poison) as well
  • Update DAGCombiner to fold bitcast(freeze(poison)) -> freeze(poison)
aqjune updated this revision to Diff 354678.Jun 26 2021, 7:31 AM

Minor fixes

aqjune updated this revision to Diff 354682.Jun 26 2021, 7:35 AM
  • Update llvm/test/CodeGen/X86/sse-intrinsics-fast-isel.ll as well

I couldn't find end-to-end tests for checking assembly generation.
To check whether this is working ok, which tests should I write and how would it look like?

There are tests like test/CodeGen/X86/avx-intrinsics-fast-isel.ll that are supposed to contain the IR the frontend generates. They mostly contain optimized IR, but then run fast-isel in the backend. I don't think all intrinsics are tested this way.

Thank you for the info. I updated three *-fast-isel.ll files to check this.

nikic added a subscriber: nikic.Jun 26 2021, 8:32 AM

Is this actually better in any meaningful way? InstCombine will turn freeze poison into zeroinitializer, and until then this is just a completely opaque value.

Is this actually better in any meaningful way? InstCombine will turn freeze poison into zeroinitializer, and until then this is just a completely opaque value.

I think to correctly emit IR for intrinsics like mm256_castsi128_si256 (D103874 has more context) efficient handling of this kind of pattern is necessary:

%v = freeze <n x ty> poison
%w = shufflevector %a, %v, mask

The zeroinitializer folding is done by InstCombine's visitFreeze, which should be fixed maybe.
I'll play with some patterns and create patches for this.