This is an archive of the discontinued LLVM Phabricator instance.

[X86] Rewrite to the vXi1 subvector insertion code to not rely on the value of bits that might be undef
ClosedPublic

Authored by craig.topper on Oct 1 2019, 6:11 PM.

Details

Summary

The previous code tried to do a trick where we would extract the subvector from the location we were inserting. Then xor that with the new value. Take the xored value and clear out the bits above the subvector size. Then shift that xored subvector to the insert location. And finally xor that with the original vector. Since the old subvector was used in both xors, this would leave just the new subvector at the inserted location. Since the surrounding bits had been zeroed no other bits of the original vector would be modified.

Unfortunately, if the old subvector came from undef we might aggressively propagate the undef. Then we end up with the XORs not cancelling because they aren't using the same value for the two uses of the old subvector. @bkramer gave me a case that demonstrated this, but we haven't reduced it enough to make it easily readable to see what's happening.

This patch uses a safer, but more costly approach. It isolate the bits above the insertion and bits below the insert point and ORs those together leaving 0 for the insertion location. Then widens the subvector with 0s in the upper bits, shifts it into position with 0s in the lower bits. Then we do another OR.

The test case changes are pretty terrible to read so please check the logic carefully.

Diff Detail

Event Timeline

craig.topper created this revision.Oct 1 2019, 6:11 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2019, 6:11 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
bkramer requested changes to this revision.Oct 2 2019, 1:32 AM

With the suggested changes this passes the test suite from https://github.com/google/jax with AVX512 enabled. I still don't see why the original code would be wrong though.

llvm/lib/Target/X86/X86ISelLowering.cpp
5785

s/Vec/Low/

5789

This should be something like IdxVal + SubVecNumElems

5792

s/Vec/High/

This revision now requires changes to proceed.Oct 2 2019, 1:32 AM

Address review comments.

This revision is now accepted and ready to land.Oct 2 2019, 10:38 AM
This revision was automatically updated to reflect the committed changes.
llvm/lib/Target/X86/X86ISelLowering.cpp