This is an archive of the discontinued LLVM Phabricator instance.

[AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers
ClosedPublic

Authored by craig.topper on Mar 14 2017, 11:59 PM.

Details

Summary

We've had several bugs(PR32256, PR32241) recently that resulted from usages of AH/BH/CH/DH either before or after a copy to/from a mask register.

This ultimately occurs because we create COPY_TO_REGCLASS with VK1 and GR8. Then in CopyToFromAsymmetricReg in X86InstrInfo we find a 32-bit super register for the GR8 to emit the KMOV with. But as these tests are demonstrating, its possible for the GR8 register to be a high register and we end up doing an accidental extra or insert from bits 15:8.

I made several attempts to fix this, but after the last test case that reopened PR32256, I think the best way forward is to stop making copies directly between mask registers and GR8/GR16. Instead I think we should restrict to only copies between mask registers and GR32/GR64 and use EXTRACT_SUBREG/INSERT_SUBREG to handle the conversion from GR32 to GR16/8 or vice versa.

Unfortunately, this complicates fastisel a bit more now to create the subreg extracts where we used to create GR8 copies. We can probably make a helper function to bring down the repitition.

This does result in KMOVD being used for copies when BWI is available because we don't know the original mask register size. This will cause a lot of deltas on tests because we have to split the checks for KMOVD vs KMOVW based on BWI.

This patch is a work in progress that I wanted to get feed back on. I still need to finish regenerating the rest of the ll test files. I also plan to clean up the formatting in the isel patterns.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Mar 14 2017, 11:59 PM

Fix a couple additional fast isel bugs.

craig.topper retitled this revision from [AVX-512] WIP patch for fixing accidental uses of AH/BH/CH/DH after copies to/from mask registers to [AVX-512] Fix accidental uses of AH/BH/CH/DH after copies to/from mask registers.

Update all of the tests.

There are a couple spots in fast isel I'm not handling now. Materializing non-zero constants and loading an i1.

zvi added a subscriber: guyblank.Mar 19 2017, 2:56 AM

hey Craig,

Following the RFC, I'm working on a patch which will make i1 illegal. I plan to upload it for review in the near future.
This should solve several issues regarding K registers, including the one you described, since the i1 will get promoted to i8 just like without AVX-512.
I've checked on my branch and PR32256 is in fact solved there.

That's great news! But this problem isn't unique to VK1. This test fails in 32-bit mode with avx512vl

define void @test_cmp_d_256(<8 x i32> %a0, <8 x i32> %a1, i8* %foo) {

%res0 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 0, i8 -1)
%res1 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 1, i8 -1)
%res2 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 2, i8 -1)
%res3 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 3, i8 -1)
%res4 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 4, i8 -1)
%res5 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 5, i8 -1)
%res6 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 6, i8 -1)
%res7 = call i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32> %a0, <8 x i32> %a1, i32 7, i8 -1)
%vec0 = add i8 %res0, %res1
%vec1 = add i8 %res2, %res3
%vec2 = add i8 %res4, %res5
%vec3 = add i8 %res5, %res6
%vec4 = and i8 %vec0, %vec1
%vec5 = and i8 %vec2, %vec3
%vec6 = add i8 %vec4, %vec5
store i8 %vec6, i8* %foo
ret void

}

declare i8 @llvm.x86.avx512.mask.cmp.d.256(<8 x i32>, <8 x i32>, i32, i8) nounwind readnone

zvi edited edge metadata.Mar 25 2017, 11:59 PM

LGTM. I think it would be best if Elena or Guy took a look too.

lib/Target/X86/X86FastISel.cpp
547 ↗(On Diff #92255)

Consider refactoring this recurring patterns in a follow-up patch.

test/CodeGen/X86/avx512-calling-conv.ll
301 ↗(On Diff #92255)

Don't know the answer to this from the top of my head, but is the combination

movb
kmovd

Inferior to

movb
kmovb

with respect to partial register stall?

test/CodeGen/X86/avx512-intrinsics.ll
92 ↗(On Diff #92255)

Can these unrelated changed be committed separately?

craig.topper added inline comments.Mar 26 2017, 12:08 AM
lib/Target/X86/X86FastISel.cpp
547 ↗(On Diff #92255)

Once Guy makes i1 illegal this probably goes away?

test/CodeGen/X86/avx512-intrinsics.ll
92 ↗(On Diff #92255)

This patch made this appear. Its not there without this patch because we were much more silently converting ax to eax before. Now because we use INSERT_SUBREG this shows up.

delena edited edge metadata.Mar 26 2017, 12:16 AM

I assume it's ok meanwhile.

zvi added inline comments.Mar 26 2017, 4:40 AM
lib/Target/X86/X86FastISel.cpp
547 ↗(On Diff #92255)

True

test/CodeGen/X86/avx512-intrinsics.ll
92 ↗(On Diff #92255)

Ok, thanks.

Assuming that was Elena's approval. Is this ok to go in?

delena accepted this revision.Mar 27 2017, 11:08 PM
This revision is now accepted and ready to land.Mar 27 2017, 11:08 PM
This revision was automatically updated to reflect the committed changes.