This is an archive of the discontinued LLVM Phabricator instance.

[X86] Remove (V)MOV64toSDrr/m and (V)MOVDI2SSrr/m. Use 128-bit result MOVD/MOVQ and COPY_TO_REGCLASS instead
ClosedPublic

Authored by craig.topper on Apr 27 2019, 3:02 AM.

Details

Summary

The register form of these instructions are CodeGenOnly instructions that cover
GR32->FR32 and GR64->FR64 bitcasts. There is a similar set of instructions for
the opposite bitcast. Due to the patterns using bitcasts these instructions get
marked as "bitcast" machine instructions as well. The peephole pass is able to
look through these as well as other copies to try to avoid register bank copies.

Because FR32/FR64/VR128 are all coalescable to each other we can end up in a
situation where a GR32->FR32->VR128->FR64->GR64 sequence can be reduced to
GR32->GR64 which the copyPhysReg code can't handle.

To prevent this, this patch removes one set of the 'bitcast' instructions. So
now we can only go GR32->VR128->FR32 or GR64->VR128->FR64. The instruction that
converts from GR32/GR64->VR128 has no special significance to the peephole pass
and won't be looked through.

I guess the other option would be to add support to copyPhysReg to just promote
the GR32->GR64 to a GR64->GR64 copy. The upper bits were basically undefined
anyway. But removing the CodeGenOnly instruction in favor of one that won't be
optimized seemed safer.

I deleted the peephole test because it couldn't be made to work with the bitcast
instructions removed.

The load version of the instructions were unnecessary as the pattern that selects
them contains a bitcasted load which should never happen.

Fixes PR41619.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Apr 27 2019, 3:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2019, 3:02 AM
Herald added a subscriber: hiraditya. · View Herald Transcript
craig.topper added a comment.EditedApr 27 2019, 3:50 AM

Hmm.. this fixes the original problem, but I can create a similar problem using mask registers using this test compiled for avx512bw. Maybe the peephole pass needs to be limited?

define i32 @foo(double %blah) {
  %z = bitcast double %blah to i64
  %y = trunc i64 %z to i32
  %a = bitcast i32 %y to <32 x i1>
  %b = shufflevector <32 x i1> %a, <32 x i1> undef, <64 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13, i32 14, i32 15, i32 16, i32 17, i32 18, i32 19, i32 20, i32 21, i32 22, i32 23, i32 24, i32 25, i32 26, i32 27, i32 28, i32 29, i32 30, i32 31, i32 32, i32 33, i32 34, i32 35, i32 36, i32 37, i32 38, i32 39, i32 40, i32 41, i32 42, i32 43, i32 44, i32 45, i32 46, i32 47, i32 48, i32 49, i32 50, i32 51, i32 52, i32 53, i32 54, i32 55, i32 56, i32 57, i32 58, i32 59, i32 60, i32 61, i32 62, i32 63>
  %c = bitcast <64 x i1> %b to i64
  %d = trunc i64 %c to i32
  ret i32 %d
}

I was half way through doing my own analysis when I saw this!

RKSimon accepted this revision.Apr 27 2019, 7:18 AM

I'm happy with this to go in now, but if you'd prefer to get to the bottom of the mask register issue that's fine

This revision is now accepted and ready to land.Apr 27 2019, 7:18 AM
This revision was automatically updated to reflect the committed changes.

I've committed this as is for now. I'll keep looking at the mask issue.

llvm/trunk/test/CodeGen/X86/fast-isel-fneg.ll