This is an archive of the discontinued LLVM Phabricator instance.

Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221)
ClosedPublic

Authored by spatel on Jan 8 2015, 2:16 PM.

Details

Summary

This patch attempts to fix the following miscompile:

define void @sqrtsd(<2 x double> %a) nounwind uwtable ssp {
  %0 = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind 
  %a0 = extractelement <2 x double> %0, i32 0
  %conv = fptrunc double %a0 to float
  %a1 = extractelement <2 x double> %0, i32 1
  %conv3 = fptrunc double %a1 to float
  tail call void @callee2(float %conv, float %conv3) nounwind
  ret void
}

Current codegen:

sqrtsd	%xmm0, %xmm1        ## high element of %xmm1 is undef here
xorps	%xmm0, %xmm0
cvtsd2ss	%xmm1, %xmm0
shufpd	$1, %xmm1, %xmm1
cvtsd2ss	%xmm1, %xmm1 ## operating on undef value
jmp	_callee

This is a continuation of http://llvm.org/viewvc/llvm-project?view=revision&revision=224624 ( http://reviews.llvm.org/D6330 ) which was itself a continuation of r167064 ( http://llvm.org/viewvc/llvm-project?view=revision&revision=167064 ).

All of these patches are partial fixes for PR14221 ( http://llvm.org/bugs/show_bug.cgi?id=14221 ); this should be the final patch needed to resolve that bug.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 17911.Jan 8 2015, 2:16 PM
spatel retitled this revision from to Model sqrtsd as a binary operation with one source operand tied to the destination (PR14221).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: delena, hliao, nadav.
spatel added a subscriber: Unknown Object (MLST).
mkuper added a subscriber: mkuper.Jan 11 2015, 6:44 AM

Sorry I'm looking at this so late - missed the sqrtss version of it.

The problem I see with this is that it impacts correct code that doesn't want a destructive update, and can use the flexibility we currently have.
E.g, for this:

declare <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double>)

define void @sqrtsd(<2 x double> %a, double* %p0, double* %p1) nounwind uwtable ssp {
  %res = tail call <2 x double> @llvm.x86.sse2.sqrt.sd(<2 x double> %a) nounwind 
  %new = extractelement <2 x double> %res, i32 0
  store double %new, double* %p0
  %orig = extractelement <2 x double> %a, i32 0
  store double %orig, double* %p1
  ret void
}

We now get:

movapd  (%rcx), %xmm0
sqrtsd  %xmm0, %xmm1
movlpd  %xmm1, (%rdx)
movlpd  %xmm0, (%r8)

And with the patch, we'll get:

movapd  (%rcx), %xmm0
movapd  %xmm0, %xmm1
sqrtsd  %xmm1, %xmm1
movlpd  %xmm1, (%rdx)
movlpd  %xmm0, (%r8)

Do you think we can fix the correctness issue without introducing extra copies?

Do you think we can fix the correctness issue without introducing extra copies?

Hi Michael -
Thanks for looking at this patch. I agree that we'll produce suboptimal code for your test case, and I think this will match the existing suboptimal codegen for the equivalent test cases using sqrtss, rsqrtss, and rcpss. I'm not sure how to solve this; it seems like it will require writing custom lowering code rather than tablegen patterns? A few points to consider before we tackle that:

  1. Correctness is more important than the minor perf problem.
  1. The perf problem is limited only to legacy SSE codegen; with AVX, we don't have to worry about the implicit source op.
  1. There's some weirdness in particular with the sqrtsd case: Intel/AMD defined the C intrinsic with 2 input operands. Why is the LLVM intrinsic defined with only 1 input? If we fix that definition, I think we can solve the problem for sqrtsd (but not sqrtss, rsqrtss, rcpss).

My preference is to commit this patch to preserve correctness (finally close PR14221) and then file the perf bug separately...unless you see a quick fix that we can implement to solve all of the problems in one patch?

mkuper edited edge metadata.Jan 12 2015, 4:29 AM

I agree with you, of course, correctness before performance.
I'm ok with this going in and opening a bug, but I'd prefer if someone who actually has experience with the patterns give the LGTM.

mkuper edited edge metadata.Jan 12 2015, 4:29 AM
mkuper added a subscriber: RKSimon.

Thanks, Michael. I filed the perf bug for the existinging SSE1 defs here:
http://llvm.org/bugs/show_bug.cgi?id=22206

This revision was automatically updated to reflect the committed changes.