This is an archive of the discontinued LLVM Phabricator instance.

[X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.
ClosedPublic

Authored by pengfei on Feb 7 2021, 6:58 PM.

Details

Summary

Intrinsics *reduce_add/mul_ps/pd have assumption that the elements in
the vector are reassociable. So we need to always assign the reassoc
flag when we call _mm_reduce_* intrinsics.

Diff Detail

Event Timeline

pengfei requested review of this revision.Feb 7 2021, 6:58 PM
pengfei created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 7 2021, 6:58 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
spatel added inline comments.Feb 8 2021, 8:28 AM
clang/lib/CodeGen/CGBuiltin.cpp
13829

I haven't looked at this part of the compiler in a long time, so I was wondering how we handle FMF scope. It looks like there is already an FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting FMF here will not affect anything but this CreateCall().

Does that match your understanding? Should we have an extra regression test to make sure that does not change?

I am imagining something like:

double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) {
  double S = _mm512_reduce_add_pd(__W) + ExtraAddOp;
  return S;
}

Then we could confirm that reassoc is not applied to the fadd that follows the reduction call.

13829

Currently (and we could say that this is an LLVM codegen bug), we will not generate the optimal/expected reduction with reassoc alone.

I think the x86 reduction definition is implicitly assuming that -0.0 is not meaningful here, so we should add nsz too.

The backend is expecting an explicit nsz on this op. Ie, I see this x86 asm currently with only reassoc:

	vextractf64x4	$1, %zmm0, %ymm1
	vaddpd	%zmm1, %zmm0, %zmm0
	vextractf128	$1, %ymm0, %xmm1
	vaddpd	%xmm1, %xmm0, %xmm0
	vpermilpd	$1, %xmm0, %xmm1      
	vaddsd	%xmm1, %xmm0, %xmm0 
	vxorpd	%xmm1, %xmm1, %xmm1   <--- create 0.0
	vaddsd	%xmm1, %xmm0, %xmm0   <--- add it to the reduction result

Alternatively (and I'm not sure where it is specified), we could replace the default 0.0 argument with -0.0?

clang/lib/Headers/avx512fintrin.h
9300

This is an existing text bug, but if we are changing this text, we might as well fix it in this patch - I'm not sure what "off" refers to here. Should that be "order"?

9303

Typo: "floating-point types"

9304

Also mention that sign of zero is indeterminate. We might use the LangRef text as a model for what to say here:
https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic

spatel added inline comments.Feb 8 2021, 8:51 AM
clang/lib/Headers/avx512fintrin.h
9355

Ah - this is where the +0.0 is specified. This should be -0.0. We could still add 'nsz' flag to be safe.

9365

This also should be changed to -0.0?

pengfei updated this revision to Diff 322344.Feb 9 2021, 4:34 AM
pengfei marked 7 inline comments as done.

Address Sanjay's comments. Thanks for the thoroughly review!

clang/lib/CodeGen/CGBuiltin.cpp
13829

Confirmed by new tests.

13829

I think there's no such assumption for fadd/fmul instructions. We do have it for fmin/fmax. So I think we don't need to add nsz here.

clang/lib/Headers/avx512fintrin.h
9304

Got it. Thanks!

9355

-0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we can find a corner case.

spatel accepted this revision.Feb 9 2021, 4:52 AM

LGTM

This revision is now accepted and ready to land.Feb 9 2021, 4:52 AM
This revision was landed with ongoing or failed builds.Feb 9 2021, 5:14 AM
This revision was automatically updated to reflect the committed changes.