Details
Diff Detail
Event Timeline
Thanks for working on this Alex - I've a concern with the 128->64 bit folded store.
lib/Target/X86/X86InstrSSE.td | ||
---|---|---|
8523 | While VCVTPS2PHrr does write to an entire xmm (result half floats to lower 64-bits, and clears the upper 64-bits), VCVTPS2PHmr only writes out 64-bits to memory (so with this pattern the upper 64-bits would be undefined / not touched). Please can you update the pattern to use a suitable extract? |
Thanks Alex, there is a 32-bit X86 test failure that needs dealing with but is good otherwise.
test/CodeGen/X86/f16c-intrinsics.ll | ||
---|---|---|
111 | This fails on the X86 tests at the moment, but passes the X86-64 version fine. |
Storing i64 on i386 with SSE2 is broken in general:
https://llvm.org/bugs/show_bug.cgi?id=23049
I'm almost afraid to ask, but why are these intrinsics defined with integer vectors?
def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">, Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty], [IntrNoMem]>;
MIPS has v8f16 defs...so this should be possible for x86 too?
On Fri, Mar 27, 2015 at 5:40 PM, Ahmed Bougacha <ahmed.bougacha@gmail.com> wrote:
On Fri, Mar 27, 2015 at 3:53 PM, Sanjay Patel <spatel@rotateright.com> wrote: > I'm almost afraid to ask, *can see where this is going* > but why are these intrinsics defined with integer vectors? > > def int_x86_vcvtps2ph_128 : GCCBuiltin<"__builtin_ia32_vcvtps2ph">, > Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_i32_ty], > [IntrNoMem]>; > > MIPS has v8f16 defs... Oh god, why? > so this should be possible for x86 too? It should, but would just be asking for trouble, with no benefit whatsoever. With f16/half you're opening the door to any kind of half operation, and you really don't want to do that (given that there's no such thing except for these conversions, pretty much anywhere except GPUs). On most targets, half is softened by the frontend to i16, and this avoids a whole class of problems. If the ps2ph builtin returned <8 x half>, I (or some optimization) could write: %1 = ..ps2ph.. %2 = ..ps2ph.. %l = fadd <8 x half> %1, %2 And this would just crash the compiler. Also, clang emits __fp16 as i16, and it's reasonable to match that in the vector builtins (though I'm not sure there's much interaction). Ideally, we would always use f16/half, and let legalization take care of it. Until then, stick with integers ;)
Thanks, Ahmed! I don't know what MIPS is doing with v8f16, but I just took that as a grep-based existence proof. :)
I see now that X86ISelLowering has:
// TODO: when we have SSE, these could be more efficient, by using movd/movq. if (!X86ScalarSSEf64) { setOperationAction(ISD::BITCAST , MVT::f32 , Expand); ...
I haven't looked into how that part works before, but I'm guessing that's where I need to hack to get i64 stores working for i386 in general...and that'll solve this problem with f16 intrinsics too.
Let the yak shaving begin: D8691 is the first of hopefully not too many steps to fix this problem.
This fixes the 64-bit store on x86-32:
http://reviews.llvm.org/rL235460
So unless there are any other objections, this patch should be good to go.
With the -mtriple fix to the test case LGTM
test/CodeGen/X86/f16c-intrinsics.ll | ||
---|---|---|
3 | Please can you replace -march with a -mtriple equivalent - it prevents the tests failing on windows build/tests |
While VCVTPS2PHrr does write to an entire xmm (result half floats to lower 64-bits, and clears the upper 64-bits), VCVTPS2PHmr only writes out 64-bits to memory (so with this pattern the upper 64-bits would be undefined / not touched). Please can you update the pattern to use a suitable extract?