Add 2 vmulxh_lane vector intrinsics that were commented out.
Diff Detail
Event Timeline
| include/clang/Basic/arm_neon.td | ||
|---|---|---|
| 1504 | I found that unfortunately it's not that straightforward. This leads to wrong code generation as it is generating a fmul instead of fmulx. I am suspecting this instruction description should be using OP_SCALAR_MULX_LN, but also the type decls are wrong. Need to dig a bit further here. | |
| include/clang/Basic/arm_neon.td | ||
|---|---|---|
| 1504 | Sorry for confusion as the commented code was never intended to be used and it is a copy of the code for the intrinsic vmulh_lane(). It was done that way in order to point out that vmulh_lane() and vmulxh_lane() intrinsics should be implemented in a similar way. The only useful thing in the commented code is the explanation that we need the scalar intrinsic vmulxh_f16() which was implemented in the scalar intrinsic patch later on. If we look at how vmulh_lane (a, b, lane) is implemented: Similarly, I thought at the time that vmulxh_lane (a, b, lane) can be implemented:  I am not sure now that we can easily use scalar intrinsic while generating the arm_neon.h file. In case we can not do that, I am thinking that the frontend should generate a new builtin for intrinsic vmulxh_lane() that the backend recognizes and generate the right code for it which is fmulx h0, h0, v1.h[lane]. If you made or will be making progress on this, then that is great. Otherwise, I can look at a frontend solution for it. | |
| include/clang/Basic/arm_neon.td | ||
|---|---|---|
| 1504 | Hi Abderrazek, defm FMULX : SIMDThreeSameVectorFP<0,0,0b011,"fmulx", int_aarch64_neon_fmulx>; If you have the bandwidth to pick this up, that would be great; I started looking into the other failing AArch64 vector intrinsics. Cheers, | |
Was not able to update this particular review with the new code, So I created a new one in https://reviews.llvm.org/D44591
I manage to reuse the mulx scalar intrinsic work, not exactly calling the fp16 scalar intrinsic itself which is not available here but the same frontend codegen work with an extract instruction before that.
I found that unfortunately it's not that straightforward. This leads to wrong code generation as it is generating a fmul instead of fmulx. I am suspecting this instruction description should be using OP_SCALAR_MULX_LN, but also the type decls are wrong. Need to dig a bit further here.