This patch implements two builtins specified in D111529.
The last __builtin_reduce_add will be seperated into another one.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
LGTM, thanks!
The last __builtin_reduce_add will be seperated into another one.
Are you planning on putting up a patch for this one as well? What makes add a bit different is that ‘llvm.vector.reduce.fadd.*’ can only perform reductions either in the original order or in an unspecified order. For the extension, we need a particular evaluation order (reduction tree adding adjacent element pairs). Technically this order is required for all reduction builtins, but for integers the order doesn't matter, same for min/max.
| clang/lib/Sema/SemaChecking.cpp | ||
|---|---|---|
| 2238–2241 | nit: Those .... vectors .... | |
Sorry about the late response. Yeah, I'm trying to work on this builtin too, but actually, I don't know if I can do this as all my previous work is like kind of boilerplate or something? I have read the whole discussion in the mailing list and the related LLVM IR reference, but I still get confused a little bit.
So the difference of this builtin is because LLVM intrinsic declare it like:
declare float @llvm.vector.reduce.fadd.v4f32(float %start_value, <4 x float> %a) declare double @llvm.vector.reduce.fadd.v2f64(double %start_value, <2 x double> %a)
And it performs sequential reduction which is not what we want right? We need it to reduce like:
[e3, e2, e1, e0] => (e3, e2) + (e1, e0)
Does it mean we should do something like a for loop? or like recursive calls? or like changing the order of the elements in the vector?
And another thing that confuses me is that pad identity elements after the last element to widen the vector out to a power 2 . According to the IR reference, is the neutral value just zero?
The last confusing point is %start_value, we just simply consider it is 0, isn't it?
I would appreciate it if you can give me any hints, which I think is very helpful to my LLVM learning :-)
@junaire did you already get commit access or should I commit this change on your behalf?
One way to go about it would be to extend the @llvm.vector.reduce.fadd to take another integer or boolean argument indicating the order to apply.
Targets that support such horizontal add instructions, like AArch64, can then lower the intrinsic call directly to the right instructions. Otherwise we can generate the right instruction sequence for the reduction. We know the number of vector elements, so there should be no for a loop or recursion, we can just generate instructions for the full tree (extra the lanes using shuffle vector & add them).
And another thing that confuses me is that pad identity elements after the last element to widen the vector out to a power 2 . According to the IR reference, is the neutral value just zero?
The last confusing point is %start_value, we just simply consider it is 0, isn't it?
For fadd reductions it should be -0.0 I think.
I would appreciate it if you can give me any hints, which I think is very helpful to my LLVM learning :-)
@junaire did you already get commit access or should I commit this change on your behalf?
Yeah, I already have commit access, just waiting for your approval ;D
nit: Those .... vectors ....