- User Since
- May 10 2016, 6:42 AM (315 w, 3 d)
Wed, May 25
Can we test some illegal type here (e.g. vscale x 16 x i64). The patch includes logic for that so it may be worth having it.
Thu, May 19
Fri, May 13
I added the link but I forgot the Differential Revision: marker. This was fixed in 189ca6958e84
Accidents happen.. Thanks for the prompt review @craig.topper.
I'll post a fix shortly.
Thu, Apr 28
Just to understand the design principle behind these patches: adding FRM as Use to the v*cvt instructions (like we did in D121087 ) would not help us to implement floor and ceil. I imagine one option could be adding specific pseudos for round up and round down and then a later pass could set the rounding mode and restore it later. However, this would not give great code generation and it adds even more instructions to our already large number of pseudos. So your approach goes by adding a new operand with the rounding mode, similar to the scalar operations, and in D123371 you propose a pass that adjusts the FRM register.
Wed, Apr 27
Apr 25 2022
Apr 21 2022
One interesting thing is that computeIncomingVLVTYPE doesn't seem to be fully aligned with what emitVSETVLIs will do. If the latter chooses to skip a vsetvl then the Exit of that basic block might potentially be different to the one that we determined in computeVLVTYPEChanges and computeIncomingVLVTYPE.
Apr 7 2022
Other than the nit above, this LGTM.
This looks good to me, based on what we discussed on D123179 with this change the stack looks like this
I'm ok with this once D123178 lands.
Apr 6 2022
This is a sensible low hanging fruit we want to have until VP_SETCC is taught more tricks.
LGTM. Thanks for the cleanup @craig.topper !
LGTM. Thanks @craig.topper !
Apr 1 2022
Mar 30 2022
I must be missing something very obvious: wouldn't 2.0 round to 2 and then give us i1 0 because its LSB is 0?
Mar 24 2022
I understand we need this for proper modelling of FRM when lowering constrained FP operations
Mar 23 2022
Mar 22 2022
- Improve the test with assertions verified by test_folding.py as suggested
Yes, SNGL is a specific intrinsic (defined in section 16.8 in Fortran 2018 standard), while DBLE is a generic intrinsic (defined in section 16.9). Intrinsic procedure call resolution (Probe in lib/Evaluate/intrinsics.cpp) replaces specific names by the related generic names. So SGNL is replaced by REAL there, but DBLE is not. So your patch in folding makes sense to me.
I was wondering why similar intrinsics SNGL and FLOAT work already but they seem to be lowered earlier to REAL, does this make sense?
Mar 17 2022
LGTM. Thanks for the cleanup @craig.topper
Mar 9 2022
I fear that adding more custom combined nodes would get unwieldy as they don't scale particularly well: we'd probably want corresponding ones for integer madd/macc, for narrowing operations, etc. They may inhibit theoretical optimizations we can perform on generic VP nodes. But the key word is "theoretical" and so that's just a gut reaction. We do have custom nodes for all of the widening operations, so it's feasible.
Hi @arcbbb I may be missing context here
Let's do that, otherwise there is doubt when to use one approach or the other.
Mar 7 2022
Ok, regarding the testing in sink-splat-operands-commute.ll I took the relevant operations in sink-splat-operands.ll and swapped the input so the broadcast operand is the first one. They generate the same input (the only difference is the basic block IDs).
Mar 2 2022
Thanks for the cleanup @khchen !
Feb 24 2022
I understand srcval here is
Feb 16 2022
LGTM. Thanks @khchen!
LGTM. Thanks @khchen!
Feb 14 2022
Feb 9 2022
To clarify I understand the motivation of this: lowering won't have to special case every constant in the AVL operand just in case it is VLMaxSentinel, right?
Feb 7 2022
Post-commit review: LGTM
Feb 2 2022
Ok thanks for the clarifications!
Jan 27 2022
Just to confirm, this is not fixed vector specific, is it?
Jan 26 2022
Looks reasonable to me. Thans @khchen!
Jan 20 2022
Jan 7 2022
Dec 30 2021
Dec 22 2021
Though perhaps we can improve the generation of -0.0 in another patch.
Dec 1 2021
Thanks for the patch @victor-eds!
Nov 18 2021
Nov 11 2021
Being myself far from an expert in floating-point, the logic and generated code seem correct to me.
Oct 27 2021
I'm curious about how we handle _Float16 here.
Oct 22 2021
Looks good to me too. Thanks a lot @craig.topper !
Oct 20 2021
Oct 18 2021
Thanks @dcaballe for the patch. A bit surprising that we didn't notice this earlier but I guess masking does not get used that often.
Oct 12 2021
No objection with this, I guess it is for consistency with GPRs, right?
Oct 1 2021
Sep 30 2021
Sep 22 2021
I think this is reasonable. I wonder if you have a small test that shows we can avoid copies this way. Unless I missed one case, the updates to the tests only show different registers being used (I understand they're small enough and copies are not a problem for them).
Sep 7 2021
Hi @jdoerfert, we (BSC) may be able to work on this but we don't want to step on each one toes. Are there plans to push this forward (by you or someone else)?
Sep 6 2021
Sep 2 2021
Sep 1 2021
LGTM. This will enable using those in VPlan as a later followup of @vkmr patches.
LGTM! Thanks @frasercrmck !
LGTM. Thanks @frasercrmck !
Aug 19 2021
I'm not super familiar with this mechanism. But reading the docs I think this is correct.
Aug 18 2021
Aug 13 2021
Aug 11 2021
Sorry for the delay. LGTM.
Aug 6 2021
Thanks for the prompt review @ABataev! I'll push this shortly.
Aug 2 2021
For the case of pass-thru, does it make sense to give it a different name like vp.overwrite / vp.insert / vp.update / vp.merge (in lack of better names). When there is pass through, this intrinsic looks to me it can be understood as first taking the whole on_false value and then, to build the result, selectively (as defined by the mask below the EVL) replacing elements with the corresponding values from on_true.
Jul 28 2021
I'm curious: why can't we apply a similar approach to loads as well? Don't they compute the EEW and EMUL in a similar way?
Jul 22 2021
It would allow us to remove some code from the vsetvli insertion pass if these instructions weren't special.
OTOH, it is perhaps confusing to provide vl for the intrinsic when the instruction doesn't need it. We'll need to rework isel patterns and may not be able to use ISD::EXTRACT_VECTOR_ELT directly for floating point.
I think we need to weigh the options here. I'm leaning towards adding the VL to the intrinsics and instructions, but I might be persuaded to just keep this patch.