Hi Luke, thanks for sharing your thoughts. I agree with your analysis. The in-tree vector extension that I am aware of that supports first faulting loads is Arm's SVE. While I work on Arm's MVE, I hope and think this is useful for SVE (and other targets) too, i.e. I think ffirst mask capability can be used. But since the devil is in the details here, an implementation would need to prove this. Hopefully that happens soon.
Sep 9 2020
Aug 28 2020
the description in LLVMRef.rst reminds me very much of "data-dependent fail on first". in particular, that on "overflow" the m[i] values are set to False.
Feb 14 2020
Perhaps this would be cleared up if I had a better understanding of what you were saying.
ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".
the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.
I don't know where you have gotten this idea, it has never been true for as long as I can recall. While RVV implementations have some freedom in how they set VL, there are also lots of rules governing their behavior. Most relevantly, since October 2018 (spec version 0.5-draft), programs requesting something less than or equal to the maximum VL will get exactly that number as VL, no something smaller. And even before that change, there were long-standing significant restrictions on how VL is determined beyond what you claim (see the linked commit).
Feb 12 2020
MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.
I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.
i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.
Would it work for you if we leave the definition of MVL for scalable types to the targets?
Feb 6 2020
TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?
In theory, yes. In practice, it will depend on how optimizations make use of that information. Your explanation of how the ExpandVectorPredicationPass will make this palatable to the backend worries me a little, because it essentially means that optimizations don't have to care that the target doesn't support this feature. They can generate IR that uses it and EVPP will smooth over it. Obviously, we could handle this on a case-by-case basis as it comes up. As you say, TTI will provide sufficient information for passes to make the decision.
Feb 6 2019
remember that both RVV and SV support the exact same vscale (variable-length multiples of elements) concept.
We've already agreed that the unit of the vlen parameter is just the element type (that is the sub-vector element type in the vector-of-subvector interpretation of scalable types).
Feb 5 2019
Certainly, that's why I say it's secondary and mostly a hint that something is amiss with "the current thinking". In fact, I am by now inclined to propose that Jacob and collaborators start out by expressing their architecture's operations with target-specific intrinsics that also use the attributes introduced here (especially since none of the typical vectorizers are equipped to generate the sort of code they want from scalar code using e.g. float4 types).