Fri, Jun 18
Thanks to @miyuki for repeating the previous benchmark with this version of the patch (and on the same machine as before, which was better than I could have done).
Thu, Jun 17
@miyuki is working on other things at the moment, and I've picked this up.
Tue, Jun 8
Mon, Jun 7
Looks like an obviously good thing, and I only have one nitpick.
May 12 2021
May 11 2021
For example, that's how I tested the refactoring in D72690.
For this kind of pure refactoring on a .td file, my usual approach to testing it is to run the file through Tablegen without any output option (i.e. just in the default -print-records mode), before and after the change. In this case, I'd think, you ought to be able to expect bit-for-bit identical output – and if you get it, then you can be pretty confident that all the directly useful output modes won't have changed either.
Apr 26 2021
That's a particularly striking improvement for the vectors of u8 :-)
Apr 21 2021
Added a unit test on the LLVM side.
Yes, it looks easy enough to add something in llvm/unittests/ADT/TripleTest.cpp to directly test getARMCPUForArch.
Mar 29 2021
Mar 4 2021
Very nice – it hadn't even occurred to me that you might be able to find the branch-target operand automatically as well as identifying Arm/Thumb.
Mar 3 2021
LGTM, but to show willing I've spotted a typo in a comment :-)
Mar 2 2021
Feb 18 2021
Looks nicer now, thanks!
Feb 17 2021
Feb 16 2021
Oh yes, I must have seen if Rn == '1101' && W == '1' and absentmindedly interpreted the W as a write to memory, not a write back to the register.
Feb 15 2021
I was lucky to notice it at all – I only spotted it because one of the test changes in the previous version of the patch did lift an increment of SP past two load instructions. As best I could tell it wasn't harmful in that particular case, but that clued me in to look more closely and see whether it could have been harmful in another case.
Feb 12 2021
What happens about a case like the following:
mov r0, sp ldrd r1, r2, [sp] ldr r3, [r0] add sp, sp, #8
In this case, the LDR of r3 is not directly reading or modifying SP, but it is nonetheless dependent on SP not having been incremented yet, because it's accessing the stack slot that becomes invalid as soon as SP is increased. I don't see any check for that situation here.
Feb 11 2021
The table-lookup strategy seems like overkill to me. As far as I can see, the integer mapping required is: 0→3, 3→2. 2→1, 1→0. In other words, all four input values (that can be handled at all) are just reduced by 1, mod 4. So instead of (147 >> (value << 1)) & 3, you could compute (value - 1) & 3, surely more cheaply.
Ahem. Now actually tested, and with the necessary extra enum definition.
Feb 8 2021
You've got a typo in the subject line :-) v2f64, not v2f62.
Feb 5 2021
Feb 4 2021
Jan 28 2021
Jan 27 2021
Jan 20 2021
Thanks. LGTM now.
Jan 19 2021
Dec 8 2020
That seems like a reasonable precaution to me, yes – with only one VPR, there's only one previous value that it might be of obviously low cost to VPNOT.
Dec 2 2020
Should there be a range limit somewhere in this logic, beyond just 'in the same basic block'? I worry slightly that if there's a very long basic block with two completely separate predicated sections, and they happen to reuse the same constant, then it might not be a win to do this change. Something along the lines of
Dec 1 2020
Nov 30 2020
Nov 27 2020
Nov 20 2020
Nov 19 2020
Oct 16 2020
Thanks for the quick review – sorry I wasn't on the ball enough to fold D89547 into this one too...
Oct 14 2020
Thanks for the quick response!
Hi @lebedev.ri , it looks as if this commit is causing an assertion failure compiling the following example code.
void bar(void), baz(unsigned);
Sep 3 2020
Aug 7 2020
Aug 6 2020
Aug 3 2020
Language-wise I think https://wg21.link/p1467 is where C++ is going, and C is taking a similar approach.
Jul 31 2020
You mean: only aarch64 backend supports lowering bfloat16 vectors at the moment?
Is that true of all vector bfloat implementations? It seems like arithmetic on these types is something implementations would likely support.
Jul 20 2020
Jul 14 2020
Jul 3 2020
@jdoerfert , @clementval : over in D83032 is a polished-up version of the script I used to check where the missing deps needed to go. Might be useful for the next problem of this kind. But I'm not sure who to get to review it; perhaps one of you might look at it?
Jul 2 2020
Jul 1 2020
LGTM. So later the DEPENDS omp_gen that are in clang subdirectories could be removed right?
[facepalm] Thank you. I carefully wrote a revised description, but forgot to upload it to this issue.
Here's a completely different patch, which adds all the missing dependencies on OMP.h.inc in the clang subdirectory in one go.
Jun 30 2020
Cannot you just add a depends for your failing case?
Unfortunately, I had to revert this because it caused a buildbot failure: rG39ea5d74b283d5a42f34b856d22bfaf806a1c907
Jun 29 2020
@clementval , are you happy for me to commit this patch?
Jun 26 2020
I wondered about that. I think it may well mean some of those DEPENDS can be removed, but I'm not sure how to be certain of it :-)