- User Since
- May 24 2016, 8:35 AM (220 w, 3 d)
@SjoerdMeijer I ran into a potential use for this patch in the reduction code I was working on. I think this looks good in general I just had a question about the fallback stratergy.
Now just the Vectorizer part and an option to test it with, plus a new test.
OK, Thanks. I can see how that one would be wrong. Thanks for the example! We'll see what we can do.
How confident are we that the repo.cc file is the cause of the bug? As opposed to another object file? Like I said, this change unfortunately makes quite a few differences and it's a bit hard to tell which one is at fault at the moment! I don't think we saw anything fail locally from this (but I may have just missed it), so I'm trying to figure out how to pin this down.
Thanks. It's hard to tell in all that where the problem might be, there are quite a few changes in all! And I didn't spot anything obvious by eyeballing it.
This seems good to me.
OK. Thanks. LGTM
Wed, Aug 12
Thanks for taking a look. I will update this soon...
Sounds like an improvement to me. Better than using TCK_CodeSize and the option is a nice addition.
Tue, Aug 11
Nice. Sounds good to me.
P.S I agree TCK_SizeAndLatency sounds better than Codesize, for the transforms.
My results seem to agree, that this improves things and doesn't break anything (they were only codesize benchmarks, but they all built OK).
Mon, Aug 10
Sun, Aug 9
After the rewrite of this pass to use activelanemask, that's essentially all there's left here
Sat, Aug 8
Yeah, nice. Sounds good to me.
Fri, Aug 7
Thu, Aug 6
Sounds good to me then, thanks.
Wed, Aug 5
Thanks. Sorry about that, it appears clang doesn't like this code as much as gcc did and then my internet went out whilst I was trying to figure out what was wrong.
Registers are free unless you go over a limit. And I wouldn't expect it to be these individual cost functions that attempted to guess whether it was over that limit.
Do you mean it's the cost of a phi that is altering things? They certainly sounds like they should be free most of the time. Even for codesize I would expect them to be folded away a lot of the time. You just have to get the inputs to share a register after all.
I kind of think this should be the default (plus it's perhaps a little strange for -march=thumbv8.1-m to have a difference branch cost to -march=thumbv8.1-m+mve).
Tue, Aug 4
Sounds good. Let me know if you find anything.
OK. Lets give this another go. There is one additional test update where we manage to convert a loop to a memcpy call.
Thanks. LGTM, in that this work the same as DAGCombiner.
I was looking at this pass for something else recently. I'm not sure if the legality checks are really necessary in here. We might want to change it to just check for active lane masks and convert them to vctp's (when legal). I think that will always be better in terms of code quality than the expansion of the active lane mask, no matter if we end up transforming to a tail predicated loop or not.
Mon, Aug 3
The last time me and Sjoerd talked about it, we figured this wouldn't fix the issue exactly (it only fuses them during scheduling, you can still spill between the t2LoopDec and the End), and as we need a proper solution anyway this might not end up being useful on it's own.
Thanks. LGTM with a very minor nitpick
Quick question - what is the expected behaviour? Do we just never expect to see an bf16 add, and if we do it's a fatal error? Or is some form of automatic promotion expected to happen?
Sun, Aug 2
Oh right. Sorry I saw that bug but didn't put two and two together somehow.
Sat, Aug 1
Thanks to Adenilson and Hans I believe chromium has now fixed the issue their end. I have also ran another bootstrap and the llvm-test-suite, which are both still doing OK.
Fri, Jul 31
I think we are still at the behests of LSR's cost modeling I'm afraid. This can just do slightly better at fixing up the results afterwards, it can slightly improve things in case ISel comes up with something unoptimal.
Now includes a quick codesize metric, to try and detect cases where a t2LDRi12 can be shrunk to tLDRi.
Thu, Jul 30
Wed, Jul 29
Tue, Jul 28
Thanks. LGTM, but please give others a day in case they have comments.
Mon, Jul 27
Thu, Jul 23
Wed, Jul 22
OK. Hopefully fixed in 411eb87c7962ec817ab6bf7aa3c737a3159d2d4e. Thanks for the report/reproducer, and sorry I didn't catch it earlier.
Yeah OK. This should be simple enough. It's missing the predicate off VMUL_qr. I'll just run a quick check-all and submit the fix.