- User Since
- May 24 2016, 8:35 AM (225 w, 2 d)
Nice. I was worried this would be a bit convoluted, but it's clean like this just setting UP.UnrollAndJam.
I have the feeling that this was protecting us from more than just things that would become vpt blocks. I think we have a check that the original was only a single loop? That would keep things simpler at least, to mostly icmp + select (and maybe zext(icmp), but as far as I understand that should be fine). Things like min/max/abs should all work OK, and select on their own. But what about the saturating intrinsics we have? vqmovn for example.
I guess in my mind unroll and jam was a lot like runtime loop unrolling, which has to be enabled per target.
Wed, Sep 16
Thanks for making the extra fp16 patterns too. LGTM
Sounds like a good idea in the short term. Thanks
Tue, Sep 15
Rebase and rename backwards function.
Rebase. I don't claim to be a big debug expert, I'm not sure if the debug value on bitcasts/phis is super important.
Mon, Sep 14
This looks good to me. It may be worth adding some fmin/fmax tests with the fadds, just because they can end up being a little different.
Yep, that's the plan. It will need a number of other patches though. I think one for adding a cost to predicated blocks is important, along with the one for costing masked loads more correctly under MVE.
Sun, Sep 13
Sat, Sep 12
STRWpost should be $x0 = STRWpost $xzr, $x0. The concept of scheduling a store can be a little strange, but the address update should come as the first operand.
Fri, Sep 11
I've tried to rewrite this to something less convoluted. Hopefully it's a little more straightforward than the nested code from before.
Thu, Sep 10
Nice one. LGTM
The TTI changes here look OK to me. I think it is probably worth trying to make the ARM check a bit more specific though. We are trying to match a SSAT, not any min/max. It's worth making sure that the subtarget will have an SSAT instruction, and that the size of the type we will be saturating to will fit into an i32.
This seems OK to me, if you can get some others to agree.
It could probably be done either way, but this seemed simpler when I wrote it. I was trying to keep the functions cleaner. Otherwise they will have to start looking backwards from the mul (which has two operands the same) up to the extend to check it's type.
Sounds good. Thanks.
Wed, Sep 9
It's probably worth adding something to the commit message about how llvm will canonicalize to different patterns than the old code would select from, and that this is updating it to the new expected form.
Tue, Sep 8
Mon, Sep 7
There are some tests for 64bit reductions. We will probably want to enable inloop reductions for them in the future too, as we have the instructions. That will require a lot of costmodel improvements though.
Sun, Sep 6
Our downstream schedulers override pickNode/pickNext and have a certain amount of lookahead. We probably wouldn't be able to make use of this. I have in the past wanted to write a heuristic that this would fit this really nicely - it needed the last scheduled instruction and would bias based on whether they could "overlap". Unfortunately I never managed to get it to actually make performance reliably better.
I would expect the tests in llvm to only be subset of the possible second order effects that can come up from changing this. Perhaps abs is simple enough, but you might want to run the llvm test suite to see if there are any other interesting binary differences.
Fri, Sep 4
Thu, Sep 3
Hello. This looks interesting. I've run into places before where we just wanted to add an extra heuristics. I think some of our downstream schedules go a bit beyond this and would still need to override tryCandidate (the merge is going to be a pain!), but I've run into situations where it would definitely be useful.
Wed, Sep 2
Tue, Sep 1
rGffd0b31c7cba is the revert.
Mon, Aug 31
Hello. Sorry I wasn't sent any buildbot failure emails, perhaps because I was not the author. Thanks for letting us know.
Sure can. Will do..
Sun, Aug 30
Rebase and ping.
Sat, Aug 29
Yeah. I'm happy with this. Looking forward to seeing any follow ups.
Fri, Aug 28
I asked Anna to put this patch together so that we had it when we needed it. I've been testing things out and I don't see any real reason not to go for it now though. Performance looks OK, if a bit conservative now that we have the higher costs. And I've not found any correctness issues. I've been trying any code I can find with a forced vector user cost and not run into any problems.
Thu, Aug 27
Thanks. Sounds good to me.
align2 test now actually has an alignment of 2.
Oh yeah :(