- User Since
- Feb 13 2015, 12:29 AM (175 w, 3 d)
Wed, Jun 6
Tue, Jun 5
Mon, Jun 4
Sun, Jun 3
Fri, Jun 1
Revert changes in LanaiAluCode.h
The change in LanaiAluCode.h are indeed unrelated, let me remove them.
Wed, May 30
Add a mention of the change in the release notes.
That being said, you are correct, I should mention this in the release note and warn on the ML.
There are numerous problem with ADDC/ADDE and the sub equivalent, specifically because they are glues :) In addition, it doesn't really simplify much because there is still UADDO and friend to support, regardless.
Sun, May 27
May 26 2018
Merge flipBooleanConstant into flipBoolean
Add more specific test cases.
May 9 2018
May 6 2018
Feb 23 2018
Add bugfix for Hexagon and rebase
Jan 31 2018
Add a comment to explain the check for MVT::i16 .
Reduce the diff to simply avoid doing the high register trick.
It turns out that there is a bug in the other optimization made in the orginal diff, so it seems like a better idea to split that out in several smaller steps to ensure progress.
Jan 30 2018
Ok sounds like this isn't the right approach, closing this one.
Jan 29 2018
Remove leftover DEBUG
Merge codepath for testl and testw
@niravd It is rebased on top of D42615 . Alternatively, it can be rebased on top of D42646 with identical results as, as per discussion with @craig.topper , the test instruction has higher throughput than bt so maybe that's a preferable solution. Either way, the high register trick problem has been identified and there are several possible solutions.
Jan 28 2018
On a second look, when disabling this trick, I only get improvement in various test cases. I'm not sure what's the impact in real source code is, but I'm not convinced this trick is worth doing at all at this stage.
Looks like this end up being a problem when the value is in EDI due to calling convention. So a few question come to mind:
1/ Shouldn't this optimization be done only after register allocation, if the selected register allows for it ? This would cause it to fail once in a while because the register allocator do not chose the proper register, but it's probably preferable to the extra copies.
2/ Is that possible to hint the register allocator that something is desirable ? For instance, that we would like this value to be in a GR8_NOREX register, but if that's not the case, don't create a copy for it ?
Jan 27 2018
@niravd After proposing D42615 , it seems like using test instead of bt is the right thing to do as it has higher throughput - unless more work is required to materialize the constant. There is a bug in the materialization of test that cause it to sometime create needless copy. This needs to be fixed, regardless of what this diff does. I think we should proceed with this one.
@craig.topper I had no idea bt has lower throughput than test, I assumed it was the same. If that's the case, then this approach doesn't make much sense.
Rebase on top of D42615, which ensure there are no more regression for pattern involving test/bt.
Jan 1 2018
Dec 14 2017
Dec 13 2017
Nov 10 2017
Nov 9 2017
Make sure we do not transform is x can be Nan, also do the transformation for fsub and add arguments in various tests to make sure they don't get folded and continue to test what they are supposed to.
Oct 14 2017
Not relevant anymore.
This is not relevent anymore.
Aug 14 2017
Aug 9 2017
@RKSimon I'll find a way to make that fast, or find an alternative like activating it only in some specific situations. In addition to solving my specific problem, it seems to improve numerous other things, especially for the AMD backend. In any case, I think D33840 is a good thing either way and we should proceed with it.
Jul 31 2017
OK, benchmarks. Compiling clang from a bc containing clang in its entierety. With the patch: