- User Since
- May 24 2016, 8:35 AM (129 w, 5 d)
Tue, Nov 13
I'm happy to change stuff around. I believe it's the way it is at the moment because of the place this is called, half way into a constructor (making virtuals difficult) and some archs requiring different pieces of info (like aarch64 being different for JIT's). I figured if in doubt, don't change too much at once.
Hello. I agree that this looks like an improvement. Can you add a testcase?
Wed, Nov 7
Cleanup using tablegen classes.
Tue, Nov 6
Unfortunately, this also increased codesize a little at -Oz, which I will have to look into.
Mon, Nov 5
(but can cause problems for cases where the blocks are not in the form they will appear in assembly).
I'm not sure what sort of issue you're running into here?
Sun, Nov 4
Sat, Nov 3
Hello! Drive by comments. I've not looked in any depth.
Thu, Nov 1
Wed, Oct 31
Thu, Oct 25
I've managed to convince myself that this looks OK. A couple of nits depending on what you think of them.
Nice cleanup. LGTM
I was thinking about how this might affect other little cores like the A53/A55, especially around the dual issue on q registers. I don't think it will make much difference though, and the CSE benefits look like a bigger win.
Wed, Oct 24
Now ignores loops that will never be executed. I also have some code that uses SCEV to calculate if the backedge count is <= 1 and allow inlining there. It doesn't seem to come up very often though and needed some plumbing to get SE's/TLI's in the right places.
Mon, Oct 22
Anyway, this looks good to me.
Interesting. I didn't realise it worked like that. I had presumed that lot of passes would have to be taught about the optional defs, as opposed to them not being marked as dead correctly.
Yep, this came up a few times in the tests I ran. Looks like a nice improvement, and I don't think it complicate things too much.
Oct 18 2018
Oct 16 2018
This looks like something that would be useful for us, where some of our benchmarks are very low noise. I currently have a couple of test-ish LNT instances with changes similar to this, either changing this value or setting ignore_small=False from the daily report page.
Oct 15 2018
I think we should deal with do while in another patch.
Yeah, defo. I just need to come up with a sensible way to fix it. I feel some of this is pushing against the edges of what instcombine should be doing, but I'll keep pushing until someone tells me to stop.
Oct 14 2018
Interesting. Nice improvements. What about small trees? It would seem that any tree less that 75 nodes would always be recalculated. Do the timings you ran include things to show that this is better? Or was that just looking at larger trees at the time?
Oct 12 2018
Yeah, I agree. LGTM.
Nice idea. The rtabi seems to says:
Write functions return the value written, read functions the value read.
Yeah, that looks like similar IR to what I was looking at. The vectorised version on Skylake (https://godbolt.org/z/RBS2Os) has a lot of shuffling, perhaps that's deemed unprofitable on Goldmont?
Oct 11 2018
Oh, no. That's not what I wanted to hear. I presume we are looking at the same bit of code!
Whilst we're here, can anyone think of a good way to simplify:
(S + -32) - (-32 & (S + umax(31 - S, -32)))
That's the "do" case. I think if we distribute the -32& through the add, that with the rest of instcombine + cse + instcombine again does get us down to:
S & 31
OK, now one bit ol' matcher. Thanks for the suggestions.
This is the page I look at as reference for alive coding:
Oct 10 2018
I ran some benchmarks for thumb1, they look great.
Just added a new test, and changed some of the others around a little.
I've taken the demand parts parts out of this, and added some extra tests for the do / while and signed /unsigned cases.
Luckily, it appears that Alive can do countLeadingOnes too (although I don't see it anywhere in the sources I have).
Updated to use activeBits. Thanks for the suggestions.
Oct 9 2018
Ah, I must have been looking at the wrong bit of code. Thanks!
It turns out the other case I ran into above ((S + -32) - (32 & (S + umax(31 - S, -32)))) was from do loops, not while loops. Signed will also be different to unsigned, with signed cases not having quite as small simplified forms.
These are the cases:
With some possible simplifications:
Hello, see the part about context in https://llvm.org/docs/Phabricator.html#phabricator-request-review-web. It's easier to review things if we can see the code around the patch as well as the code in the patch.
Hello. Very nice. I don't think I can speak to much of the detail here, especially the Hexagon parts, but can you:
- Add full context to the patch (-U99999)
- Replace the copyright headers to be more "llvmy"
Oct 5 2018
Oct 2 2018
I've added a new memcpy test from the original reproducer. It's a byte memcpy (people seem to love writing those), which I think is worth focusing on because its small, but still increases codesize. It expands to:
Unfortunately, since I did this everything seems to have changed and we now end up with something like:
(S + -32) - (32 & (S + umax(31 - S, -32)))
Let me know if you see anything funny from this patch.
Oct 1 2018
For loop-noinline.ll the "Simplified" instructions (the ones that cost nothing) appear to be:
3 x phis
one of the geps in the loop
the gep outside the loop
Sep 30 2018
Yep sure, with 52177, I no longer have a motivating case for this.
Sep 28 2018
Sep 26 2018
Sep 25 2018
Added a comment and correct spelling
LGTM, with one minor Nit.
Sep 24 2018
Turns out there wasn't any conflict, but I've tried to clean this up a little and add a few more tests.
Sounds good. I'm happy with this, if no one else has any issues.
Sep 21 2018
This does seem, from the tests I ran, to reduce codesize on average. Especially on Thumb1.
Is this because we can just use a MOVS and wont have to fill in any higher bits? And MOVS's aren't trivially rematerialisable? And Thumb2/Arm are handled by getT2SOImmVal?
Sep 20 2018
Sep 19 2018
Minor edit to test (and renamed it to tailcall-dup.ll).
Added a tailcall duplication opt test to Transforms/CodeGenPrepare.
Sep 18 2018
Thanks for pointing to D52070. I think I saw that (it gave me the idea for this), but hadn't realised it had come back out.
Sorry for the delay. Sure, I can do that.
OK. Thanks guys.