- User Since
- Nov 5 2014, 12:57 AM (167 w, 2 d)
Dec 6 2017
Nov 6 2017
I'm not sure whether it's the right fix.
I would expect the clobbers list of an instruction to be precise - if it says it's clobbered, you can't rely on it being preserved across the call, whatever the circumstances. So maybe the right thing to do would be to filter reserved registers out of the clobber list for these invokes when they're lowered?
Oct 16 2017
Sep 5 2017
Aug 9 2017
Aug 8 2017
I think the revision summary doesn't fully represent what's going on here at this point. :-)
Jul 27 2017
Jul 18 2017
Anyway, I agree this isn't the right fix, but my gut feeling is that the right fix is to actually allow the builder to create a bitcast between a pointer and a pointer-sized float.
I don't see anything in the langref that makes the semantics of this undefined. Renato, what kind of semantic issues do you see here?
I agree the vectorizer can "safely" assume this case is ok because we know the original datatype was a pointer anyway and this is part of a strided load, but I feel we'd be opening a can of worms if we start allowing any float<->pointer conversion by default.
For example, a different case would be pointer->float->double->pointer. C has automatic promotions, and some corner cases may slip and create that sequence, which would destroy the bit-pattern and therefore the memory address.
So, if we can do this in this specific case, Manoj's current fix is "better" than moving it up CreateBitOrPointerCast, because we know what the semantics is. Or maybe we just load them as "data" (i32/i64?) and then bitcast safely?
Jul 17 2017
Should I update the Builder's CreateBitOrPointerCast function to handle float to pointer casts using float -> int and int -> pointer casts? I can do that or I can create a local pass specific casting function to handle float -> ptr + other CreateBitOrPointerCast routines used here.
Why can't you do the bitcast directly inside CreateBitOrPointerCast()? You'll also want to upadte isBitOrNoopPointerCastable(),
Anyway, you probably want a different set of reviewers for that patch - I'm really not the authority on that, and at least Renato seems to have a completely different opinion.
We don't really have a policy on this either way, I think.
I certainly don't have any strong opinions about it.
Jul 12 2017
Jul 11 2017
This looks wrong.
In particular, it would break for something like:
Jul 10 2017
Jul 9 2017
Re overflow - the point is that getOrCreateTripCount() returns, basically, PSE.getBackedgeTakenCount() + 1, and that may overflow, so the "trip count" may end up being 0 if the backedge taken count is 0. I don't think this is outdated, and this is behavior we want to preserve. But this patch should preserve this behavior IIUC. Can you make sure there's a test for this?
Jul 6 2017
I've reverted this in r307334 because it breaks a lot of clang tests with:
Jun 28 2017
Jun 26 2017
LGTM, but please wait for Matt.
Jun 25 2017
Jun 23 2017
Makes sense to me, modulo a couple of nits.
Jun 19 2017
Jun 9 2017
Jun 4 2017
Sorry it's taking me so long to get to this.
I've only looked at parts of the code, more forthcoming in the next few days. :-)
Jun 2 2017
This LGTM in terms of being "the right thing to do", but I'm really not sure whether the branch information we have right now is precise enough right now to enable this, especially when using sampling-based fdo.
wmi/dehao, any chance you could check if this causes regressions for us?
May 11 2017
May 9 2017
LGTM, with a nit.
May 8 2017
Apr 26 2017
Apr 25 2017
Adding some target people - I think they ought to care about this more than I do. :-)
Ok, for fast-math, I think this makes sense.
Honestly, I don't feel strongly about this.
Apr 24 2017
When I tried to play with this kind of thing, I got rather inconsistent performance results, because of things like a cold loop within a relatively hot loop.
Do you have any performance numbers for this?
I'm not sure. Do we currently (I mean, without this patch) do anything with FP IVs that violates spec?
Yes, that should have been "trip count - 1", sorry.
The original PR looks fishy to me, but I agree this is a real issue regardless.
Makes sense. LGTM.
Apr 20 2017
What does MSVC do for this? Is MSVC's behavior predicated on /guard:cf? Is it ok to generate empty functions without /guard:cf? Crash on calling an empty function seems like somewhat non-intuitive behavior.
Also, what about thumbv7-windows?
Apr 19 2017
Apr 17 2017
This looks ok, in the sense that it doesn't make things any worse, and will catch more cases - but I think it, unfortunately, doesn't really make things better either.
I don't think we can generalize this (because it blows up exponentially), so eventually we'll have to fix the underlying problems with the SLP "tree".
Apr 13 2017
Apr 12 2017
LGTM with minor nit about comment.
LGTM, except for the comments.
Apr 10 2017
This makes sense to me, but I don't understand why the Scale == 1 requirement was there to begin with.
+ctopper for another look.
Apr 6 2017
I suggest we leave the PMADDUBSW discussion for a separate patch.
Apr 3 2017
Ohhh, I think I got it now. Sorry for the noise, let me think about it. :-)
**haven't really looked.
Apr 2 2017
Mar 24 2017
LGTM, but please fix the test before committing.
Mar 23 2017
Mar 21 2017
Mar 19 2017
Sorry for the delay, Matt.
Mar 16 2017
Re the PS - I don't know.
As I wrote above - UnrollLoop() verifies the latch is conditional, but doesn't verify the two edges are actually going to the header and the exit block. But we haven't actually *seen* any issues with this, as far as I know.
Sorry, lost track of this. I'll try to get to this today/tomorrow, unless Elena does first.
Mar 15 2017
Mar 10 2017
Mar 8 2017
(If you're not committing the addMetaData fix immediately after this, please put a FIXME above the "if (VF > 1)")
The SLP part LGTM - it's less broken than the rest of the code, and we need to fix SLP itself to fix it (and avoid things like the Container hack...)
Thank you, this will be incredibly helpful. I don't think it quite works, though - see inline.
Mar 7 2017
LGTM, modulo the minor issues inline.
I haven't looked at the patch in detail yet, but I don't understand the rationale for putting it in the SLP vectorizer.
If transforming this pattern is a win, it's a win regardless of whether it originated in the SLP vectorizer, or just happened to appear in the IR.
This makes sense. Some minor comments inline.
Oy. That's not good. Removal from SetVector (except with pop_back) is linear-time... :-\
Mar 6 2017
Do you need someone to check this in for you?
Also, forgot it before, sorry - please add a test. (No need for a huge test, you can force a small threshold.)