- User Since
- Nov 12 2014, 1:58 PM (132 w, 2 d)
Can you provide such a test?
Wed, May 24
I can see the bloat as we inline with a ridicolously high threshold before instrumenting without DCE'ing, so the patch seems a decent idea.
Also, DCE is basically free compile-time wise. If you want to improve I'd do it on the DCE side instead of the instrumentation side, but that needs a larger discussion which definitely doesn't fit in the margin of this page.
This is fine assuming you audited all the callers (a quick grep shows just these two).
Thanks for your first contribution, great GSoC start!
One note: this manifests with ThinLTO, but it might be a more generic metadata issue. Are you able to reproduce without ThinLTO (and maybe with a single file?)
If not, can you explain why?
Agree with Simon, this needs to be split in two patches.
Tue, May 23
Fine by me. Thanks!
Also, r303704 for the cleanup.
Mon, May 22
The homegrown dominance frontier logic is not great, but I don't think it's a significant blocker and can be refactored in the future.
i.e. I'm much more interested in understanding whether this thing stick together (including the non trivial unswitch, once you get to it).
I'm happy with this going in when Danny/Sanjoy are.
This is, unfortunately, not ideal, as full unrolling can be beneficial in -Os.
Michael (@mzolotukhin) did you get a chance to re-evaluate this?
Do you have an example where this triggers?
Also, please note this patch contains also the clang bits. you may want to apply them separately.
I'm not entirely sure whether you need cl::ZeroOrMore. Few comments, with that, LG.
Feel free to submit once addressed without another round trip.
Can you make the same change in the future pass manager pipeline?
Sun, May 21
Sat, May 20
Note: whether it's a good idea to declare pthread_self() with attribute const is a different story, but I'll leave the answer to the glibc developers :)
OK, I think I found out the cause. I guess this patch was wrong, my bad.
GCC doesn't do anything special with pthread_self per-se.
What GCC does is speculating the glibc implementation of pthread_self is declared with __attribute__(const).
The semantic of the attribute is that of "The const attribute is specified as asserting that the function does not examine any data except the arguments. " 
If the function has no arguments, it has to return the same value every time.
This is a really nice cleanup and I think we can greatly benefit from it in other passes as well :)
Some comments inline.
Looks okay, my POSIX was rusty so I looked at it again and this seems OK to speculate as it has no side-effects.
The thread id is guaranteed to be unique across all *running* threads, but can be reused when a threads joins and another one is created. I don't think this matters for this optimization.
I looked very closely at the opengroup spec and I can't find a paragraph stating that the return of pthread_self() is guaranteed to be unique for the thread lifetime.
I expect any sane implementation to do that, and it's probably implicit.
This one is implemented correctly, but, can you put together all your private patches and post for review? That would clarify the general picture for reviewers.
LGTM (assuming you grepped and replaced all the uses in the documentation).
Do you have some C code where this triggers? Can you provide an example?
Fri, May 19
LGTM. I see Peter confirmed this is the correct fix in the PR, and I think so too.
Correct version of the patch & clang-formatted.
Updated after Sanjay's/Jacob's feedback. Thanks for your review!
So what youre suggesting here is to canonicalize and remove this Xform entirely?
Thu, May 18
This also refactors the codepath for clarity (as suggested by Sanjay).
Wed, May 17
Tue, May 16
As we did in NewGVN I think it's important to add a comment explaining why this doesn't matter, so, @mgrang, you might want to do that instead.
Nevermind, we always remove the copy so that shouldn't matter.
uh, I'm very confused at this point, I thought this was resulting in non-deterministic output?
Where did you encounter this?
I'm a little surprised that bootstrapping without LTO takes more time than bootstrapping with LTO, but assuming numbers are sane, the compile time impact is not terrible.
In particular, the time spent in EliminatePHI goes from
92.7229 ( 59.4%) 0.0100 ( 1.1%) 92.7329 ( 59.0%) 92.7333 ( 59.0%) Eliminate PHI nodes for register allocation
107.5286 ( 62.0%) 0.0090 ( 1.1%) 107.5375 ( 61.7%) 107.5381 ( 61.7%) Eliminate PHI nodes for register allocation
The test could probably be further reduced, but I think it's fine now.
The change looked good to me a while ago, and still looks good.
Please wait and check if Craig is happy with this.
This is fine. It's just a theoretical concern, but OK. Please keep in mind that the transform you're proposing won't work if the first icmp is located in a different position in the function, for that you need value numbering.
The wording is OK. What kind of infloop are you talking about here? Can you please elaborate?
Mon, May 15
updated, now with testcase.
Sorry, I didn't do git add of the testcase, I'll do in my next upload.