- User Since
- Jul 14 2014, 6:06 PM (275 w, 22 h)
Sep 11 2015
LGTM, there is just an assert to fix.
Sep 4 2015
It sounds great to me (although a lot of work for such a corner case, but I don't see any better solution).
Aug 31 2015
After thinking some more about it, I realized that having the load stay non-atomic is actually unsound because there might be a race, which would make it undefined behaviour.
Sadly, making it monotonic would still introduce these redundant cmpxchg8b/16b since monotonic requires the access to be atomic.
I don't know well enough the semantics of unordered accesses to know if they would do the trick, and the llvm reference manual description of them is fairly lightweight on details.
Jul 23 2015
LGTM for the integer part.
Nov 12 2014
To dberlin: I looked at this article, but they explain well that the reason min-cut is so expensive for PRE is because it must be repeated for each computation in the function (of which there can be 10s of thousand in very large function) and must look at a potentially huge graph. In comparison we only run this twice: once for hwsync and one for lwsync. Furthermore, because the graph is stopped by any memory access (and not just use/kill of some very specific computation as in PRE), I expect each of these runs of min-cut to be quite cheap. I have not had the time to benchmark the compile-time cost of this pass (deadline tomorrow for PLDI..), but in summary I expect it to be small, even for large functions full of fences.
Oct 31 2014
Oct 27 2014
Oct 21 2014
Initialize the FlowGraphNode::Next iterator more cleanly
Add an option "-view-fences-pre-flow-graph" to visualize the graph being used,
per the request of jfb (it makes things much easier to debug).
Oct 17 2014
Oct 16 2014
Apply clang-format, fix small style issues pointed by jfb.
I haven't done the bigger modifications yet.
I missed your idea of running this pass for C++11 fences instead. In short: this is probably useless, as C++11 fences are expected to be rather rare (compared to atomic accesses that result in target-specific fences), and the semantics of C++11 fences are hairy enough I don't want to try without a clear benefit. So while the infrastructure could support it, I don't plan on doing it.
Thanks for the review !
Oct 13 2014
Oct 9 2014
I don't know enough about this topic to comment on the details of the ideas (although they seem reasonable and match what had been discussed on the mailing list), but I have a read part of the patch and found a few typos and have a few inline comments.
Oct 8 2014
Follow jfb suggestion.
Rebase on trunk + use doxygen trick suggested by jfb
Oct 7 2014
Apply jfb suggestions.
Based on an offline comment by jf, improve the patch so that the immediate is
merged into the addq.
Oct 3 2014
I will add the test with ppc440.
Oct 2 2014
Rebase on trunk.
Duplicate with commit r218923 by Hal FInkel.
Add tests for indexed loads/stores
Thank you ! This paragraph of the documentation was indeed what I was
thinking about, when saying that atomicity is not a problem, I should have
made it more explicit. I will try to generate tests for the indexed
versions and add them in.
Sep 29 2014
I don't think I know this code well enough to give the LGTM, but the general approach looks good.
Also, thank you for the detailed testing/explanation.
Sep 25 2014
Sep 23 2014
Add requested comments
Partial cleanup of the control-flow + extra test + fixed formatting/typos.
Thanks for the review. Answers inline, and patch next.
Use CHECK-LABEL and CHECK-NEXT
Preserve FIXME comment, and refine tests.
The tests for cmpxchg16b were indeed testing for the lock prefix, but not the tests for the 8b version. I will upload a new patch with an improved test (and preserved FIXME).
Sep 22 2014
Thanks for the review, I will fix the comments as described below.
Sep 19 2014
Doing this optimization so late was looking increasingly brittle as it must interact with AtomicExpandPass earlier at the IR level.
I reimplemented it at that level in D5422, and it turned out a lot cleaner.
Sep 18 2014
Cleanup of some now redundant code that I had missed.
Sep 17 2014
Sep 16 2014
Rebase on trunk.
There is no barrier generated by the other path, the entire operation (store/load/rmw/whatever) is turned into a libcall on these old processors. It is tested (summarily) at the end of atomic-load-store.ll at least.
It mostly seems reasonable to me, but I don't know this area of the code, and do not feel at all comfortable giving an LGTM, so someone more experienced should look at it.
I will commit it soon (currently testing after rebase on trunk).
Sep 12 2014
Thanks jf for catching this forgotten test!
Sep 11 2014
I did not know that ppc is also supposed to be a generic processor. I can
either add HasSYNC to it, or remove it from generic (and change the broken
tests). Which would you prefer ?
I had originally left HasSYNC out of generic, but added it in because it broke a bunch of tests (which did not specify anything about the processor and yet were expecting syncs). Should I instead modify all of these tests to use a given processor ? If so which one ?
Sep 10 2014
Sep 5 2014
Sep 4 2014
Fix typo and add comment.
Add a Fixme about the use of IsBookE flag for sync support.
Fix formatting based on hfinkel comments.
Sep 3 2014
Ok, can the revision be accepted then ? Thanks.
Fix wrong update
MakeDMB -> makeDMB
Thanks for the review, I will modify the comments accordingly.
There was already a lot of tests for atomics on X86 (test/CodeGen/X86/atomic*), so I did not modify them in this patch, just checked that they still pass.
Thanks for the review !
Sep 2 2014
Got a lgtm from Nadav Rotem on the mailing list.
Aug 29 2014
Fix the formatting