User Details
- User Since
- Aug 11 2015, 1:08 PM (397 w, 4 d)
Fri, Mar 24
OK, talked to some more people. Fences are fine inside the branch.
Thu, Mar 23
- fixes to frame lowering
- fixes to frame lowering
- disable the always inlining pass while testing
I'm not confident that an acq_rel exchange in combination with the fences is correct. If it is correct, I think it must be suboptimal. Why have you gone with that implementation?
- start testing, fix an inversion bug caught
Error handling could be better but otherwise looks ok here
More tests, great! Do you know how the first one fails on amdgpu?
Wed, Mar 22
The argv/env setup is the same as on amdgpu, with different names for the allocators. Maybe pull that into a header that takes functions for the alloc/memcpy and call it from both loaders?
Tue, Mar 21
- with frame setup, without tests, rebased on a refactored main
OK then as written this is definitely going to blow up on us. We shouldn't implement the general purpose lock API if it deadlocks unless called in a very specific situation.
LGTM but Matt's the expert here
Mon, Mar 20
I'm really sure that locks at thread scope do not work on amdgpu or pre-volta nvptx. One of the threads wins the cas, all the others do not, and it immediately deadlocks.
Fri, Mar 17
Libc could walk the ctors array before main and the dtors after, as normal function pointers. We might even be able to persuade hip or openmp to lower their ctor/dtors as function pointer in the same arrays, where they happen to launch a special kernel that walks at least one of them.
Global ctor/dtor is tricky. I have a suspicion hip and openmp have done different things there. When to run the dtor is a challenge, and how many threads the ctors run with may have implementation divergence.
Wed, Mar 15
Do we actually have seq_cst ordering on GPUS? It means every thread sees the same ordering, which I'd guess has to be done by a RMW atomic operations. Maybe a fetch_add. Plus these aren't scoped, so the memory underlying it has to be accessible from all threads, which probably means it goes to host shared memory. Aka fetch_add on CPU memory to get the ordering relation. That seems expensive to the extent that it probably isn't implemented. @arsenm how do we usually deal with that?
The GPU plumbing looks ok to me. Implementing the minimal can-write-stderr version first is good too, details can be revised in tree as needed.
Mon, Mar 13
Adding the amdgpu reviewer group. This change might be contentious - we currently treat various broken code as calls to trap on the grounds that those functions might not be executed. By analogy, functions that call undefined functions but are not themselves called might be considered not a problem.
Sun, Mar 12
Fri, Mar 10
Thu, Mar 9
I think that's all comments fixed except for teaching the verifier about range metadata which should be done separately to this.
- review comments, update tests
That is great news, phabricator's list of branches has mislead me. Thanks!
Duplicating a comment from the commit thread so it's easier for me to find later.
Wed, Mar 8
You've applied this to the release branches going back as far as 14. It's a user facing breaking change. As in people who have a working openmp toolchain and update to the point release which looks like a semantic versioning thing indicating minor bugfixes will experience immediate cessation of function. I consider that very poor user experience and do not agree with the scope of breakage.
Reporting after another round of discussion. I don't have much support from the llvm openmp working group that we should maintain the current divergence from libc++ and the like so I'm backing down. Let's revert this and regress for all users who don't install globally.
Mon, Mar 6
Looking at AMDGPUPromoteAlloca.cpp, the spilling to LDS is disabled for any function which accesses any global that represents dynamic LDS.
I'm happy with this but agree that "what might be a system path?" is a tricky heuristic. What we want is to exclude the places that the application will search anyway, but that's OS dependent. I'm mostly worried about toolchains installed under /opt or under $HOME so this is good for me. It's also easy to pull later if we reconsider.
Sat, Feb 25
This is still a NFC in terms of machine code but introducing the metadata during LDS lowering introduces a lot of churn into the tests, leaving updating those for a later patch.
- drop one function, move a second out of the header
Needs to be rebased on D144221
- factor out the md parse
- Set the metadata in lowermodulelds - test noise update still todo
Fri, Feb 24
- rebase
Checked in the new test case (codegen for hybrid path, previously only checking IR) before this patch in order to highlight the s_trap -> working change. Will commit this as is to catch up with rocm and then change to absolute_symbols (as in D144221), patching trunk first so that rocm picks up the change without manual intervention.
- update test, no longer emitting traps
Feb 23 2023
Feb 22 2023
If fedora has a specific set of paths it rejects in DT_RUNPATH
- and libomptarget is on one of those by default
- and user executables find it there
- and we can detect we're building on fedora
then we can disable the rpath assignment for fedora installs to system root, while keeping openmp binaries working when the toolchain is installed elsewhere.
- Change to hybrid. Reintroduces code block deleted by git misfire
NVM the above, all the other call sites to addLTOOptions have that test in them so it must be fine
Marking this as "no" because as far as I can tell it'll stop anyone running openmp built from source which constitutes a severe regression and I am struggling to find information on what Fedora are doing here. This link https://fedoraproject.org/wiki/Changes/Broken_RPATH_will_fail_rpmbuild suggests changing clang to not set rpath when it would be aiming at a "system directory", which is unfortunately platform specific magic strings, would be fine. That is, maybe Fedora is OK with setting RPATH as long as it isn't set to /usr/lib64 or possibly other unspecified strings.
So what is this configuration file? Joseph found a Gentoo blog post https://blogs.gentoo.org/mgorny/2022/10/07/clang-in-gentoo-now-sets-default-runtimes-via-config-file/ and I don't have a clang.cfg file in my install dir
I don't know how this configuration file works. Running clang from the build directory is useful when developing but to avoid user facing regression we'd need it to run successfully from the install directory. The use case is someone builds trunk or a release on a HPC machine and expects it to run without hacking with the environment variables, which is especially important as module systems managing the machine are likely to be hacking with the environment variables.
Feb 21 2023
I don't mind hugely what mechanism is used but would really like clang++ -fopenmp foo.cpp to build a program that runs. How can we preserve that 'works' feature without setting rpath on the binary?
Feb 16 2023
- delete a print statement
Created D144233 (which was supposed to be a diff relative to this patch but actually has all the line noise in it anyway).
That is not ready to go but shows how function scope external LDS can be layered on the same metadata scheme proposed here. Main difference is that we really don't know that address until codegen time, at least as presently implemented.
Feb 15 2023
Feb 13 2023
I don't rate the copy&paste but I suppose we can fix that later if we choose to. The implementation is probably alright as it was copied from things that are probably alright, and if it turns out to be buggy, we'll fix it when that arises.
I believe coarse/fine grain and accessibility are orthogonal concepts. If the coarse grain pool happens to have the accessibility properties of "target_host", then it'll work. That may be target specific and even order of pool iteration specific.
Feb 4 2023
Added some people I can remember from the original discussion.
Jan 30 2023
Shall we wait to see if people using the nextgen report breakages before deleting the current/previous? I don't think the latest branch has actually shipped to anyone yet
Jan 25 2023
Jan 24 2023
That works if you have one toolchain installed globally or you are happy to cobble together a working system using environment variables. If you have multiple toolchains, they can't all sit in the global directory. If you don't have root, you can't install them there.
Jan 23 2023
I think we should land this (sounds like we need to default a test to the old plugin as part of the commit, if our CI is running?) asap. We don't really have the test coverage or resources to reliably manage both new and old plugins over the long haul.
Jan 21 2023
If you don't mind keeping an eye on CI and reverting it on breakage feel free to land it, I'm away from my desk
I think it's good to go. If this is something you've tried in qmcpack so we have a real world example of it improving things then even better.
- drop atomic
- address review feedback
Jan 20 2023
- simplify
- clean up
- clean up
This is trending in the direction of D142008. It's essentially the same workaround but restructured aiming to minimise blast radius.