- User Since
- Aug 11 2015, 1:08 PM (296 w, 1 d)
Change requested is to not change the handling of region_address. I haven't looked up what that is but I am sure the module lds pass doesn't do anything with it.
I agree with the intent. This would be a good change to run through the internal CI before landing.
I believe we only require a value for '-march=' to unblock running tests on CI machines. I'd guess you're referring to target id stuff where clang fills in reasonable defaults already.
Yep, thanks for factoring this out
Mon, Apr 12
New tests look good to me, thanks.
Interesting. Reduction across lanes in warp? If so, this is probably a way to handle the last step reduction for openmp reductions
The algorithm I had in mind was along the lines of:
for each LDS variable: if should-transform create 16 bit integer in LDS initialize that global with (constexpr) address of variable replace all uses of variable with a (constexpr) access through new pointer
Thu, Apr 8
Nice. Thank you for fixing the oversight.
Wed, Apr 7
Right, we don't really have minutes for the weekly call.
We'll have slightly indirect testing once this is used to enable D99656. There are two pieces that can be tested:
Implementing GNU libelf interface as-is via LLVM ELF is problematic because this will require const casting of constant references/pointers exposed by LLVM ELF.
I'm happy with this as-is. @jdoerfert is this close enough to what you expected when we discussed this offline?
Tue, Apr 6
This change is partly motivated by wanting to check in runtime tests for openmp that execute on whatever hardware is available locally. It is functionally similar to an out of tree bash script called mygpu that contains manually curated tables of pci.ids and to a python script called rocm_agent_enumerator that calls a c++ tool called rocminfo and tries to parse the output, with a different table of pci.ids for when that fails.
Sat, Apr 3
Nice, thanks! I think the function in the devicertl is dead with this change, maybe remove that too?
Wed, Mar 31
I guess there's no XFAIL equivalent here? In that case we should probably leave off the RUN line for the tests that can't work, as they'll otherwise break the build once an amdgpu CI machine goes live
Tue, Mar 30
That's a lot of code. Mostly straightforward, though the raw new/delete makes me slightly nervous. I think it's almost all here to abstract over libelf or llvm's elf library with a higher level interface. I haven't read through it carefully yet.
Mon, Mar 29
Bug https://bugs.llvm.org/show_bug.cgi?id=49764 points at this change as increasing device memory use. Previously the bitcode and the archive were both linked, where this file was always in the archive. After this commit, no files will have been extracted from the archive (and shortly afterwards the archive was dropped entirely).
I don't understand why fine grained attributes would be introduced and then all the fine grained attributes set on the function. I also don't understand why setting the attributes on a function based on which they call is considered difficult. We don't seem to be reaching consensus. I'll leave the assessment of this patch to others.
Context in https://bugs.llvm.org/show_bug.cgi?id=49752 is that this resolves a regression in stack usage from D94315. This change looks good. I'm not totally sold on using a function call boundary to convey invariants on ICV, but that's an existing property.
Fri, Mar 26
The general problem looks harder but important to fix. Finding the right headers but the wrong shared library is bad, and iirc we currently have to use LD_LIBRARY_PATH to bodge the latter which is not a good UX.
Arrived here from D99402. Complicated! Perhaps simpler today than in 2019 due to the monorepo, and if I understand the above, somewhat separable.
Thu, Mar 25
Tue, Mar 23
I can't work out which LDS variables you intend to replace with pointers from the code. Could you spell out what the condition under which you intend to replace one is?
This is much more complicated than I expected. Perhaps because it's been written intertwined with the lowering pass, instead of as an optimisation that executes beforehand?
Mon, Mar 22
Adding Ron in case he can run this against the amd-stg-open branch, and as fore warning of merge conflicts if not.
I'm not blocking. It doesn't merge cleanly into rocm so it's hard to run against aomp's testing.
Fri, Mar 19
No longer useful as a reference, the good parts of this are in main now.
Great context, thanks guys. I had missed that part of the compiler.
Thu, Mar 18
I failed to apply this to amd-stg-open to test, but may be able to run qmcpack against llvm main now. Will try that.
Ah, apologies. My 'land it' script has clobbered the author field. Will need to look up how to not do that in future
Sure. The process for requesting access has changed since I did it, can't remember what it is now.
We should pull this from the 12 release. Lots of effort at the last minute to stop a complicated patch asserting, after it has been patched several times already, is unlikely to yield a stable release.
Looks good to me. This will presumably increase the uses of the lower module lds pass. Maybe prudent to run it through the gerrit CI infra before landing?
Reviewer list derived from git blame + frequency.
This is really interesting. The idea seems to be to choose the dispatch parameters based on the kernel metadata and the limits of the machine.
Looks good to me, thanks.
Wed, Mar 17
Reverted, D98746 attracted requests for improvement that I don't have a timeline for. This will unblock people building openmp with a clang that doesn't have amdgpu enabled.
Tue, Mar 16
One of the drawbacks of limited trunk testing of openmp is that we're reliant on out of trunk people noticing something looks odd. I don't want to set a precedent of downstream forks reverting patches that fail local testing, as that'll remove a bunch of the ad hoc testing we do have.
I'm starting to have doubts about the thread safety of this library in general so would lean towards removing the commit entirely such that the remainder is easier to reason about. That way we can be fairly sure we've removed whatever bug this introduced so have ~ one fewer race to try to pin down.
Thanks. I'm going to wait for some of the rocm people to pass judgement too as this code path is shared with hip / opencl etc.
That's three independent reports of stuff breaking after this patch. There are a bunch of locks and condition variables involved, and it looks suspicious to me that the introduced variables are volatile but not atomic.
Do you know how to do so? I'm an hour into cmake documentation, stack overflow and trial and error with zero progress.
There is a failure mode here. If clang is built with an llvm that has the amdgpu target disabled, it will fail to compile this library. This is because clang sets the flag amdhsa-code-object-version for amdgpu, but that flag is defined in llvm, and is excluded from the build if the amdgpu target is disabled. That can be fixed by either not building openmp, or by enabling the target, but the error message the user gets is poor:
Note to self - there is ongoing interest in minimising the LDS usage of applications. This patch allocates the struct in every kernel (see the call to markUsedByKernel, it is applied exactly once to each kernel), in order to support calls to functions that make use of that struct.
Mar 15 2021
Yep. Also, DEVICE now always expands to whitespace, so that can go.
- drop accidental amdgcn part
I'm going to abandon this. I'm not confident that a cuda toolkit that is newer than the compiler will work with it correctly and would prefer it take some jury rigging on the end users part to put the two together.
Apologies for missing this. Looks reasonable to me. I'll wire up the amdgcn plugin when I find some time.
Agreed. Lack of save temps is causing grief when debugging, so I keep on applying this patch locally. Let's go with this for now. and change to something better when we think of it.