- User Since
- Aug 11 2015, 1:08 PM (266 w, 5 d)
Fri, Sep 18
An additional repo which llvm, openmp, clang etc implicitly depend on would also work. It's roughly equivalent to treating llvm itself as that root dependency.
I strongly dislike build scripts pulling files off the internet. It's bad for machines that aren't connected to the web and upsets people in security
Thu, Sep 17
Thu, Sep 3
It might be dead. Difficult to tell. The control flow being spread across codegen and the runtime obfuscates things.
This can't break nvptx as WARPSIZE == DS_Max_Warp_Number.
Fri, Aug 28
Nice! Thank you, there is way too much copy and paste between these files.
Wed, Aug 26
I'm unable to find a good reference for what this frequency should be, so am considering unconditionally returning zero from this function. That won't 'work' in any useful sense, but it's an improvement on failing to link because __clock64 is undefined.
Tue, Aug 25
The duplication from libc++ doesn't feel good but I can see how it happened. Dependency breaking etc.
This looks good to me. I can't claim to have read the test case in detail. Some style suggestions inline around the vector, but they're not blocking.
Mon, Aug 24
The reference counting to free is interesting. I don't think I've seen that on a bump allocator before. It doesn't allow memory reuse within a slab. I wonder whether there is usually some outstanding reference into the slab for most of the execution.
Sat, Aug 22
Aug 20 2020
As a heads up, I'm told this breaks amdgpu tests. @ronlieb is looking at the merge from upstream, don't have any more details at this time. The basic idea of wrapping device alloc seems likely to be sound for all targets so I'd guess we've run into a bug in this patch.
This was written as part of debugging a port of the deviceRTL to openmp. It probably isn't necessary for that goal after all, but replacing benign UB with .b in a number of places seems worth having anyway.
- remove now-dead casts
This is a good direction. Packing copies together is likely to be faster.
Aug 19 2020
This can't be used in the libomptarget plugins or in the deviceRTL, as both are required to compile without the llvm sources available.
Works out cleanly, thanks for the suggestion.
- Use more specific diagnostic
How does one opt out of this scheme?
Aug 17 2020
Well volunteered, thank you very much!
Aug 16 2020
Seems OK to me as a local fix. I'm not sure about encoding 'missing condition' as 0, but that's a preexisting choice.
Link error seems reasonable to me. Thanks
Patch is good, thanks.
Aug 15 2020
There are some parts of this that I'm hoping to push down into the hsa/rocr layer. And other parts that can be deleted without replacement. I will make this better. Good to have clang dev unblocked in the meantime.
Aug 14 2020
Fix proposed at D85990. Thanks for the detailed bug report!
Aug 13 2020
If I recall correctly, &foo with variants of foo returns a pointer to the base. If we have no base, and disable_implicit_base, what does &foo yield? It should probably be a compilation error with some descriptive message
Thanks! Probably good to implement this just at the time math headers needs it as that gives us a real world example of the change working.
I think this is reasonable. It's unfortunate to have isnan return bool or int depending on the system headers, but considering we have that in a language that doesn't mangle the return type into the name the workaround seems OK.
Aug 12 2020
It definitely can and should be tested. Instantiate on a device that uses host malloc/free for the functions and stress test it under valgrind.
LGTM then. Calling into the plugin to do the bulk alloc/free is nice.
OK, cool. If we're open to changing the implementation later this is fine by me. An instance per host thread is likely to be better than all the internal locks. Couple of minor comments above.
- Fix typo in comment
- Insert plugin in chronological order
Aug 11 2020
Nice. What makes it an extension? 5.0 / 2.3.5 claims "and where variant-func-id is the name of a function variant that is either a base language identifier or, for C++, a template-id." which suggests this could be always-on
I'm still doubtful about this. Bump allocate + no-op free is fast unless the GPU runs out of memory before the arena can be dropped. The list and mutex construction is unusual for an allocator.
Aug 5 2020
Jul 30 2020
I'm not totally sure about this. There's a lot of test setup relative to the test case itself, and I can't work out by reading the test itself what it's checking. What are all the mock classes for?
Jul 29 2020
We probably do want a macro to indicate 'compiling for amdgcn as the device half of a combined host+device language'. I'm having a tough time with the control flow in this header so we probably want tests to check the overall behaviour is as intended. E.g. static assert + various language modes.
Jul 28 2020
Jul 27 2020
Not Johannes, but LGTM. I didn't realise our tests could pick up a different runtime to the one just built, that was a nasty failure mode. Thanks for the patch
Jul 26 2020
A macro for wavefront size would make targeting gfx10 for openmp easier.
Jul 24 2020
- Fix missing underscores
Jul 23 2020
Jul 15 2020
I think this is good. It's dangerous, but it's also undocumented and has unsafe in the name.
Jul 14 2020
Nice, thanks. All my concerns were addressed by the above revision.
I think there's an unfortunate interaction with link time optimisation here. If there are external regions, but their code is combined with llvm-link before codegen, then a user could reasonably assume this flag is safe.
Agreed on tests. I like the mechanism - passing a string through to the backend as a way to dispatch between isa properties looks cleanly extensible. We probably do want to emit a warning when the backend claims it doesn't know anything about said string as it'll be prone to typos.
Jul 13 2020
This is indeed the direction I had in mind. Broad strokes look right. I probably wouldn't notice an accidental change amidst the whitespace reshuffle. Very happy to read through line by line if you can split the whitespace change out.
Jul 10 2020
Fine by me. Let's get nvptx working properly in tree now and work out how to wire up amdgcn subsequently. I'm sure a reasonable abstraction will present itself.
Fair enough, stateful it is then.
Jul 9 2020
Changing to getGridValue would be useful for sharing parts of this with amdgcn.