LGTM. However, I would like to hear Tony's opinion.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Today
Yesterday
LGTM. Thanks.
In D101117#2710368, @JonChesterfield wrote:I'm not very pleased with this. There's too much state to move around when representing the failure and the control flow is probably harder to read than it was in the original. May take another pass at it.
nice. Thanks.
make it NFC for non-hipRTC
LGTM. Thanks.
Did you check whether this patch will cause regression in OpenCL conformance tests? If not, I am OK with it.
Wed, Apr 21
revised by Artem's comments.
In D99233#2656446, @tra wrote:I'm concerned that if we make it a top-level option, it's likely to be cargo-culted and (mis)used as a sledgehammer in cases where it's not needed. I think the option should remain hidden.
While thresholds do need to be tweaked, in my experience it happens very rarely.
When it does, most of the time it's sufficient to apply __attribute__((always_inline)) to a few functions where it matters.
If AMDGPU bumps into the limit too often, perhaps it's the default threshold value that needs to be changed.
Currently ROCm builds all math libs and frameworks with an LLVM option which inline all functions for AMDGPU target. We cannot simply remove that option and use the default inline threshold since it will cause performance degradations. We cannot use -mllvm -inline-threshold=x directly either since it will affect both host and device compilation. We need an option to set the inline threshold for GPU only so that we could fine-tuning the inline threshold. I agree that this option should be hidden since it is intended for compiler development.
Tue, Apr 20
revised by Matt's comments
fix hip::numeric_type
revised by Matt's comments
The recent change https://reviews.llvm.org/D96280 caused some difficulty for this patch. I would like to have some suggestions.
there are still pre-merge failures. you may need update your test
revised by Artem's comments
You get pre-merge failures due to format issues. Better clean them up.
Mon, Apr 19
rebase
Rebase and fix HIP header bug exposed by this patch.
ping
Sun, Apr 18
Revised by Matt's comments.
Sat, Apr 17
Fri, Apr 16
Thu, Apr 15
LGTM. Thanks!
revised tests by Artem's comments.
revised error msg by Aaron's comments
Add GlobalDCE before internalization pass.
In D98783#2657656, @rjmccall wrote:In D98783#2642269, @tra wrote:We may want to add someone with more expertise with the IR as a reviewer. I'd like an educated opinion on whether the invisible dangling IR is something that needs fixing in general or if it's OK to just clean it up in this particular case. Or both.
@rjmccall, @rsmith -- do you have any suggestions -- either on the subject of the invisible dangling IR or on who may be the right person to talk to?
If Clang is creating constants unnecessarily, we should try to avoid that on general compile time / memory usage grounds unless doing so is a serious problem. But I don't think it should be our responsibility to GC unused constant expressions in order to make DCE work.
It's extremely common to create bitcast global constants. So the fact that this happens with addrspacecasts but hasn't been a persistent problem with bitcasts makes more suspect that DCE actually tries to handle this, but something about what it's doing isn't aware of addrspacecast. And indeed, LLVM's GlobalDCE seems to call a method called removeDeadConstantUsers() before concluding that a constant can't be thrown away. So either you're using a different transform that needs to do the same thing, or something is stopping removeDeadConstantUsers() from eliminating this addrspacecast.
Fri, Apr 9
Thu, Apr 8
Any other concerns? Thanks.
Wed, Apr 7
fix test failure on windows. need to specify triple since it affects name mangling.
LGTM. Thanks.
LGTM about AMDGPU toolchain change. Thanks.
Tue, Apr 6
In D99683#2672668, @tejohnson wrote:In D99683#2672578, @yaxunl wrote:In D99683#2672554, @tejohnson wrote:This raises some higher level questions for me:
First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).
We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.
Second, force importing of everything transitively referenced defeats the purpose of ThinLTO and would probably make it worse than regular LTO. The main entry module will need to import everything transitively referenced from there, so everything not dead in the binary, which should make that module post importing equivalent to a regular LTO module. In addition, every other module needs to transitively import everything referenced from those modules, making them very large depending on how many leaf vs non-leaf functions and variables they contain. What is the goal of doing ThinLTO in this case?
The objective is to improve optimization/codegen time by using multi-threads of thinLTO. For example, I have 10 modules each containing a kernel. In full LTO linking, I get one big module containing 10 kernels with all functions inlined, and I have one thread for optimization/codegen. With thinLTO, I get one kernel in each module, with all functions inlined. AMDGPU internalization and global DCE will remove functions not used by that kernel in each module. I will get 10 threads, each doing optimization/codegen for one kernel. Theoretically, there could be 10 times speed up.
That will work as long as there are no dependence edges anywhere between the kernels. Is this a library that has a bunch of totally independent kernels only called externally?
revised by Richard's comments. Check function-scope static var.
In D99683#2672554, @tejohnson wrote:This raises some higher level questions for me:
First, how will you deal with other corner cases that won't or cannot be imported right now? While enabling importing of noinline functions and cranking up the threshold will get the majority of functions imported, there are cases that we still won't import (functions/vars that are interposable, certain funcs/vars that cannot be renamed, most non-const variables with non-trivial initializers).
We will document the limitation of thinLTO support of HIP toolchain and recommend users not to use thinLTO in those corner cases.
ping.
rebase
revert the change about option
In D99683#2669136, @yaxunl wrote:In D99683#2669080, @tejohnson wrote:In D99683#2669047, @yaxunl wrote:In D99683#2664674, @tejohnson wrote:I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.
AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.
How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.
AMDGPU backend by default uses full LTO for linking. It does not support non-LTO linking. Currently, it inlines all functions except kernels. However we want to be able to be able not to inline all functions. Is it OK to add an LLVM option to mark imported functions as linkonce_odr so that AMDGPU backend can keep the definitions of the imported functions?
revised by John's comments
Mon, Apr 5
Revised by John's comments. Do not allow atomic fetch add with x86_fp80.
LGTM. Thanks.
In D99683#2669080, @tejohnson wrote:In D99683#2669047, @yaxunl wrote:In D99683#2664674, @tejohnson wrote:I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.
AMDGPU backend does not support linking of object files containing external symbols, i.e. one object file calling a function defined in another object file. Therefore the LLVM module passed to AMDGPU backend needs to contain definitions of all callees, even if a callee has noinline attribute. To support backends like this, the function importer needs to be able to import functions with noinline attribute. Therefore we add an LLVM option for allowing that, which is off by default. We have comments at line 70 of HIP.cpp about this.
How does a non-LTO build work, or is (full) LTO currently required? Because with ThinLTO we only import functions that are externally defined but referenced in the current module. Also, when ThinLTO imports functions it makes them available_externally, which means they are dropped and made external symbols again after inlining. So anything imported but not inlined will go back to being an external symbol.
Revise by Artem's comments. Add patch description.
In D99683#2664674, @tejohnson wrote:I haven't looked extensively yet, but why import noinline functions? Also, please add a patch description.
Sun, Apr 4
rebase and add tests for old options
LGTM. Thanks.
LGTM. Thanks.
Thu, Apr 1
I have tested removing this did not cause regressions in our CI.