Introduce two new AAs. AAICVTrackerFunctionReturned which checks if a
function can have a unique ICV value after it is finished, and
AAICVCallSiteReturned which checks AAICVTrackerFunctionReturned for a
call site. This enables us to check the value of a call and if it
changes the ICV. This also changes the approach in
getReplacementValues() to a worklist-based approach so we can explore
all relevant BBs.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Note: I've added D85052 as a parent revision. However this can also be considered as a slight change in approach. If needed, those 2 can be merged together.
Things like ICVValue struct could be deleted if this approach turns out to be the one we go with. But I'd like to hear from others first.
Cool! Can you describe in the commit message in more detail what is different in this approach, how it works in general. Maybe pros and cons compared to before?
llvm/test/Transforms/OpenMP/icv_tracking.ll | ||
---|---|---|
158 | Can we have more negative test cases too?
|
We need a test with more basic blocks between the setter and getter.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1173 | The comment of getUniqueValue is not helpful. It looks like this will compute the value of ICV after the function was called, correct? Whatever it does, write it here ;) | |
1263 | If this is only for calls, why not take a CallBase &CB instead? | |
1277–1279 | UniqueReplVal should not be needed anywhere here. | |
1365 | We should have a worklist of instructions we need to look at. Initially it is the argument, if we go through the prdedecessor blocks we add their terminator to the worklist. I'm not sure we need the explorer at all in that case. |
Unclear if we still want D85052 to go in before this, since most of the login is replaced with this patch.
Should I just merge the good parts with this patch?
llvm/test/Transforms/OpenMP/icv_tracking.ll | ||
---|---|---|
155 | this should not happen. The weak function should not be used in IPO. I guess we could make the AA actually work on all IRPositions. (call site) return for the value after a call, and function for the body. Then, I hope, the Attributor should not give you the Return value if you query the call site return value of a weak function. |
Yes I think so.
TBH, I'm thinking we should embrace the IRPositions and Attributor design fully, as it seems a bit messy right now (= we reason about returns, calls, and instructions all somewhat together).
- AAICVTrackerFunction for the function body, given an ICV and an instruction, what is the value by backwards lookup (as you have it now). When you hit a call, ask AAICVTrackerCallSiteReturn
- AAICVTrackerFunctionReturn which should forward queries to AAICVTrackerFunction if it is not invalid. The queries are done for all return inst.
- AAICVTrackerCallSiteReturn which should forward queries to AAICVTrackerFunctionReturn.
WDYT?
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1170 | I am unsure about the difference of these two functions from the declarations. If we need two member functions we should name them appropriately and elaborate in the description. | |
1173 | The description is now better, we should also change the name to match it. Maybe getICVValueAfterOnReturn ? |
At first, I thought it wouldn't be very useful to have AAs for multiple positions since we only interact with the function one.
Now that I actually tried to make it work, turns out it is quiet a nice clean up and makes much more sense.
I'll update shortly.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1170 | With the new AAs for CSRet and Ret positions, one of those goes away and I think it will be clearer then. |
- Add AAICVTrackerCallSite
- Move manifest to AAICVTrackerCallSite
- Only register callsite AAs for getters.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1364 | Turns out hasTrackedValue is actually not needed. As for caching, values are kind of cached inside appropriate AAICVTrackerFunctionReturned, AAICVTrackerCallSiteReturned or AAICVTrackerCallSite |
Looks much better now. Mostly minor comments.
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
945 | Nit: We can now get rid of the outer loop and the conditional as well by using the SCC version of foreachUse. | |
1240 | Don't we need to check if we already assumed this value? | |
1252 | What is this about? | |
1256 | I feel this update doesn't need to look at everything but only the things we have in the cache. Right now it will look at all calls for all ICVs, which is probably not necessary. | |
1273 | Arguably, the value for the ICV is the getter call, so I. | |
1458 | This seems unfortunate but OK for now. |
addressing comments
llvm/lib/Transforms/IPO/OpenMPOpt.cpp | ||
---|---|---|
1252 | As per IRC discussion, added TODO to avoid this. Probably in a follow up, I'll try to introduce new state and possibly make other AAs not depend on function AA as much. This applies to the comment bellow as well. | |
1273 | The reason we return None here is because getters don't change the ICV. |
I'll revert this one in a second as it breaks some codes (at least on power).
The reason is that this extended the Attributor scope to the module slice. The Attributor thinks all of this is the SCC and when it tries to remove a function we see an assertion (as the function is actually not in the SCC).
The fix is to teach the Attributor about the module slice properly. Maybe in a pre-commit. For now we just revert this to unblock the build.
Here is the reproducer we should include in the OpenMP tests as well:
define internal fastcc void @"_omp$reduction$reduction_func14"() unnamed_addr { %call = call i8 @_ZStplIdESt7complexIT_ERKS2_S4_() ret void } define linkonce_odr hidden i8 @_ZStplIdESt7complexIT_ERKS2_S4_() local_unnamed_addr { ret i8 undef } declare void @__omp_offloading_2b_4010cad__ZN11qmcplusplus7ompBLAS17gemv_batched_implIfEEiRiciiPKT_PKS5_iS7_iS5_PKPS3_ii_l148(i64, i64, i64, float**, float**, i64, float**, float*, float*, i64) local_unnamed_addr declare dso_local fastcc void @__kmpc_for_static_init_8u() unnamed_addr !nvvm.annotations = !{!0} !0 = !{void (i64, i64, i64, float**, float**, i64, float**, float*, float*, i64)* @__omp_offloading_2b_4010cad__ZN11qmcplusplus7ompBLAS17gemv_batched_implIfEEiRiciiPKT_PKS5_iS7_iS5_PKPS3_ii_l148, !"kernel", i32 1}
Nit: We can now get rid of the outer loop and the conditional as well by using the SCC version of foreachUse.