Starting from clang-12 openmp started generating internal global variable with got relocation even when static relocation enabled.
In clang-11 shouldAssumeDSOLocal was assuming it dso_local based on static relocation model.
Since shouldAssumeDSOLocal is cleaned up now for respecting dso_local generated from frontend, marking openmp internal globals as dso_local.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/CodeGen/CGOpenMPRuntime.cpp | ||
---|---|---|
2186–2187 | I would not do it here since the linkage is changing in some places. Better to make this kind of change explicitly where required + need to add tests. |
I think these are two separate fixes. TargetMachine::shouldAssumeDSOLocal() change might make sense,
but as the comment there notes, this really should be fixed in in CGOpenMPRuntime::getOrCreateInternalVariable() itself (is that not the right default?),
and in it's every user that incorrectly unmarks said global as being `dso_local.
Please see this revision https://reviews.llvm.org/differential/diff/367953/
Should I get back to this?
If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.
Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.
I do not understand why the most original diff here was wrong.
Is is the wrong default for getOrCreateInternalVariable() to default to dso_local?
Need to check that the linkage/visibility of the created vars is not changing in place. If it is not changing, then it is good to go. Otherwise, need to add some another function/extra parameter with linkage/visibility settings.
I'm having trouble parsing the comment, but i don't really see why the fact that some users may be changing linkage
should deter us from having the right default. If some users change linkage, then regardless of where this is dealt with,
the users will have to be adjusted, just a different set of them. It seems to me that fixing the ones that already mess with linkage is the right choice,
not fixing the ones that *don't* mess with linkage.
Yes, but the fix should not introduce bugs, if possible. Need to check the users and make them work correctly too, that's why need to check the users and if some of them change the linkage/visibility, then need to add a new function or add an extra parameter to track whether the need to add dso_local attribute.
If some users change linkage, then regardless of where this is dealt with, the users will have to be adjusted, just a different set of them. It seems to me that fixing the ones that already mess with linkage is the right choice, not fixing the ones that *don't* mess with linkage.
That's all I'm asking, actually. Or just keep the original code (if no time to go through the users) and fix dso_local only where required.
@MaskRay I would like to know more about these trade-offs details, any supporting documents will do.
Considering below testcase, can you shed some light how code generated by llc-12 is better than llc-11 for given testcase?
https://godbolt.org/z/x9xojWb58
And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?
This is a very minor issue. First, global variable access is rarely a performance bottleneck.
Second, if the symbol turns out to be non-preemptible (which implies that it is defined in the component), the R_X86_64_REX_GOTPCRELX GOT indirection can be optimized out.
The only minor issue is slightly longer code sequence.
And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?
This is a kernel issue. Please mention the justification (is it related to OpenMP?) in the first place.
The kernel can be compiled in -fpie mode. In this mode, taking the address of a default-visibility undefined symbol uses R_X86_64_REX_GOTPCRELX.
So the kernel should support R_X86_64_REX_GOTPCRELX anyway.
If we think the optimization is meaningful:
Depending on the property of .gomp_critical_user_.atomic_reduction.var different DSOLocal strategies should be used.
If it is TU-local, make it local linkage. If it is linked image local, make it hidden visibility.
If it may be defined in a shared object and shared with other shared objects or the main executable, (not so sure because such symbol interposition does not work in other binary formats), use dso_preemptable as is.
I believe the current forced dso_local is definitely incorrect because it may break -fpic -shared links.
@kamleshbhalui ,make the changes accordingly that honour -fpic and don't mark dsolocal in this case.
I would not do it here since the linkage is changing in some places. Better to make this kind of change explicitly where required + need to add tests.