If you read the comment in TargetMachine::shouldAssumeDSOLocal: this is the wrong direction. dso_local is assumed to be marked by the frontend.
Direct accesses and GOT accesses are trade-offs. Direct accesses are not always preferred. The MachO special case is an unfortunate case for their xnu kernel, not a good example here.
@MaskRay I would like to know more about these trade-offs details, any supporting documents will do.
Considering below testcase, can you shed some light how code generated by llc-12 is better than llc-11 for given testcase?
This is a very minor issue. First, global variable access is rarely a performance bottleneck.
Second, if the symbol turns out to be non-preemptible (which implies that it is defined in the component), the R_X86_64_REX_GOTPCRELX GOT indirection can be optimized out.
The only minor issue is slightly longer code sequence.
And FYI this testcase does not work when build as Linux Kernel Module. LKM loader throw error("Unknown rela relocation: 42")?
This is a kernel issue. Please mention the justification (is it related to OpenMP?) in the first place.
The kernel can be compiled in -fpie mode. In this mode, taking the address of a default-visibility undefined symbol uses R_X86_64_REX_GOTPCRELX.
So the kernel should support R_X86_64_REX_GOTPCRELX anyway.
If we think the optimization is meaningful:
Depending on the property of .gomp_critical_user_.atomic_reduction.var different DSOLocal strategies should be used.
If it is TU-local, make it local linkage. If it is linked image local, make it hidden visibility.
If it may be defined in a shared object and shared with other shared objects or the main executable, (not so sure because such symbol interposition does not work in other binary formats), use dso_preemptable as is.
I believe the current forced dso_local is definitely incorrect because it may break -fpic -shared links.
@kamleshbhalui ,make the changes accordingly that honour -fpic and don't mark dsolocal in this case.