This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP] Refactor/Rework topology discovery code
ClosedPublic

Authored by jlpeyton on Apr 21 2021, 1:58 PM.

Details

Summary

This patch does the following:

  1. Introduces kmp_topology_t as the runtime-friendly structure (the corresponding global variable is __kmp_topology) to determine the exact topology structure which can vary widely among current and future machine architectures. For example, an SNC-4 enabled KNL machine can have 5 topology layers with Hwloc. The current design is not easy to expand beyond the assumed three layer topology: socket, core, and thread so a rework capable of using the existing KMP_AFFINITY mechanisms was required.

This new topology structure has:

  • The depth and types of the topology
  • Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core)
  • Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
  • Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core)
  • Whether it is uniform or not

The hardware threads are represented with the kmp_hw_thread_t structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous Address structure. The kmp_topology_t structure contains an array of these.

  1. Generalizes the KMP_HW_SUBSET envirable for the new kmp_topology_t structure. The algorithm doesn't assume any order with tiles/numa domains/sockets/cores/threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new kmp_hw_subset_t structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within __kmp_hw_get_keyword() can be used as a name and can be shortened as well. e.g., KMP_HW_SUBSET=1s,2numa,4tile,2c,3t
  1. Rework topology detection functions to be simpler and only do a singular task of detecting the topology. Printing, and all canonicalizing functionality is now done afterwards so many lines of duplicated code are reduced.
  1. New TR8 ll_caches and numa_domains are added to OMP_PLACES, and consequently, KMP_AFFINITY's granularity setting. In fact, all the names within __kmp_hw_get_keyword() are available for use in OMP_PLACES or KMP_AFFINITY's granularity setting.
  1. A lot of places where explicit listing of allowed names in affinity settings inside if() conditions is made more general so expanding the topology names is less burdensome in the future.
  1. CPUID leaf 4 cache detection was added to existing x2apic id method so equivalent caches could be detected (in particular for the ll_caches place).

Diff Detail

Event Timeline

jlpeyton created this revision.Apr 21 2021, 1:58 PM
jlpeyton requested review of this revision.Apr 21 2021, 1:58 PM
This revision is now accepted and ready to land.May 3 2021, 10:42 AM
This revision was landed with ongoing or failed builds.May 3 2021, 4:04 PM
This revision was automatically updated to reflect the committed changes.
protze.joachim added inline comments.
openmp/runtime/src/kmp_affinity.cpp
2602–2605

This patch leaves back prod unused. Is it ok to just remove the variable? I'd include this into a cleanup unused variable patch.