[OpenMP] Refactor/Rework topology discovery code

Authored by jlpeyton on Apr 16 2021, 2:30 PM.


[OpenMP] Refactor/Rework topology discovery code

This patch does the following:

  1. Introduce kmp_topology_t as the runtime-friendly structure (the

corresponding global variable is __kmp_topology) to determine the
exact machine topology which can vary widely among current and future
architectures. The current design is not easy to expand beyond the assumed
three layer topology: sockets, cores, and threads so a rework capable of
using the existing KMP_AFFINITY mechanisms is required.

This new topology structure has:

  • The depth and types of the topology
  • Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core)
  • Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
  • Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1/L2 cache equivalent to core)
  • Whether it is uniform or not

The hardware threads are represented with the kmp_hw_thread_t
structure. This structure contains the ids (e.g., socket 0, core 1,
thread 0) and other information grabbed from the previous Address
structure. The kmp_topology_t structure contains an array of these.

  1. Generalize the KMP_HW_SUBSET envirable for the new

kmp_topology_t structure. The algorithm doesn't assume any order with
tiles,numa domains,sockets,cores,threads. Instead it just parses the
envirable, makes sure it is consistent with the detected topology
(including taking into account equivalent layers) and then trims away
the unneeded subset of hardware threads. To enable this, a new
kmp_hw_subset_t structure is introduced which contains a vector of
items (hardware type, number user wants, offset). Any keyword within
__kmp_hw_get_keyword() can be used as a name and can be shortened as
well. e.g.,
KMP_HW_SUBSET=1s,2numa,4tile,2c,3t can be used on the KNL SNC-4 machine.

  1. Simplify topology detection functions so they only do the singular

task of detecting the machine's topology. Printing, and all
canonicalizing functionality is now done afterwards. So many lines of
duplicated code are eliminated.

  1. Add new ll_caches and numa_domains to OMP_PLACES, and

consequently, KMP_AFFINITY's granularity setting. All the names within
__kmp_hw_get_keyword() are available for use in OMP_PLACES or
KMP_AFFINITY's granularity setting.

  1. Simplify and future-proof code where explicit lists of allowed

affinity settings keywords inside if() conditions.

  1. Add x86 CPUID leaf 4 cache detection to existing x2apic id method

so equivalent caches could be detected (in particular for the ll_caches

Differential Revision: https://reviews.llvm.org/D100997


jlpeytonMay 3 2021, 4:00 PM
Differential Revision
D100997: [OpenMP] Refactor/Rework topology discovery code
rG32b500431c02: Add some additional test cases inspired by PR50191