This is an archive of the discontinued LLVM Phabricator instance.

[OpenMP][WIP] Make the kmp_depend_info type fit in 128 bits.
Needs RevisionPublic

Authored by jdoerfert on Dec 29 2019, 11:17 PM.

Details

Reviewers
Hahnfeld
Summary
NOTE: The clang codegen change is missing but this simplifies a OMPIRBuilder patch and makes the struct more compact. This is not to be commited like this!

The old type wasted 64 bits on a length value that is not used. While we
probably need to use the length value eventually, supporting a 64 bit
extend seems excessive. With this patch we have 61 bit extend (=length)
and we use the 3 highest bits for the kind encoding. Alternatively, we
could put some bits in the lower part of the base pointer.

Diff Detail

Event Timeline

jdoerfert created this revision.Dec 29 2019, 11:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 29 2019, 11:17 PM

Unit tests: pass. 61134 tests passed, 0 failed and 728 were skipped.

clang-tidy: fail. Please fix clang-tidy findings.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

This & D71987 - please note that if those types are not internal to the library,
(and they are not since they are produced by clang codegen), i suspect they can not
just be changed as it would cause an API/ABI break for already-compiled code,
and even newly-compiled code with previous compiler releases.
I would expect that some versioning is needed..

Hahnfeld requested changes to this revision.Dec 30 2019, 2:45 AM
Hahnfeld added a subscriber: Hahnfeld.

Yes, this must not be changed for compatibility reasons. If this really brings a benefit, we need new entry functions and update the compiler(s).

This revision now requires changes to proceed.Dec 30 2019, 2:45 AM

Do we have a consensus / policy on ABI breaks?

Breaking ABI will inconvenience people mixing parts of the toolchain from different releases and different vendors. Never breaking ABI means code complexity and performance degradation.

Do we have a consensus / policy on ABI breaks?

Breaking ABI will inconvenience people mixing parts of the toolchain from different releases and different vendors. Never breaking ABI means code complexity and performance degradation.

Right now libomp.so isn't versioned, meaning only a single version can be installed at the same time
(at least that is what i see in debian). So API/ABI break as such will mean that if libomp.so
from git master is installed, clang-9-based openmp will no longer be usable, only clang from git master.
That's kinda not great to put it mildly.

I"d like to understand that use case. Someone compiles libomp from trunk, pastes it over the version from repos, but leaves the rest of the toolchain unchanged. Breakage seems a fair result.

I've always shipped toolchains as a cohesive package. Various binaries, libraries, header files install to some directory and reference each other. Rpath et al.

I think you're suggesting multiple compilers share a single directory containing dependencies that work with all of them. I see the cost of that setup - why would it be preferred?

I"d like to understand that use case. Someone compiles libomp from trunk, pastes it over the version from repos, but leaves the rest of the toolchain unchanged. Breakage seems a fair result.

I've always shipped toolchains as a cohesive package. Various binaries, libraries, header files install to some directory and reference each other. Rpath et al.

I think you're suggesting multiple compilers share a single directory containing dependencies that work with all of them. I see the cost of that setup - why would it be preferred?

I quite literally only stated how things are as of right now.
It is possible to install multiple clang versions sumilteniously:

$ dpkg -l | grep clang
ii  clang-10                                                    1:10~+20191227100609+c3dbd782f1e-1~exp1~20191227211208.2957 amd64        C, C++ and Objective-C compiler
ii  clang-8                                                     1:8.0.1-4                                                   amd64        C, C++ and Objective-C compiler
ii  clang-9                                                     1:9.0.1-2                                                   amd64        C, C++ and Objective-C compiler

But not so much libomp:

$ dpkg -l | grep libomp
ii  libomp-10-dev                                               1:10~+20191227100609+c3dbd782f1e-1~exp1~20191227211208.2957 amd64        LLVM OpenMP runtime - dev package
ii  libomp-10-dev-dbgsym                                        1:10~+20191227100609+c3dbd782f1e-1~exp1~20191227211208.2957 amd64        debug symbols for libomp-10-dev
ii  libomp5-10:amd64                                            1:10~+20191227100609+c3dbd782f1e-1~exp1~20191227211208.2957 amd64        LLVM OpenMP runtime
ii  libomp5-10-dbgsym:amd64                                     1:10~+20191227100609+c3dbd782f1e-1~exp1~20191227211208.2957 amd64        debug symbols for libomp5-10
$ sudo apt install libomp5-9
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be REMOVED:
  libomp-10-dev libomp-10-dev-dbgsym libomp5-10 libomp5-10-dbgsym
The following NEW packages will be installed:
  libomp5-9
0 upgraded, 1 newly installed and 39 not upgraded.
Need to get 333 kB of archives.
After this operation, 752 kB disk space will be freed.
Do you want to continue? [Y/n] ^C

Which is consistent with how libomp is installed (no major version bumps between releases!), and how clang links to it:

$ ldd <some binary built with -fopenmp> | grep omp
        libomp.so.5 => /lib/x86_64-linux-gnu/libomp.so.5 (0x00007f9b6da18000)

So it may be best to first invest into soversion bumps,etc to avoid silent earth-shattering ABI breakages.

Ah, right. I follow. Some groundwork to be done first then, thanks.

jdoerfert added a comment.EditedDec 30 2019, 9:32 AM

I would expect that some versioning is needed..

Versioning is fine with me. Versioning the runtime would be even preferable.

If this really brings a benefit, we need new entry functions and update the compiler(s).

We go to great lengths to align to cache lines in some structs, e.g., kmp_depnode, but we are literally wasting space and alignment for others. Even if we do not adopt this, we should start to adopt some best practices.

FWIW. This was never intended to be committed like this, thus no reviewers.

I would expect that some versioning is needed..

Versioning is fine with me. Versioning the runtime would be even preferable.

When would you expect bumps of the version? As an example libc++ and libstdc++ have versions, but they're not frequently bumped (once in a decade or less?) because rebuilding all your applications can be very much work. That is why I think stable interfaces for runtime libraries are very convenient whenever possible.

If this really brings a benefit, we need new entry functions and update the compiler(s).

We go to great lengths to align to cache lines in some structs, e.g., kmp_depnode, but we are literally wasting space and alignment for others. Even if we do not adopt this, we should start to adopt some best practices.

There's a difference though: kmp_depend_info_t is passed by the compiler and evaluated once by the runtime library. If I recall correctly, all information is stored in internal structures afterwards (the mentioned kmp_depnode being one of them). Thus I wonder if this change brings a measurable improvement when reading the data once is not on the critical path. Breaking the ABI on the benefit of saving a few bytes on the stack seems a bit overkill just for the sake of doing it.

If this really brings a benefit, we need new entry functions and update the compiler(s).

We go to great lengths to align to cache lines in some structs, e.g., kmp_depnode, but we are literally wasting space and alignment for others. Even if we do not adopt this, we should start to adopt some best practices.

There's a difference though: kmp_depend_info_t is passed by the compiler and evaluated once by the runtime library. If I recall correctly, all information is stored in internal structures afterwards (the mentioned kmp_depnode being one of them). Thus I wonder if this change brings a measurable improvement when reading the data once is not on the critical path. Breaking the ABI on the benefit of saving a few bytes on the stack seems a bit overkill just for the sake of doing it.

Why isn't this on the critical path? It has to be paid by the thread creating tasks, in many scenarios that is the critical path while the threads executing tasks can take a hit (though it can be the other way around). The cost is not a few bytes on the stack, never was. The cost is an additional store + load (for the i8) and an extra cache line accessed for every few kmp_depend_info_t objects.

I mentioned this before, I'll do it again: Even if we do not adopt this, we should start to adopt some best practices.

I would expect that some versioning is needed..

Versioning is fine with me. Versioning the runtime would be even preferable.

When would you expect bumps of the version? As an example libc++ and libstdc++ have versions, but they're not frequently bumped (once in a decade or less?) because rebuilding all your applications can be very much work. That is why I think stable interfaces for runtime libraries are very convenient whenever possible.

If we version both, the interface and the runtime, we don't need to rebuild stuff and we can phase old interfaces out. Right now we already version the interfaces.