Page MenuHomePhabricator

Enable thread specific cl::opt values for multi-threaded support
Needs ReviewPublic

Authored by yrouban on Oct 19 2018, 2:21 AM.

Details

Summary

When several threads compile different modules the compiler options (instances of cl::opt) cannot be set individually for each thread. That is because the options are visible to all threads. In other words all options are global.

It would be convenient if the options were specific to LLVMContext and they were accessed through an instance of LLVMContext. This kind of change would need changes in all source files where options are used.

This patch proposes a solution that needs minimal changes in LLVM source base.
It is proposed to have a thread local set of re-defined option values mapped by pointers to options.

Specifically, every time a program gets/sets a value for an option it is checked if the current thread local context is set for the current thread and the option has its local copy in this context. If so the local copy of the option is accessed, otherwise the global option is accessed. For all programs that existed so far the context is not set and they work with the global options.
For new multi-threaded compilers (where every thread compiles its own module) every thread can be linked to its own context (see ContextValues) where any option can have its thread specific value that do not affect the other threads' option values. See the thread_routine() in the test ContextSpecificValues2.

This feature allows a configuration flexibility for multi-threaded compilers that can compile every compilation unit in its own thread with different command line options.

Diff Detail

Event Timeline

yrouban created this revision.Oct 19 2018, 2:21 AM

Haven't looked too much at the implementation details yet, but I definitely like the idea.

include/llvm/Support/CommandLine.h
1184

This should be declared explicitly in a .cpp file to provide an "anchor" for the vtable. (Otherwise the destructor and the vtable etc. are emitted in every compilation unit that includes this file, which is a waste of compile time.)

MatzeB added a subscriber: MatzeB.EditedOct 21 2018, 2:04 PM

Can you explain what you would use per-thread command line options for?
Intuitively I would not expect actual commandline users wanting to set options per thread. If you need it to tweak compiler behavior then it might be better to find ways to encode the information in TargetOptions.h, function attributes or similar, so we have a streamlined way of setting them independently of programmatically modifying commandline options.

I also think we agree that ideally cl::opts would be part of the LLVMContext. I just suspect inertia to actually do this is somewhat slow as it naturally means making the use of options being more typing than before...It is a little known fact, but we do even have API for options in LLVMContext: D5389 / rL219854 define it and use it for a single option. Unfortunately this was never followed up by transitioning additional options.

Can you explain what you would use per-thread command line options for?
Intuitively I would not expect actual commandline users wanting to set options per thread. If you need it to tweak compiler behavior then it might be better to find ways to encode the information in TargetOptions.h, function attributes or similar, so we have a streamlined way of setting them independently of programmatically modifying commandline options.

It's about proper encapsulation of state.

There are use cases where there are multiple components using LLVM in the same process for different purposes, and we should really be able to properly isolate those purposes. The use case that I care about is Mesa, where one of the users is an OpenGL driver (using LLVM as a shader compiler backend) which may be loaded into an application that uses LLVM for something else as well.

jfb added a comment.Oct 22 2018, 10:09 AM

Can you explain what you would use per-thread command line options for?
Intuitively I would not expect actual commandline users wanting to set options per thread. If you need it to tweak compiler behavior then it might be better to find ways to encode the information in TargetOptions.h, function attributes or similar, so we have a streamlined way of setting them independently of programmatically modifying commandline options.

It's about proper encapsulation of state.

There are use cases where there are multiple components using LLVM in the same process for different purposes, and we should really be able to properly isolate those purposes. The use case that I care about is Mesa, where one of the users is an OpenGL driver (using LLVM as a shader compiler backend) which may be loaded into an application that uses LLVM for something else as well.

The outcome you wants seems highly desirable. The method (thread local storage) doesn't seem desirable when LLVMContext could be used for this. It seems you've collected a list of cl::opt values which really shouldn't be per-process when using LLVM as a library. How about making that happen instead?

In D53424#1270961, @jfb wrote:

The outcome you wants seems highly desirable. The method (thread local storage) doesn't seem desirable when LLVMContext could be used for this. It seems you've collected a list of cl::opt values which really shouldn't be per-process when using LLVM as a library. How about making that happen instead?

LLVMContext is not always created when cl::opts are used. In other words, I intentionally made the use of this new feature (the thread local option context) wider than the scope of LLVMContext. Users can bind they LLVMContexts to the thread local option context as they needed. It can be done by creating a ContextValue along with LLVMContext and setting this ContextValues as the thread local option context for all threads that work with the LLVMContext instance.
I did not understand your point about collection a list of cl::opts. I do not collect any lists. Instead, every thread local option context collects only those options that have been redefined for this context.

pni added a subscriber: pni.Oct 23 2018, 1:24 AM

@yrouban While I agree with what you're trying to achieve, I'm not a huge fan of the implementation.

  • LLVM’s global options are great for easily changing things during development but they are terrible from a library user perspective. If the default values of these options are not good for a user it really suggests they need to be moved into a proper API.
  • Given that cl::opt’s are usually global it feels extremely weird that reading and writing to these could actually be a thread local operation.

On the other hand what you've done seems like it could be reasonable stop-gap to allow you to make progress. However I really don't want to see what's in this patch become permanent unless there is a strong commitment from the community to start cleaning up our global options mess so that this patch can eventually be removed.

This is a very good summary. I would suggest that we land this patch and mark this feature as a temporary solution asking the users to consider moving the options they redefine to the new API D5389.

jfb added a comment.Oct 23 2018, 8:26 PM

This is a very good summary. I would suggest that we land this patch and mark this feature as a temporary solution asking the users to consider moving the options they redefine to the new API D5389.

It would be good to make sure this part of the comment is also addressed:

However I really don't want to see what's in this patch become permanent unless there is a strong commitment from the community to start cleaning up our global options mess so that this patch can eventually be removed.

Was there a commitment from the community, with some time horizon for moving away from this patch?

In D53424#1273728, @jfb wrote:

Was there a commitment from the community, with some time horizon for moving away from this patch?

Frankly speaking, I do not think that this kind of transition can be done easily. Every single option needs a big change. That is the biggest disadvantage of the feature D5389. It needs some improvements to allow enough flexibility. And we cannot know what improvements are needed until we implement enough options in the new way. I do not see any good way to force the transition to finish in a fixed time frame. If D5389 were easy to use and transit to, it would have been already adopted and we would not have to came up with another idea like this D53424.
In other words, I would give the new idea a try (at least under a preprocessor flag).

In D53424#1273728, @jfb wrote:

Was there a commitment from the community, with some time horizon for moving away from this patch?

Frankly speaking, I do not think that this kind of transition can be done easily. Every single option needs a big change. That is the biggest disadvantage of the feature D5389. It needs some improvements to allow enough flexibility. And we cannot know what improvements are needed until we implement enough options in the new way. I do not see any good way to force the transition to finish in a fixed time frame. If D5389 were easy to use and transit to, it would have been already adopted and we would not have to came up with another idea like this D53424.
In other words, I would give the new idea a try (at least under a preprocessor flag).

While I agree that a transition away from global cl::opts is going to be tough I think we need to get agreement from the community on some sort of plan before we land this. Otherwise we'll end up with this patch becoming a permanent part of LLVM's internal API which makes our technical debt even worse.
My suggestion would be to create a new RFC on llvm-dev that proposes a plan to move away from cl::opts in LLVM's codebase (and probably other sub-projects). Given that you're looking at this issue you're probably in good position to discuss the technical problems with the available replacement APIs in LLVM's codebase.
You could sketch a simple patch for an existing pass that shows where the existing APIs work well and then discuss an example where the existing API is insufficient.

On a related note I think it's worth mentioning how I handled the global cl::opt problem in my own project. The strategy I used might be applicable to some parts of LLVM's codebase too. The strategy is as follows.

  • For the class/struct of interest, a corresponding options struct is created (e.g. https://github.com/delcypher/jfs/blob/master/include/jfs/CXXFuzzingBackend/ClangOptions.h). The users of these options use a pointer to these options and never use cl::opts directly (apart from occasional internal debugging stuff which is not meant to be part of the API). Here's the user in my project https://github.com/delcypher/jfs/blob/master/include/jfs/CXXFuzzingBackend/CXXFuzzingSolverOptions.h .
  • The library containing the class/struct of interest (in this case JFSCXXFuzzingBackend) doesn't use cl::opt at all. Instead there is a corresponding library (in this case JFSCXXFuzzingBackendCmdLine) that declares the cl::opt for command line use and contains a helper function (buildClangOptionsFromCmdLine(...) ) that constructs the option struct from the command line options (see https://github.com/delcypher/jfs/blob/master/lib/CXXFuzzingBackend/CmdLine/ClangOptionsBuilder.cpp ). This separation means that if clients of my libraries don't want to use the command line to initialise ClangOptions they just link against JFSCXXFuzzingBackend and initialise an instance of CXXFuzzingSolverOptions directly using the API (no global cl::opts are pulled into clients code). If a client of the libraries does want to initialise ClangOptions from the command line they link against JFSCXXFuzzingBackend` and JFSCXXFuzzingBackendCmdLine. The latter library adds the global cl::opts and the helper function to initialise the options struct from the cl::opts.

This approach does have some disadvantages:

  • It still uses global cl::opts. This could be fixed by encapsulating these cl::opts into a class that is ensured to live long enough to do command line option parsing.
  • The separation of libraries only works for static libraries. If you build a dynamic library everything has to be included which means the global cl::opts get pulled in.
  • The options struct and the global cl::optss duplicate option default values and storage. This duplication then means that the values read on the command line need to be copied over to the options struct which is wasteful and maintaining two copies of the default values is a pain.
unittests/Support/CommandLineTest.cpp
27

This unit test needs to work on Windows. I see no good reason to use pthread here. Can't you just use std::thread instead?
Also guarding chrono and thread headers on pthread being available seems wrong given that those headers will be available on platforms without pthreads.

May I ask why you think it's important to move away from cl::opt in the first place? What purpose does it actually solve?

I think having global cl::opt variables as descriptions of available options is a perfectly pragmatic thing to do. The storage of the chosen option values can be decoupled of the description of the options, which is something that this patch essentially does.

I mean, I get the idea that there are certain kinds of options which are associated to passes that you may run multiple times in a compilation flow, and you may want to be able to specify different option values for the different invocations of those passes; and yeah, those options are not served well by cl::opt. But many, perhaps most, options are not like this. Having them described as global variables is perfectly fine as far as I can tell.

Conversely, forcing such kinds of options into a new option struct has a clear disadvantage, which is that adding a new option (or possibly even modifying an existing one) will change a header file that will end up being included by essentially the whole world, so that adding a single option will force a recompile of large parts of the project, which is always a pain.

So from a pragmatic point of view, I think there is actually a strong reason for keeping cl::opts around as a means of describing available options, even while decoupling storage for option values. I'm sure there's a fancy name to describe this kind of pattern, if that would help convince you of it. It's a really bad idea to start churn all over the code base unless you can actually name a clear benefit for that churn. The two items you've brought up so far are:

  • LLVM’s global options are great for easily changing things during development but they are terrible from a library user perspective. If the default values of these options are not good for a user it really suggests they need to be moved into a proper API.

I agree that LLVMParseCommandLine is not a great interface, but there's really nothing stopping us from adding a function LLVMSetCommandLineOptionBool/Int/String(option_name, value). That's a perfectly fine API.

  • Given that cl::opt’s are usually global it feels extremely weird that reading and writing to these could actually be a thread local operation.

That's purely aesthetic opinion and not an argument at all.

  • Given that cl::opt’s are usually global it feels extremely weird that reading and writing to these could actually be a thread local operation.

That's purely aesthetic opinion and not an argument at all.

I should add that there are even ways to address these aestethics if people really get hung up over it. You could make it so querying a cl::opt value ends up looking something like cl::getContextValue(NameOfClOptVariable).

yrouban added inline comments.Oct 24 2018, 4:53 AM
include/llvm/Support/CommandLine.h
1184

ok

unittests/Support/CommandLineTest.cpp
27

ok

jfb added a comment.Oct 24 2018, 9:01 AM

May I ask why you think it's important to move away from cl::opt in the first place? What purpose does it actually solve?

By now it's pretty clear that you need an RFC to answer this question, not a phab thread.

In D53424#1274387, @jfb wrote:

May I ask why you think it's important to move away from cl::opt in the first place? What purpose does it actually solve?

By now it's pretty clear that you need an RFC to answer this question, not a phab thread.

The RFC can be found here http://lists.llvm.org/pipermail/llvm-dev/2018-October/127039.html

yrouban updated this revision to Diff 171430.Oct 28 2018, 8:13 AM

Addressed comments:

  • ~Destructible() moved to cpp file
  • changed pthread to std::thread in the test
  • added GetThreadOptionContext() and ThreadOptionContext is made protected

May I ask why you think it's important to move away from cl::opt in the first place? What purpose does it actually solve?

I'm not saying that we should move away from cl::opt everywhere. My opinion is that these really should only be used for developer only debugging. For other options that we wish to expose to the user, these really should be part of a proper API.
cl::opt are typically declared in source files with no corresponding declaration in a header file which means getting access to them to read/write to them is pain because you have to write your own declaration to get at them. Aside from this API annoyance cl::opts are usually global which means that

  • They are not safe to use in a multi-threaded environment if one of the threads wishes to write to the cl::opt.
  • A whole bunch of global constructors (one for every cl::opt definition) run at library load time. Clients of LLVM might not actually want this.
  • Given that cl::opt’s are usually global it feels extremely weird that reading and writing to these could actually be a thread local operation.

That's purely aesthetic opinion and not an argument at all.

It absolutely is an argument. If the patch changes the behaviour in a non-intuitive way then we need to consider this and the impact it will have on its users. Globals that don't act like globals is not I what I would call intuitive.
Pretending like this doesn't matter seems overly dismissive.

Anyway this discussion should move to the RFC.

Personally, I'd like to see something where developer-only debug options show up in -help-hidden or similar for the correct executable: opt vs llc. Right now I'm pretty sure they don't. I don't have any opinion on how to make that happen, though.

Oh, and +1 to anything that makes startup as a JIT compiler faster.