Page MenuHomePhabricator

Add support for symbolic large constant entries inside stackmaps
Needs RevisionPublic

Authored by undingen on Apr 21 2015, 3:42 PM.

Details

Summary

This patch implements support for embedding symbols in the array of large constants inside a stackmap/patchpoint.
We (=Pyston project) embed in the LLVM IR we generate a lot of pointers to external constants. I would like to cache the JITed functions and therefore I need to make the embedded constants relocatable -> change the IR to refer to module variables instead which I can replace when loading the cached object.
Currently when a patchpoint has a constant variable as live argument it will materialize it in a register. Which we don't want because the live variables are rarely used (also the regalloc runs out of registers when having to many const variables as argument to the pp?!?... I filled PR23306).
This patch emits a symbol reference inside the array of large constants in the stackmap. The stack map format is untouched by this patch.

Together with http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20150330/269160.html this should make it possible to cache more JITed functions containing patchpoints.

Diff Detail

Repository
rL LLVM

Event Timeline

undingen updated this revision to Diff 24174.Apr 21 2015, 3:42 PM
undingen retitled this revision from to Add support for symbolic large constant entries inside stackmaps.
undingen updated this object.
undingen edited the test plan for this revision. (Show Details)
undingen added reviewers: reames, lhames, ributzka.
undingen set the repository for this revision to rL LLVM.
undingen added a subscriber: Unknown Object (MLST).
reames edited edge metadata.Apr 26 2015, 12:36 PM

Comments inline, but you are still missing the most important part of the patch. Where is the documentation which describes the output format? I have inferred some of it from the code, but it will save you time if we can settle on the output before hashing through every bit of code. We may need to change the code based on the result of the format discussion.

One question to consider: As a consumer of the generated stackmap, how do I tell a given entry is a actual constant vs a symbol that needs resolved? Maybe I'm missing the obvious, but I didn't see this in the code. Or are you assuming that the consumer is only parsing the finalized/full relocated version of the binary section?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
6920

I don't understand this check. What case is this for? Should this possibly be an assert?

Or should you possibly be using the getValueType in place of the MVT::i64 as an argument to getTargetGlobalAddress?

lib/CodeGen/StackMaps.cpp
309

The need for this check makes me thing that you should introduce a new Location type.

329

I really think you want a Location::GlobalValue or something similar.

333

This seems unnecessarily complicated. Just emit the offset from the beginning of the global value section. (This comment is dependent on the one below w.r.t. reuse of constant section.)

420

Mixing global values with constants doesn't seem like the right approach here. Why not introduce new section specifically for global values?

(I'm open to being convinced this is the right answer, but you need to make the argument.)

Thanks for the comments and sorry for not having written any documentation because I thought the default behavior of "RTDyldMemoryManager::getSymbolAddress" which will resolve the symbols makes documenting the change unnecessary but I can see that this may be unexpected and needs documentation.
I reused the existing stackmap format in order to not break existing clients. That's why I choose to put them in the constant value table but this means it's impossible for clients to retrieve the number of large constants which don't need symbol resolving and which do. (They will just be at the end of the table but there is no way to know how many there are at the end)
If changing the stackmap format is not a big problem than maybe it's best to make a new section, which would make the resulting stackmap and the code to generate it inside llvm more obvious.

The MVT::i64 check is my attempt at making sure that the symbol addresses are exactly 64bit long like the entries in the stackmap constant table. But this may should get changed to allow smaller values.

Thanks again for the comments, I will wait with changing the patch until we all agreed on how to proceed.

Comments inline, but you are still missing the most important part of the patch. Where is the documentation which describes the output format? I have inferred some of it from the code, but it will save you time if we can settle on the output before hashing through every bit of code. We may need to change the code based on the result of the format discussion.

One question to consider: As a consumer of the generated stackmap, how do I tell a given entry is a actual constant vs a symbol that needs resolved? Maybe I'm missing the obvious, but I didn't see this in the code. Or are you assuming that the consumer is only parsing the finalized/full relocated version of the binary section?

I reused the existing stackmap format in order to not break existing clients. That's why I choose to put them in the constant value table but this means it's impossible for clients to retrieve the number of large constants which don't need symbol resolving and which do. (They will just be at the end of the table but there is no way to know how many there are at the end)

I would strongly prefer you extended the format. Reusing the existing sections for mixed purposes without updating the documentation would be a really bad idea.

If changing the stackmap format is not a big problem than maybe it's best to make a new section, which would make the resulting stackmap and the code to generate it inside llvm more obvious.

The section format is versioned. Adding a new section is not a big deal.

The MVT::i64 check is my attempt at making sure that the symbol addresses are exactly 64bit long like the entries in the stackmap constant table. But this may should get changed to allow smaller values.

This is an incompatible assumption with non-x86 bit architectures right? I'd prefer not to see this baked in.

Thanks again for the comments, I will wait with changing the patch until we all agreed on how to proceed.

I would suggest that you update *just the docs* for the next couple of iterations. Until we stablize on what the feature will look like, getting the code exactly right will be a waste of time.

I say this specifically because I don't feel like I understand exactly what you're trying to propose. Until that part is clear, I can't really offer anything in the way of useful review.

reames requested changes to this revision.Jun 16 2015, 3:23 PM
reames edited edge metadata.
This revision now requires changes to proceed.Jun 16 2015, 3:23 PM

I say this specifically because I don't feel like I understand exactly what you're trying to propose. Until that part is clear, I can't really offer anything in the way of useful review.

Sorry I overlooked your reply.
Maybe I can make clear what I have in mind by describing how I'm currently using this patch:

We (Pyston project) use patchpoints to implement inline caches and for deoptimization when using the LLVM tier.
For the deoptimization use case we add all variables to the patchpoint live args which we need too continue the execution in a lower generic tier (e.g. interpreter). A lot of our generated IR values were direct inttoptr casts because we often generate instances of our objects outside of LLVM. For example we may generate instances of a python objects when we setup the internal representation of a python function which we then share between the interpreter and LLVM tier. That's why we had a lot of inttoptr casts in our generated IR, there are also additional args like pointers to the AST nodes which we will need for deopt.

Deopimizations should happen only very rarely that means that we don't want to actually load all the constants we specified as live values inside the patchpoint into registers/stack slots. Currently LLVM will put all arguments which are constants inside the stackmap constant table in order to not have to generate code in front of the patchpoint to put all this constant values into register/stack slots. This is exactly how I would expect the behavior to be and how I need it.

But then I added a new feature: in order to speedup JITing time if we encounter the same function on the next application start I implemented an object cache for the LLVM generated code. This means I need to be able to relocate all this embedded pointers because the memory layout will not be the same. I choose to solve this by emitting special unique symbol name for all cases where I previously embedded the direct pointer value. This symbol names are deterministic, on the next startup when encountering the same function I can directly load it from the object cache and just have to return the real pointer values inside the RTDyldMemoryManager::getSymbolAddress() overloaded function.

The problem I encountered and this patch tries to solve is that LLVM will currently emit code which will load all this symbolic constants into registers before the patchpoint. With this patch we will stop emitting this machine instructions and instead emit constant table entries inside the stackmap.

Hope this helps understanding what I have done (even if my english isn't good), I successfully use this solution now since several weeks and it gave us a huge speedup.

frej added a subscriber: frej.Jul 13 2015, 12:36 AM