Index: docs/Statepoints.rst =================================================================== --- /dev/null +++ docs/Statepoints.rst @@ -0,0 +1,250 @@ +=================================== +Garbage Collection Safepoints in LLVM +=================================== + +.. contents:: + :local: + :depth: 2 + +Status/Warning +======= + +THIS DOCUMENT IS STILL IN EARLY DRAFT FORM. + +This document describes a set of experimental extensions to LLVM. Use with caution. Because the intrinsics have experimental status, compatibility across LLVM releases is not guaranteed. + +LLVM currently supports an alternate mechanism for conservative garbage collection support using the gc_root intrinsic. The mechanism described here shares little in common with the alternate implementation and it is hoped that this mechanism will eventually replace the gc_root mechanism. + +Overview +======== + +.. TODO: allocation vs object? allocation may contain several objects? + +To collect dead objects, garbage collectors must be able to identify any references to objects contained within executing code, and, depending on the collector, potentially update them. The collector does not need this information at all points in code - that would make the problem much harder - but only at well defined points in the execution known as 'safepoints' For a most collectors, it is sufficient to track at least one copy of each unique pointer value. However, for a collector which wishes to relocate objects directly reachable from running code, a higher standard is required. + +One additional challenge is that the compiler may compute intermediate results ("derived pointers") which point outside of the allocation or even into the middle of another allocation. The eventual use of this intermediate value must yield an address within the bounds of the allocation, but such "exterior derived pointers" may be visible to the collector. Given this, a garbage collector can not safely rely on the runtime address stored in a particular value to indicate the object it is associated with. If the garbage collector wishes to move any object, the compiler must provide a mapping for each pointer to an indication of its allocation. + +To simplify the interaction between a collector and the compiled code, most garbage collectors are organized in terms of two key abstractions, load or store barriers, and safepoints. + +A load barrier is a bit of code executed immediately after the machine load instruction, but before any use of the value loaded. Depending on the collector, such a barrier may be needed for all loads, merely loads of a particular type (in the original source language), or none at all. + +Analogously, a store barrier is a code fragement that runs immediately before the machine store instruction, but after the computation of the value stored. The most common use of a store barrier is to update a 'card table' i a generational garbage collector. + +A safepoint is a location at which pointers visible to the compiled code (i.e. currently in registers or on the stack) are allowed to change. After the safepoint completes, the actual pointer value may differ, but the 'object' (as seen by the source language) pointed to will not. + + Note that the term 'safepoint' is somewhat overloaded. It refers to both the location at which the machine state is parsable and the coordination protocol involved in bring application threads to a point at which the collector can safely uses that information. The term "statepoint" as used in this document refers exclusively to the former. + +To support relocation of objects directly reachable from values in compiled code, the collector must be able to: + +#. identify every copy of a pointer (including copies introduced by the compiler itself) at the statepoint, +#. identify which object each pointer relates to, and +#. potentially update each of those copies. + +This document describes the mechanism by which an LLVM based compiler can provide this information to a language runtime/collector and ensure that all pointers can be read and updated if desired. + +The heart of the approach is to construct (or rewrite) the IR in a manner where the possible updates performed by the garbage collector are explicitly visible in the IR. Doing so requires that we: + +#. create a new SSA value for each potentially relocated pointer, and ensure that no uses of the original (non relocated) value is reachable after the safepoint, +#. specify the relocation in a way which is opaque to the compiler to ensure that the optimizer can not introduce new uses of an unrelocated value after a statepoint. This prevents the optimizer from performing unsound optimizations. +#. recording a mapping of live pointers (and the allocation they're associated with) for each statepoint. + +..At the most abstract level, inserting a safepoint can be thought of as replacing a call instruction with a call to a multiple return value function which both calls the original target of the call, returns it's result, and returns updated values for any live pointers to garbage collected objects. + + Note that the task of identifying all live pointers to garbage collected values, transforming the IR to expose a pointer giving the base object for every such live pointer, and inserting all the intrinsics correctly is explicitly out of scope for this document. The recommended approach is described in the section of Late Safepoint Placement below. + + +..TODO: parsepoint? AT CALLS, defer to later discussion + + +An Example Safepoint Sequence +======== + +Let's consider a simple call in LLVM IR: + todo + +Depending on our language we may need to allow a safepoint during the execution of the function called from this site. If so, we need to let the collector update local values in the current frame. + +Let's say we need to relocate SSA values 'a', 'b', and 'c' at this safepoint. To represent this, we would generate the statepoint sequence:: + put an example sequence here + +Ideally, this sequence would have been represented as a M argument, N return value function (where M is the number of values being relocated + the original call arguments and N is the original return value + each relocated value), but LLVM does not easily support such a representation. + +Instead, the statepoint intrinsic marks the actual site of the safepoint or statepoint. The statepoint returns a token value (which exists only at compile time). To get back the original return value of the call, we use the 'gc_result' intrinsic. To get the relocation of each pointer in turn, we use the 'gc_relocate' intrinsic with the appropriate index. Note that both the gc_relocate and gc_result are tied to the statepoint. The combination forms a "statepoint sequence" and represents the entitety of a parseable call or 'statepoint'. + +When lowered, this example would generate the following x86 assembly:: + put assembly here + +Each of the potentially relocated values has been spilled to the stack, and a record of that location has been recorded to the StackMap section. If the garbage collector needs to update any of these pointers during the call, it knows exactly what to change. + +Intrinsics +=========== + +'''gc_statepoint''' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare i32 + @gc_statepoint(func_type , i64 <#call args>. + i64 , ... (call parameters), + i64 <# deopt args>, ... (deopt parameters), + ... (gc parameters)) + +Overview: +""""""""" + +The statepoint intrinsic represents a call which is parse-able by the runtime. + +Operands: +""""""""" + +The 'target' operand is the function actually being called. The target can be specified as either a symbolic LLVM funciton, or as an arbitrary Value of appropriate function type. Note that the function type must match the signature of the callee and the types of the 'call parameters' arguments. + +The '#call args' operand is the number of arguments to the actual call. It must exactly match the number of arguments passed in the 'call parameters' variable length section. + +The 'unused' operand is unused and likely to be removed. Please do not use. + +The 'call parameters' arguments are simply the arguments which need to be passed to the call target. They will be lowered according to the specified calling convention and otherwise handled like a normal call instruction. The number of arguments must exactly match what is specified in '# call args'. The types must match the signature of 'target'. + +The 'deopt parameters' arguments contain an arbitrary list of Values which is meaningful to the runtime. The runtime may read any of these values, but is assumed not to modify them. If the garbage collector might need to modify one of these values, it must also be listed in the 'gc pointer' argument list. The '# deopt args' field indicates how many operands are to be interpreted as 'deopt parameters'. + +The 'gc parameters' arguments contain every pointer to a garbage collector object which potentially needs to be updated by the garbage collector. Note that the argument list must explicitly contain a base pointer for every derived pointer listed. The order of arguments is unimportant. Unlike the other variable length parameter sets, this list is not length prefixed. + +Semantics: +"""""""""" + +A statepoint is assumed to read and write all memory. As a result, memory operations can not be reordered past a statepoint. It is illegal to mark a statepoint as being either 'readonly' or 'readnone'. + +Note that legal IR can not perform any memory operation on a 'gc pointer' argument of the statepoint in a location statically reachable from the statepoint. Instead, the explicitly relocated value (from a ''gc_relocate'') must be used. + +'''gc_result''' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare type* + @gc_result_ptr(i32 %statepoint_token) + + declare fX + @gc_result_float(i32 %statepoint_token) + + declare iX + @gc_result_int(i32 %statepoint_token) + +Overview: +""""""""" + +'''gc_result''' extracts the result of the original call instruction which was replaced by the '''gc_statepoint'''. The '''gc_result''' intrinsic is actually a family of three intrinsics due to an implementation limitation. Other than the type of the return value, the semantics are the same. + +Operands: +""""""""" + +The first and only argument is the '''gc.statepoint''' which starts the safepoint sequence of which this '''gc_result'' is a part. Despite the typing of this as a generic i32, *only* the value defined by a '''gc.statepoint''' is legal here. + +Semantics: +"""""""""" + +The ''gc_result'' represents the return value of the call target of the ''statepoint''. The type of the ''gc_result'' must exactly match the type of the target. If the call target returns void, there will be no ''gc_result''. + +A ''gc_result'' is modeled as a 'readnone' pure function. It has no side effects since it is just a projection of the return value of the previous call represented by the ''gc_statepoint''. + +'''gc_relocate''' Intrinsic +^^^^^^^^^^^^^^^^^^^^^^ + +Syntax: +""""""" + +:: + + declare addrspace(1)* + @gc_relocate(i32 %token, i32 %base_offset, i32 %pointer_offset) + +Overview: +""""""""" + +A ''gc_relocate'' returns the potentially relocated value of a pointer at the safepoint. + +Operands: +""""""""" + +The first argument is the '''gc.statepoint''' which starts the safepoint sequence of which this '''gc_relocation'' is a part. Despite the typing of this as a generic i32, *only* the value defined by a '''gc.statepoint''' is legal here. + +The second argument is an index into the statepoints list of arguments which specifies the base pointer for the pointer being relocated. This index must land within the 'gc parameter' section of the statepoint's argument list. + +The third argument is an index into the statepoint's list of arguments which specify the (potentially) derived pointer being relocated. It is legal for this index to be the same as the second argument if-and-only-if a base pointer is being relocated. This index must land within the 'gc parameter' section of the statepoint's argument list. + +Semantics: +"""""""""" +The return value of ''gc_relocate'' is the potentially relocated value of the pointer specified by it's arguments. It is unspecified how the value of the returned pointer relates to the argument to the ''gc_statepoint'' other than that a) the point to the same object with the same offset, and b) the 'based-on' relationship of the newly relocated pointers is a projection of the unrelocated pointers. In particular, the integer value of the pointer returned is unspecified. + +A ''gc_relocate'' is modeled as a 'readnone' pure function. It has no side effects since it is just a way to extract information about work done during the actual call modeled by the ''gc_statepoint''. + + +StackMap Format +================ + +Locations for each pointer value which may need read and/or updated by the runtime or collector are provided via the StackMap format specified in the PatchPoint documentation (LINK). + +Each statepoint generates the following Locations: + +* Constant which describes number of following deopt *Locations* (not operands) +* Variable number of Locations, one for each deopt parameter listed in the IR statepoint (same number as described by previous Constant) +* Variable number of Locations pairs, one pair for each unique pointer which needs relocated. The first Location in each pair describes the base pointer for the object. The second is the derived pointer actually being relocated. It is guaranteed that the base pointer must also appear explicitly as a relocation pair if used after the statepoint. There may be fewer pairs then gc parameters in the IR statepoint. Each *unique* pair will occur at least once; duplicates are possible. + +Note that the Locations used in each section may describe the same physical location. e.g. A stack slot may appear as a deopt location, a gc base pointer, and a gc derived pointer. + +The ID field of the 'StkMapRecord' for a statepoint is meaningless and it's value is explicitly unspecified. + +The LiveOut section of the StkMapRecord will be empty for a statepoint record. + +Safepoint Semantics +==================== + +The fundamental correctness property for the compiled code's correctness w.r.t. the garbage collector is a dynamic one. It must be the case that there is no dynamic trace such that a operation involving a potentially relocated pointer is observably-after a safepoint which could relocate it. 'observably-after' is this usage means that an outside observer could observe this sequence of events in a way which precludes the operation being before performed the safepoint. +To understand why this 'observable-after' property is required, consider a null comparison performed on the original copy of a relocated pointer. Assuming that control flow follows the safepoint, there is no way to observe externally whether the null comparison is performed before or after the safepoint. (Remember, the original Value is unmodified by the safepoint.) The compiler is free to make either scheduling choice. + +The actual correctness property implemented is slightly stronger than this. We require that there be no *static path* on which a potentially relocated pointer is 'observably-after' it may have been relocated. This is slightly stronger than is strictly necessary (and thus may disallow some otherwise valid programs), but greatly simplifies reasoning about correctness of the compiled code. + +By construction, this property will be upheld by the optimizer if correctly established in the source IR. This is a key invariant of the design. + +.. Limitations on Optimization +.. ============================ + +.. The semantics of the statepoint intrinsic sequence impose certain necessary restrictions on the optimizations performed by LLVM. These restrictions are required to uphold the correctness of the code w.r.t. concurrent relocation by the garbage collector. + +.. TODO: re-mention read/write semantics on call, motivation for such, recommend LSP + +Safepoint Verification +====================== + +The existing IR Verifier pass has been extended to check most of the local restrictions on the intrinsics mentioned above. The current implementation in LLVM does not check the key relocation invariant described above. + +There is ongoing work on constructing an IR verifier which is able to statically check this property. The current implementation (available at this external LINK) is sound, but may reject some well formed programs. The key challenge remaining is identifying case where the a naive analysis would identify a use of an unrelocated value as being reachable from a safepoint, but where the order of the two is not observable. + +The current verifier is useful for checking the correctness of the IR generated by hand or most reasonable frontends. However, LLVM IR optimization passes do transform code into ways which cause the verifier to spuriously report errors due insufficient reasoning about observability. In particular, optimizations performed by CodeGenPrep may generate IR which spuriously fails this verifier for some examples. + + +Polling for a Safepoint +============ + +There are generally two approaches to generating safepoint polls: +- First, you can explicitly emit control flow that checks some property of the runtime environment and branches to an explicit call into the runtime if desired. This mechanism is easy to implement using the statepoint functionality. The slowpath call to the runtime is simply another parseable call site (statepoint sequence). +- Second, you can emit a memory access which is known to fault if-and-only-if the safepoint should be taken. This approach is not implemented today. The existing statepoint mechanism would have to be extended to represent a "parseable load" in addition to today's "parsable call". One possible approach would be to add a special "parseable load" intrinsic, call it via a statepoint, and then adjust the lowering in SelectionDAG. + +Call Safepoint Insertion +======================== + +This document has focused on describing how to correctly represent safepoints in LLVM IR. The task of constructing safepoints from statepoints, gc_relocates, and gc_results has been left as an exercise for the reader. + +There is ongoing work on creating an IR transformation pass which can insert safepoints into plain IR. There's a fundamental performance advantage to inserting safepoints late (i.e. after most optimization passes have run) but this mechanism can also be used to insert safepoints into unoptimized IR. For the moment, information on this SafepointPlacementPass can be found here (LINK: https://github.com/AzulSystems/llvm-late-safepoint-placement) (WARNING: External Project). Eventually, it is hoped that this functionality will be available in trunk. + +Note: It is *strongly recommended* that frontend authors make use of the static verification pass mentioned above to ensure that their IR upholds the required properties. + + +.. TODO: we could probably use a section here on the implementation and todo items. Maybe that should be a separate page? Index: include/llvm/CodeGen/FunctionLoweringInfo.h =================================================================== --- include/llvm/CodeGen/FunctionLoweringInfo.h +++ include/llvm/CodeGen/FunctionLoweringInfo.h @@ -88,6 +88,12 @@ /// RegFixups - Registers which need to be replaced after isel is done. DenseMap RegFixups; + /// StatepointStackSlots - A list of temporary stack slots (frame indices) + /// used to spill values at a statepoint. We store them here to enable + /// reuse of the same stack slots across different statepoints in different + /// basic blocks. + SmallVector StatepointStackSlots; + /// MBB - The current block. MachineBasicBlock *MBB; Index: include/llvm/CodeGen/MachineInstr.h =================================================================== --- include/llvm/CodeGen/MachineInstr.h +++ include/llvm/CodeGen/MachineInstr.h @@ -88,7 +88,7 @@ // anything other than to convey comment // information to AsmPrinter. - uint8_t NumMemRefs; // Information on memory references. + uint8_t NumMemRefs; // Information on memory references. mmo_iterator MemRefs; DebugLoc debugLoc; // Source line information. Index: include/llvm/CodeGen/StackMaps.h =================================================================== --- include/llvm/CodeGen/StackMaps.h +++ include/llvm/CodeGen/StackMaps.h @@ -81,6 +81,45 @@ unsigned getNextScratchIdx(unsigned StartIdx = 0) const; }; +// Statepoint operands: +// , , [call arguments], +// , , +// , , [deopt args], +// [gc values] +class StatepointOpers { +private: + enum { + NCallArgsPos = 0, + CallTargetPos = 1 + }; + +public: + explicit StatepointOpers(const MachineInstr *MI): + MI(MI) { } + + // Get starting index of non call related arguments + // (statepoint flags, vm state and gc state) + unsigned getVarIdx() const { + return MI->getOperand(NCallArgsPos).getImm() + 2; + } + + unsigned getNumVMSArgsIdx() const { + // for flags, , for numVMSArgs + return getVarIdx() + 3; + } + + unsigned getNumVMSArgs() const { + return MI->getOperand(getNumVMSArgsIdx()).getImm(); + } + + const MachineOperand &getCallTarget() const { + return MI->getOperand(CallTargetPos); + } + +private: + const MachineInstr *MI; +}; + class StackMaps { public: struct Location { @@ -132,6 +171,9 @@ /// \brief Generate a stackmap record for a patchpoint instruction. void recordPatchPoint(const MachineInstr &MI); + /// \brief Generate a stackmap record for a statepoint instruction. + void recordStatepoint(const MachineInstr &MI); + /// If there is any stack map data, create a stack map section and serialize /// the map info into it. This clears the stack map data structures /// afterwards. @@ -139,7 +181,6 @@ private: static const char *WSMP; - typedef SmallVector LocationVec; typedef SmallVector LiveOutVec; typedef MapVector ConstantPool; Index: include/llvm/IR/Intrinsics.td =================================================================== --- include/llvm/IR/Intrinsics.td +++ include/llvm/IR/Intrinsics.td @@ -489,6 +489,22 @@ llvm_ptr_ty, llvm_i32_ty, llvm_vararg_ty]>; + +//===------------------------ Garbage Collection Intrinsics ---------------===// +// These are documented in docs/Statepoint.rst + +def int_experimental_gc_statepoint : Intrinsic<[llvm_i32_ty], + [llvm_anyptr_ty, llvm_i32_ty, + llvm_i32_ty, + llvm_vararg_ty]>; + +def int_experimental_gc_result_int : Intrinsic<[llvm_anyint_ty], [llvm_i32_ty]>; +def int_experimental_gc_result_float : Intrinsic<[llvm_anyfloat_ty], [llvm_i32_ty]>; +def int_experimental_gc_result_ptr : Intrinsic<[llvm_anyptr_ty], [llvm_i32_ty]>; + +def int_experimental_gc_relocate : Intrinsic<[llvm_anyptr_ty], + [llvm_i32_ty, llvm_i32_ty, llvm_i32_ty]>; + //===-------------------------- Other Intrinsics --------------------------===// // def int_flt_rounds : Intrinsic<[llvm_i32_ty]>, Index: include/llvm/IR/Statepoint.h =================================================================== --- /dev/null +++ include/llvm/IR/Statepoint.h @@ -0,0 +1,172 @@ + +#ifndef __LLVM_IR_STATEPOINT_H__ +#define __LLVM_IR_STATEPOINT_H__ + +#include "llvm/ADT/iterator_range.h" +#include "llvm/IR/Instructions.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/CallSite.h" +#include "llvm/Support/Compiler.h" + +namespace llvm { + +bool isStatepoint(const ImmutableCallSite &CS); +bool isStatepoint(const Instruction *inst); +bool isStatepoint(const Instruction &inst); + +bool isGCRelocate(const Instruction *inst); +bool isGCRelocate(const ImmutableCallSite &CS); + +bool isGCResult(const Instruction *inst); +bool isGCResult(const ImmutableCallSite &CS); + +template +class StatepointBase { + CallSiteTy callSite; + void *operator new(size_t, unsigned) LLVM_DELETED_FUNCTION; + void *operator new(size_t s) LLVM_DELETED_FUNCTION; + +protected: + explicit StatepointBase(InstructionTy *I) : callSite(I) { + assert(isStatepoint(I)); + } + explicit StatepointBase(CallSiteTy CS) : callSite(CS) { + assert(isStatepoint(CS)); + } + + public: + typedef typename CallSiteTy::arg_iterator arg_iterator; + + CallSiteTy getCallSite() { + return callSite; + } + + ValueTy *actualCallee() { + return callSite.getArgument(0); + } + int numCallArgs() { + return cast(callSite.getArgument(1))->getZExtValue(); + } + int numTotalVMSArgs() { + return cast(callSite.getArgument(3 + numCallArgs()))->getZExtValue(); + } + + typename CallSiteTy::arg_iterator call_args_begin() { + // 3 = callTarget, #callArgs, flag + int offset = 3; + assert(offset <= (int)callSite.arg_size()); + return callSite.arg_begin() + offset; + } + typename CallSiteTy::arg_iterator call_args_end() { + int offset = 3 + numCallArgs(); + assert(offset <= (int)callSite.arg_size()); + return callSite.arg_begin() + offset; + } + + /// range adapter for call arguments + iterator_range call_args() { + return iterator_range(call_args_begin(), call_args_end()); + } + + typename CallSiteTy::arg_iterator vm_state_begin() { + return call_args_end(); + } + typename CallSiteTy::arg_iterator vm_state_end() { + int offset = 3 + numCallArgs() + 1 + numTotalVMSArgs(); + assert(offset <= (int)callSite.arg_size()); + return callSite.arg_begin() + offset; + } + + /// range adapter for vm state arguments + iterator_range vm_state_args() { + return iterator_range(vm_state_begin(), vm_state_end()); + } + + typename CallSiteTy::arg_iterator first_vm_state_stack_begin() { + // 6 = numTotalVMSArgs, 1st_objectID, 1st_bci, + // 1st_#stack, 1st_#local, 1st_#monitor + return vm_state_begin() + 6; + } + + typename CallSiteTy::arg_iterator gc_args_begin() { + return vm_state_end(); + } + typename CallSiteTy::arg_iterator gc_args_end() { + return callSite.arg_end(); + } + + /// range adapter for gc arguments + iterator_range gc_args() { + return iterator_range(gc_args_begin(), gc_args_end()); + } + + +#ifndef NDEBUG + void verify() { + // The internal asserts in the iterator accessors do the rest. + (void)call_args_begin(); + (void)call_args_end(); + (void)vm_state_begin(); + (void)vm_state_end(); + (void)gc_args_begin(); + (void)gc_args_end(); + } +#endif +}; + +class ImmutableStatepoint + : public StatepointBase { + typedef StatepointBase + Base; + +public: + explicit ImmutableStatepoint(const Instruction *I) : Base(I) { + } + explicit ImmutableStatepoint(ImmutableCallSite CS) : Base(CS) { + } +}; + +class Statepoint : public StatepointBase { + typedef StatepointBase Base; + +public: + explicit Statepoint(Instruction *I) : Base(I) { + } + explicit Statepoint(CallSite CS) : Base(CS) { + } +}; + +class GCRelocateOperands { + ImmutableCallSite _relocate; + + public: + GCRelocateOperands(const User* U) + : GCRelocateOperands(cast(U)) {} + GCRelocateOperands(const Instruction *inst) : _relocate(inst) { + assert(isGCRelocate(inst)); + } + GCRelocateOperands(CallSite CS) : _relocate(CS) { + assert(isGCRelocate(CS)); + } + + const Instruction *statepoint() { + return cast(_relocate.getArgument(0)); + } + int basePtrIndex() { + return cast(_relocate.getArgument(1))->getZExtValue(); + } + int derivedPtrIndex() { + return cast(_relocate.getArgument(2))->getZExtValue(); + } + const Value *basePtr() { + ImmutableCallSite CS(statepoint()); + return *(CS.arg_begin() + basePtrIndex()); + } + const Value *derivedPtr() { + ImmutableCallSite CS(statepoint()); + return *(CS.arg_begin() + derivedPtrIndex()); + } +}; +} +#endif Index: include/llvm/Target/Target.td =================================================================== --- include/llvm/Target/Target.td +++ include/llvm/Target/Target.td @@ -850,6 +850,15 @@ let mayLoad = 1; let usesCustomInserter = 1; } +def STATEPOINT : Instruction { + let OutOperandList = (outs); + let InOperandList = (ins variable_ops); + let usesCustomInserter = 1; + let mayLoad = 1; + let mayStore = 1; + let hasSideEffects = 1; + let isCall = 1; +} def LOAD_STACK_GUARD : Instruction { let OutOperandList = (outs ptr_rc:$dst); let InOperandList = (ins); Index: include/llvm/Target/TargetFrameLowering.h =================================================================== --- include/llvm/Target/TargetFrameLowering.h +++ include/llvm/Target/TargetFrameLowering.h @@ -199,6 +199,14 @@ virtual int getFrameIndexReference(const MachineFunction &MF, int FI, unsigned &FrameReg) const; + /// Same as above, except that the 'base register' will always be RSP, not + /// RBP on x86. This should really be a parameterizable choice. + virtual int getFrameIndexReferenceForGC(const MachineFunction &MF, int FI, + unsigned &FrameReg) const { + // default to calling normal version, we override this on x86 only + return getFrameIndexReference(MF, FI, FrameReg); + } + /// processFunctionBeforeCalleeSavedScan - This method is called immediately /// before PrologEpilogInserter scans the physical registers used to determine /// what callee saved registers should be spilled. This method is optional. Index: include/llvm/Target/TargetOpcodes.h =================================================================== --- include/llvm/Target/TargetOpcodes.h +++ include/llvm/Target/TargetOpcodes.h @@ -110,7 +110,13 @@ /// to prevent the stack guard value or address from being spilled to the /// stack should override TargetLowering::emitLoadStackGuardNode and /// additionally expand this pseudo after register allocation. - LOAD_STACK_GUARD = 19 + LOAD_STACK_GUARD = 19, + + /// Call instruction with associated vm state for deoptimization and list + /// of live pointers for relocation by the garbage collector. It is + /// intended to support garbage collection with fully precise relocating + /// collectors and deoptimizations in either the callee or caller. + STATEPOINT = 20 }; } // end namespace TargetOpcode } // end namespace llvm Index: lib/Analysis/TargetTransformInfo.cpp =================================================================== --- lib/Analysis/TargetTransformInfo.cpp +++ lib/Analysis/TargetTransformInfo.cpp @@ -403,6 +403,10 @@ case Intrinsic::objectsize: case Intrinsic::ptr_annotation: case Intrinsic::var_annotation: + case Intrinsic::experimental_gc_result_int: + case Intrinsic::experimental_gc_result_float: + case Intrinsic::experimental_gc_result_ptr: + case Intrinsic::experimental_gc_relocate: // These intrinsics don't actually represent code after lowering. return TCC_Free; } Index: lib/CodeGen/InlineSpiller.cpp =================================================================== --- lib/CodeGen/InlineSpiller.cpp +++ lib/CodeGen/InlineSpiller.cpp @@ -1084,7 +1084,8 @@ bool WasCopy = MI->isCopy(); unsigned ImpReg = 0; - bool SpillSubRegs = (MI->getOpcode() == TargetOpcode::PATCHPOINT || + bool SpillSubRegs = (MI->getOpcode() == TargetOpcode::STATEPOINT || + MI->getOpcode() == TargetOpcode::PATCHPOINT || MI->getOpcode() == TargetOpcode::STACKMAP); // TargetInstrInfo::foldMemoryOperand only expects explicit, non-tied Index: lib/CodeGen/LocalStackSlotAllocation.cpp =================================================================== --- lib/CodeGen/LocalStackSlotAllocation.cpp +++ lib/CodeGen/LocalStackSlotAllocation.cpp @@ -291,6 +291,7 @@ // Debug value, stackmap and patchpoint instructions can't be out of // range, so they don't need any updates. if (MI->isDebugValue() || + MI->getOpcode() == TargetOpcode::STATEPOINT || MI->getOpcode() == TargetOpcode::STACKMAP || MI->getOpcode() == TargetOpcode::PATCHPOINT) continue; Index: lib/CodeGen/PrologEpilogInserter.cpp =================================================================== --- lib/CodeGen/PrologEpilogInserter.cpp +++ lib/CodeGen/PrologEpilogInserter.cpp @@ -798,6 +798,25 @@ continue; } + // Note: The code for this used to be shared with PATCHPOINT. In order + // to simplify merges, I separated it for the moment. + // S: Frame indicies in debug values are encoded in a target independent + // S: way with simply the frame index and offset rather than any + // S: target-specific addressing mode. + if (MI->getOpcode() == TargetOpcode::STATEPOINT) { + assert((!MI->isDebugValue() || i == 0) && + "Frame indicies can only appear as the first operand of a " + "DBG_VALUE machine instruction"); + unsigned Reg; + MachineOperand &Offset = MI->getOperand(i + 1); + const unsigned refOffset = + TFI->getFrameIndexReferenceForGC(Fn, MI->getOperand(i).getIndex(), Reg); + + Offset.setImm(Offset.getImm() + refOffset); + MI->getOperand(i).ChangeToRegister(Reg, false /*isDef*/); + continue; + } + // Some instructions (e.g. inline asm instructions) can have // multiple frame indices and/or cause eliminateFrameIndex // to insert more than one instruction. We need the register Index: lib/CodeGen/SelectionDAG/CMakeLists.txt =================================================================== --- lib/CodeGen/SelectionDAG/CMakeLists.txt +++ lib/CodeGen/SelectionDAG/CMakeLists.txt @@ -19,6 +19,7 @@ SelectionDAGDumper.cpp SelectionDAGISel.cpp SelectionDAGPrinter.cpp + StatepointLowering.cpp ScheduleDAGVLIW.cpp TargetLowering.cpp TargetSelectionDAGInfo.cpp Index: lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp =================================================================== --- lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp +++ lib/CodeGen/SelectionDAG/FunctionLoweringInfo.cpp @@ -275,6 +275,7 @@ ArgDbgValues.clear(); ByValArgFrameIndexMap.clear(); RegFixups.clear(); + StatepointStackSlots.clear(); PreferredExtendType.clear(); } Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h @@ -21,6 +21,8 @@ #include "llvm/IR/CallSite.h" #include "llvm/IR/Constants.h" #include "llvm/Support/ErrorHandling.h" +#include "llvm/Target/TargetLowering.h" +#include "StatepointLowering.h" #include namespace llvm { @@ -114,6 +116,10 @@ /// get simple disambiguation between loads without worrying about alias /// analysis. SmallVector PendingLoads; + + /// State used while lowering a statepoint sequence (gc_statepoint, + /// gc_relocate, and gc_result). See StatepointLowering.hpp/cpp for details. + StatepointLoweringState StatepointLowering; private: /// PendingExports - CopyToReg nodes that copy values to virtual registers @@ -611,6 +617,13 @@ N = NewN; } + void removeValue(const Value *V) { + // This is to support hack in lowerCallFromStatepoint + // Should be removed when hack is resolved + if (NodeMap.count(V)) + NodeMap.erase(V); + } + void setUnusedArgValue(const Value *V, SDValue NewN) { SDValue &N = UnusedArgNodeMap[V]; assert(!N.getNode() && "Already set a value for this node!"); @@ -776,6 +789,11 @@ void visitStackmap(const CallInst &I); void visitPatchpoint(const CallInst &I); + // These three are implemented in StatepointLowering.cpp + void visitStatepoint(const CallInst &I); + void visitGCRelocate(const CallInst &I); + void visitGCResult(const CallInst &I); + void visitUserOp1(const Instruction &I) { llvm_unreachable("UserOp1 should not exist at instruction selection time!"); } Index: lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp =================================================================== --- lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp +++ lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp @@ -16,6 +16,7 @@ #include "llvm/ADT/BitVector.h" #include "llvm/ADT/Optional.h" #include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/Statistic.h" #include "llvm/Analysis/AliasAnalysis.h" #include "llvm/Analysis/BranchProbabilityInfo.h" #include "llvm/Analysis/ConstantFolding.h" @@ -46,6 +47,7 @@ #include "llvm/IR/Intrinsics.h" #include "llvm/IR/LLVMContext.h" #include "llvm/IR/Module.h" +#include "llvm/IR/Statepoint.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" #include "llvm/Support/ErrorHandling.h" @@ -884,6 +886,7 @@ CurInst = nullptr; HasTailCall = false; SDNodeOrder = LowestSDNodeOrder; + StatepointLowering.clear(); } /// clearDanglingDebugInfo - Clear the dangling debug information @@ -5483,6 +5486,20 @@ visitPatchpoint(I); return nullptr; } + case Intrinsic::experimental_gc_statepoint: { + visitStatepoint(I); + return nullptr; + } + case Intrinsic::experimental_gc_result_int: + case Intrinsic::experimental_gc_result_float: + case Intrinsic::experimental_gc_result_ptr: { + visitGCResult(I); + return nullptr; + } + case Intrinsic::experimental_gc_relocate: { + visitGCRelocate(I); + return nullptr; + } } } Index: lib/CodeGen/SelectionDAG/StatepointLowering.h =================================================================== --- /dev/null +++ lib/CodeGen/SelectionDAG/StatepointLowering.h @@ -0,0 +1,138 @@ +//===-- StatepointLowering.h - SDAGBuilder's statepoint code -*- C++ -*---===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This file includes support code use by SelectionDAGBuilder when lowering a +// statepoint sequence in SelectionDAG IR. +// +//===----------------------------------------------------------------------===// + +#ifndef LLVM_LIB_CODEGEN_SELECTIONDAG_STATEPOINTLOWERING_H +#define LLVM_LIB_CODEGEN_SELECTIONDAG_STATEPOINTLOWERING_H + +#include "llvm/ADT/DenseMap.h" +#include "llvm/CodeGen/SelectionDAG.h" +#include "llvm/CodeGen/SelectionDAGNodes.h" +#include + +namespace llvm { +class SelectionDAGBuilder; + +/// This class tracks both per-statepoint and per-selectiondag information. +/// For each statepoint it tracks locations of it's gc valuess (incoming and +/// relocated) and list of gcreloc calls scheduled for visiting (this is +/// used for a debug mode consistency check only). The spill slot tracking +/// works in concert with information in FunctionLoweringInfo. +class StatepointLoweringState { +public: + StatepointLoweringState() : NextSlotToAllocate(0) { + } + + /// Reset all state tracking for a newly encountered safepoint. Also + /// performs some consistency checking. + void startNewStatepoint(SelectionDAGBuilder &Builder); + + /// Clear the memory usage of this object. This is called from + /// SelectionDAGBuilder::clear. We require this is never called in the + /// midst of processing a statepoint sequence. + void clear(); + + /// Returns the spill location of a value incoming to the current + /// statepoint. Will return SDValue() if this value hasn't been + /// spilled. Otherwise, the value has already been spilled and no + /// further action is required by the caller. + SDValue getLocation(SDValue val) { + if (!Locations.count(val)) + return SDValue(); + return Locations[val]; + } + void setLocation(SDValue val, SDValue Location) { + assert(!Locations.count(val) && + "Trying to allocate already allocated location"); + Locations[val] = Location; + } + + /// Returns the relocated value for a given input pointer. Will + /// return SDValue() if this value hasn't yet been reloaded from + /// it's stack slot after the statepoint. Otherwise, the value + /// has already been reloaded and the SDValue of that reload will + /// be returned. Note that VMState values are spilled but not + /// reloaded (since they don't change at the safepoint unless + /// also listed in the GC pointer section) and will thus never + /// be in this map + SDValue getRelocLocation(SDValue val) { + if (!RelocLocations.count(val)) + return SDValue(); + return RelocLocations[val]; + } + void setRelocLocation(SDValue val, SDValue Location) { + assert(!RelocLocations.count(val) && + "Trying to allocate already allocated location"); + RelocLocations[val] = Location; + } + + /// Record the fact that we expect to encounter a given gc_relocate + /// before the next statepoint. If we don't see it, we'll report + /// an assertion. + void scheduleRelocCall(const CallInst &RelocCall) { + PendingGCRelocateCalls.push_back(&RelocCall); + } + /// Remove this gc_relocate from the list we're expecting to see + /// before the next statepoint. If we weren't expecting to see + /// it, we'll report an assertion. + void relocCallVisited(const CallInst &RelocCall) { + SmallVectorImpl::iterator itr = + std::find(PendingGCRelocateCalls.begin(), PendingGCRelocateCalls.end(), + &RelocCall); + assert(itr != PendingGCRelocateCalls.end() && + "Visited unexpected gcrelocate call"); + PendingGCRelocateCalls.erase(itr); + } + + // TODO: Should add consistency tracking to ensure we encounter + // expected gc_result calls too. + + /// Get a stack slot we can use to store an value of type ValueType. This + /// will hopefully be a recylced slot from another statepoint. + SDValue allocateStackSlot(EVT ValueType, SelectionDAGBuilder &Builder); + + void reserveStackSlot(int Offset) { + assert(Offset >= 0 && Offset < (int)AllocatedStackSlots.size() && + "out of bounds"); + assert(!AllocatedStackSlots[Offset] && "already reserved!"); + assert(NextSlotToAllocate <= (unsigned)Offset && "consistency!"); + AllocatedStackSlots[Offset] = true; + } + bool isStackSlotAllocated(int Offset) { + assert(Offset >= 0 && Offset < (int)AllocatedStackSlots.size() && + "out of bounds"); + return AllocatedStackSlots[Offset]; + } + +private: + /// Maps pre-relocation value (gc pointer directly incoming into statepoint) + /// into it's location (currently only stack slots) + DenseMap Locations; + /// Map pre-relocated value into it's new relocated location + DenseMap RelocLocations; + + /// A boolean indicator for each slot listed in the FunctionInfo as to + /// whether it has been used in the current statepoint. Since we try to + /// preserve stack slots across safepoints, there can be gaps in which + /// slots have been allocated. + SmallVector AllocatedStackSlots; + + /// Points just beyond the last slot known to have been allocated + unsigned NextSlotToAllocate; + + /// Keep track of pending gcrelocate calls for consistency check + SmallVector PendingGCRelocateCalls; +}; +} // end namespace llvm + +#endif // LLVM_LIB_CODEGEN_SELECTIONDAG_STATEPOINTLOWERING_H Index: lib/CodeGen/SelectionDAG/StatepointLowering.cpp =================================================================== --- /dev/null +++ lib/CodeGen/SelectionDAG/StatepointLowering.cpp @@ -0,0 +1,638 @@ +//===-- StatepointLowering.cpp - SDAGBuilder's statepoint code -----------===// +// +// The LLVM Compiler Infrastructure +// +// This file is distributed under the University of Illinois Open Source +// License. See LICENSE.TXT for details. +// +//===----------------------------------------------------------------------===// +// +// This file includes support code use by SelectionDAGBuilder when lowering a +// statepoint sequence in SelectionDAG IR. +// +//===----------------------------------------------------------------------===// + +#include "StatepointLowering.h" +#include "SelectionDAGBuilder.h" +#include "llvm/ADT/SmallSet.h" +#include "llvm/ADT/Statistic.h" +#include "llvm/CodeGen/FunctionLoweringInfo.h" +#include "llvm/CodeGen/SelectionDAG.h" +#include "llvm/CodeGen/StackMaps.h" +#include "llvm/IR/CallingConv.h" +#include "llvm/IR/Instructions.h" +#include "llvm/IR/IntrinsicInst.h" +#include "llvm/IR/Intrinsics.h" +#include "llvm/IR/Statepoint.h" +#include "llvm/Target/TargetLowering.h" +#include +using namespace llvm; + +#define DEBUG_TYPE "statepoint-lowering" + +STATISTIC(NumSlotsAllocatedForStatepoints, + "Number of stack slots allocated for statepoints"); +STATISTIC(NumOfStatepoints, "Number of statepoint nodes encountered"); +STATISTIC(StatepointMaxSlotsRequired, + "Maximum number of stack slots required for a singe statepoint"); + +void +StatepointLoweringState::startNewStatepoint(SelectionDAGBuilder &Builder) { + // Consistency check + assert(PendingGCRelocateCalls.empty() && + "Trying to visit statepoint before finished processing previous one"); + Locations.clear(); + RelocLocations.clear(); + NextSlotToAllocate = 0; + // Need to resize this on each safepoint - we need the two to stay in + // sync and the clear patterns of a SelectionDAGBuilder have no relation + // to FunctionLoweringInfo. + AllocatedStackSlots.resize(Builder.FuncInfo.StatepointStackSlots.size()); + for (size_t i = 0; i < AllocatedStackSlots.size(); i++) { + AllocatedStackSlots[i] = false; + } +} +void StatepointLoweringState::clear() { + Locations.clear(); + RelocLocations.clear(); + AllocatedStackSlots.clear(); + assert(PendingGCRelocateCalls.empty() && + "cleared before statepoint sequence completed"); +} + +SDValue +StatepointLoweringState::allocateStackSlot(EVT ValueType, + SelectionDAGBuilder &Builder) { + + NumSlotsAllocatedForStatepoints++; + + // The basic scheme here is to first look for a previously created stack slot + // which is not in use (accounting for the fact arbitrary slots may already + // be reserved), or to create a new stack slot and use it. + + // If this doesn't succeed in 40000 iterations, something is seriously wrong + for (int i = 0; i < 40000; i++) { + assert(Builder.FuncInfo.StatepointStackSlots.size() == + AllocatedStackSlots.size() && + "broken invariant"); + const size_t NumSlots = AllocatedStackSlots.size(); + assert(NextSlotToAllocate <= NumSlots && "broken invariant"); + + if (NextSlotToAllocate >= NumSlots) { + assert(NextSlotToAllocate == NumSlots); + // record stats + if (NumSlots + 1 > StatepointMaxSlotsRequired) { + StatepointMaxSlotsRequired = NumSlots + 1; + } + + SDValue SpillSlot = Builder.DAG.CreateStackTemporary(ValueType); + const unsigned FI = cast(SpillSlot)->getIndex(); + Builder.FuncInfo.StatepointStackSlots.push_back(FI); + AllocatedStackSlots.push_back(true); + return SpillSlot; + } + if (!AllocatedStackSlots[NextSlotToAllocate]) { + const int FI = Builder.FuncInfo.StatepointStackSlots[NextSlotToAllocate]; + AllocatedStackSlots[NextSlotToAllocate] = true; + return Builder.DAG.getFrameIndex(FI, ValueType); + } + // Note: We deliberately choose to advance this only on the failing path. + // Doing so on the suceeding path involes a bit of complexity that caused a + // minor bug previously. Unless performance shows this matters, please + // keep this code as simple as possible. + NextSlotToAllocate++; + } + llvm_unreachable("infinite loop?"); +} + +/// Try to find existing copies of the incoming values in stack slots used for +/// statepoint spilling. If we can find a spill slot for the incoming value, +/// mark that slot as allocated, and reuse the same slot for this safepoint. +/// This helps to avoid series of loads and stores that only serve to resuffle +/// values on the stack between calls. +static void reservePreviousStackSlotForValue(SDValue Incoming, + SelectionDAGBuilder &Builder) { + + if (isa(Incoming) || isa(Incoming)) { + // We won't need to spill this, so no need to check for previously + // allocated stack slots + return; + } + + SDValue loc = Builder.StatepointLowering.getLocation(Incoming); + if (loc.getNode()) { + // duplicates in input + return; + } + + // Search back for the load from a stack slot pattern to find the original + // slot we allocated for this value. We could extend this to deal with + // simple modification patterns, but simple dealing with trivial load/store + // sequences helps a lot already. + if (LoadSDNode *Load = dyn_cast(Incoming)) { + if (auto *FI = dyn_cast(Load->getBasePtr())) { + const int index = FI->getIndex(); + auto itr = std::find(Builder.FuncInfo.StatepointStackSlots.begin(), + Builder.FuncInfo.StatepointStackSlots.end(), index); + if (itr == Builder.FuncInfo.StatepointStackSlots.end()) { + // not one of the lowering stack slots, can't reuse! + // TODO: Actually, we probably could reuse the stack slot if the value + // hasn't changed at all, but we'd need to look for intervening writes + return; + } else { + // This is one of our dedicated lowering slots + const int Offset = + std::distance(Builder.FuncInfo.StatepointStackSlots.begin(), itr); + if (Builder.StatepointLowering.isStackSlotAllocated(Offset)) { + // stack slot already assigned to someone else, can't use it! + // TODO: currently we reserve space for gc arguments after doing + // normal allocation for deopt arguments. We should reserve for + // _all_ deopt and gc arguments, then start allocating. This + // will prevent some moves being inserted when vm state changes, + // but gc state doesn't between two calls. + return; + } + // Reserve this stack slot + Builder.StatepointLowering.reserveStackSlot(Offset); + } + + // Cache this slot so we find it when going through the normal + // assignment loop. + SDValue loc = + Builder.DAG.getTargetFrameIndex(index, Incoming.getValueType()); + + Builder.StatepointLowering.setLocation(Incoming, loc); + } + } + + // TODO: handle case where a reloaded value flows through a phi to + // another safepoint. e.g. + // bb1: + // a' = relocated... + // bb2: % pred: bb1, bb3, bb4, etc. + // a_phi = phi(a', ...) + // statepoint ... a_phi + // NOTE: This will require reasoning about cross basic block values. This is + // decidedly non trivial and this might not be the right place to do it. We + // don't really have the information we need here... + + // TODO: handle simple updates. If a value is modified and the original + // value is no longer live, it would be nice to put the modified value in the + // same slot. This allows folding of the memory accesses for some + // instructions types (like an increment). + // statepoint (i) + // i1 = i+1 + // statepoint (i1) +} + +/// Remove any duplicate (as SDValues) from the derived pointer pairs. This +/// is not required for correctness. It's purpose is to reduce the size of +/// StackMap section. It has no effect on the number of spill slots required +/// or the actual lowering. +static void removeDuplicatesGCPtrs(SmallVectorImpl &bases, + SmallVectorImpl &ptrs, + SmallVectorImpl &relocs, + SelectionDAGBuilder &Builder) { + + // This is horribly ineffecient, but I don't care right now + SmallSet seen; + + SmallVector newbases, newptrs, newrelocs; + for (size_t i = 0; i < ptrs.size(); i++) { + SDValue sd = Builder.getValue(ptrs[i]); + // Only add non-duplicates + if (seen.count(sd) == 0) { + newbases.push_back(bases[i]); + newptrs.push_back(ptrs[i]); + newrelocs.push_back(relocs[i]); + } + seen.insert(sd); + } + assert(bases.size() >= newbases.size()); + assert(ptrs.size() >= newptrs.size()); + assert(relocs.size() >= newrelocs.size()); + bases = newbases; + ptrs = newptrs; + relocs = newrelocs; + assert(ptrs.size() == bases.size()); + assert(ptrs.size() == relocs.size()); +} + +/// Extract call from statepoint, lower it and return pointer to the +/// call node. Also update NodeMap so that getValue(statepoint) will +/// reference lowered call result +static SDNode *lowerCallFromStatepoint(const CallInst &CI, + SelectionDAGBuilder &Builder) { + + assert(Intrinsic::experimental_gc_statepoint == + dyn_cast(&CI)->getIntrinsicID() && + "function called must be the statepoint function"); + + int NumCallArgs = dyn_cast(CI.getArgOperand(1))->getZExtValue(); + assert(NumCallArgs >= 0 && "non-negative"); + + ImmutableStatepoint statepointOpers(&CI); + + // Lower the actual call itself - This is a bit of a hack, but we want to + // avoid modifying the actual lowering code. This is similiar in intent to + // the LowerCallOperands mechanism used by PATCHPOINT, but is structured + // differently. Hopefully, this is slightly more robust w.r.t. calling + // convention, return values, and other function attributes. + Value *ActualCallee = const_cast(statepointOpers.actualCallee()); +#ifndef NDEBUG + statepointOpers.verify(); +#endif + + std::vector args; + CallInst::const_op_iterator arg_begin = statepointOpers.call_args_begin(); + CallInst::const_op_iterator arg_end = statepointOpers.call_args_end(); + args.insert(args.end(), arg_begin, arg_end); + CallInst *tmp = CallInst::Create(ActualCallee, args); + tmp->setTailCall(CI.isTailCall()); + tmp->setCallingConv(CI.getCallingConv()); + tmp->setAttributes(CI.getAttributes()); + Builder.LowerCallTo(tmp, Builder.getValue(ActualCallee), false); + + // Handle the return value of the call iff any. + const bool hasDef = !tmp->getType()->isVoidTy(); + if (hasDef) { + // The value of the statepoint itself will be the value of call itself. + // We'll replace the actually call node shortly. gc_result will grab + // this value. + Builder.setValue(&CI, Builder.getValue(tmp)); + } else { + // The token value is never used from here on, just generate a poison value + Builder.setValue(&CI, Builder.DAG.getIntPtrConstant(-1)); + } + // Remove the fake entry we created so we don't have a hanging reference + // after we delete this node. + Builder.removeValue(tmp); + delete tmp; + tmp = nullptr; + + // Search for the call node + // The following code is essentially reverse engineering X86's + // LowerCallTo. + SDNode *CallNode = nullptr; + + // We just emitted a call, so it should be last thing generated + SDValue Chain = Builder.DAG.getRoot(); + + // Find closest CALLSEQ_END walking back through lowered nodes if needed + SDNode *CallEnd = Chain.getNode(); + int sanity = 0; + while (CallEnd->getOpcode() != ISD::CALLSEQ_END) { + CallEnd = CallEnd->getGluedNode(); + assert(CallEnd && "Can not find call node"); + assert(sanity < 20 && "should have found call end already"); + sanity++; + } + assert(CallEnd->getOpcode() == ISD::CALLSEQ_END && + "Expected a callseq node."); + assert(CallEnd->getGluedNode()); + + // Step back inside the CALLSEQ + CallNode = CallEnd->getGluedNode(); + return CallNode; +} + +/// Callect all gc pointers coming into statepoint intrinsic, clean them up, +/// and return two arrays: +/// Bases - base pointers incoming to this statepoint +/// Ptrs - derived pointers incoming to this statepoint +/// Relocs - the gc_relocate corresponding to each base/ptr pair +/// Elements of this arrays should be in one-to-one correspondence with each +/// other i.e Bases[i], Ptrs[i] are from the same gcrelocate call +static void +getIncomingStatepointGCValues(SmallVectorImpl &Bases, + SmallVectorImpl &Ptrs, + SmallVectorImpl &Relocs, + ImmutableCallSite Statepoint, + SelectionDAGBuilder &Builder) { + // Search for relocated pointers. Note that working backwards from the + // gc_relocates ensures that we only get pairs which are actually relocated + // and used after the statepoint. + // TODO: This logic should probably become a utility function in Statepoint.h + for (const User *U : cast(Statepoint.getInstruction())->users()) { + if (!isGCRelocate(U)) { + continue; + } + GCRelocateOperands relocateOpers(U); + Relocs.push_back(cast(U)); + Bases.push_back(relocateOpers.basePtr()); + Ptrs.push_back(relocateOpers.derivedPtr()); + } + + // Remove any redundant llvm::Values which map to the same SDValue as another + // input. Also has the effect of removing duplicates in the original + // llvm::Value input list as well. This is a useful optimization for + // reducing the size of the StackMap section. It has no other impact. + removeDuplicatesGCPtrs(Bases, Ptrs, Relocs, Builder); + + assert(Bases.size() == Ptrs.size() && Ptrs.size() == Relocs.size()); +} + +/// Spill a value incoming to the statepoint. It might be either part of +/// vmstate +/// or gcstate. In both cases unconditionally spill it on the stack unless it +/// is a null constant. Return pair with first element being frame index +/// containing saved value and second element with outgoing chain from the +/// emitted store +static std::pair +spillIncomingStatepointValue(SDValue Incoming, SDValue Chain, + SelectionDAGBuilder &Builder) { + SDValue loc = Builder.StatepointLowering.getLocation(Incoming); + + // Emit new store if we didn't do it for this ptr before + if (!loc.getNode()) { + loc = Builder.StatepointLowering.allocateStackSlot(Incoming.getValueType(), + Builder); + assert(isa(loc)); + int index = cast(loc)->getIndex(); + // We use TargetFrameIndex so that isel will not select it into LEA + loc = Builder.DAG.getTargetFrameIndex(index, Incoming.getValueType()); + + // TODO: We can create TokenFactor node instead of + // chaining stores one after another, this may allow + // a bit more optimal scheduling for them + Chain = Builder.DAG.getStore(Chain, Builder.getCurSDLoc(), Incoming, loc, + MachinePointerInfo::getFixedStack(index), + false, false, 0); + + Builder.StatepointLowering.setLocation(Incoming, loc); + } + + assert(loc.getNode()); + return std::make_pair(loc, Chain); +} + +/// Lower a single value incoming to a statepoint node. This value can be +/// either a deopt value or a gc value, the handling is the same. We special +/// case constants and allocas, then fall back to spilling if required. +static void lowerIncomingStatepointValue(SDValue Incoming, + SmallVectorImpl &Ops, + SelectionDAGBuilder &Builder) { + SDValue chain = Builder.getRoot(); + + if (ConstantSDNode *C = dyn_cast(Incoming)) { + // If the original value was a constant, make sure it gets recorded as + // such in the stackmap. This is required so that the consumer can + // parse any internal format to the deopt state. It also handles null + // pointers and other constant pointers in GC states + Ops.push_back( + Builder.DAG.getTargetConstant(StackMaps::ConstantOp, MVT::i64)); + Ops.push_back(Builder.DAG.getTargetConstant(C->getSExtValue(), MVT::i64)); + } else if (FrameIndexSDNode *FI = dyn_cast(Incoming)) { + // This handles allocas as arguments to the statepoint + const TargetLowering &TLI = Builder.DAG.getTargetLoweringInfo(); + Ops.push_back( + Builder.DAG.getTargetFrameIndex(FI->getIndex(), TLI.getPointerTy())); + } else { + // Otherwise, locate a spill slot and explicitly spill it so it + // can be found by the runtime later. We currently do not support + // tracking values through callee saved registers to their eventual + // spill location. This would be a useful optimization, but would + // need to be optional since it requires a lot of complexity on the + // runtime side which not all would support. + std::pair res = + spillIncomingStatepointValue(Incoming, chain, Builder); + Ops.push_back(res.first); + chain = res.second; + } + + Builder.DAG.setRoot(chain); +} + +/// Lower deopt state and gc pointer arguments of the statepoint. The actual +/// lowering is described in lowerIncomingStatepointValue. This function is +/// responsible for lowering everything in the right position and playing some +/// tricks to avoid redundant stack manipulation where possible. On +/// completion, 'Ops' will contain ready to use operands for machine code +/// statepoint. The chain nodes will have already been created and the DAG root +/// will be set to the last value spilled (if any were). +static void lowerStatepointMetaArgs(SmallVectorImpl &Ops, + ImmutableStatepoint Statepoint, + SelectionDAGBuilder &Builder) { + + // Lower the deopt and gc arguments for this statepoint. Layout will + // be: deopt argument length, deopt arguments.., gc arguments... + + SmallVector bases, ptrs, relocations; + getIncomingStatepointGCValues(bases, ptrs, relocations, + Statepoint.getCallSite(), Builder); + + // Before we actually start lowering (and allocating spill slots for values), + // reserve any stack slots which we judge to be profitable to reuse for a + // particular value. This is purely an optimization over the code below and + // doesn't change semantics at all. It is important for performance that we + // reserve slots for both deopt and gc values before lowering either. + for (auto I = Statepoint.vm_state_begin() + 1, E = Statepoint.vm_state_end(); + I != E; ++I) { + Value *V = *I; + SDValue Incoming = Builder.getValue(V); + reservePreviousStackSlotForValue(Incoming, Builder); + } + for (unsigned i = 0; i < bases.size() * 2; ++i) { + // Even elements will contain base, odd elements - derived ptr + const Value *V = i % 2 ? bases[i / 2] : ptrs[i / 2]; + SDValue Incoming = Builder.getValue(V); + reservePreviousStackSlotForValue(Incoming, Builder); + } + + // First, prefix the list with the number of unique values to be + // lowered. Note that this is the number of *Values* not the + // number of SDValues required to lower them. + const int numVMSArgs = Statepoint.numTotalVMSArgs(); + Ops.push_back( + Builder.DAG.getTargetConstant(StackMaps::ConstantOp, MVT::i64)); + Ops.push_back(Builder.DAG.getTargetConstant(numVMSArgs, MVT::i64)); + + assert(numVMSArgs + 1 == std::distance(Statepoint.vm_state_begin(), + Statepoint.vm_state_end())); + + // The vm state arguments are lowered in an opaque manner. We do + // not know what type of values are contained within. We skip the + // first one since that happens to be the total number we lowered + // explicitly just above. We could have left it in the loop and + // not done it explicitly, but it's far easier to understand this + // way. + for (auto I = Statepoint.vm_state_begin() + 1, E = Statepoint.vm_state_end(); + I != E; ++I) { + const Value *V = *I; + SDValue Incoming = Builder.getValue(V); + lowerIncomingStatepointValue(Incoming, Ops, Builder); + } + + // Finally, go ahead and lower all the gc arguments. There's no prefixed + // length for this one. After lowering, we'll have the base and pointer + // arrays interwoven with each (lowered) base pointer immediately followed by + // it's (lowered) derived pointer. i.e + // (base[0], ptr[0], base[1], ptr[1], ...) + for (unsigned i = 0; i < bases.size() * 2; ++i) { + // Even elements will contain base, odd elements - derived ptr + const Value *V = i % 2 ? bases[i / 2] : ptrs[i / 2]; + SDValue Incoming = Builder.getValue(V); + lowerIncomingStatepointValue(Incoming, Ops, Builder); + } +} +void SelectionDAGBuilder::visitStatepoint(const CallInst &CI) { + // The basic scheme here is that information about both the original call and + // the safepoint is encoded in the CallInst. We create a temporary call and + // lower it, then reverse engineer the calling sequence. + + // Check some preconditions for sanity + assert(isStatepoint(&CI) && + "function called must be the statepoint function"); + NumOfStatepoints++; + // Clear state + StatepointLowering.startNewStatepoint(*this); + +#ifndef NDEBUG + // Consistency check + for (const User *U : CI.users()) { + const CallInst *Call = cast(U); + if (isGCRelocate(Call)) + StatepointLowering.scheduleRelocCall(*Call); + } +#endif + + ImmutableStatepoint statepoint(&CI); + + // Lower statepoint vmstate and gcstate arguments + SmallVector lowered_args; + lowerStatepointMetaArgs(lowered_args, statepoint, *this); + + // Get call node, we will replace it later with statepoint + SDNode *CallNode = lowerCallFromStatepoint(CI, *this); + + // Construct the actual STATEPOINT node with all the appropriate arguments + // and return values. + + // TODO: Currently, all of these operands are being marked as read/write in + // PrologEpilougeInserter.cpp, we should special case the VMState arguments + // and flags to be read-only. + SmallVector Ops; + + // Calculate and push starting position of vmstate arguments + // Call Node: Chain, Target, {Args}, RegMask, [Glue] + SDValue Glue; + if (CallNode->getGluedNode()) { + // Glue is always last operand + Glue = CallNode->getOperand(CallNode->getNumOperands() - 1); + } + // Get number of arguments incoming directly into call node + unsigned NumCallRegArgs = + CallNode->getNumOperands() - (Glue.getNode() ? 4 : 3); + Ops.push_back(DAG.getTargetConstant(NumCallRegArgs, MVT::i32)); + + // Add call target + SDValue call_target = SDValue(CallNode->getOperand(1).getNode(), 0); + Ops.push_back(call_target); + + // Add call arguments + // Get position of register mask in the call + SDNode::op_iterator regMaskIt; + if (Glue.getNode()) + regMaskIt = CallNode->op_end() - 2; + else + regMaskIt = CallNode->op_end() - 1; + Ops.insert(Ops.end(), CallNode->op_begin() + 2, regMaskIt); + + // Add a leading constant argument with the Flags and the calling convention + // masked together + CallingConv::ID CallConv = CI.getCallingConv(); + int Flags = dyn_cast(CI.getArgOperand(2))->getZExtValue(); + assert(Flags == 0 && "not expected to be used"); + Ops.push_back(DAG.getTargetConstant(StackMaps::ConstantOp, MVT::i64)); + Ops.push_back( + DAG.getTargetConstant(Flags | ((unsigned)CallConv << 1), MVT::i64)); + + // Insert all vmstate and gcstate arguments + Ops.insert(Ops.end(), lowered_args.begin(), lowered_args.end()); + + // Add register mask from call node + Ops.push_back(*regMaskIt); + + // Add chain + Ops.push_back(CallNode->getOperand(0)); + + // Same for the glue, but we add it only if original call had it + if (Glue.getNode()) + Ops.push_back(Glue); + + // Compute return values + SmallVector ValueVTs; + ValueVTs.push_back(MVT::Other); + ValueVTs.push_back(MVT::Glue); // provide a glue output since we consume one + // as input. This allows someone else to chain + // off us as needed. + SDVTList NodeTys = DAG.getVTList(ValueVTs); + + SDNode *StatepointMCNode = DAG.getMachineNode(TargetOpcode::STATEPOINT, + getCurSDLoc(), NodeTys, Ops); + + // Replace original call + DAG.ReplaceAllUsesWith(CallNode, StatepointMCNode); // This may update Root + // Remove originall call node + DAG.DeleteNode(CallNode); + + // DON'T set the root - under the assumption that it's already set past the + // inserted node we created. + + // TODO: A better future implementation would be to emit a single variable + // argument, variable return value STATEPOINT node here and then hookup the + // return value of each gc.relocate to the respective output of the + // previously emitted STATEPOINT value. Unfortunately, this doesn't appear + // to actually be possible today. +} + +void SelectionDAGBuilder::visitGCResult(const CallInst &CI) { + // The result value of the gc_result is simply the result of the actual + // call. We've already emitted this, so just grab the value. + Instruction *statepoint = cast(CI.getArgOperand(0)); + assert(isStatepoint(statepoint) && + "first argument must be a statepoint token"); + + setValue(&CI, getValue(statepoint)); +} + +void SelectionDAGBuilder::visitGCRelocate(const CallInst &CI) { +#ifndef NDEBUG + // Consistency check + StatepointLowering.relocCallVisited(CI); +#endif + + GCRelocateOperands relocateOpers(&CI); + SDValue sd = getValue(relocateOpers.derivedPtr()); + + if (isa(sd) || isa(sd)) { + // We didn't need to spill these special cases (constants and allocas). + // See the handling in spillIncomingValueForStatepoint for detail. + setValue(&CI, sd); + return; + } + + SDValue loc = StatepointLowering.getRelocLocation(sd); + // Emit new load if we did not emit it before + if (!loc.getNode()) { + SDValue SpillSlot = StatepointLowering.getLocation(sd); + int FI = cast(SpillSlot)->getIndex(); + + // Be conservative: flush all pending loads + // TODO: Probably we can be less restrictive on this, + // it may allow more scheduling opprtunities + SDValue Chain = getRoot(); + + loc = DAG.getLoad(SpillSlot.getValueType(), getCurSDLoc(), Chain, + SpillSlot, MachinePointerInfo::getFixedStack(FI), false, + false, false, 0); + + StatepointLowering.setRelocLocation(sd, loc); + + // Again, be conservative, don't emit pending loads + DAG.setRoot(loc.getValue(1)); + } + + assert(loc.getNode()); + setValue(&CI, loc); +} Index: lib/CodeGen/StackMaps.cpp =================================================================== --- lib/CodeGen/StackMaps.cpp +++ lib/CodeGen/StackMaps.cpp @@ -277,6 +277,18 @@ } #endif } +void StackMaps::recordStatepoint(const MachineInstr &MI) { + assert(MI.getOpcode() == TargetOpcode::STATEPOINT && + "expected statepoint"); + + StatepointOpers opers(&MI); + // Record all the deopt and gc operands (they're contiguous and run from the + // initial index to the end of the operand list) + const unsigned StartIdx = opers.getVarIdx(); + recordStackMapOpers(MI, 0xABCDEF00, + MI.operands_begin() + StartIdx, MI.operands_end(), + false); +} /// Emit the stackmap header. /// Index: lib/CodeGen/TargetLoweringBase.cpp =================================================================== --- lib/CodeGen/TargetLoweringBase.cpp +++ lib/CodeGen/TargetLoweringBase.cpp @@ -977,8 +977,14 @@ // Add a new memory operand for this FI. const MachineFrameInfo &MFI = *MF.getFrameInfo(); assert(MFI.getObjectOffset(FI) != -1); + + unsigned Flags = MachineMemOperand::MOLoad; + if (MI->getOpcode() == TargetOpcode::STATEPOINT) { + Flags |= MachineMemOperand::MOStore; + Flags |= MachineMemOperand::MOVolatile; + } MachineMemOperand *MMO = MF.getMachineMemOperand( - MachinePointerInfo::getFixedStack(FI), MachineMemOperand::MOLoad, + MachinePointerInfo::getFixedStack(FI), Flags, TM.getSubtargetImpl()->getDataLayout()->getPointerSize(), MFI.getObjectAlignment(FI)); MIB->addMemOperand(MF, MMO); Index: lib/IR/CMakeLists.txt =================================================================== --- lib/IR/CMakeLists.txt +++ lib/IR/CMakeLists.txt @@ -36,6 +36,7 @@ Pass.cpp PassManager.cpp PassRegistry.cpp + Statepoint.cpp Type.cpp TypeFinder.cpp Use.cpp Index: lib/IR/Function.cpp =================================================================== --- lib/IR/Function.cpp +++ lib/IR/Function.cpp @@ -446,6 +446,37 @@ return 0; } +/// Returns a stable mangling for the type specified for use in the name +/// mangling scheme used by 'any' types in intrinsic signatures. . +static std::string getMangledTypeStr(Type* Ty) { + std::string Result; + if (PointerType* PTyp = dyn_cast(Ty) ) { + Result += "p" + llvm::utostr(PTyp->getAddressSpace()) + + getMangledTypeStr(PTyp->getElementType()); + } else if( ArrayType* ATyp = dyn_cast(Ty) ) { + Result += "a" + llvm::utostr(ATyp->getNumElements()) + + getMangledTypeStr(ATyp->getElementType()); + } else if( StructType* STyp = dyn_cast(Ty) ) { + if( !STyp->isLiteral() ) { + Result += STyp->getName(); + } else { + llvm_unreachable("TODO: implement literal types"); + } + } else if( FunctionType* FT = dyn_cast(Ty) ) { + Result += "f_" + getMangledTypeStr(FT->getReturnType()); + for(size_t i = 0; i < FT->getNumParams(); i++) { + Result += getMangledTypeStr(FT->getParamType(i)); + } + if( FT->isVarArg() ) { + Result += "vararg"; + } + Result += "f"; //ensure distinguishable + } else if (Ty) { + Result += EVT::getEVT(Ty).getEVTString(); + } + return Result; +} + std::string Intrinsic::getName(ID id, ArrayRef Tys) { assert(id < num_intrinsics && "Invalid intrinsic ID!"); static const char * const Table[] = { @@ -458,12 +489,7 @@ return Table[id]; std::string Result(Table[id]); for (unsigned i = 0; i < Tys.size(); ++i) { - if (PointerType* PTyp = dyn_cast(Tys[i])) { - Result += ".p" + llvm::utostr(PTyp->getAddressSpace()) + - EVT::getEVT(PTyp->getElementType()).getEVTString(); - } - else if (Tys[i]) - Result += "." + EVT::getEVT(Tys[i]).getEVTString(); + Result += "." + getMangledTypeStr(Tys[i]); } return Result; } @@ -739,6 +765,12 @@ while (!TableRef.empty()) ArgTys.push_back(DecodeFixedType(TableRef, Tys, Context)); + // DecodeFixedType returns Void for IITDescriptor::Void and IITDescriptor::VarArg + // If we see void type as the type of the last argument, it is vararg intrinsic + if (!ArgTys.empty() && ArgTys.back()->isVoidTy()) { + ArgTys.pop_back(); + return FunctionType::get(ResultTy, ArgTys, true); + } return FunctionType::get(ResultTy, ArgTys, false); } Index: lib/IR/Statepoint.cpp =================================================================== --- /dev/null +++ lib/IR/Statepoint.cpp @@ -0,0 +1,52 @@ + + +#include "llvm/IR/Function.h" +#include "llvm/IR/Constant.h" +#include "llvm/IR/Constants.h" +#include "llvm/Support/CommandLine.h" + +#include "llvm/IR/Statepoint.h" + +using namespace std; +using namespace llvm; + +bool llvm::isStatepoint(const ImmutableCallSite &CS) { + const Function *F = CS.getCalledFunction(); + return (F && F->getIntrinsicID() == Intrinsic::experimental_gc_statepoint); +} +bool llvm::isStatepoint(const Instruction *inst) { + if (isa(inst) || isa(inst)) { + ImmutableCallSite CS(inst); + return isStatepoint(CS); + } + return false; +} +bool llvm::isStatepoint(const Instruction &inst) { + return isStatepoint(&inst); +} + +bool llvm::isGCRelocate(const ImmutableCallSite &CS) { + return isGCRelocate(CS.getInstruction()); +} +bool llvm::isGCRelocate(const Instruction *inst) { + if (const CallInst *call = dyn_cast(inst)) { + if (const Function *F = call->getCalledFunction()) { + return F->getIntrinsicID() == Intrinsic::experimental_gc_relocate; + } + } + return false; +} + +bool llvm::isGCResult(const ImmutableCallSite &CS) { + return isGCResult(CS.getInstruction()); +} +bool llvm::isGCResult(const Instruction *inst) { + if (const CallInst *call = cast(inst)) { + if (Function *F = call->getCalledFunction()) { + return (F->getIntrinsicID() == Intrinsic::experimental_gc_result_int || + F->getIntrinsicID() == Intrinsic::experimental_gc_result_float || + F->getIntrinsicID() == Intrinsic::experimental_gc_result_ptr); + } + } + return false; +} Index: lib/IR/Verifier.cpp =================================================================== --- lib/IR/Verifier.cpp +++ lib/IR/Verifier.cpp @@ -68,6 +68,7 @@ #include "llvm/IR/Metadata.h" #include "llvm/IR/Module.h" #include "llvm/IR/PassManager.h" +#include "llvm/IR/Statepoint.h" #include "llvm/Pass.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" @@ -1113,8 +1114,15 @@ // direct call/invokes, never having its "address taken". if (F.getIntrinsicID()) { const User *U; - if (F.hasAddressTaken(&U)) - Assert1(0, "Invalid user of intrinsic instruction!", U); + if (F.hasAddressTaken(&U)) { + bool ok = false; + if (const Instruction* I = dyn_cast(U)) { + if (isStatepoint(I)) { + ok = true; + } + } + Assert1(ok, "Invalid user of intrinsic instruction!", U); + } } Assert1(!F.hasDLLImportStorageClass() || @@ -2212,11 +2220,14 @@ if (Function *F = dyn_cast(I.getOperand(i))) { // Check to make sure that the "address of" an intrinsic function is never // taken. - Assert1(!F->isIntrinsic() || i == (isa(I) ? e-1 : 0), - "Cannot take the address of an intrinsic!", &I); - Assert1(!F->isIntrinsic() || isa(I) || - F->getIntrinsicID() == Intrinsic::donothing, - "Cannot invoke an intrinsinc other than donothing", &I); + if (!isStatepoint(&I)) { + // TODO: this is a hack at the moment, generalize + Assert1(!F->isIntrinsic() || i == (isa(I) ? e-1 : 0), + "Cannot take the address of an intrinsic!", &I); + Assert1(!F->isIntrinsic() || isa(I) || + F->getIntrinsicID() == Intrinsic::donothing, + "Cannot invoke an intrinsinc other than donothing", &I); + } Assert1(F->getParent() == M, "Referencing function in another module!", &I); } else if (BasicBlock *OpBB = dyn_cast(I.getOperand(i))) { @@ -2534,7 +2545,88 @@ Assert1(isa(CI.getArgOperand(1)), "llvm.invariant.end parameter #2 must be a constant integer", &CI); break; + + case Intrinsic::experimental_gc_statepoint: { + // target, # call args = 0, # deopt args = 0, #gc args = 0 -> 4 args + assert(CI.getNumArgOperands() >= 4 && + "not enough arguments to statepoint"); + for (User* U : CI.users()) { + const CallInst* gcRelocCall = cast(U); + const Function *gcRelocFn = gcRelocCall->getCalledFunction(); + Assert1(gcRelocFn && gcRelocFn->isDeclaration() && + (gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_result_int || + gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_result_float || + gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_result_ptr || + gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_relocate), + "gc.result or gc.relocate are the only value uses of statepoint", &CI); + if (gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_result_int || + gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_result_float || + gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_result_ptr ) { + Assert1(gcRelocCall->getNumArgOperands() == 1, "wrong number of arguments", &CI); + Assert2(gcRelocCall->getArgOperand(0) == &CI, "connected to wrong statepoint", &CI, gcRelocCall); + } else if (gcRelocFn->getIntrinsicID() == Intrinsic::experimental_gc_relocate) { + Assert1(gcRelocCall->getNumArgOperands() == 3, "wrong number of arguments", &CI); + Assert2(gcRelocCall->getArgOperand(0) == &CI, "connected to wrong statepoint", &CI, gcRelocCall); + + // Note: It is legal for a single derived pointer to be listed multiple + // times. It's non-optimal, but it is legal. It can also happen after + // insertion if we strip a bitcast away. + } else { + llvm_unreachable("unsupported use type - how'd we get past the assert?"); + } + } + + // Note: It is really tempting to check that each base is relocated and + // that a derived pointer is never reused as a base pointer. This turns + // out to be problematic since optimizations run after safepoint insertion + // can recognize equality properties that the insertion logic doesn't know + // about. See example statepoint.ll in the verifier subdirectory + break; + } + case Intrinsic::experimental_gc_result_int: + case Intrinsic::experimental_gc_result_float: + case Intrinsic::experimental_gc_result_ptr: { + Assert1(CI.getNumArgOperands() == 1, "wrong number of arguments", &CI); + + // Are we tied to a statepoint properly? + CallSite StatepointCS(CI.getArgOperand(0)); + const Function *spFn = StatepointCS.getCalledFunction(); + Assert2(spFn && spFn->isDeclaration() && + spFn->getIntrinsicID() == Intrinsic::experimental_gc_statepoint, + "token must be from a statepoint", &CI, CI.getArgOperand(0)); + break; } + case Intrinsic::experimental_gc_relocate: { + // Some checks to ensure gc.relocate has the correct set of + // parameters. TODO: we can make these tests much stricter. + Assert1(CI.getNumArgOperands() == 3, "wrong number of arguments", &CI); + + // Are we tied to a statepoint properly? + CallSite StatepointCS(CI.getArgOperand(0)); + const Function *spFn = + StatepointCS.getInstruction() ? StatepointCS.getCalledFunction() : NULL; + Assert2(spFn && spFn->isDeclaration() && + spFn->getIntrinsicID() == Intrinsic::experimental_gc_statepoint, + "token must be from a statepoint", &CI, CI.getArgOperand(0)); + + // Both the base and derived must be piped through the safepoint + Value* base = CI.getArgOperand(1); + Assert1( isa(base), "must be integer offset", &CI); + + Value* derived = CI.getArgOperand(2); + Assert1( isa(derived), "must be integer offset", &CI); + + GCRelocateOperands ops( &CI ); + // Check the bounds + Assert1( 0 <= ops.basePtrIndex() && + ops.basePtrIndex() < (int)StatepointCS.arg_size(), + "index out of bounds", &CI); + Assert1( 0 <= ops.derivedPtrIndex() && + ops.derivedPtrIndex() < (int)StatepointCS.arg_size(), + "index out of bounds", &CI); + break; + } + }; } void DebugInfoVerifier::verifyDebugInfo() { Index: lib/Target/X86/X86FrameLowering.h =================================================================== --- lib/Target/X86/X86FrameLowering.h +++ lib/Target/X86/X86FrameLowering.h @@ -69,6 +69,10 @@ int getFrameIndexReference(const MachineFunction &MF, int FI, unsigned &FrameReg) const override; + int getFrameIndexOffsetForGC(const MachineFunction &MF, int FI) const; + int getFrameIndexReferenceForGC(const MachineFunction &MF, int FI, + unsigned &FrameReg) const override; + void eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator MI) const override; Index: lib/Target/X86/X86FrameLowering.cpp =================================================================== --- lib/Target/X86/X86FrameLowering.cpp +++ lib/Target/X86/X86FrameLowering.cpp @@ -1123,6 +1123,83 @@ return getFrameIndexOffset(MF, FI); } +// Simplified from getFrameIndexOffset keeping only StackPointer cases +int X86FrameLowering::getFrameIndexOffsetForGC(const MachineFunction &MF, int FI) const { + const X86RegisterInfo *RegInfo = + static_cast(MF.getSubtarget().getRegisterInfo()); + const MachineFrameInfo *MFI = MF.getFrameInfo(); + const uint64_t StackSize = MFI->getStackSize(); //not including dynamic realign + + { +#ifndef NDEBUG + // Note: LLVM arranges the stack as: + // Args > Saved RetPC (<--FP) > CSRs > dynamic alignment (<--BP) + // > "Stack Slots" (<--SP) + // We can always address StackSlots from RSP. We can usually (unless + // needsStackRealignment) address CSRs from RSP, but sometimes need to + // address them from RBP. FixedObjects can be placed anywhere in the stack + // frame depending on their specific requirements (i.e. we can actually + // refer to arguments to the function which are stored in the *callers* + // frame). As a result, THE RESULT OF THIS CALL IS MEANINGLESS FOR CSRs + // AND FixedObjects IFF needsStackRealignment or hasVarSizedObject. + + assert(!RegInfo->hasBasePointer(MF) && "we don't handle this case"); + + // We don't handle tail calls, and shouldn't be seeing them + // either. + int TailCallReturnAddrDelta = + MF.getInfo()->getTCReturnAddrDelta(); + assert(!(TailCallReturnAddrDelta < 0) && "we don't handle this case!"); +#endif + } + + // This is how the math works out: + // + // %rsp grows (i.e. gets lower) left to right. Each box below is + // one word (eight bytes). Obj0 is the stack slot we're trying to + // get to. + // + // ---------------------------------- + // | BP | Obj0 | Obj1 | ... | ObjN | + // ---------------------------------- + // ^ ^ ^ ^ + // A B C E + // + // A is the incoming stack pointer. + // (B - A) is the local area offset (-8 for x86-64) [1] + // (C - A) is the Offset returned by MFI->getObjectOffset for Obj0 [2] + // + // |(E - B)| is the StackSize (absolute value, positive). For a + // stack that grown down, this works out to be (B - E). [3] + // + // E is also the value of %rsp after stack has been set up, and we + // want (C - E) -- the value we can add to %rsp to get to Obj0. Now + // (C - E) == (C - A) - (B - A) + (B - E) + // { Using [1], [2] and [3] above } + // == getObjectOffset - LocalAreaOffset + StackSize + // + + // Get the Offset from the StackPointer + int Offset = MFI->getObjectOffset(FI) - getOffsetOfLocalArea(); + + // Note: Adding the FI < 0 here you'd expect from the non-GC version of this + // function doesn't get through run-tests.sh --run-slow-tests. Its not clear + // why not. + //assert((-(Offset + StackSize)) % MFI->getObjectAlignment(FI) == 0); + return Offset + StackSize; +} +// Simplified from getFrameIndexReference keeping only StackPointer cases +int X86FrameLowering::getFrameIndexReferenceForGC(const MachineFunction &MF, int FI, + unsigned &FrameReg) const { + const X86RegisterInfo *RegInfo = + static_cast(MF.getSubtarget().getRegisterInfo()); + + assert(!RegInfo->hasBasePointer(MF) && "we don't handle this case"); + + FrameReg = RegInfo->getStackRegister(); + return getFrameIndexOffsetForGC(MF, FI); +} + bool X86FrameLowering::assignCalleeSavedSpillSlots( MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector &CSI) const { Index: lib/Target/X86/X86ISelLowering.cpp =================================================================== --- lib/Target/X86/X86ISelLowering.cpp +++ lib/Target/X86/X86ISelLowering.cpp @@ -20616,6 +20616,11 @@ case X86::EH_SjLj_LongJmp64: return emitEHSjLjLongJmp(MI, BB); + case TargetOpcode::STATEPOINT: + // As an implementation detail, STATEPOINT shares the STACKMAP format at + // this point in the process. We diverge later + return emitPatchPoint(MI, BB); + case TargetOpcode::STACKMAP: case TargetOpcode::PATCHPOINT: return emitPatchPoint(MI, BB); Index: lib/Target/X86/X86MCInstLower.cpp =================================================================== --- lib/Target/X86/X86MCInstLower.cpp +++ lib/Target/X86/X86MCInstLower.cpp @@ -19,9 +19,11 @@ #include "Utils/X86ShuffleDecode.h" #include "llvm/ADT/SmallString.h" #include "llvm/CodeGen/MachineFunction.h" +#include "llvm/CodeGen/MachineFrameInfo.h" #include "llvm/CodeGen/MachineConstantPool.h" #include "llvm/CodeGen/MachineOperand.h" #include "llvm/CodeGen/MachineModuleInfoImpls.h" +#include "llvm/CodeGen/MachineRegisterInfo.h" #include "llvm/CodeGen/StackMaps.h" #include "llvm/IR/DataLayout.h" #include "llvm/IR/GlobalValue.h" @@ -34,6 +36,10 @@ #include "llvm/MC/MCInstBuilder.h" #include "llvm/MC/MCStreamer.h" #include "llvm/MC/MCSymbol.h" +#include "llvm/Support/CommandLine.h" +#include "llvm/Support/FormattedStream.h" +#include "llvm/Support/raw_ostream.h" +#include "llvm/Target/TargetFrameLowering.h" #include "llvm/Support/TargetRegistry.h" using namespace llvm; @@ -807,6 +813,66 @@ } // while (NumBytes) } +static void LowerSTATEPOINT(MCStreamer &OS, StackMaps &SM, + const MachineInstr &MI, bool Is64Bit, + const TargetMachine& TM, + const MCSubtargetInfo& STI, + X86MCInstLower &MCInstLowering) { + assert(Is64Bit && "Statepoint currently only supports X86-64"); + + // We need to record the frame size for stack walking + const MachineFunction* MF = MI.getParent()->getParent(); + assert(MF && "can't find machine function?"); + + // + // Emit call instruction + // + + // Lower call target and choose correct opcode + const MachineOperand &call_target = StatepointOpers(&MI).getCallTarget(); + MCOperand call_target_mcop; + unsigned call_opcode; + switch (call_target.getType()) { + case MachineOperand::MO_GlobalAddress: + case MachineOperand::MO_ExternalSymbol: + call_target_mcop = MCInstLowering.LowerSymbolOperand( + call_target, + MCInstLowering.GetSymbolFromOperand(call_target)); + call_opcode = X86::CALL64pcrel32; + // Currently, we only support relative addressing with statepoints. + // Otherwise, we'll need a scratch register to hold the target + // address. You'll fail asserts during load & relocation if this + // symbol is to far away. (TODO: support non-relative addressing) + break; + case MachineOperand::MO_Immediate: + call_target_mcop = MCOperand::CreateImm(call_target.getImm()); + call_opcode = X86::CALL64pcrel32; + // Currently, we only support relative addressing with statepoints. + // Otherwise, we'll need a scratch register to hold the target + // immediate. You'll fail asserts during load & relocation if this + // address is to far away. (TODO: support non-relative addressing) + break; + case MachineOperand::MO_Register: + call_target_mcop = MCOperand::CreateReg(call_target.getReg()); + call_opcode = X86::CALL64r; + break; + default: + llvm_unreachable("Unsupported operand type in statepoint call target"); + break; + } + + // Emit call + MCInst call_inst; + call_inst.setOpcode(call_opcode); + call_inst.addOperand(call_target_mcop); + OS.EmitInstruction(call_inst, STI); + + // Record our statepoint node in the same section used by STACKMAP + // and PATCHPOINT + SM.recordStatepoint(MI); +} + + // Lower a stackmap of the form: // , , ... void X86AsmPrinter::LowerSTACKMAP(const MachineInstr &MI) { @@ -1029,7 +1095,9 @@ .addExpr(DotExpr)); return; } - + case TargetOpcode::STATEPOINT: + return LowerSTATEPOINT(OutStreamer, SM, *MI, Subtarget->is64Bit(), TM, + getSubtargetInfo(), MCInstLowering); case TargetOpcode::STACKMAP: return LowerSTACKMAP(*MI); Index: lib/Transforms/InstCombine/InstCombineCalls.cpp =================================================================== --- lib/Transforms/InstCombine/InstCombineCalls.cpp +++ lib/Transforms/InstCombine/InstCombineCalls.cpp @@ -1047,6 +1047,15 @@ if (!CI->isLosslessCast()) return false; + // If this is an intrinsic, avoid munging types. We need types for + // statepoint reconstruction in SelectionDAG. This is probably something + // which could be upstreamed since the entire point of intrinsics is that + // they are understandable by the optimizer. :) + if( CS.getCalledFunction() && + 0 != CS.getCalledFunction()->getIntrinsicID() ) { + return false; + } + // The size of ByVal or InAlloca arguments is derived from the type, so we // can't change to a type with a different size. If the size were // passed explicitly we could avoid this check. Index: test/CodeGen/X86/statepoint-call-lowering.ll =================================================================== --- /dev/null +++ test/CodeGen/X86/statepoint-call-lowering.ll @@ -0,0 +1,71 @@ +; RUN: llc < %s | FileCheck %s +; This file contains a collection of basic tests to ensure we didn't +; screw up normal call lowering when there are no deopt or gc arguments. + +declare zeroext i1 @return_i1() +declare zeroext i32 @return_i32() +declare i32* @return_i32ptr() +declare float @return_float() + +define i1 @test_i1_return() { +; CHECK-LABEL: test_i1_return +; This is just checking that a i1 gets lowered normally when there's no extra +; state arguments to the statepoint +; CHECK: pushq %rax +; CHECK: callq return_i1 +; CHECK: popq %rdx +; CHECK: retq +entry: + %safepoint_token = tail call i32 (i1 ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_i1f(i1 ()* @return_i1, i32 0, i32 0, i32 0) + %call1 = call zeroext i1 @llvm.experimental.gc.result.int.i1(i32 %safepoint_token) + ret i1 %call1 +} + +define i32 @test_i32_return() { +; CHECK-LABEL: test_i32_return +; CHECK: pushq %rax +; CHECK: callq return_i32 +; CHECK: popq %rdx +; CHECK: retq +entry: + %safepoint_token = tail call i32 (i32 ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_i32f(i32 ()* @return_i32, i32 0, i32 0, i32 0) + %call1 = call zeroext i32 @llvm.experimental.gc.result.int.i32(i32 %safepoint_token) + ret i32 %call1 +} + +define i32* @test_i32ptr_return() { +; CHECK-LABEL: test_i32ptr_return +; CHECK: pushq %rax +; CHECK: callq return_i32ptr +; CHECK: popq %rdx +; CHECK: retq +entry: + %safepoint_token = tail call i32 (i32* ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_p0i32f(i32* ()* @return_i32ptr, i32 0, i32 0, i32 0) + %call1 = call i32* @llvm.experimental.gc.result.ptr.p0i32(i32 %safepoint_token) + ret i32* %call1 +} + +define float @test_float_return() { +; CHECK-LABEL: test_float_return +; CHECK: pushq %rax +; CHECK: callq return_float +; CHECK: popq %rax +; CHECK: retq +entry: + %safepoint_token = tail call i32 (float ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_f32f(float ()* @return_float, i32 0, i32 0, i32 0) + %call1 = call float @llvm.experimental.gc.result.float.f32(i32 %safepoint_token) + ret float %call1 +} + +declare i32 @llvm.experimental.gc.statepoint.p0f_i1f(i1 ()*, i32, i32, ...) +declare i1 @llvm.experimental.gc.result.int.i1(i32) + +declare i32 @llvm.experimental.gc.statepoint.p0f_i32f(i32 ()*, i32, i32, ...) +declare i32 @llvm.experimental.gc.result.int.i32(i32) + +declare i32 @llvm.experimental.gc.statepoint.p0f_p0i32f(i32* ()*, i32, i32, ...) +declare i32* @llvm.experimental.gc.result.ptr.p0i32(i32) + +declare i32 @llvm.experimental.gc.statepoint.p0f_f32f(float ()*, i32, i32, ...) +declare float @llvm.experimental.gc.result.float.f32(i32) + Index: test/CodeGen/X86/statepoint-stack-usage.ll =================================================================== --- /dev/null +++ test/CodeGen/X86/statepoint-stack-usage.ll @@ -0,0 +1,60 @@ +; RUN: llc < %s | FileCheck %s + +target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128" +target triple = "x86_64-pc-linux-gnu" + +; This test is checking to make sure that we reuse the same stack slots +; for GC values spilled over two different call sites. Since the order +; of GC arguments differ, niave lowering code would insert loads and +; stores to rearrange items on the stack. We need to make sure (for +; performance) that this doesn't happen. +define i32 @back_to_back_calls(i32* %a, i32* %b, i32* %c) #1 { +; CHECK-LABEL: back_to_back_calls +; The exact stores don't matter, but there need to be three stack slots created +; CHECK: movq %rdx, 16(%rsp) +; CHECK: movq %rdi, 8(%rsp) +; CHECK: movq %rsi, (%rsp) + %safepoint_token = tail call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* undef, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32* %a, i32* %b, i32* %c) + %a1 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token, i32 9, i32 9) + %b1 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token, i32 9, i32 10) + %c1 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token, i32 9, i32 11) +; CHECK: callq +; This is the key check. There should NOT be any memory moves here +; CHECK-NOT: movq + %safepoint_token2 = tail call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* undef, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32* %c1, i32* %b1, i32* %a1) + %a2 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token2, i32 9, i32 11) + %b2 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token2, i32 9, i32 10) + %c2 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token2, i32 9, i32 9) +; CHECK: callq + ret i32 1 +} + +; This test simply checks that minor changes in vm state don't prevent slots +; being reused for gc values. +define i32 @reserve_first(i32* %a, i32* %b, i32* %c) #1 { +; CHECK-LABEL: reserve_first +; The exact stores don't matter, but there need to be three stack slots created +; CHECK: movq %rdx, 16(%rsp) +; CHECK: movq %rdi, 8(%rsp) +; CHECK: movq %rsi, (%rsp) + %safepoint_token = tail call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* undef, i32 0, i32 0, i32 5, i32 0, i32 -1, i32 0, i32 0, i32 0, i32* %a, i32* %b, i32* %c) + %a1 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token, i32 9, i32 9) + %b1 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token, i32 9, i32 10) + %c1 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token, i32 9, i32 11) +; CHECK: callq +; This is the key check. There should NOT be any memory moves here +; CHECK-NOT: movq + %safepoint_token2 = tail call i32 (void ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()* undef, i32 0, i32 0, i32 5, i32* %a1, i32 0, i32* %c1, i32 0, i32 0, i32* %c1, i32* %b1, i32* %a1) + %a2 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token2, i32 9, i32 11) + %b2 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token2, i32 9, i32 10) + %c2 = tail call coldcc i32* @llvm.experimental.gc.relocate.p0i32(i32 %safepoint_token2, i32 9, i32 9) +; CHECK: callq + ret i32 1 +} + +; Function Attrs: nounwind +declare i32* @llvm.experimental.gc.relocate.p0i32(i32, i32, i32) #3 + +declare i32 @llvm.experimental.gc.statepoint.p0f_isVoidf(void ()*, i32, i32, ...) + +attributes #1 = { uwtable } Index: test/CodeGen/X86/statepoint-stackmap-format.ll =================================================================== --- /dev/null +++ test/CodeGen/X86/statepoint-stackmap-format.ll @@ -0,0 +1,108 @@ +; RUN: llc < %s | FileCheck %s +; This test is a sanity check to ensure statepoints are generating StackMap +; sections correctly. This is not intended to be a rigorous test of the +; StackMap format (see the stackmap tests for that). + +declare zeroext i1 @return_i1() + +define i1 @test(i32 addrspace(1)* %ptr) { +; CHECK-LABEL: test +; Do we see one spill for the local value and the store to the +; alloca? +; CHECK: subq $24, %rsp +; CHECK: movq $0, 8(%rsp) +; CHECK: movq %rdi, (%rsp) +; CHECK: callq return_i1 +; CHECK: addq $24, %rsp +; CHECK: retq +entry: + %metadata1 = alloca i32 addrspace(1)*, i32 2, align 8 + store i32 addrspace(1)* null, i32 addrspace(1)** %metadata1 +; NOTE: Currently NOT testing alloca lowering in the StackMap format. Its +; known to be broken. + %safepoint_token = tail call i32 (i1 ()*, i32, i32, ...)* @llvm.experimental.gc.statepoint.p0f_i1f(i1 ()* @return_i1, i32 0, i32 0, i32 2, i32 addrspace(1)* %ptr, i32 addrspace(1)* null) + %call1 = call zeroext i1 @llvm.experimental.gc.result.int.i1(i32 %safepoint_token) + %a = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 4, i32 4) + %b = call i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32 %safepoint_token, i32 5, i32 5) +; + ret i1 %call1 +} + +declare i32 @llvm.experimental.gc.statepoint.p0f_i1f(i1 ()*, i32, i32, ...) +declare i1 @llvm.experimental.gc.result.int.i1(i32) +declare i32 addrspace(1)* @llvm.experimental.gc.relocate.p1i32(i32, i32, i32) #3 + + +; CHECK-LABEL: .section .llvm_stackmaps +; CHECK-NEXT: __LLVM_StackMaps: +; Header +; CHECK-NEXT: .byte 1 +; CHECK-NEXT: .byte 0 +; CHECK-NEXT: .short 0 +; Num Functions +; CHECK-NEXT: .long 1 +; Num LargeConstants +; CHECK-NEXT: .long 0 +; Num Callsites +; CHECK-NEXT: .long 1 + +; Functions and stack size +; CHECK-NEXT: .quad test +; CHECK-NEXT: .quad 24 + +; Large Constants +; Statepoint ID only +; CHECK: .quad 2882400000 + +; Callsites +; Constant arguments +; CHECK: .long .Ltmp1-test +; CHECK: .short 0 +; CHECK: .short 8 +; SmallConstant (0) +; CHECK: .byte 4 +; CHECK: .byte 8 +; CHECK: .short 0 +; CHECK: .long 0 +; SmallConstant (2) +; CHECK: .byte 4 +; CHECK: .byte 8 +; CHECK: .short 0 +; CHECK: .long 2 +; Direct Spill Slot [RSP+0] +; CHECK: .byte 2 +; CHECK: .byte 8 +; CHECK: .short 7 +; CHECK: .long 0 +; SmallConstant (0) +; CHECK: .byte 4 +; CHECK: .byte 8 +; CHECK: .short 0 +; CHECK: .long 0 +; SmallConstant (0) +; CHECK: .byte 4 +; CHECK: .byte 8 +; CHECK: .short 0 +; CHECK: .long 0 +; SmallConstant (0) +; CHECK: .byte 4 +; CHECK: .byte 8 +; CHECK: .short 0 +; CHECK: .long 0 +; Direct Spill Slot [RSP+0] +; CHECK: .byte 2 +; CHECK: .byte 8 +; CHECK: .short 7 +; CHECK: .long 0 +; Direct Spill Slot [RSP+0] +; CHECK: .byte 2 +; CHECK: .byte 8 +; CHECK: .short 7 +; CHECK: .long 0 + +; No Padding or LiveOuts +; CHECK: .short 0 +; CHECK: .short 0 +; CHECK: .align 8 + + Index: test/Verifier/statepoint.ll =================================================================== --- /dev/null +++ test/Verifier/statepoint.ll @@ -0,0 +1,29 @@ +; RUN: opt -S %s -O3 | FileCheck %s +; This test catches two cases where the verifier was too strict: +; 1) A base doesn't need to be relocated if it's never used again +; 2) A value can be replaced by one which is known equal. This +; means a potentially derived pointer can be known base and that +; we can't check that derived pointer are never bases. + +declare void @use(...) +declare i64 addrspace(1)* @llvm.gc.relocate.p1i64(i32, i32, i32) +declare i32 @llvm.statepoint.p0f_isVoidf(void ()*, i32, i32, ...) + +define void @example(i8 addrspace(1)* %arg, i64 addrspace(1)* %arg2) { +entry: + %cast = bitcast i8 addrspace(1)* %arg to i64 addrspace(1)* + %c = icmp eq i64 addrspace(1)* %cast, %arg2 + br i1 %c, label %equal, label %notequal + +notequal: + ret void + +equal: +%safepoint_token = call i32 (void ()*, i32, i32, ...)* @llvm.statepoint.p0f_isVoidf(void ()* undef, i32 0, i32 0, i32 5, i32 0, i32 0, i32 0, i32 10, i32 0, i8 addrspace(1)* %arg, i64 addrspace(1)* %cast, i8 addrspace(1)* %arg, i8 addrspace(1)* %arg) +; CHECK-LABEL: equal +; CHECK: statepoint +; CHECK-NOT: cast + %reloc = call coldcc i64 addrspace(1)* @llvm.gc.relocate.p1i64(i32 %safepoint_token, i32 9, i32 10) + call coldcc void undef(i64 addrspace(1)* %reloc) + ret void +} Index: utils/TableGen/CodeGenTarget.cpp =================================================================== --- utils/TableGen/CodeGenTarget.cpp +++ utils/TableGen/CodeGenTarget.cpp @@ -302,6 +302,7 @@ "IMPLICIT_DEF", "SUBREG_TO_REG", "COPY_TO_REGCLASS", "DBG_VALUE", "REG_SEQUENCE", "COPY", "BUNDLE", "LIFETIME_START", "LIFETIME_END", "STACKMAP", "PATCHPOINT", "LOAD_STACK_GUARD", + "STATEPOINT", nullptr}; const DenseMap &Insts = getInstructions(); for (const char *const *p = FixedInstrs; *p; ++p) {