Index: docs/LangRef.rst
===================================================================
--- docs/LangRef.rst
+++ docs/LangRef.rst
@@ -2160,10 +2160,24 @@
 
 .. _singlethread:
 
-If an atomic operation is marked ``singlethread``, it only *synchronizes
-with* or participates in modification and seq\_cst total orderings with
-other operations running in the same thread (for example, in signal
-handlers).
+If an atomic operation is marked ``singlethread``, it only *synchronizes with*,
+and only participates in the seq\_cst total orderings, of other operations
+running in the same thread (for example, in signal handlers).
+
+.. _synchscope:
+
+If an atomic operation is marked ``syncscope(<n>)``, where ``<n>`` specifies a
+target specific synchronization scope ``A``, it only *synchronizes with*, and
+only participates in the seq\_cst total orderings, of other operations that
+specify a synchronization scope ``B`` running in threads that are members of the
+same dynamic instance of a synchronization scope of which both ``A`` and ``B``
+are members (for example, in languages such as OpenCL that support separate
+memory scopes for device, work-group and sub-group).
+
+Otherwise, an atomic operation that is not marked ``singlethread`` or
+``syncscope(<n>)`` *synchronizes with*, and participates in the global seq\_cst
+total ordering, of other operations that do not specify ``singlethread`` or
+``syncscope(<n>)`` running in any thread.
 
 .. _fastmath:
 
@@ -7014,7 +7028,7 @@
 ::
 
       <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !invariant.group !<index>][, !nonnull !<index>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>]
-      <result> = load atomic [volatile] <ty>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>]
+      <result> = load atomic [volatile] <ty>, <ty>* <pointer> [singlethread|synchscope(<n>)] <ordering>, align <alignment> [, !invariant.group !<index>]
       !<index> = !{ i32 1 }
       !<deref_bytes_node> = !{i64 <dereferenceable_bytes>}
       !<align_node> = !{ i64 <value_alignment> }
@@ -7035,14 +7049,14 @@
 :ref:`volatile operations <volatile>`.
 
 If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
-<ordering>` and optional ``singlethread`` argument. The ``release`` and
-``acq_rel`` orderings are not valid on ``load`` instructions. Atomic loads
-produce :ref:`defined <memmodel>` results when they may see multiple atomic
-stores. The type of the pointee must be an integer, pointer, or floating-point
-type whose bit width is a power of two greater than or equal to eight and less
-than or equal to a target-specific size limit.  ``align`` must be explicitly
-specified on atomic loads, and the load has undefined behavior if the alignment
-is not set to a value which is at least the size in bytes of the
+<ordering>` and optional ``singlethread`` or ``synchscope(<n>)`` argument. The
+``release`` and ``acq_rel`` orderings are not valid on ``load`` instructions.
+Atomic loads produce :ref:`defined <memmodel>` results when they may see
+multiple atomic stores. The type of the pointee must be an integer, pointer, or
+floating-point type whose bit width is a power of two greater than or equal to
+eight and less than or equal to a target-specific size limit.  ``align`` must be
+explicitly specified on atomic loads, and the load has undefined behavior if the
+alignment is not set to a value which is at least the size in bytes of the
 pointee. ``!nontemporal`` does not have any defined semantics for atomic loads.
 
 The optional constant ``align`` argument specifies the alignment of the
@@ -7145,7 +7159,7 @@
 ::
 
       store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.group !<index>]        ; yields void
-      store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>] ; yields void
+      store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread|synchscope(<n>)] <ordering>, align <alignment> [, !invariant.group !<index>] ; yields void
 
 Overview:
 """""""""
@@ -7165,14 +7179,14 @@
 structural type <t_opaque>`) can be stored.
 
 If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
-<ordering>` and optional ``singlethread`` argument. The ``acquire`` and
-``acq_rel`` orderings aren't valid on ``store`` instructions. Atomic loads
-produce :ref:`defined <memmodel>` results when they may see multiple atomic
-stores. The type of the pointee must be an integer, pointer, or floating-point
-type whose bit width is a power of two greater than or equal to eight and less
-than or equal to a target-specific size limit.  ``align`` must be explicitly
-specified on atomic stores, and the store has undefined behavior if the
-alignment is not set to a value which is at least the size in bytes of the
+<ordering>` and optional ``singlethread`` or ``synchscope(<n>)`` argument. The
+``acquire`` and ``acq_rel`` orderings aren't valid on ``store`` instructions.
+Atomic loads produce :ref:`defined <memmodel>` results when they may see
+multiple atomic stores. The type of the pointee must be an integer, pointer, or
+floating-point type whose bit width is a power of two greater than or equal to
+eight and less than or equal to a target-specific size limit.  ``align`` must be
+explicitly specified on atomic stores, and the store has undefined behavior if
+the alignment is not set to a value which is at least the size in bytes of the
 pointee. ``!nontemporal`` does not have any defined semantics for atomic stores.
 
 The optional constant ``align`` argument specifies the alignment of the
@@ -7233,7 +7247,7 @@
 
 ::
 
-      fence [singlethread] <ordering>                   ; yields void
+      fence [singlethread|synchscope(<n>)] <ordering>            ; yields void
 
 Overview:
 """""""""
@@ -7271,6 +7285,16 @@
 that the fence only synchronizes with other fences in the same thread.
 (This is useful for interacting with signal handlers.)
 
+The optional ``synchscope(<n>)`` argument, where ``<n>`` specifies a target
+specific synchronization scope ``A``, specifies that the fence only synchronizes
+with other fences that specify a synchronization scope ``B`` running in threads
+that are members of the same dynamic instance of a synchronization scope of
+which both ``A`` and ``B`` are members.
+
+Otherwise, fence that does not specify ``singlethread`` and ``synchscope(<n>)``
+argument synchronizes with other fences that do not specify ``singlethread`` and
+``synchscope(<n>)`` argument running in any thread.
+
 Example:
 """"""""
 
@@ -7278,6 +7302,7 @@
 
       fence acquire                          ; yields void
       fence singlethread seq_cst             ; yields void
+      fence synchscope(2) seq_cst            ; yields void
 
 .. _i_cmpxchg:
 
@@ -7289,7 +7314,7 @@
 
 ::
 
-      cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <success ordering> <failure ordering> ; yields  { ty, i1 }
+      cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread|synchscope(<n>)] <success ordering> <failure ordering> ; yields  { ty, i1 }
 
 Overview:
 """""""""
@@ -7320,8 +7345,17 @@
 
 The optional "``singlethread``" argument declares that the ``cmpxchg``
 is only atomic with respect to code (usually signal handlers) running in
-the same thread as the ``cmpxchg``. Otherwise the cmpxchg is atomic with
-respect to all other code in the system.
+the same thread as the ``cmpxchg``.
+
+The optional ``synchscope(<n>)`` argument, where ``<n>`` specifies a target
+specific synchronization scope ``A``, declares that the ``cmpxchg`` is only
+atomic with respect to code that specifies a synchronization scope ``B`` running
+in threads as the ``cmpxchg`` that are members of the same dynamic instance of a
+synchronization scope of which both ``A`` and ``B`` are members.
+
+Otherwise, ``cmpxchg`` that does not specify ``singlethread`` and
+``synchscope(<n>)`` argument is only atomic with respect to code that does not
+specify ``singlethread`` and ``synchscope(<n>)`` argument running in any thread.
 
 The pointer passed into cmpxchg must have alignment greater than or
 equal to the size in memory of the operand.
@@ -7375,7 +7409,7 @@
 
 ::
 
-      atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering>                   ; yields ty
+      atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread|synchscope(<n>)] <ordering>                   ; yields ty
 
 Overview:
 """""""""
@@ -7409,6 +7443,20 @@
 order of execution of this ``atomicrmw`` with other :ref:`volatile
 operations <volatile>`.
 
+The optional "``singlethread``" argument declares that the ``atomicrmw``
+is only atomic with respect to code (usually signal handlers) running in
+the same thread as the ``atomicrmw``.
+
+The optional ``synchscope(<n>)`` argument, where ``<n>`` specifies a target
+specific synchronization scope ``A``, declares that the ``atomicrmw`` is only
+atomic with respect to code that specifies a synchronization scope ``B`` running
+in threads as the ``atomicrmw`` that are members of the same dynamic instance of
+a synchronization scope of which both ``A`` and ``B`` are members.
+
+Otherwise, ``atomicrmw`` that does not specify ``singlethread`` and
+``synchscope(<n>)`` argument is only atomic with respect to code that does not
+specify ``singlethread`` and ``synchscope(<n>)`` argument running in any thread.
+
 Semantics:
 """"""""""
 
Index: include/llvm/Bitcode/LLVMBitCodes.h
===================================================================
--- include/llvm/Bitcode/LLVMBitCodes.h
+++ include/llvm/Bitcode/LLVMBitCodes.h
@@ -370,9 +370,15 @@
 };
 
 /// Encoded SynchronizationScope values.
-enum AtomicSynchScopeCodes {
+enum AtomicSynchScopeCodes : unsigned {
+  /// Encoded value for SingleThread synchronization scope.
   SYNCHSCOPE_SINGLETHREAD = 0,
-  SYNCHSCOPE_CROSSTHREAD = 1
+
+  /// Encoded value for CrossThread synchronization scope.
+  SYNCHSCOPE_CROSSTHREAD = 1,
+
+  /// First encoded value for target specific synchronization scope.
+  SYNCHSCOPE_FIRSTTARGETSPECIFIC = 2
 };
 
 /// Markers and flags for call instruction.
Index: include/llvm/CodeGen/SelectionDAGNodes.h
===================================================================
--- include/llvm/CodeGen/SelectionDAGNodes.h
+++ include/llvm/CodeGen/SelectionDAGNodes.h
@@ -432,7 +432,6 @@
     uint16_t IsVolatile : 1;
     uint16_t IsNonTemporal : 1;
     uint16_t IsInvariant : 1;
-    uint16_t SynchScope : 1; // enum SynchronizationScope
     uint16_t Ordering : 4;   // enum AtomicOrdering
   };
   enum { NumMemSDNodeBits = NumSDNodeBits + 8 };
@@ -1077,6 +1076,10 @@
   /// Memory reference information.
   MachineMemOperand *MMO;
 
+  /// The synchronization scope of this memory operation. Not quite enough room
+  /// in SubclassData for everything, so synch scope gets its own field.
+  SynchronizationScope SynchScope;
+
 public:
   MemSDNode(unsigned Opc, unsigned Order, const DebugLoc &dl, SDVTList VTs,
             EVT MemoryVT, MachineMemOperand *MMO);
@@ -1109,11 +1112,14 @@
   bool isNonTemporal() const { return MemSDNodeBits.IsNonTemporal; }
   bool isInvariant() const { return MemSDNodeBits.IsInvariant; }
 
+  /// Returns the atomic ordering requirements for this memory operation.
   AtomicOrdering getOrdering() const {
     return static_cast<AtomicOrdering>(MemSDNodeBits.Ordering);
   }
+
+  /// Returns the synchronization scope for this memory operation.
   SynchronizationScope getSynchScope() const {
-    return static_cast<SynchronizationScope>(MemSDNodeBits.SynchScope);
+    return SynchScope;
   }
 
   // Returns the offset from the location of the access.
@@ -1187,8 +1193,9 @@
 
 /// This is an SDNode representing atomic operations.
 class AtomicSDNode : public MemSDNode {
-  /// For cmpxchg instructions, the ordering requirements when a store does not
-  /// occur.
+  /// For cmpxchg atomic operations, the atomic ordering requirements when store
+  /// does not occur. Not quite enough room in SubclassData for everything, so
+  /// failure ordering gets its own field.
   AtomicOrdering FailureOrdering;
 
   void InitAtomic(AtomicOrdering SuccessOrdering,
@@ -1196,9 +1203,8 @@
                   SynchronizationScope SynchScope) {
     MemSDNodeBits.Ordering = static_cast<uint16_t>(SuccessOrdering);
     assert(getOrdering() == SuccessOrdering && "Value truncated");
-    MemSDNodeBits.SynchScope = static_cast<uint16_t>(SynchScope);
-    assert(getSynchScope() == SynchScope && "Value truncated");
     this->FailureOrdering = FailureOrdering;
+    this->SynchScope = SynchScope;
   }
 
 public:
@@ -1213,12 +1219,14 @@
   const SDValue &getBasePtr() const { return getOperand(1); }
   const SDValue &getVal() const { return getOperand(2); }
 
+  /// For cmpxchg atomic operations, returns the atomic ordering requirements
+  /// when store occurs.
   AtomicOrdering getSuccessOrdering() const {
     return getOrdering();
   }
 
-  // Not quite enough room in SubclassData for everything, so failure gets its
-  // own field.
+  /// For cmpxchg atomic operations, returns the atomic ordering requirements
+  /// when store does not occur.
   AtomicOrdering getFailureOrdering() const {
     return FailureOrdering;
   }
Index: include/llvm/IR/Instructions.h
===================================================================
--- include/llvm/IR/Instructions.h
+++ include/llvm/IR/Instructions.h
@@ -37,9 +37,16 @@
 class DataLayout;
 class LLVMContext;
 
-enum SynchronizationScope {
+/// Prededined synchronization scopes.
+enum SynchronizationScope : unsigned {
+  /// Synchronized with respect to signal handlers executing in the same thread.
   SingleThread = 0,
-  CrossThread = 1
+
+  /// Synchronized with respect to all concurrently executing threads.
+  CrossThread = 1,
+
+  /// First target specific synchronization scope.
+  SynchronizationScopeFirstTargetSpecific = 2
 };
 
 //===----------------------------------------------------------------------===//
@@ -219,30 +226,30 @@
 
   void setAlignment(unsigned Align);
 
-  /// Returns the ordering effect of this fence.
+  /// Returns the ordering constraint of this load instruction.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 7) & 7);
   }
 
-  /// Set the ordering constraint on this load. May not be Release or
-  /// AcquireRelease.
+  /// Sets the ordering constraint on this load instruction. May not be Release
+  /// or AcquireRelease.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 7)) |
                                ((unsigned)Ordering << 7));
   }
 
+  /// Returns the synchronization scope of this load instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() >> 6) & 1);
+    return SynchScope;
   }
 
-  /// Specify whether this load is ordered with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope xthread) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~(1 << 6)) |
-                               (xthread << 6));
+  /// Sets the synchronization scope on this load instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
+  /// Sets the ordering constraint and synchronization scope on this load
+  /// instruction.
   void setAtomic(AtomicOrdering Ordering,
                  SynchronizationScope SynchScope = CrossThread) {
     setOrdering(Ordering);
@@ -279,6 +286,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this load instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 //===----------------------------------------------------------------------===//
@@ -337,30 +349,30 @@
 
   void setAlignment(unsigned Align);
 
-  /// Returns the ordering effect of this store.
+  /// Returns the ordering constraint of this store instruction.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 7) & 7);
   }
 
-  /// Set the ordering constraint on this store.  May not be Acquire or
-  /// AcquireRelease.
+  /// Sets the ordering constraint on this store instruction. May not be Acquire
+  /// or AcquireRelease.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 7)) |
                                ((unsigned)Ordering << 7));
   }
 
+  /// Returns the synchronization scope of this store instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() >> 6) & 1);
+    return SynchScope;
   }
 
-  /// Specify whether this store instruction is ordered with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope xthread) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~(1 << 6)) |
-                               (xthread << 6));
+  /// Sets the synchronization scope on this store instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
+  /// Sets the ordering constraint and synchronization scope on this store
+  /// instruction.
   void setAtomic(AtomicOrdering Ordering,
                  SynchronizationScope SynchScope = CrossThread) {
     setOrdering(Ordering);
@@ -400,6 +412,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this store instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 template <>
@@ -437,28 +454,26 @@
             SynchronizationScope SynchScope,
             BasicBlock *InsertAtEnd);
 
-  /// Returns the ordering effect of this fence.
+  /// Returns the ordering constraint of this fence instruction.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering(getSubclassDataFromInstruction() >> 1);
   }
 
-  /// Set the ordering constraint on this fence.  May only be Acquire, Release,
-  /// AcquireRelease, or SequentiallyConsistent.
+  /// Sets the ordering constraint on this fence instruction. May only be
+  /// Acquire, Release, AcquireRelease, or SequentiallyConsistent.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & 1) |
                                ((unsigned)Ordering << 1));
   }
 
+  /// Returns the synchronization scope of this fence instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope(getSubclassDataFromInstruction() & 1);
+    return SynchScope;
   }
 
-  /// Specify whether this fence orders other operations with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope xthread) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
-                               xthread);
+  /// Sets the synchronization scope on this fence instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
@@ -475,6 +490,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this fence instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 //===----------------------------------------------------------------------===//
@@ -539,7 +559,14 @@
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
-  /// Set the ordering constraint on this cmpxchg.
+  /// Returns the ordering constraint of this cmpxchg instruction when store
+  /// occurs.
+  AtomicOrdering getSuccessOrdering() const {
+    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
+  }
+
+  /// Sets the ordering constraint on this cmpxchg instruction when store
+  /// occurs.
   void setSuccessOrdering(AtomicOrdering Ordering) {
     assert(Ordering != AtomicOrdering::NotAtomic &&
            "CmpXchg instructions can only be atomic.");
@@ -547,6 +574,14 @@
                                ((unsigned)Ordering << 2));
   }
 
+  /// Returns the ordering constraint of this cmpxchg instruction when store
+  /// does not occur.
+  AtomicOrdering getFailureOrdering() const {
+    return AtomicOrdering((getSubclassDataFromInstruction() >> 5) & 7);
+  }
+
+  /// Sets the ordering constraint on this cmpxchg instruction when store
+  /// does not occur.
   void setFailureOrdering(AtomicOrdering Ordering) {
     assert(Ordering != AtomicOrdering::NotAtomic &&
            "CmpXchg instructions can only be atomic.");
@@ -554,28 +589,14 @@
                                ((unsigned)Ordering << 5));
   }
 
-  /// Specify whether this cmpxchg is atomic and orders other operations with
-  /// respect to all concurrently executing threads, or only with respect to
-  /// signal handlers executing in the same thread.
-  void setSynchScope(SynchronizationScope SynchScope) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~2) |
-                               (SynchScope << 1));
-  }
-
-  /// Returns the ordering constraint on this cmpxchg.
-  AtomicOrdering getSuccessOrdering() const {
-    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
-  }
-
-  /// Returns the ordering constraint on this cmpxchg.
-  AtomicOrdering getFailureOrdering() const {
-    return AtomicOrdering((getSubclassDataFromInstruction() >> 5) & 7);
+  /// Returns the synchronization scope of this cmpxchg instruction.
+  SynchronizationScope getSynchScope() const {
+    return SynchScope;
   }
 
-  /// Returns whether this cmpxchg is atomic between threads or only within a
-  /// single thread.
-  SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() & 2) >> 1);
+  /// Sets the synchronization scope on this cmpxchg instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
   Value *getPointerOperand() { return getOperand(0); }
@@ -630,6 +651,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this cmpxchg instruction. Not quite enough
+  /// room in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 template <>
@@ -726,7 +752,12 @@
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
-  /// Set the ordering constraint on this RMW.
+  /// Returns the ordering constraint of this RMW instruction.
+  AtomicOrdering getOrdering() const {
+    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
+  }
+
+  /// Sets the ordering constraint on this RMW instruction.
   void setOrdering(AtomicOrdering Ordering) {
     assert(Ordering != AtomicOrdering::NotAtomic &&
            "atomicrmw instructions can only be atomic.");
@@ -734,25 +765,16 @@
                                ((unsigned)Ordering << 2));
   }
 
-  /// Specify whether this RMW orders other operations with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope SynchScope) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~2) |
-                               (SynchScope << 1));
-  }
-
-  /// Returns the ordering constraint on this RMW.
-  AtomicOrdering getOrdering() const {
-    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
-  }
-
-  /// Returns whether this RMW is atomic between threads or only within a
-  /// single thread.
+  /// Returns the synchronization scope of this RMW instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() & 2) >> 1);
+    return SynchScope;
   }
 
+  /// Sets the synchronization scope on this RMW instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
+   }
+
   Value *getPointerOperand() { return getOperand(0); }
   const Value *getPointerOperand() const { return getOperand(0); }
   static unsigned getPointerOperandIndex() { return 0U; }
@@ -781,6 +803,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this RMW instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 template <>
Index: lib/AsmParser/LLLexer.cpp
===================================================================
--- lib/AsmParser/LLLexer.cpp
+++ lib/AsmParser/LLLexer.cpp
@@ -541,6 +541,7 @@
   KEYWORD(acq_rel);
   KEYWORD(seq_cst);
   KEYWORD(singlethread);
+  KEYWORD(synchscope);
 
   KEYWORD(nnan);
   KEYWORD(ninf);
Index: lib/AsmParser/LLParser.h
===================================================================
--- lib/AsmParser/LLParser.h
+++ lib/AsmParser/LLParser.h
@@ -239,6 +239,7 @@
     bool ParseOptionalDerefAttrBytes(lltok::Kind AttrKind, uint64_t &Bytes);
     bool ParseScopeAndOrdering(bool isAtomic, SynchronizationScope &Scope,
                                AtomicOrdering &Ordering);
+    bool ParseScope(SynchronizationScope &Scope);
     bool ParseOrdering(AtomicOrdering &Ordering);
     bool ParseOptionalStackAlignment(unsigned &Alignment);
     bool ParseOptionalCommaAlign(unsigned &Alignment, bool &AteExtraComma);
Index: lib/AsmParser/LLParser.cpp
===================================================================
--- lib/AsmParser/LLParser.cpp
+++ lib/AsmParser/LLParser.cpp
@@ -1880,8 +1880,10 @@
 }
 
 /// ParseScopeAndOrdering
-///   if isAtomic: ::= 'singlethread'? AtomicOrdering
-///   else: ::=
+///   if isAtomic:
+///     ::= 'singlethread' or 'synchscope' '(' uint32 ')'? AtomicOrdering
+///   else
+///     ::=
 ///
 /// This sets Scope and Ordering to the parsed values.
 bool LLParser::ParseScopeAndOrdering(bool isAtomic, SynchronizationScope &Scope,
@@ -1889,11 +1891,41 @@
   if (!isAtomic)
     return false;
 
+  return ParseScope(Scope) || ParseOrdering(Ordering);
+}
+
+/// ParseScope
+///   ::= /* empty */
+///   ::= 'singlethread'
+///   ::= 'synchscope' '(' uint32 ')'
+///
+/// This sets Scope to the parsed value.
+bool LLParser::ParseScope(SynchronizationScope &Scope) {
+  if (EatIfPresent(lltok::kw_synchscope)) {
+    auto StartParen = Lex.getLoc();
+    if (!EatIfPresent(lltok::lparen))
+      return Error(StartParen, "expected '(' in synchscope");
+
+    unsigned ScopeU32 = 0;
+    auto ScopeU32At = Lex.getLoc();
+    if (ParseUInt32(ScopeU32))
+      return true;
+    if (ScopeU32 < SynchronizationScopeFirstTargetSpecific)
+      return Error(ScopeU32At, "invalid target specific synchronization scope");
+
+    auto EndParen = Lex.getLoc();
+    if (!EatIfPresent(lltok::rparen))
+      return Error(EndParen, "expected ')' in synchscope");
+
+    Scope = SynchronizationScope(ScopeU32);
+    return false;
+  }
+
   Scope = CrossThread;
   if (EatIfPresent(lltok::kw_singlethread))
     Scope = SingleThread;
 
-  return ParseOrdering(Ordering);
+  return false;
 }
 
 /// ParseOrdering
Index: lib/AsmParser/LLToken.h
===================================================================
--- lib/AsmParser/LLToken.h
+++ lib/AsmParser/LLToken.h
@@ -94,6 +94,8 @@
   kw_acq_rel,
   kw_seq_cst,
   kw_singlethread,
+  kw_synchscope,
+
   kw_nnan,
   kw_ninf,
   kw_nsz,
Index: lib/Bitcode/Reader/BitcodeReader.cpp
===================================================================
--- lib/Bitcode/Reader/BitcodeReader.cpp
+++ lib/Bitcode/Reader/BitcodeReader.cpp
@@ -913,9 +913,13 @@
 }
 
 static SynchronizationScope getDecodedSynchScope(unsigned Val) {
+  if (Val >= bitc::SYNCHSCOPE_FIRSTTARGETSPECIFIC)
+    return SynchronizationScope(SynchronizationScopeFirstTargetSpecific +
+      (Val - bitc::SYNCHSCOPE_FIRSTTARGETSPECIFIC));
+
   switch (Val) {
+  default: llvm_unreachable("Invalid synch scope");
   case bitc::SYNCHSCOPE_SINGLETHREAD: return SingleThread;
-  default: // Map unknown scopes to cross-thread.
   case bitc::SYNCHSCOPE_CROSSTHREAD: return CrossThread;
   }
 }
Index: lib/Bitcode/Writer/BitcodeWriter.cpp
===================================================================
--- lib/Bitcode/Writer/BitcodeWriter.cpp
+++ lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -590,11 +590,15 @@
 }
 
 static unsigned getEncodedSynchScope(SynchronizationScope SynchScope) {
+  if (SynchScope >= SynchronizationScopeFirstTargetSpecific)
+    return unsigned(bitc::SYNCHSCOPE_FIRSTTARGETSPECIFIC +
+      (SynchScope - SynchronizationScopeFirstTargetSpecific));
+
   switch (SynchScope) {
+  default: llvm_unreachable("Invalid synch scope");
   case SingleThread: return bitc::SYNCHSCOPE_SINGLETHREAD;
   case CrossThread: return bitc::SYNCHSCOPE_CROSSTHREAD;
   }
-  llvm_unreachable("Invalid synch scope");
 }
 
 void ModuleBitcodeWriter::writeStringRecord(unsigned Code, StringRef Str,
Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -6686,7 +6686,8 @@
 
 MemSDNode::MemSDNode(unsigned Opc, unsigned Order, const DebugLoc &dl,
                      SDVTList VTs, EVT memvt, MachineMemOperand *mmo)
-    : SDNode(Opc, Order, dl, VTs), MemoryVT(memvt), MMO(mmo) {
+    : SDNode(Opc, Order, dl, VTs), MemoryVT(memvt), MMO(mmo),
+      SynchScope(CrossThread) {
   MemSDNodeBits.IsVolatile = MMO->isVolatile();
   MemSDNodeBits.IsNonTemporal = MMO->isNonTemporal();
   MemSDNodeBits.IsInvariant = MMO->isInvariant();
Index: lib/IR/AsmWriter.cpp
===================================================================
--- lib/IR/AsmWriter.cpp
+++ lib/IR/AsmWriter.cpp
@@ -2122,9 +2122,18 @@
   if (Ordering == AtomicOrdering::NotAtomic)
     return;
 
-  switch (SynchScope) {
-  case SingleThread: Out << " singlethread"; break;
-  case CrossThread: break;
+  if (SynchScope >= SynchronizationScopeFirstTargetSpecific) {
+    Out << " synchscope(" << unsigned(SynchScope) << ')';
+  } else {
+    switch (SynchScope) {
+    case SingleThread:
+      Out << " singlethread";
+      break;
+    case CrossThread:
+      break;
+    default:
+      llvm_unreachable("Unknown SynchScope");
+    }
   }
 
   Out << " " << toIRString(Ordering);
@@ -2136,9 +2145,18 @@
   assert(SuccessOrdering != AtomicOrdering::NotAtomic &&
          FailureOrdering != AtomicOrdering::NotAtomic);
 
-  switch (SynchScope) {
-  case SingleThread: Out << " singlethread"; break;
-  case CrossThread: break;
+  if (SynchScope >= SynchronizationScopeFirstTargetSpecific) {
+    Out << " synchscope(" << unsigned(SynchScope) << ')';
+  } else {
+    switch (SynchScope) {
+    case SingleThread:
+      Out << " singlethread";
+      break;
+    case CrossThread:
+      break;
+    default:
+      llvm_unreachable("Unknown SynchScope");
+    }
   }
 
   Out << " " << toIRString(SuccessOrdering);
Index: lib/Target/AMDGPU/AMDGPU.h
===================================================================
--- lib/Target/AMDGPU/AMDGPU.h
+++ lib/Target/AMDGPU/AMDGPU.h
@@ -11,6 +11,8 @@
 #ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPU_H
 #define LLVM_LIB_TARGET_AMDGPU_AMDGPU_H
 
+#include "llvm/IR/Instructions.h"
+
 namespace llvm {
 
 class AMDGPUTargetMachine;
@@ -162,4 +164,35 @@
 
 } // namespace AMDGPUAS
 
+/// AMDGPU-specific synchronization scopes.
+enum class AMDGPUSynchronizationScope : unsigned {
+  /// Synchronized with respect to the entire system, which includes all
+  /// work-items on all agents executing kernel dispatches for the same
+  /// application process, together with all agents executing the same
+  /// application process as the executing work-item. Only supported for the
+  /// global segment.
+  System = llvm::CrossThread,
+
+  /// Synchronized with respect to signal handlers executing in the same
+  /// work-item.
+  SignalHandler = llvm::SingleThread,
+
+  /// Synchronized with respect to the agent, which includes all work-items on
+  /// the same agent executing kernel dispatches for the same application
+  /// process as the executing work-item. Only supported for the global segment.
+  Agent = llvm::SynchronizationScopeFirstTargetSpecific,
+
+  /// Synchronized with respect to the work-group, which includes all work-items
+  /// in the same work-group as the executing work-item.
+  WorkGroup,
+
+  /// Synchronized with respect to the wavefront, which includes all work-items
+  /// in the same wavefront as the executing work-item.
+  Wavefront,
+
+  /// Synchronized with respect to image fence instruction executing in the same
+  /// work-item.
+  Image
+};
+
 #endif
Index: lib/Target/SystemZ/SystemZISelLowering.cpp
===================================================================
--- lib/Target/SystemZ/SystemZISelLowering.cpp
+++ lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -3179,7 +3179,7 @@
   // The only fence that needs an instruction is a sequentially-consistent
   // cross-thread fence.
   if (FenceOrdering == AtomicOrdering::SequentiallyConsistent &&
-      FenceScope == CrossThread) {
+      FenceScope != SingleThread) {
     return SDValue(DAG.getMachineNode(SystemZ::Serialize, DL, MVT::Other,
                                       Op.getOperand(0)),
                    0);
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -21052,7 +21052,7 @@
   // The only fence that needs an instruction is a sequentially-consistent
   // cross-thread fence.
   if (FenceOrdering == AtomicOrdering::SequentiallyConsistent &&
-      FenceScope == CrossThread) {
+      FenceScope != SingleThread) {
     if (Subtarget.hasMFence())
       return DAG.getNode(X86ISD::MFENCE, dl, MVT::Other, Op.getOperand(0));
 
Index: lib/Transforms/Instrumentation/ThreadSanitizer.cpp
===================================================================
--- lib/Transforms/Instrumentation/ThreadSanitizer.cpp
+++ lib/Transforms/Instrumentation/ThreadSanitizer.cpp
@@ -366,9 +366,9 @@
 
 static bool isAtomic(Instruction *I) {
   if (LoadInst *LI = dyn_cast<LoadInst>(I))
-    return LI->isAtomic() && LI->getSynchScope() == CrossThread;
+    return LI->isAtomic() && LI->getSynchScope() != SingleThread;
   if (StoreInst *SI = dyn_cast<StoreInst>(I))
-    return SI->isAtomic() && SI->getSynchScope() == CrossThread;
+    return SI->isAtomic() && SI->getSynchScope() != SingleThread;
   if (isa<AtomicRMWInst>(I))
     return true;
   if (isa<AtomicCmpXchgInst>(I))
Index: test/Assembler/atomic.ll
===================================================================
--- test/Assembler/atomic.ll
+++ test/Assembler/atomic.ll
@@ -7,10 +7,14 @@
   load atomic i32, i32* %x unordered, align 4
   ; CHECK: load atomic volatile i32, i32* %x singlethread acquire, align 4
   load atomic volatile i32, i32* %x singlethread acquire, align 4
+  ; CHECK: load atomic volatile i32, i32* %x synchscope(2) acquire, align 4
+  load atomic volatile i32, i32* %x synchscope(2) acquire, align 4
   ; CHECK: store atomic i32 3, i32* %x release, align 4
   store atomic i32 3, i32* %x release, align 4
   ; CHECK: store atomic volatile i32 3, i32* %x singlethread monotonic, align 4
   store atomic volatile i32 3, i32* %x singlethread monotonic, align 4
+  ; CHECK: store atomic volatile i32 3, i32* %x synchscope(3) monotonic, align 4
+  store atomic volatile i32 3, i32* %x synchscope(3) monotonic, align 4
   ; CHECK: cmpxchg i32* %x, i32 1, i32 0 singlethread monotonic monotonic
   cmpxchg i32* %x, i32 1, i32 0 singlethread monotonic monotonic
   ; CHECK: cmpxchg volatile i32* %x, i32 0, i32 1 acq_rel acquire
@@ -19,13 +23,19 @@
   cmpxchg i32* %x, i32 42, i32 0 acq_rel monotonic
   ; CHECK: cmpxchg weak i32* %x, i32 13, i32 0 seq_cst monotonic
   cmpxchg weak i32* %x, i32 13, i32 0 seq_cst monotonic
+  ; CHECK: cmpxchg weak i32* %x, i32 13, i32 0 synchscope(4) seq_cst monotonic
+  cmpxchg weak i32* %x, i32 13, i32 0 synchscope(4) seq_cst monotonic
   ; CHECK: atomicrmw add i32* %x, i32 10 seq_cst
   atomicrmw add i32* %x, i32 10 seq_cst
   ; CHECK: atomicrmw volatile xchg  i32* %x, i32 10 monotonic
   atomicrmw volatile xchg i32* %x, i32 10 monotonic
+  ; CHECK: atomicrmw volatile xchg i32* %x, i32 10 synchscope(5) monotonic
+  atomicrmw volatile xchg i32* %x, i32 10 synchscope(5) monotonic
   ; CHECK: fence singlethread release
   fence singlethread release
   ; CHECK: fence seq_cst
   fence seq_cst
+  ; CHECK: fence synchscope(6) seq_cst
+  fence synchscope(6) seq_cst
   ret void
 }