Index: docs/LangRef.rst
===================================================================
--- docs/LangRef.rst
+++ docs/LangRef.rst
@@ -2160,10 +2160,23 @@
 
 .. _singlethread:
 
-If an atomic operation is marked ``singlethread``, it only *synchronizes
-with* or participates in modification and seq\_cst total orderings with
-other operations running in the same thread (for example, in signal
-handlers).
+If an atomic operation is marked ``singlethread``, it only *synchronizes with*,
+and only participates in the seq\_cst total orderings of, other operations
+running in the same thread (for example, in signal handlers).
+
+.. _syncscope:
+
+If an atomic operation is marked ``syncscope([ss0])``, then it
+*synchronizes with*, and participates in the seq\_cst total orderings of, other
+atomic operations marked ``syncscope([ss0])``. It is target defined how it
+interacts with atomic operations marked ``singlethread``, marked
+``syncscope([ss1])`` where ``[ss0] != [ss1]``, or not marked ``singlethread`` or
+``syncscope([ss0])``.
+
+Otherwise, an atomic operation that is not marked ``singlethread`` or
+``syncscope([ss0])`` *synchronizes with*, and participates in the global
+seq\_cst total orderings of, other operations that are not marked
+``singlethread`` or ``syncscope([ss0])``.
 
 .. _fastmath:
 
@@ -7251,7 +7264,7 @@
 ::
 
       <result> = load [volatile] <ty>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.load !<index>][, !invariant.group !<index>][, !nonnull !<index>][, !dereferenceable !<deref_bytes_node>][, !dereferenceable_or_null !<deref_bytes_node>][, !align !<align_node>]
-      <result> = load atomic [volatile] <ty>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>]
+      <result> = load atomic [volatile] <ty>, <ty>* <pointer> [singlethread|syncscope([ss])] <ordering>, align <alignment> [, !invariant.group !<index>]
       !<index> = !{ i32 1 }
       !<deref_bytes_node> = !{i64 <dereferenceable_bytes>}
       !<align_node> = !{ i64 <value_alignment> }
@@ -7272,14 +7285,14 @@
 :ref:`volatile operations <volatile>`.
 
 If the ``load`` is marked as ``atomic``, it takes an extra :ref:`ordering
-<ordering>` and optional ``singlethread`` argument. The ``release`` and
-``acq_rel`` orderings are not valid on ``load`` instructions. Atomic loads
-produce :ref:`defined <memmodel>` results when they may see multiple atomic
-stores. The type of the pointee must be an integer, pointer, or floating-point
-type whose bit width is a power of two greater than or equal to eight and less
-than or equal to a target-specific size limit.  ``align`` must be explicitly
-specified on atomic loads, and the load has undefined behavior if the alignment
-is not set to a value which is at least the size in bytes of the
+<ordering>` and optional ``singlethread`` or ``syncscope([ss])`` argument. The
+``release`` and ``acq_rel`` orderings are not valid on ``load`` instructions.
+Atomic loads produce :ref:`defined <memmodel>` results when they may see
+multiple atomic stores. The type of the pointee must be an integer, pointer, or
+floating-point type whose bit width is a power of two greater than or equal to
+eight and less than or equal to a target-specific size limit.  ``align`` must be
+explicitly specified on atomic loads, and the load has undefined behavior if the
+alignment is not set to a value which is at least the size in bytes of the
 pointee. ``!nontemporal`` does not have any defined semantics for atomic loads.
 
 The optional constant ``align`` argument specifies the alignment of the
@@ -7380,7 +7393,7 @@
 ::
 
       store [volatile] <ty> <value>, <ty>* <pointer>[, align <alignment>][, !nontemporal !<index>][, !invariant.group !<index>]        ; yields void
-      store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread] <ordering>, align <alignment> [, !invariant.group !<index>] ; yields void
+      store atomic [volatile] <ty> <value>, <ty>* <pointer> [singlethread|syncscope([ss])] <ordering>, align <alignment> [, !invariant.group !<index>] ; yields void
 
 Overview:
 """""""""
@@ -7400,14 +7413,14 @@
 structural type <t_opaque>`) can be stored.
 
 If the ``store`` is marked as ``atomic``, it takes an extra :ref:`ordering
-<ordering>` and optional ``singlethread`` argument. The ``acquire`` and
-``acq_rel`` orderings aren't valid on ``store`` instructions. Atomic loads
-produce :ref:`defined <memmodel>` results when they may see multiple atomic
-stores. The type of the pointee must be an integer, pointer, or floating-point
-type whose bit width is a power of two greater than or equal to eight and less
-than or equal to a target-specific size limit.  ``align`` must be explicitly
-specified on atomic stores, and the store has undefined behavior if the
-alignment is not set to a value which is at least the size in bytes of the
+<ordering>` and optional ``singlethread`` or ``syncscope([ss])`` argument. The
+``acquire`` and ``acq_rel`` orderings aren't valid on ``store`` instructions.
+Atomic loads produce :ref:`defined <memmodel>` results when they may see
+multiple atomic stores. The type of the pointee must be an integer, pointer, or
+floating-point type whose bit width is a power of two greater than or equal to
+eight and less than or equal to a target-specific size limit.  ``align`` must be
+explicitly specified on atomic stores, and the store has undefined behavior if
+the alignment is not set to a value which is at least the size in bytes of the
 pointee. ``!nontemporal`` does not have any defined semantics for atomic stores.
 
 The optional constant ``align`` argument specifies the alignment of the
@@ -7468,7 +7481,7 @@
 
 ::
 
-      fence [singlethread] <ordering>                   ; yields void
+      fence [singlethread|syncscope([ss])] <ordering>            ; yields void
 
 Overview:
 """""""""
@@ -7502,17 +7515,17 @@
 ``acquire`` and ``release`` semantics specified above, participates in
 the global program order of other ``seq_cst`` operations and/or fences.
 
-The optional ":ref:`singlethread <singlethread>`" argument specifies
-that the fence only synchronizes with other fences in the same thread.
-(This is useful for interacting with signal handlers.)
+A ``fence`` instruction can also take an optional
+":ref:`singlethread <singlethread>`" or ":ref:`syncscope <syncscope>`" argument.
 
 Example:
 """"""""
 
 .. code-block:: llvm
 
-      fence acquire                          ; yields void
-      fence singlethread seq_cst             ; yields void
+      fence acquire                           ; yields void
+      fence singlethread seq_cst              ; yields void
+      fence syncscope(amdgpu_agent) seq_cst   ; yields void
 
 .. _i_cmpxchg:
 
@@ -7524,7 +7537,7 @@
 
 ::
 
-      cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread] <success ordering> <failure ordering> ; yields  { ty, i1 }
+      cmpxchg [weak] [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new> [singlethread|syncscope([ss])] <success ordering> <failure ordering> ; yields  { ty, i1 }
 
 Overview:
 """""""""
@@ -7553,10 +7566,8 @@
 stronger than that on success, and the failure ordering cannot be either
 ``release`` or ``acq_rel``.
 
-The optional "``singlethread``" argument declares that the ``cmpxchg``
-is only atomic with respect to code (usually signal handlers) running in
-the same thread as the ``cmpxchg``. Otherwise the cmpxchg is atomic with
-respect to all other code in the system.
+A ``cmpxchg`` instruction can also take an optional
+":ref:`singlethread <singlethread>`" or ":ref:`syncscope <syncscope>`" argument.
 
 The pointer passed into cmpxchg must have alignment greater than or
 equal to the size in memory of the operand.
@@ -7610,7 +7621,7 @@
 
 ::
 
-      atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread] <ordering>                   ; yields ty
+      atomicrmw [volatile] <operation> <ty>* <pointer>, <ty> <value> [singlethread|syncscope([ss])] <ordering>                   ; yields ty
 
 Overview:
 """""""""
@@ -7644,6 +7655,9 @@
 order of execution of this ``atomicrmw`` with other :ref:`volatile
 operations <volatile>`.
 
+A ``atomicrmw`` instruction can also take an optional
+":ref:`singlethread <singlethread>`" or ":ref:`syncscope <syncscope>`" argument.
+
 Semantics:
 """"""""""
 
Index: include/llvm/Bitcode/LLVMBitCodes.h
===================================================================
--- include/llvm/Bitcode/LLVMBitCodes.h
+++ include/llvm/Bitcode/LLVMBitCodes.h
@@ -387,9 +387,15 @@
 };
 
 /// Encoded SynchronizationScope values.
-enum AtomicSynchScopeCodes {
+enum AtomicSynchScopeCodes : uint8_t {
+  /// Encoded value for SingleThread synchronization scope.
   SYNCHSCOPE_SINGLETHREAD = 0,
-  SYNCHSCOPE_CROSSTHREAD = 1
+
+  /// Encoded value for CrossThread synchronization scope.
+  SYNCHSCOPE_CROSSTHREAD = 1,
+
+  /// First encoded value for target-specific synchronization scope.
+  SYNCHSCOPE_FIRSTTARGET= 2
 };
 
 /// Markers and flags for call instruction.
Index: include/llvm/CodeGen/SelectionDAGNodes.h
===================================================================
--- include/llvm/CodeGen/SelectionDAGNodes.h
+++ include/llvm/CodeGen/SelectionDAGNodes.h
@@ -1124,6 +1124,10 @@
   /// Memory reference information.
   MachineMemOperand *MMO;
 
+  /// The synchronization scope of this memory operation. Not quite enough room
+  /// in SubclassData for everything, so synch scope gets its own field.
+  SynchronizationScope SynchScope;
+
 public:
   MemSDNode(unsigned Opc, unsigned Order, const DebugLoc &dl, SDVTList VTs,
             EVT MemoryVT, MachineMemOperand *MMO);
Index: include/llvm/IR/Instructions.h
===================================================================
--- include/llvm/IR/Instructions.h
+++ include/llvm/IR/Instructions.h
@@ -30,6 +30,7 @@
 #include "llvm/IR/Function.h"
 #include "llvm/IR/InstrTypes.h"
 #include "llvm/IR/OperandTraits.h"
+#include "llvm/IR/SyncScope.h"
 #include "llvm/IR/Type.h"
 #include "llvm/IR/Use.h"
 #include "llvm/IR/User.h"
@@ -47,11 +48,6 @@
 class DataLayout;
 class LLVMContext;
 
-enum SynchronizationScope {
-  SingleThread = 0,
-  CrossThread = 1
-};
-
 //===----------------------------------------------------------------------===//
 //                                AllocaInst Class
 //===----------------------------------------------------------------------===//
@@ -230,30 +226,30 @@
 
   void setAlignment(unsigned Align);
 
-  /// Returns the ordering effect of this fence.
+  /// Returns the ordering constraint of this load instruction.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 7) & 7);
   }
 
-  /// Set the ordering constraint on this load. May not be Release or
-  /// AcquireRelease.
+  /// Sets the ordering constraint on this load instruction. May not be Release
+  /// or AcquireRelease.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 7)) |
                                ((unsigned)Ordering << 7));
   }
 
+  /// Returns the synchronization scope of this load instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() >> 6) & 1);
+    return SynchScope;
   }
 
-  /// Specify whether this load is ordered with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope xthread) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~(1 << 6)) |
-                               (xthread << 6));
+  /// Sets the synchronization scope on this load instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
+  /// Sets the ordering constraint and synchronization scope on this load
+  /// instruction.
   void setAtomic(AtomicOrdering Ordering,
                  SynchronizationScope SynchScope = CrossThread) {
     setOrdering(Ordering);
@@ -290,6 +286,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this load instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 //===----------------------------------------------------------------------===//
@@ -351,30 +352,30 @@
 
   void setAlignment(unsigned Align);
 
-  /// Returns the ordering effect of this store.
+  /// Returns the ordering constraint of this store instruction.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering((getSubclassDataFromInstruction() >> 7) & 7);
   }
 
-  /// Set the ordering constraint on this store.  May not be Acquire or
-  /// AcquireRelease.
+  /// Sets the ordering constraint on this store instruction. May not be Acquire
+  /// or AcquireRelease.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & ~(7 << 7)) |
                                ((unsigned)Ordering << 7));
   }
 
+  /// Returns the synchronization scope of this store instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() >> 6) & 1);
+    return SynchScope;
   }
 
-  /// Specify whether this store instruction is ordered with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope xthread) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~(1 << 6)) |
-                               (xthread << 6));
+  /// Sets the synchronization scope on this store instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
+  /// Sets the ordering constraint and synchronization scope on this store
+  /// instruction.
   void setAtomic(AtomicOrdering Ordering,
                  SynchronizationScope SynchScope = CrossThread) {
     setOrdering(Ordering);
@@ -414,6 +415,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this store instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 template <>
@@ -453,28 +459,26 @@
 
   void *operator new(size_t, unsigned) = delete;
 
-  /// Returns the ordering effect of this fence.
+  /// Returns the ordering constraint of this fence instruction.
   AtomicOrdering getOrdering() const {
     return AtomicOrdering(getSubclassDataFromInstruction() >> 1);
   }
 
-  /// Set the ordering constraint on this fence.  May only be Acquire, Release,
-  /// AcquireRelease, or SequentiallyConsistent.
+  /// Sets the ordering constraint on this fence instruction. May only be
+  /// Acquire, Release, AcquireRelease, or SequentiallyConsistent.
   void setOrdering(AtomicOrdering Ordering) {
     setInstructionSubclassData((getSubclassDataFromInstruction() & 1) |
                                ((unsigned)Ordering << 1));
   }
 
+  /// Returns the synchronization scope of this fence instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope(getSubclassDataFromInstruction() & 1);
+    return SynchScope;
   }
 
-  /// Specify whether this fence orders other operations with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope xthread) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~1) |
-                               xthread);
+  /// Sets the synchronization scope on this fence instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
   // Methods for support type inquiry through isa, cast, and dyn_cast:
@@ -491,6 +495,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this fence instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 //===----------------------------------------------------------------------===//
@@ -558,7 +567,14 @@
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
-  /// Set the ordering constraint on this cmpxchg.
+  /// Returns the ordering constraint of this cmpxchg instruction when store
+  /// occurs.
+  AtomicOrdering getSuccessOrdering() const {
+    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
+  }
+
+  /// Sets the ordering constraint on this cmpxchg instruction when store
+  /// occurs.
   void setSuccessOrdering(AtomicOrdering Ordering) {
     assert(Ordering != AtomicOrdering::NotAtomic &&
            "CmpXchg instructions can only be atomic.");
@@ -566,6 +582,14 @@
                                ((unsigned)Ordering << 2));
   }
 
+  /// Returns the ordering constraint of this cmpxchg instruction when store
+  /// does not occur.
+  AtomicOrdering getFailureOrdering() const {
+    return AtomicOrdering((getSubclassDataFromInstruction() >> 5) & 7);
+  }
+
+  /// Sets the ordering constraint on this cmpxchg instruction when store
+  /// does not occur.
   void setFailureOrdering(AtomicOrdering Ordering) {
     assert(Ordering != AtomicOrdering::NotAtomic &&
            "CmpXchg instructions can only be atomic.");
@@ -573,28 +597,14 @@
                                ((unsigned)Ordering << 5));
   }
 
-  /// Specify whether this cmpxchg is atomic and orders other operations with
-  /// respect to all concurrently executing threads, or only with respect to
-  /// signal handlers executing in the same thread.
-  void setSynchScope(SynchronizationScope SynchScope) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~2) |
-                               (SynchScope << 1));
-  }
-
-  /// Returns the ordering constraint on this cmpxchg.
-  AtomicOrdering getSuccessOrdering() const {
-    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
-  }
-
-  /// Returns the ordering constraint on this cmpxchg.
-  AtomicOrdering getFailureOrdering() const {
-    return AtomicOrdering((getSubclassDataFromInstruction() >> 5) & 7);
+  /// Returns the synchronization scope of this cmpxchg instruction.
+  SynchronizationScope getSynchScope() const {
+    return SynchScope;
   }
 
-  /// Returns whether this cmpxchg is atomic between threads or only within a
-  /// single thread.
-  SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() & 2) >> 1);
+  /// Sets the synchronization scope on this cmpxchg instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
   }
 
   Value *getPointerOperand() { return getOperand(0); }
@@ -649,6 +659,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this cmpxchg instruction. Not quite enough
+  /// room in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 template <>
@@ -747,7 +762,12 @@
   /// Transparently provide more efficient getOperand methods.
   DECLARE_TRANSPARENT_OPERAND_ACCESSORS(Value);
 
-  /// Set the ordering constraint on this RMW.
+  /// Returns the ordering constraint of this RMW instruction.
+  AtomicOrdering getOrdering() const {
+    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
+  }
+
+  /// Sets the ordering constraint on this RMW instruction.
   void setOrdering(AtomicOrdering Ordering) {
     assert(Ordering != AtomicOrdering::NotAtomic &&
            "atomicrmw instructions can only be atomic.");
@@ -755,25 +775,16 @@
                                ((unsigned)Ordering << 2));
   }
 
-  /// Specify whether this RMW orders other operations with respect to all
-  /// concurrently executing threads, or only with respect to signal handlers
-  /// executing in the same thread.
-  void setSynchScope(SynchronizationScope SynchScope) {
-    setInstructionSubclassData((getSubclassDataFromInstruction() & ~2) |
-                               (SynchScope << 1));
-  }
-
-  /// Returns the ordering constraint on this RMW.
-  AtomicOrdering getOrdering() const {
-    return AtomicOrdering((getSubclassDataFromInstruction() >> 2) & 7);
-  }
-
-  /// Returns whether this RMW is atomic between threads or only within a
-  /// single thread.
+  /// Returns the synchronization scope of this RMW instruction.
   SynchronizationScope getSynchScope() const {
-    return SynchronizationScope((getSubclassDataFromInstruction() & 2) >> 1);
+    return SynchScope;
   }
 
+  /// Sets the synchronization scope on this RMW instruction.
+  void setSynchScope(SynchronizationScope SynchScope) {
+    this->SynchScope = SynchScope;
+   }
+
   Value *getPointerOperand() { return getOperand(0); }
   const Value *getPointerOperand() const { return getOperand(0); }
   static unsigned getPointerOperandIndex() { return 0U; }
@@ -803,6 +814,11 @@
   void setInstructionSubclassData(unsigned short D) {
     Instruction::setInstructionSubclassData(D);
   }
+
+  /// The synchronization scope of this RMW instruction. Not quite enough room
+  /// in SubClassData for everything, so synchronization scope gets its own
+  /// field.
+  SynchronizationScope SynchScope;
 };
 
 template <>
Index: include/llvm/IR/SyncScope.h
===================================================================
--- /dev/null
+++ include/llvm/IR/SyncScope.h
@@ -0,0 +1,54 @@
+//===-- llvm/SyncScope.h - LLVM Synchronization Scopes ----------*- C++ -*-===//
+//
+//                     The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines LLVM's set of synchronization scopes.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_IR_SYNCSCOPE_H
+#define LLVM_IR_SYNCSCOPE_H
+
+namespace llvm {
+
+/// Predefined synchronization scopes.
+enum SynchronizationScope : uint8_t {
+  /// Synchronized with respect to signal handlers executing in the same thread.
+  SingleThread = 0,
+
+  /// Synchronized with respect to all concurrently executing threads.
+  CrossThread = 1,
+
+  /// First target-specific synchronization scope.
+  FirstTargetSS = 2,
+
+  /// AMDGPU_AGENT - Synchronized with respect to the agent, which includes all
+  /// work-items on the same agent executing kernel dispatches for the same
+  /// application process as the executing work-item. Only supported for the
+  /// global segment.
+  AMDGPU_AGENT = 2,
+
+  /// AMDGPU_WORKGROUP - Synchronized with respect to the work-group, which
+  /// includes all work-items in the same work-group as the executing work-item.
+  AMDGPU_WORKGROUP = 3,
+
+  /// AMDGPU_WAVEFRONT - Synchronized with respect to the wavefront, which
+  /// includes all work-items in the same wavefront as the executing work-item.
+  AMDGPU_WAVEFRONT = 4,
+
+  /// AMDGPU_IMAGE - Synchronized with respect to image fence instruction
+  /// executing in the same work-item.
+  AMDGPU_IMAGE = 5,
+
+  /// The highest possible synchronization scope ID.
+  MaxSS = 0xFF
+};
+
+} // end namespace llvm
+
+#endif // LLVM_IR_SYNCSCOPE_H
Index: lib/AsmParser/LLLexer.cpp
===================================================================
--- lib/AsmParser/LLLexer.cpp
+++ lib/AsmParser/LLLexer.cpp
@@ -542,7 +542,15 @@
   KEYWORD(release);
   KEYWORD(acq_rel);
   KEYWORD(seq_cst);
+
+  // Synchronization scopes:
   KEYWORD(singlethread);
+  KEYWORD(syncscope);
+  KEYWORD(amdgpu_agent);
+  KEYWORD(amdgpu_agent);
+  KEYWORD(amdgpu_workgroup);
+  KEYWORD(amdgpu_wavefront);
+  KEYWORD(amdgpu_image);
 
   KEYWORD(nnan);
   KEYWORD(ninf);
Index: lib/AsmParser/LLParser.h
===================================================================
--- lib/AsmParser/LLParser.h
+++ lib/AsmParser/LLParser.h
@@ -243,6 +243,8 @@
     bool ParseOptionalDerefAttrBytes(lltok::Kind AttrKind, uint64_t &Bytes);
     bool ParseScopeAndOrdering(bool isAtomic, SynchronizationScope &Scope,
                                AtomicOrdering &Ordering);
+    bool ParseScope(SynchronizationScope &Scope);
+    bool ParseTargetScope(SynchronizationScope &Scope);
     bool ParseOrdering(AtomicOrdering &Ordering);
     bool ParseOptionalStackAlignment(unsigned &Alignment);
     bool ParseOptionalCommaAlign(unsigned &Alignment, bool &AteExtraComma);
Index: lib/AsmParser/LLParser.cpp
===================================================================
--- lib/AsmParser/LLParser.cpp
+++ lib/AsmParser/LLParser.cpp
@@ -17,6 +17,7 @@
 #include "llvm/ADT/Optional.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringSwitch.h"
 #include "llvm/AsmParser/SlotMapping.h"
 #include "llvm/IR/Argument.h"
 #include "llvm/IR/AutoUpgrade.h"
@@ -1882,8 +1883,11 @@
 }
 
 /// ParseScopeAndOrdering
-///   if isAtomic: ::= 'singlethread'? AtomicOrdering
-///   else: ::=
+///   if isAtomic:
+///     ::= 'singlethread'? AtomicOrdering
+//      ::= 'syncscope' '(' <target syncscope token> ')'? AtomicOrdering
+///   else
+///     ::=
 ///
 /// This sets Scope and Ordering to the parsed values.
 bool LLParser::ParseScopeAndOrdering(bool isAtomic, SynchronizationScope &Scope,
@@ -1891,11 +1895,62 @@
   if (!isAtomic)
     return false;
 
+  return ParseScope(Scope) || ParseOrdering(Ordering);
+}
+
+/// ParseScope
+///   ::= /* empty */
+///   ::= 'singlethread'
+///   ::= 'syncscope' '(' <tartoken> ')'
+///
+/// This sets Scope to the parsed value.
+bool LLParser::ParseScope(SynchronizationScope &Scope) {
+  if (EatIfPresent(lltok::kw_syncscope))
+    return ParseTargetScope(Scope);
+
   Scope = CrossThread;
   if (EatIfPresent(lltok::kw_singlethread))
     Scope = SingleThread;
 
-  return ParseOrdering(Ordering);
+  return false;
+}
+
+/// ParseTargetScope
+///   ::= 'amdgpu_agent'
+///   ::= 'amdgpu_workgroup'
+///   ::= 'amdgpu_wavefront'
+///   ::= 'amdgpu_image'
+///
+/// This sets Scope to the target-specific synchronization scope.
+bool LLParser::ParseTargetScope(SynchronizationScope &Scope) {
+  auto StartParenAt = Lex.getLoc();
+  if (!EatIfPresent(lltok::lparen))
+    return Error(StartParenAt, "Expected '(' in syncscope");
+
+  auto TargetScopeAt = Lex.getLoc();
+  switch (Lex.getKind()) {
+  case lltok::kw_amdgpu_agent:
+    Scope = AMDGPU_AGENT;
+    break;
+  case lltok::kw_amdgpu_workgroup:
+    Scope = AMDGPU_WORKGROUP;
+    break;
+  case lltok::kw_amdgpu_wavefront:
+    Scope = AMDGPU_WAVEFRONT;
+    break;
+  case lltok::kw_amdgpu_image:
+    Scope = AMDGPU_IMAGE;
+    break;
+  default:
+    return Error(TargetScopeAt, "Invalid target syncscope");
+  }
+  Lex.Lex();
+
+  auto EndParenAt = Lex.getLoc();
+  if (!EatIfPresent(lltok::rparen))
+    return Error(EndParenAt, "Expected ')' in syncscope");
+
+  return false;
 }
 
 /// ParseOrdering
Index: lib/AsmParser/LLToken.h
===================================================================
--- lib/AsmParser/LLToken.h
+++ lib/AsmParser/LLToken.h
@@ -93,7 +93,15 @@
   kw_release,
   kw_acq_rel,
   kw_seq_cst,
+
+  // Synchronization scopes:
   kw_singlethread,
+  kw_syncscope,
+  kw_amdgpu_agent,
+  kw_amdgpu_workgroup,
+  kw_amdgpu_wavefront,
+  kw_amdgpu_image,
+
   kw_nnan,
   kw_ninf,
   kw_nsz,
Index: lib/Bitcode/Reader/BitcodeReader.cpp
===================================================================
--- lib/Bitcode/Reader/BitcodeReader.cpp
+++ lib/Bitcode/Reader/BitcodeReader.cpp
@@ -936,9 +936,15 @@
 }
 
 static SynchronizationScope getDecodedSynchScope(unsigned Val) {
+  if (Val >= bitc::SYNCHSCOPE_FIRSTTARGET) {
+    assert(Val == uint8_t(Val) && "expected 8-bit integer (too large)");
+    return SynchronizationScope(
+        FirstTargetSS + (Val - bitc::SYNCHSCOPE_FIRSTTARGET));
+  }
+
   switch (Val) {
+  default: llvm_unreachable("Invalid syncscope");
   case bitc::SYNCHSCOPE_SINGLETHREAD: return SingleThread;
-  default: // Map unknown scopes to cross-thread.
   case bitc::SYNCHSCOPE_CROSSTHREAD: return CrossThread;
   }
 }
Index: lib/Bitcode/Writer/BitcodeWriter.cpp
===================================================================
--- lib/Bitcode/Writer/BitcodeWriter.cpp
+++ lib/Bitcode/Writer/BitcodeWriter.cpp
@@ -580,11 +580,16 @@
 }
 
 static unsigned getEncodedSynchScope(SynchronizationScope SynchScope) {
+  if (SynchScope >= FirstTargetSS) {
+    return unsigned(
+        bitc::SYNCHSCOPE_FIRSTTARGET + (SynchScope - FirstTargetSS));
+  }
+
   switch (SynchScope) {
+  default: llvm_unreachable("Invalid syncscope");
   case SingleThread: return bitc::SYNCHSCOPE_SINGLETHREAD;
   case CrossThread: return bitc::SYNCHSCOPE_CROSSTHREAD;
   }
-  llvm_unreachable("Invalid synch scope");
 }
 
 static void writeStringRecord(BitstreamWriter &Stream, unsigned Code,
Index: lib/CodeGen/SelectionDAG/SelectionDAG.cpp
===================================================================
--- lib/CodeGen/SelectionDAG/SelectionDAG.cpp
+++ lib/CodeGen/SelectionDAG/SelectionDAG.cpp
@@ -7241,7 +7241,8 @@
 
 MemSDNode::MemSDNode(unsigned Opc, unsigned Order, const DebugLoc &dl,
                      SDVTList VTs, EVT memvt, MachineMemOperand *mmo)
-    : SDNode(Opc, Order, dl, VTs), MemoryVT(memvt), MMO(mmo) {
+    : SDNode(Opc, Order, dl, VTs), MemoryVT(memvt), MMO(mmo),
+      SynchScope(CrossThread) {
   MemSDNodeBits.IsVolatile = MMO->isVolatile();
   MemSDNodeBits.IsNonTemporal = MMO->isNonTemporal();
   MemSDNodeBits.IsDereferenceable = MMO->isDereferenceable();
Index: lib/IR/AsmWriter.cpp
===================================================================
--- lib/IR/AsmWriter.cpp
+++ lib/IR/AsmWriter.cpp
@@ -2097,6 +2097,8 @@
   void writeAtomicCmpXchg(AtomicOrdering SuccessOrdering,
                           AtomicOrdering FailureOrdering,
                           SynchronizationScope SynchScope);
+  void writeSynchScope(SynchronizationScope SynchScope);
+  void writeTargetSynchScope(SynchronizationScope SynchScope);
 
   void writeAllMDNodes();
   void writeMDNode(unsigned Slot, const MDNode *Node);
@@ -2162,11 +2164,7 @@
   if (Ordering == AtomicOrdering::NotAtomic)
     return;
 
-  switch (SynchScope) {
-  case SingleThread: Out << " singlethread"; break;
-  case CrossThread: break;
-  }
-
+  writeSynchScope(SynchScope);
   Out << " " << toIRString(Ordering);
 }
 
@@ -2176,15 +2174,50 @@
   assert(SuccessOrdering != AtomicOrdering::NotAtomic &&
          FailureOrdering != AtomicOrdering::NotAtomic);
 
-  switch (SynchScope) {
-  case SingleThread: Out << " singlethread"; break;
-  case CrossThread: break;
-  }
-
+  writeSynchScope(SynchScope);
   Out << " " << toIRString(SuccessOrdering);
   Out << " " << toIRString(FailureOrdering);
 }
 
+void AssemblyWriter::writeSynchScope(SynchronizationScope SynchScope) {
+  if (SynchScope >= FirstTargetSS) {
+    writeTargetSynchScope(SynchScope);
+  } else {
+    switch (SynchScope) {
+    case SingleThread:
+      Out << " singlethread";
+      break;
+    case CrossThread:
+      break;
+    default:
+      llvm_unreachable("Invalid syncscope");
+    }
+  }
+}
+
+void AssemblyWriter::writeTargetSynchScope(SynchronizationScope SynchScope) {
+  assert(SynchScope >= FirstTargetSS);
+
+  Out << " syncscope(";
+  switch (SynchScope) {
+  case AMDGPU_AGENT:
+    Out << "amdgpu_agent";
+    break;
+  case AMDGPU_WORKGROUP:
+    Out << "amdgpu_workgroup";
+    break;
+  case AMDGPU_WAVEFRONT:
+    Out << "amdgpu_wavefront";
+    break;
+  case AMDGPU_IMAGE:
+    Out << "amdgpu_image";
+    break;
+  default:
+    llvm_unreachable("Invalid target syncscope");
+  }
+  Out << ")";
+}
+
 void AssemblyWriter::writeParamOperand(const Value *Operand,
                                        AttributeList Attrs, unsigned Idx) {
   if (!Operand) {
Index: lib/Target/SystemZ/SystemZISelLowering.cpp
===================================================================
--- lib/Target/SystemZ/SystemZISelLowering.cpp
+++ lib/Target/SystemZ/SystemZISelLowering.cpp
@@ -3199,7 +3199,7 @@
   // The only fence that needs an instruction is a sequentially-consistent
   // cross-thread fence.
   if (FenceOrdering == AtomicOrdering::SequentiallyConsistent &&
-      FenceScope == CrossThread) {
+      FenceScope != SingleThread) {
     return SDValue(DAG.getMachineNode(SystemZ::Serialize, DL, MVT::Other,
                                       Op.getOperand(0)),
                    0);
Index: lib/Target/X86/X86ISelLowering.cpp
===================================================================
--- lib/Target/X86/X86ISelLowering.cpp
+++ lib/Target/X86/X86ISelLowering.cpp
@@ -22761,7 +22761,7 @@
   // The only fence that needs an instruction is a sequentially-consistent
   // cross-thread fence.
   if (FenceOrdering == AtomicOrdering::SequentiallyConsistent &&
-      FenceScope == CrossThread) {
+      FenceScope != SingleThread) {
     if (Subtarget.hasMFence())
       return DAG.getNode(X86ISD::MFENCE, dl, MVT::Other, Op.getOperand(0));
 
Index: lib/Transforms/Instrumentation/ThreadSanitizer.cpp
===================================================================
--- lib/Transforms/Instrumentation/ThreadSanitizer.cpp
+++ lib/Transforms/Instrumentation/ThreadSanitizer.cpp
@@ -379,9 +379,9 @@
 
 static bool isAtomic(Instruction *I) {
   if (LoadInst *LI = dyn_cast<LoadInst>(I))
-    return LI->isAtomic() && LI->getSynchScope() == CrossThread;
+    return LI->isAtomic() && LI->getSynchScope() != SingleThread;
   if (StoreInst *SI = dyn_cast<StoreInst>(I))
-    return SI->isAtomic() && SI->getSynchScope() == CrossThread;
+    return SI->isAtomic() && SI->getSynchScope() != SingleThread;
   if (isa<AtomicRMWInst>(I))
     return true;
   if (isa<AtomicCmpXchgInst>(I))
Index: test/Assembler/atomic.ll
===================================================================
--- test/Assembler/atomic.ll
+++ test/Assembler/atomic.ll
@@ -7,10 +7,14 @@
   load atomic i32, i32* %x unordered, align 4
   ; CHECK: load atomic volatile i32, i32* %x singlethread acquire, align 4
   load atomic volatile i32, i32* %x singlethread acquire, align 4
+  ; CHECK: load atomic volatile i32, i32* %x syncscope(amdgpu_agent) acquire, align 4
+  load atomic volatile i32, i32* %x syncscope(amdgpu_agent) acquire, align 4
   ; CHECK: store atomic i32 3, i32* %x release, align 4
   store atomic i32 3, i32* %x release, align 4
   ; CHECK: store atomic volatile i32 3, i32* %x singlethread monotonic, align 4
   store atomic volatile i32 3, i32* %x singlethread monotonic, align 4
+  ; CHECK: store atomic volatile i32 3, i32* %x syncscope(amdgpu_workgroup) monotonic, align 4
+  store atomic volatile i32 3, i32* %x syncscope(amdgpu_workgroup) monotonic, align 4
   ; CHECK: cmpxchg i32* %x, i32 1, i32 0 singlethread monotonic monotonic
   cmpxchg i32* %x, i32 1, i32 0 singlethread monotonic monotonic
   ; CHECK: cmpxchg volatile i32* %x, i32 0, i32 1 acq_rel acquire
@@ -19,13 +23,19 @@
   cmpxchg i32* %x, i32 42, i32 0 acq_rel monotonic
   ; CHECK: cmpxchg weak i32* %x, i32 13, i32 0 seq_cst monotonic
   cmpxchg weak i32* %x, i32 13, i32 0 seq_cst monotonic
+  ; CHECK: cmpxchg weak i32* %x, i32 13, i32 0 syncscope(amdgpu_wavefront) seq_cst monotonic
+  cmpxchg weak i32* %x, i32 13, i32 0 syncscope(amdgpu_wavefront) seq_cst monotonic
   ; CHECK: atomicrmw add i32* %x, i32 10 seq_cst
   atomicrmw add i32* %x, i32 10 seq_cst
   ; CHECK: atomicrmw volatile xchg  i32* %x, i32 10 monotonic
   atomicrmw volatile xchg i32* %x, i32 10 monotonic
+  ; CHECK: atomicrmw volatile xchg i32* %x, i32 10 syncscope(amdgpu_image) monotonic
+  atomicrmw volatile xchg i32* %x, i32 10 syncscope(amdgpu_image) monotonic
   ; CHECK: fence singlethread release
   fence singlethread release
   ; CHECK: fence seq_cst
   fence seq_cst
+  ; CHECK: fence syncscope(amdgpu_image) seq_cst
+  fence syncscope(amdgpu_image) seq_cst
   ret void
 }
Index: test/Bitcode/atomic-no-syncscope.ll
===================================================================
--- /dev/null
+++ test/Bitcode/atomic-no-syncscope.ll
@@ -0,0 +1,14 @@
+; RUN: llvm-dis -o - %s.bc | FileCheck %s
+
+; CHECK: load atomic i32, i32* %x unordered, align 4
+; CHECK: load atomic volatile i32, i32* %x singlethread acquire, align 4
+; CHECK: store atomic i32 3, i32* %x release, align 4
+; CHECK: store atomic volatile i32 3, i32* %x singlethread monotonic, align 4
+; CHECK: cmpxchg i32* %x, i32 1, i32 0 singlethread monotonic monotonic
+; CHECK: cmpxchg volatile i32* %x, i32 0, i32 1 acq_rel acquire
+; CHECK: cmpxchg i32* %x, i32 42, i32 0 acq_rel monotonic
+; CHECK: cmpxchg weak i32* %x, i32 13, i32 0 seq_cst monotonic
+; CHECK: atomicrmw add i32* %x, i32 10 seq_cst
+; CHECK: atomicrmw volatile xchg  i32* %x, i32 10 monotonic
+; CHECK: fence singlethread release
+; CHECK: fence seq_cst