Index: docs/LangRef.rst
===================================================================
--- docs/LangRef.rst
+++ docs/LangRef.rst
@@ -5118,7 +5118,7 @@
 conjunction with ``llvm.loop`` loop identification metadata. The
 ``llvm.loop.vectorize`` and ``llvm.loop.interleave`` metadata are only
 optimization hints and the optimizer will only interleave and vectorize loops if
-it believes it is safe to do so. The ``llvm.mem.parallel_loop_access`` metadata
+it believes it is safe to do so. The ``llvm.loop.parallel_accesses`` metadata
 which contains information about loop-carried memory dependencies can be helpful
 in determining the safety of these transformations.
 
@@ -5320,89 +5320,97 @@
 This metadata should be used in conjunction with ``llvm.loop`` loop
 identification metadata.
 
-'``llvm.mem``'
-^^^^^^^^^^^^^^^
-
-Metadata types used to annotate memory accesses with information helpful
-for optimizations are prefixed with ``llvm.mem``.
-
-'``llvm.mem.parallel_loop_access``' Metadata
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+'``llvm.loop.parallel_accesses``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-The ``llvm.mem.parallel_loop_access`` metadata refers to a loop identifier,
-or metadata containing a list of loop identifiers for nested loops.
-The metadata is attached to memory accessing instructions and denotes that
-no loop carried memory dependence exist between it and other instructions denoted
-with the same loop identifier. The metadata on memory reads also implies that
+The ``llvm.loop.parallel_accesses`` metadata refers to one or more
+access group metadata nodes (see ``llvm.access.group``). It denotes that
+no loop carried memory dependence exist between it and other instructions
+in the loop with this metadata. The metadata on memory reads also implies that
 if conversion (i.e. speculative execution within a loop iteration) is safe.
 
 Precisely, given two instructions ``m1`` and ``m2`` that both have the
-``llvm.mem.parallel_loop_access`` metadata, with ``L1`` and ``L2`` being the
-set of loops associated with that metadata, respectively, then there is no loop
-carried dependence between ``m1`` and ``m2`` for loops in both ``L1`` and
-``L2``.
+``llvm.access.group`` metadata to the access groups  ``g1`` and ``g2``
+(which might be identical). If a loop has both access groups in its
+``llvm.loop.parallel_accesses`` metadata, then the compiler can assume
+that there is no dependency between ``m1`` and ``m2`` carried by this
+loop.
 
 As a special case, if all memory accessing instructions in a loop have
-``llvm.mem.parallel_loop_access`` metadata that refers to that loop, then the
-loop has no loop carried memory dependences and is considered to be a parallel
-loop.
+``llvm.loop.parallel_accesses`` metadata that refers to that loop, then the
+loop has no loop carried memory dependences and is considered to be a
+parallel loop.
 
-Note that if not all memory access instructions have such metadata referring to
-the loop, then the loop is considered not being trivially parallel. Additional
+Note that if not all memory access instructions belong to an access
+group referred by ``llvm.loop.parallel_accesses``, then the loop must
+not be considered trivially parallel. Additional
 memory dependence analysis is required to make that determination. As a fail
 safe mechanism, this causes loops that were originally parallel to be considered
 sequential (if optimization passes that are unaware of the parallel semantics
 insert new memory instructions into the loop body).
 
 Example of a loop that is considered parallel due to its correct use of
-both ``llvm.loop`` and ``llvm.mem.parallel_loop_access``
-metadata types that refer to the same loop identifier metadata.
+both ``llvm.access.group`` and ``llvm.loop.parallel_accesses``
+metadata types.
 
 .. code-block:: llvm
 
    for.body:
      ...
-     %val0 = load i32, i32* %arrayidx, !llvm.mem.parallel_loop_access !0
+     %val0 = load i32, i32* %arrayidx, !llvm.access.group !1
      ...
-     store i32 %val0, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0
+     store i32 %val0, i32* %arrayidx1, !llvm.access.group !1
      ...
      br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
 
    for.end:
    ...
-   !0 = !{!0}
+   !0 = !{!0, !{"llvm.loop.parallel_accesses", !1}}
+   !1 = distinct !{}
 
-It is also possible to have nested parallel loops. In that case the
-memory accesses refer to a list of loop identifier metadata nodes instead of
-the loop identifier metadata node directly:
+It is also possible to have nested parallel loops:
 
 .. code-block:: llvm
 
    outer.for.body:
      ...
-     %val1 = load i32, i32* %arrayidx3, !llvm.mem.parallel_loop_access !2
+     %val1 = load i32, i32* %arrayidx3, !llvm.access.group !4
      ...
      br label %inner.for.body
 
    inner.for.body:
      ...
-     %val0 = load i32, i32* %arrayidx1, !llvm.mem.parallel_loop_access !0
+     %val0 = load i32, i32* %arrayidx1, !llvm.access.group !3
      ...
-     store i32 %val0, i32* %arrayidx2, !llvm.mem.parallel_loop_access !0
+     store i32 %val0, i32* %arrayidx2, !llvm.access.group !3
      ...
      br i1 %exitcond, label %inner.for.end, label %inner.for.body, !llvm.loop !1
 
    inner.for.end:
      ...
-     store i32 %val1, i32* %arrayidx4, !llvm.mem.parallel_loop_access !2
+     store i32 %val1, i32* %arrayidx4, !llvm.access.group !4
      ...
      br i1 %exitcond, label %outer.for.end, label %outer.for.body, !llvm.loop !2
 
    outer.for.end:                                          ; preds = %for.body
    ...
-   !0 = !{!1, !2} ; a list of loop identifiers
-   !1 = !{!1} ; an identifier for the inner loop
-   !2 = !{!2} ; an identifier for the outer loop
+   !1 = !{!1, !{"llvm.loop.parallel_accesses", !3}}     ; an identifier for the inner loop
+   !2 = !{!2, !{"llvm.loop.parallel_accesses", !3, !4}} ; an identifier for the outer loop
+   !3 = distinct !{} ; access group for instructions in the inner loop
+   !4 = distinct !{} ; access group for instructions in the outer, but not the inner loop
+
+'``llvm.access.group``' Metadata
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``llvm.access.group`` metadata can be attached to any instruction that
+potentially accesses memory. It points to a single distinct metadata
+node, which we call access group. This node represents all memory access
+instructions referring to it via ``llvm.access.group``.
+
+The access group can be used to refer to a memory access instruction
+without pointing to it directly (which is not possible in global
+metadata). Currently, the only metadata making use of it is
+``llvm.loop.parallel_accesses``.
 
 '``irr_loop``' Metadata
 ^^^^^^^^^^^^^^^^^^^^^^^
Index: include/llvm/IR/LLVMContext.h
===================================================================
--- include/llvm/IR/LLVMContext.h
+++ include/llvm/IR/LLVMContext.h
@@ -102,6 +102,7 @@
     MD_associated = 22,               // "associated"
     MD_callees = 23,                  // "callees"
     MD_irr_loop = 24,                 // "irr_loop"
+    MD_access_group = 25,             // "llvm.access.group"
   };
 
   /// Known operand bundle tag IDs, which always have the same value.  All
Index: include/llvm/Transforms/Utils/LoopUtils.h
===================================================================
--- include/llvm/Transforms/Utils/LoopUtils.h
+++ include/llvm/Transforms/Utils/LoopUtils.h
@@ -162,12 +162,19 @@
 /// Returns the instructions that use values defined in the loop.
 SmallVector<Instruction *, 8> findDefsUsedOutsideOfLoop(Loop *L);
 
+/// Find string metadata for a loop.
+///
+/// Returns the MDNode where the first operand is the metadata's name. The
+/// following operands are the metadata's values. If no metadata with @p Name is
+/// found, return nullptr.
+MDNode *findOptionMDForLoop(const Loop *TheLoop, StringRef Name);
+
 /// Find string metadata for loop
 ///
 /// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
 /// operand or null otherwise.  If the string metadata is not found return
 /// Optional's not-a-value.
-Optional<const MDOperand *> findStringMetadataForLoop(Loop *TheLoop,
+Optional<const MDOperand *> findStringMetadataForLoop(const Loop *TheLoop,
                                                       StringRef Name);
 
 /// Set input string into loop metadata by keeping other values intact.
Index: lib/Analysis/LoopInfo.cpp
===================================================================
--- lib/Analysis/LoopInfo.cpp
+++ lib/Analysis/LoopInfo.cpp
@@ -34,6 +34,7 @@
 #include "llvm/Support/CommandLine.h"
 #include "llvm/Support/Debug.h"
 #include "llvm/Support/raw_ostream.h"
+#include "llvm/Transforms/Utils/LoopUtils.h"
 #include <algorithm>
 using namespace llvm;
 
@@ -291,12 +292,52 @@
   setLoopID(NewLoopID);
 }
 
+static MDNode *findOptionMDForLoopID(MDNode *LoopID, StringRef Name) {
+  // Return none if LoopID is false.
+  if (!LoopID)
+    return nullptr;
+
+  // First operand should refer to the loop id itself.
+  assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
+  assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
+
+  // Iterate over LoopID operands and look for MDString Metadata
+  for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
+    MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
+    if (!MD)
+      continue;
+    MDString *S = dyn_cast<MDString>(MD->getOperand(0));
+    if (!S)
+      continue;
+    // Return true if MDString holds expected MetaData.
+    if (Name.equals(S->getString()))
+      return MD;
+  }
+  return nullptr;
+}
+
+MDNode *llvm::findOptionMDForLoop(const Loop *TheLoop, StringRef Name) {
+  return findOptionMDForLoopID(TheLoop->getLoopID(), Name);
+}
+
 bool Loop::isAnnotatedParallel() const {
   MDNode *DesiredLoopIdMetadata = getLoopID();
 
   if (!DesiredLoopIdMetadata)
     return false;
 
+  MDNode* ParallelAccesses =
+      findOptionMDForLoop(this, "llvm.loop.parallel_accesses");
+  SmallPtrSet<MDNode *, 4> ParallelAccessGroups;
+  if (ParallelAccesses) {
+    for (auto &MD : drop_begin(ParallelAccesses->operands(), 1)) {
+      MDNode* AccGroup = cast<MDNode>(MD.get());
+      assert(AccGroup->isDistinct() && AccGroup->getNumOperands() == 0 &&
+          "Access groups must be a distinct node without operands.");
+      ParallelAccessGroups.insert(AccGroup);
+    }
+  }
+
   // The loop branch contains the parallel loop metadata. In order to ensure
   // that any parallel-loop-unaware optimization pass hasn't added loop-carried
   // dependencies (thus converted the loop back to a sequential loop), check
@@ -307,6 +348,11 @@
       if (!I.mayReadOrWriteMemory())
         continue;
 
+      if (auto AccessGroup = I.getMetadata(LLVMContext::MD_access_group)) {
+        if (ParallelAccessGroups.count(AccessGroup))
+          continue;
+      }
+
       // The memory instruction can refer to the loop identifier metadata
       // directly or indirectly through another list metadata (in case of
       // nested parallel loops). The loop identifier metadata refers to
Index: lib/IR/LLVMContext.cpp
===================================================================
--- lib/IR/LLVMContext.cpp
+++ lib/IR/LLVMContext.cpp
@@ -61,6 +61,7 @@
     {MD_associated, "associated"},
     {MD_callees, "callees"},
     {MD_irr_loop, "irr_loop"},
+    {MD_access_group, "llvm.access.group"},
   };
 
   for (auto &MDKind : MDKinds) {
Index: lib/Transforms/InstCombine/InstCombineCalls.cpp
===================================================================
--- lib/Transforms/InstCombine/InstCombineCalls.cpp
+++ lib/Transforms/InstCombine/InstCombineCalls.cpp
@@ -174,6 +174,9 @@
     MI->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
   if (LoopMemParallelMD)
     L->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
+  MDNode *AccessGroupMD = MI->getMetadata(LLVMContext::MD_access_group);
+  if (AccessGroupMD)
+    L->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);
 
   StoreInst *S = Builder.CreateStore(L, Dest);
   // Alignment from the mem intrinsic will be better, so use it.
@@ -182,6 +185,8 @@
     S->setMetadata(LLVMContext::MD_tbaa, CopyMD);
   if (LoopMemParallelMD)
     S->setMetadata(LLVMContext::MD_mem_parallel_loop_access, LoopMemParallelMD);
+  if (AccessGroupMD)
+    S->setMetadata(LLVMContext::MD_access_group, AccessGroupMD);
 
   if (auto *MT = dyn_cast<MemTransferInst>(MI)) {
     // non-atomics can be volatile
Index: lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
===================================================================
--- lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
+++ lib/Transforms/InstCombine/InstCombineLoadStoreAlloca.cpp
@@ -492,6 +492,7 @@
     case LLVMContext::MD_noalias:
     case LLVMContext::MD_nontemporal:
     case LLVMContext::MD_mem_parallel_loop_access:
+    case LLVMContext::MD_access_group:
       // All of these directly apply.
       NewLoad->setMetadata(ID, N);
       break;
Index: lib/Transforms/Scalar/LoopVersioningLICM.cpp
===================================================================
--- lib/Transforms/Scalar/LoopVersioningLICM.cpp
+++ lib/Transforms/Scalar/LoopVersioningLICM.cpp
@@ -628,6 +628,7 @@
     // Set Loop Versioning metaData for version loop.
     addStringMetadataToLoop(LVer.getVersionedLoop(), LICMVersioningMetaData);
     // Set "llvm.mem.parallel_loop_access" metaData to versioned loop.
+    // FIXME: "llvm.mem.parallel_loop_access" annotates memory access instructions, not loops.
     addStringMetadataToLoop(LVer.getVersionedLoop(),
                             "llvm.mem.parallel_loop_access");
     // Update version loop with aggressive aliasing assumption.
Index: lib/Transforms/Scalar/SROA.cpp
===================================================================
--- lib/Transforms/Scalar/SROA.cpp
+++ lib/Transforms/Scalar/SROA.cpp
@@ -2594,6 +2594,7 @@
     V = convertValue(DL, IRB, V, NewAllocaTy);
     StoreInst *Store = IRB.CreateAlignedStore(V, &NewAI, NewAI.getAlignment());
     Store->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);
+    Store->copyMetadata(SI, LLVMContext::MD_access_group);
     if (AATags)
       Store->setAAMetadata(AATags);
     Pass.DeadInsts.insert(&SI);
@@ -2663,6 +2664,7 @@
                                      SI.isVolatile());
     }
     NewSI->copyMetadata(SI, LLVMContext::MD_mem_parallel_loop_access);
+    NewSI->copyMetadata(SI, LLVMContext::MD_access_group);
     if (AATags)
       NewSI->setAAMetadata(AATags);
     if (SI.isVolatile())
@@ -3773,6 +3775,7 @@
           getAdjustedAlignment(LI, PartOffset, DL), /*IsVolatile*/ false,
           LI->getName());
       PLoad->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
+      PLoad->copyMetadata(*LI, LLVMContext::MD_access_group);
 
       // Append this load onto the list of split loads so we can find it later
       // to rewrite the stores.
@@ -3829,6 +3832,7 @@
                            PartPtrTy, StoreBasePtr->getName() + "."),
             getAdjustedAlignment(SI, PartOffset, DL), /*IsVolatile*/ false);
         PStore->copyMetadata(*LI, LLVMContext::MD_mem_parallel_loop_access);
+        PStore->copyMetadata(*LI, LLVMContext::MD_access_group);
         LLVM_DEBUG(dbgs() << "      +" << PartOffset << ":" << *PStore << "\n");
       }
 
Index: lib/Transforms/Scalar/Scalarizer.cpp
===================================================================
--- lib/Transforms/Scalar/Scalarizer.cpp
+++ lib/Transforms/Scalar/Scalarizer.cpp
@@ -363,7 +363,8 @@
           || Tag == LLVMContext::MD_invariant_load
           || Tag == LLVMContext::MD_alias_scope
           || Tag == LLVMContext::MD_noalias
-          || Tag == ParallelLoopAccessMDKind);
+          || Tag == ParallelLoopAccessMDKind
+          || Tag == LLVMContext::MD_access_group);
 }
 
 // Transfer metadata from Op to the instructions in CV if it is known
Index: lib/Transforms/Utils/InlineFunction.cpp
===================================================================
--- lib/Transforms/Utils/InlineFunction.cpp
+++ lib/Transforms/Utils/InlineFunction.cpp
@@ -770,14 +770,16 @@
   UnwindDest->removePredecessor(InvokeBB);
 }
 
-/// When inlining a call site that has !llvm.mem.parallel_loop_access metadata,
-/// that metadata should be propagated to all memory-accessing cloned
-/// instructions.
+/// When inlining a call site that has !llvm.mem.parallel_loop_access or
+/// llvm.access.group metadata, that metadata should be propagated to all
+/// memory-accessing cloned instructions.
 static void PropagateParallelLoopAccessMetadata(CallSite CS,
                                                 ValueToValueMapTy &VMap) {
   MDNode *M =
     CS.getInstruction()->getMetadata(LLVMContext::MD_mem_parallel_loop_access);
-  if (!M)
+  MDNode *AccessGroup =
+      CS.getInstruction()->getMetadata(LLVMContext::MD_access_group);
+  if (!M && !AccessGroup)
     return;
 
   for (ValueToValueMapTy::iterator VMI = VMap.begin(), VMIE = VMap.end();
@@ -789,11 +791,28 @@
     if (!NI)
       continue;
 
-    if (MDNode *PM = NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {
+    if (M) {
+      if (MDNode *PM =
+              NI->getMetadata(LLVMContext::MD_mem_parallel_loop_access)) {
         M = MDNode::concatenate(PM, M);
       NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
-    } else if (NI->mayReadOrWriteMemory()) {
-      NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
+      } else if (NI->mayReadOrWriteMemory()) {
+        NI->setMetadata(LLVMContext::MD_mem_parallel_loop_access, M);
+      }
+    }
+
+    if (NI->mayReadOrWriteMemory() && AccessGroup) {
+      if (NI->getMetadata(LLVMContext::MD_access_group)) {
+        // FIXME: We should combine the access groups of the CallInst and the
+        // memory access instruction, but this would require updating all uses
+        // of one of the access groups in the function. Alternatively, we could
+        // create a new accesses group inheriting from the two others, but this
+        // is not supported at the moment. Currently, we keep the access
+        // instruction's access group which will be belong to inner loops and
+        // therefore is more useful to the vectorizer.
+      } else {
+        NI->setMetadata(LLVMContext::MD_access_group, AccessGroup);
+      }
     }
   }
 }
Index: lib/Transforms/Utils/Local.cpp
===================================================================
--- lib/Transforms/Utils/Local.cpp
+++ lib/Transforms/Utils/Local.cpp
@@ -2314,6 +2314,12 @@
       case LLVMContext::MD_mem_parallel_loop_access:
         K->setMetadata(Kind, MDNode::intersect(JMD, KMD));
         break;
+      case LLVMContext::MD_access_group:
+        // Drop access group if not equal.
+        // FIXME: Combine access groups.
+        if (JMD == KMD)
+          K->setMetadata(Kind, JMD);
+        break;
       case LLVMContext::MD_range:
         K->setMetadata(Kind, MDNode::getMostGenericRange(JMD, KMD));
         break;
Index: lib/Transforms/Utils/LoopUtils.cpp
===================================================================
--- lib/Transforms/Utils/LoopUtils.cpp
+++ lib/Transforms/Utils/LoopUtils.cpp
@@ -188,37 +188,19 @@
 /// If it has a value (e.g. {"llvm.distribute", 1} return the value as an
 /// operand or null otherwise.  If the string metadata is not found return
 /// Optional's not-a-value.
-Optional<const MDOperand *> llvm::findStringMetadataForLoop(Loop *TheLoop,
+Optional<const MDOperand *> llvm::findStringMetadataForLoop(const Loop *TheLoop,
                                                             StringRef Name) {
-  MDNode *LoopID = TheLoop->getLoopID();
-  // Return none if LoopID is false.
-  if (!LoopID)
+  MDNode *MD = findOptionMDForLoop(TheLoop, Name);
+  if (!MD)
     return None;
-
-  // First operand should refer to the loop id itself.
-  assert(LoopID->getNumOperands() > 0 && "requires at least one operand");
-  assert(LoopID->getOperand(0) == LoopID && "invalid loop id");
-
-  // Iterate over LoopID operands and look for MDString Metadata
-  for (unsigned i = 1, e = LoopID->getNumOperands(); i < e; ++i) {
-    MDNode *MD = dyn_cast<MDNode>(LoopID->getOperand(i));
-    if (!MD)
-      continue;
-    MDString *S = dyn_cast<MDString>(MD->getOperand(0));
-    if (!S)
-      continue;
-    // Return true if MDString holds expected MetaData.
-    if (Name.equals(S->getString()))
-      switch (MD->getNumOperands()) {
-      case 1:
-        return nullptr;
-      case 2:
-        return &MD->getOperand(1);
-      default:
-        llvm_unreachable("loop metadata has 0 or 1 operand");
-      }
+  switch (MD->getNumOperands()) {
+  case 1:
+    return nullptr;
+  case 2:
+    return &MD->getOperand(1);
+  default:
+    llvm_unreachable("loop metadata has 0 or 1 operand");
   }
-  return None;
 }
 
 /// Does a BFS from a given node to all of its children inside a given loop.
Index: lib/Transforms/Utils/SimplifyCFG.cpp
===================================================================
--- lib/Transforms/Utils/SimplifyCFG.cpp
+++ lib/Transforms/Utils/SimplifyCFG.cpp
@@ -1315,7 +1315,8 @@
                              LLVMContext::MD_align,
                              LLVMContext::MD_dereferenceable,
                              LLVMContext::MD_dereferenceable_or_null,
-                             LLVMContext::MD_mem_parallel_loop_access};
+                             LLVMContext::MD_mem_parallel_loop_access,
+                             LLVMContext::MD_access_group};
       combineMetadata(I1, I2, KnownIDs, true);
 
       // I1 and I2 are being combined into a single instruction.  Its debug
Index: test/ThinLTO/X86/lazyload_metadata.ll
===================================================================
--- test/ThinLTO/X86/lazyload_metadata.ll
+++ test/ThinLTO/X86/lazyload_metadata.ll
@@ -10,13 +10,13 @@
 ; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
 ; RUN:          -o /dev/null -stats \
 ; RUN:  2>&1 | FileCheck %s -check-prefix=LAZY
-; LAZY: 55 bitcode-reader  - Number of Metadata records loaded
+; LAZY: 57 bitcode-reader  - Number of Metadata records loaded
 ; LAZY: 2 bitcode-reader  - Number of MDStrings loaded
 
 ; RUN: llvm-lto -thinlto-action=import %t2.bc -thinlto-index=%t3.bc \
 ; RUN:          -o /dev/null -disable-ondemand-mds-loading -stats \
 ; RUN:  2>&1 | FileCheck %s -check-prefix=NOTLAZY
-; NOTLAZY: 64 bitcode-reader  - Number of Metadata records loaded
+; NOTLAZY: 66 bitcode-reader  - Number of Metadata records loaded
 ; NOTLAZY: 7 bitcode-reader  - Number of MDStrings loaded
 
 
Index: test/Transforms/Inline/parallel-loop-md-callee.ll
===================================================================
--- test/Transforms/Inline/parallel-loop-md-callee.ll
+++ test/Transforms/Inline/parallel-loop-md-callee.ll
@@ -1,33 +1,32 @@
 ; RUN: opt -S -inline < %s | FileCheck %s
-; RUN: opt -S -passes='cgscc(inline)' < %s | FileCheck %s
+;
+; Check that the !llvm.access.group is still present after inlining.
+;
 target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
-target triple = "x86_64-unknown-linux-gnu"
 
-; Function Attrs: norecurse nounwind uwtable
-define void @Body(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p, i32 %i) #0 {
+define void @Body(i32* nocapture %res, i32* nocapture readnone %c, i32* nocapture readonly %d, i32* nocapture readonly %p, i32 %i) {
 entry:
   %idxprom = sext i32 %i to i64
   %arrayidx = getelementptr inbounds i32, i32* %p, i64 %idxprom
-  %0 = load i32, i32* %arrayidx, align 4
+  %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
   %cmp = icmp eq i32 %0, 0
   %arrayidx2 = getelementptr inbounds i32, i32* %res, i64 %idxprom
-  %1 = load i32, i32* %arrayidx2, align 4
+  %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !0
   br i1 %cmp, label %cond.end, label %cond.false
 
 cond.false:                                       ; preds = %entry
   %arrayidx6 = getelementptr inbounds i32, i32* %d, i64 %idxprom
-  %2 = load i32, i32* %arrayidx6, align 4
+  %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !0
   %add = add nsw i32 %2, %1
   br label %cond.end
 
 cond.end:                                         ; preds = %entry, %cond.false
   %cond = phi i32 [ %add, %cond.false ], [ %1, %entry ]
-  store i32 %cond, i32* %arrayidx2, align 4
+  store i32 %cond, i32* %arrayidx2, align 4, !llvm.access.group !0
   ret void
 }
 
-; Function Attrs: nounwind uwtable
-define void @Test(i32* %res, i32* %c, i32* %d, i32* %p, i32 %n) #1 {
+define void @Test(i32* %res, i32* %c, i32* %d, i32* %p, i32 %n) {
 entry:
   br label %for.cond
 
@@ -37,22 +36,20 @@
   br i1 %cmp, label %for.body, label %for.end
 
 for.body:                                         ; preds = %for.cond
-  call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.mem.parallel_loop_access !0
+  call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0)
   %inc = add nsw i32 %i.0, 1
-  br label %for.cond, !llvm.loop !0
+  br label %for.cond, !llvm.loop !1
 
 for.end:                                          ; preds = %for.cond
   ret void
 }
 
 ; CHECK-LABEL: @Test
-; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: store i32{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: br label %for.cond, !llvm.loop !0
-
-attributes #0 = { norecurse nounwind uwtable }
-
-!0 = distinct !{!0}
-
+; CHECK: load i32,{{.*}}, !llvm.access.group !0
+; CHECK: load i32,{{.*}}, !llvm.access.group !0
+; CHECK: load i32,{{.*}}, !llvm.access.group !0
+; CHECK: store i32{{.*}}, !llvm.access.group !0
+; CHECK: br label %for.cond, !llvm.loop !1
+
+!0 = distinct !{}
+!1 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !0}}
Index: test/Transforms/Inline/parallel-loop-md.ll
===================================================================
--- test/Transforms/Inline/parallel-loop-md.ll
+++ test/Transforms/Inline/parallel-loop-md.ll
@@ -37,22 +37,24 @@
   br i1 %cmp, label %for.body, label %for.end
 
 for.body:                                         ; preds = %for.cond
-  call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.mem.parallel_loop_access !0
+  call void @Body(i32* %res, i32* undef, i32* %d, i32* %p, i32 %i.0), !llvm.access.group !0
   %inc = add nsw i32 %i.0, 1
-  br label %for.cond, !llvm.loop !0
+  br label %for.cond, !llvm.loop !1
 
 for.end:                                          ; preds = %for.cond
   ret void
 }
 
 ; CHECK-LABEL: @Test
-; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: load i32,{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: store i32{{.*}}, !llvm.mem.parallel_loop_access !0
-; CHECK: br label %for.cond, !llvm.loop !0
+; CHECK: load i32,{{.*}}, !llvm.access.group !0
+; CHECK: load i32,{{.*}}, !llvm.access.group !0
+; CHECK: load i32,{{.*}}, !llvm.access.group !0
+; CHECK: store i32{{.*}}, !llvm.access.group !0
+; CHECK: br label %for.cond, !llvm.loop !1
 
 attributes #0 = { norecurse nounwind uwtable }
 
-!0 = distinct !{!0}
+!0 = distinct !{}
+!1 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !0}}
+
 
Index: test/Transforms/InstCombine/loadstore-metadata.ll
===================================================================
--- test/Transforms/InstCombine/loadstore-metadata.ll
+++ test/Transforms/InstCombine/loadstore-metadata.ll
@@ -39,7 +39,7 @@
 define i32 @test_load_cast_combine_invariant(float* %ptr) {
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves invariant metadata.
 ; CHECK-LABEL: @test_load_cast_combine_invariant(
-; CHECK: load i32, i32* %{{.*}}, !invariant.load !5
+; CHECK: load i32, i32* %{{.*}}, !invariant.load !7
 entry:
   %l = load float, float* %ptr, !invariant.load !6
   %c = bitcast float %l to i32
@@ -50,7 +50,7 @@
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves nontemporal
 ; metadata.
 ; CHECK-LABEL: @test_load_cast_combine_nontemporal(
-; CHECK: load i32, i32* %{{.*}}, !nontemporal !6
+; CHECK: load i32, i32* %{{.*}}, !nontemporal !8
 entry:
   %l = load float, float* %ptr, !nontemporal !7
   %c = bitcast float %l to i32
@@ -61,7 +61,7 @@
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves align
 ; metadata.
 ; CHECK-LABEL: @test_load_cast_combine_align(
-; CHECK: load i8*, i8** %{{.*}}, !align !7
+; CHECK: load i8*, i8** %{{.*}}, !align !9
 entry:
   %l = load i32*, i32** %ptr, !align !8
   %c = bitcast i32* %l to i8*
@@ -72,7 +72,7 @@
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves dereferenceable
 ; metadata.
 ; CHECK-LABEL: @test_load_cast_combine_deref(
-; CHECK: load i8*, i8** %{{.*}}, !dereferenceable !7
+; CHECK: load i8*, i8** %{{.*}}, !dereferenceable !9
 entry:
   %l = load i32*, i32** %ptr, !dereferenceable !8
   %c = bitcast i32* %l to i8*
@@ -83,7 +83,7 @@
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves
 ; dereferenceable_or_null metadata.
 ; CHECK-LABEL: @test_load_cast_combine_deref_or_null(
-; CHECK: load i8*, i8** %{{.*}}, !dereferenceable_or_null !7
+; CHECK: load i8*, i8** %{{.*}}, !dereferenceable_or_null !9
 entry:
   %l = load i32*, i32** %ptr, !dereferenceable_or_null !8
   %c = bitcast i32* %l to i8*
@@ -94,7 +94,7 @@
 ; Ensure (cast (load (...))) -> (load (cast (...))) preserves loop access
 ; metadata.
 ; CHECK-LABEL: @test_load_cast_combine_loop(
-; CHECK: load i32, i32* %{{.*}}, !llvm.mem.parallel_loop_access !4
+; CHECK: load i32, i32* %{{.*}}, !llvm.access.group !6
 entry:
   br label %loop
 
@@ -102,7 +102,7 @@
   %i = phi i32 [ 0, %entry ], [ %i.next, %loop ]
   %src.gep = getelementptr inbounds float, float* %src, i32 %i
   %dst.gep = getelementptr inbounds i32, i32* %dst, i32 %i
-  %l = load float, float* %src.gep, !llvm.mem.parallel_loop_access !4
+  %l = load float, float* %src.gep, !llvm.access.group !9
   %c = bitcast float %l to i32
   store i32 %c, i32* %dst.gep
   %i.next = add i32 %i, 1
@@ -142,8 +142,9 @@
 !1 = !{!"scalar type", !2}
 !2 = !{!"root"}
 !3 = distinct !{!3, !4}
-!4 = distinct !{!4}
+!4 = distinct !{!4, !{!"llvm.loop.parallel_accesses", !9}}
 !5 = !{i32 0, i32 42}
 !6 = !{}
 !7 = !{i32 1}
 !8 = !{i64 8}
+!9 = distinct !{}
Index: test/Transforms/InstCombine/mem-par-metadata-memcpy.ll
===================================================================
--- test/Transforms/InstCombine/mem-par-metadata-memcpy.ll
+++ test/Transforms/InstCombine/mem-par-metadata-memcpy.ll
@@ -1,6 +1,6 @@
 ; RUN: opt < %s -instcombine -S | FileCheck %s
 ;
-; Make sure the llvm.mem.parallel_loop_access meta-data is preserved
+; Make sure the llvm.access.group meta-data is preserved
 ; when a memcpy is replaced with a load+store by instcombine
 ;
 ; #include <string.h>
@@ -13,8 +13,8 @@
 ; }
 
 ; CHECK: for.body:
-; CHECK:  %{{.*}} = load i16, i16* %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1
-; CHECK:  store i16 %{{.*}}, i16* %{{.*}}, align 1, !llvm.mem.parallel_loop_access !1
+; CHECK:  %{{.*}} = load i16, i16* %{{.*}}, align 1, !llvm.access.group !1
+; CHECK:  store i16 %{{.*}}, i16* %{{.*}}, align 1, !llvm.access.group !1
 
 
 ; ModuleID = '<stdin>'
@@ -36,7 +36,7 @@
   %arrayidx = getelementptr inbounds i8, i8* %out, i64 %i.0
   %add = add nsw i64 %i.0, %size
   %arrayidx1 = getelementptr inbounds i8, i8* %out, i64 %add
-  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.mem.parallel_loop_access !1
+  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %arrayidx, i8* %arrayidx1, i64 2, i1 false), !llvm.access.group !4
   br label %for.inc
 
 for.inc:                                          ; preds = %for.body
@@ -56,6 +56,7 @@
 !llvm.ident = !{!0}
 
 !0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
-!1 = distinct !{!1, !2, !3}
+!1 = distinct !{!1, !2, !3, !{!"llvm.loop.parallel_accesses", !4}}
 !2 = distinct !{!2, !3}
 !3 = !{!"llvm.loop.vectorize.enable", i1 true}
+!4 = distinct !{}
Index: test/Transforms/LoopVectorize/X86/force-ifcvt.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/force-ifcvt.ll
+++ test/Transforms/LoopVectorize/X86/force-ifcvt.ll
@@ -13,21 +13,21 @@
 for.body:                                         ; preds = %cond.end, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
   %arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
-  %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+  %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !1
   %cmp1 = icmp eq i32 %0, 0
   %arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
-  %1 = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
+  %1 = load i32, i32* %arrayidx3, align 4, !llvm.access.group !1
   br i1 %cmp1, label %cond.end, label %cond.false
 
 cond.false:                                       ; preds = %for.body
   %arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
-  %2 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0
+  %2 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !1
   %add = add nsw i32 %2, %1
   br label %cond.end
 
 cond.end:                                         ; preds = %for.body, %cond.false
   %cond = phi i32 [ %add, %cond.false ], [ %1, %for.body ]
-  store i32 %cond, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
+  store i32 %cond, i32* %arrayidx3, align 4, !llvm.access.group !1
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 16
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
@@ -38,4 +38,5 @@
 
 attributes #0 = { norecurse nounwind uwtable "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" }
 
-!0 = distinct !{!0}
+!0 = distinct !{!0, !{!"llvm.loop.parallel_accesses", !1}}
+!1 = distinct !{}
Index: test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
+++ test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
@@ -15,9 +15,9 @@
 
 for.end.us:                                       ; preds = %for.body3.us
   %arrayidx9.us = getelementptr inbounds i32, i32* %b, i64 %indvars.iv33
-  %0 = load i32, i32* %arrayidx9.us, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load i32, i32* %arrayidx9.us, align 4, !llvm.access.group !6
   %add10.us = add nsw i32 %0, 3
-  store i32 %add10.us, i32* %arrayidx9.us, align 4, !llvm.mem.parallel_loop_access !3
+  store i32 %add10.us, i32* %arrayidx9.us, align 4, !llvm.access.group !6
   %indvars.iv.next34 = add i64 %indvars.iv33, 1
   %lftr.wideiv35 = trunc i64 %indvars.iv.next34 to i32
   %exitcond36 = icmp eq i32 %lftr.wideiv35, %m
@@ -29,9 +29,9 @@
   %add4.us = add i32 %add.us, %1
   %idxprom.us = sext i32 %add4.us to i64
   %arrayidx.us = getelementptr inbounds i32, i32* %a, i64 %idxprom.us
-  %2 = load i32, i32* %arrayidx.us, align 4, !llvm.mem.parallel_loop_access !3
+  %2 = load i32, i32* %arrayidx.us, align 4, !llvm.access.group !6
   %add5.us = add nsw i32 %2, 1
-  store i32 %add5.us, i32* %arrayidx7.us, align 4, !llvm.mem.parallel_loop_access !3
+  store i32 %add5.us, i32* %arrayidx7.us, align 4, !llvm.access.group !6
   %indvars.iv.next30 = add i64 %indvars.iv29, 1
   %lftr.wideiv31 = trunc i64 %indvars.iv.next30 to i32
   %exitcond32 = icmp eq i32 %lftr.wideiv31, %m
@@ -50,7 +50,7 @@
 
 attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-frame-pointer-elim-non-leaf"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" }
 
-!3 = !{!4, !5}
+!3 = !{!4, !5, !{!"llvm.loop.parallel_accesses", !6}}
 !4 = !{!4}
 !5 = !{!5}
-
+!6 = distinct !{}
Index: test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
+++ test/Transforms/LoopVectorize/X86/parallel-loops-after-reg2mem.ll
@@ -19,19 +19,19 @@
 for.body:                                         ; preds = %for.body.for.body_crit_edge, %entry
   %indvars.iv.reload = load i64, i64* %indvars.iv.reg2mem
   %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.reload
-  %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !4
   %arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv.reload
-  %1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !4
   %idxprom3 = sext i32 %1 to i64
   %arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
-  store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !3
+  store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !4
   %indvars.iv.next = add i64 %indvars.iv.reload, 1
   ; A new store without the parallel metadata here:
   store i64 %indvars.iv.next, i64* %indvars.iv.next.reg2mem
   %indvars.iv.next.reload1 = load i64, i64* %indvars.iv.next.reg2mem
   %arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next.reload1
-  %2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3
-  store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !4
+  store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !4
   %indvars.iv.next.reload = load i64, i64* %indvars.iv.next.reg2mem
   %lftr.wideiv = trunc i64 %indvars.iv.next.reload to i32
   %exitcond = icmp eq i32 %lftr.wideiv, 512
@@ -46,4 +46,5 @@
   ret void
 }
 
-!3 = !{!3}
+!3 = !{!3, !{!"llvm.loop.parallel_accesses", !4}}
+!4 = distinct !{}
Index: test/Transforms/LoopVectorize/X86/parallel-loops.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/parallel-loops.ll
+++ test/Transforms/LoopVectorize/X86/parallel-loops.ll
@@ -51,18 +51,18 @@
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
-  %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !13
   %arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
-  %1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !13
   %idxprom3 = sext i32 %1 to i64
   %arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
   ; This store might have originated from inlining a function with a parallel
   ; loop. Refers to a list with the "original loop reference" (!4) also included.
-  store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !5
+  store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !15
   %indvars.iv.next = add i64 %indvars.iv, 1
   %arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
-  %2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !3
-  store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !13
+  store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !13
   %lftr.wideiv = trunc i64 %indvars.iv.next to i32
   %exitcond = icmp eq i32 %lftr.wideiv, 512
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
@@ -84,18 +84,18 @@
 for.body:                                         ; preds = %for.body, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds i32, i32* %b, i64 %indvars.iv
-  %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !6
+  %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !16
   %arrayidx2 = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
-  %1 = load i32, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
+  %1 = load i32, i32* %arrayidx2, align 4, !llvm.access.group !16
   %idxprom3 = sext i32 %1 to i64
   %arrayidx4 = getelementptr inbounds i32, i32* %a, i64 %idxprom3
   ; This refers to the loop marked with !7 which we are not in at the moment.
   ; It should prevent detecting as a parallel loop.
-  store i32 %0, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !7
+  store i32 %0, i32* %arrayidx4, align 4, !llvm.access.group !17
   %indvars.iv.next = add i64 %indvars.iv, 1
   %arrayidx6 = getelementptr inbounds i32, i32* %b, i64 %indvars.iv.next
-  %2 = load i32, i32* %arrayidx6, align 4, !llvm.mem.parallel_loop_access !6
-  store i32 %2, i32* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !6
+  %2 = load i32, i32* %arrayidx6, align 4, !llvm.access.group !16
+  store i32 %2, i32* %arrayidx2, align 4, !llvm.access.group !16
   %lftr.wideiv = trunc i64 %indvars.iv.next to i32
   %exitcond = icmp eq i32 %lftr.wideiv, 512
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !6
@@ -104,8 +104,12 @@
   ret void
 }
 
-!3 = !{!3}
-!4 = !{!4}
-!5 = !{!3, !4}
-!6 = !{!6}
-!7 = !{!7}
+!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13, !15}}
+!4 = !{!4, !{!"llvm.loop.parallel_accesses", !14, !15}}
+!6 = !{!6, !{!"llvm.loop.parallel_accesses", !16}}
+!7 = !{!7, !{!"llvm.loop.parallel_accesses", !17}}
+!13 = distinct !{}
+!14 = distinct !{}
+!15 = distinct !{}
+!16 = distinct !{}
+!17 = distinct !{}
Index: test/Transforms/LoopVectorize/X86/pr34438.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/pr34438.ll
+++ test/Transforms/LoopVectorize/X86/pr34438.ll
@@ -18,11 +18,11 @@
 for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
-  %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load float, float* %arrayidx, align 4, !llvm.access.group !5
   %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
-  %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !5
   %add = fadd fast float %0, %1
-  store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  store float %add, float* %arrayidx2, align 4, !llvm.access.group !5
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 8
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4
@@ -31,5 +31,6 @@
   ret void
 }
 
-!3 = !{!3}
+!3 = !{!3, !{!"llvm.loop.parallel_accesses", !5}}
 !4 = !{!4}
+!5 = distinct !{}
Index: test/Transforms/LoopVectorize/X86/vect.omp.force.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/vect.omp.force.ll
+++ test/Transforms/LoopVectorize/X86/vect.omp.force.ll
@@ -32,10 +32,10 @@
 for.body:
   %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
   %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
-  %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1
+  %0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
   %call = tail call float @llvm.sin.f32(float %0)
   %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
-  store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1
+  store float %call, float* %arrayidx2, align 4, !llvm.access.group !11
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
   %lftr.wideiv = trunc i64 %indvars.iv.next to i32
   %exitcond = icmp eq i32 %lftr.wideiv, 1000
@@ -48,8 +48,9 @@
   ret void
 }
 
-!1 = !{!1, !2}
+!1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
 !2 = !{!"llvm.loop.vectorize.enable", i1 true}
+!11 = distinct !{}
 
 ;
 ; This method will not be vectorized, as scalar cost is lower than any of vector costs.
@@ -62,10 +63,10 @@
 for.body:
   %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
   %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
-  %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
   %call = tail call float @llvm.sin.f32(float %0)
   %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
-  store float %call, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  store float %call, float* %arrayidx2, align 4, !llvm.access.group !13
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 2
   %lftr.wideiv = trunc i64 %indvars.iv.next to i32
   %exitcond = icmp eq i32 %lftr.wideiv, 1000
@@ -81,5 +82,6 @@
 declare float @llvm.sin.f32(float) nounwind readnone
 
 ; Dummy metadata
-!3 = !{!3}
+!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
+!13 = distinct !{}
 
Index: test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll
+++ test/Transforms/LoopVectorize/X86/vect.omp.force.small-tc.ll
@@ -31,11 +31,11 @@
 for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
-  %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !1
+  %0 = load float, float* %arrayidx, align 4, !llvm.access.group !11
   %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
-  %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1
+  %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !11
   %add = fadd fast float %0, %1
-  store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !1
+  store float %add, float* %arrayidx2, align 4, !llvm.access.group !11
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 20
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !1
@@ -44,8 +44,9 @@
   ret void
 }
 
-!1 = !{!1, !2}
+!1 = !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
 !2 = !{!"llvm.loop.vectorize.enable", i1 true}
+!11 = distinct !{}
 
 ;
 ; This loop will not be vectorized as the trip count is below the threshold.
@@ -57,11 +58,11 @@
 for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
-  %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
   %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
-  %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
   %add = fadd fast float %0, %1
-  store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 20
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !3
@@ -70,7 +71,8 @@
   ret void
 }
 
-!3 = !{!3}
+!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
+!13 = distinct !{}
 
 ;
 ; This loop will be vectorized as the trip count is below the threshold but no
@@ -83,11 +85,11 @@
 for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds float, float* %B, i64 %indvars.iv
-  %0 = load float, float* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %0 = load float, float* %arrayidx, align 4, !llvm.access.group !13
   %arrayidx2 = getelementptr inbounds float, float* %A, i64 %indvars.iv
-  %1 = load float, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %1 = load float, float* %arrayidx2, align 4, !llvm.access.group !13
   %add = fadd fast float %0, %1
-  store float %add, float* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  store float %add, float* %arrayidx2, align 4, !llvm.access.group !13
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 16
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4
Index: test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
===================================================================
--- test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
+++ test/Transforms/LoopVectorize/X86/vector_max_bandwidth.ll
@@ -58,11 +58,11 @@
 for.body:
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
   %arrayidx = getelementptr inbounds i8, i8* %B, i64 %indvars.iv
-  %l1 = load i8, i8* %arrayidx, align 4, !llvm.mem.parallel_loop_access !3
+  %l1 = load i8, i8* %arrayidx, align 4, !llvm.access.group !13
   %arrayidx2 = getelementptr inbounds i8, i8* %A, i64 %indvars.iv
-  %l2 = load i8, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  %l2 = load i8, i8* %arrayidx2, align 4, !llvm.access.group !13
   %add = add i8 %l1, %l2
-  store i8 %add, i8* %arrayidx2, align 4, !llvm.mem.parallel_loop_access !3
+  store i8 %add, i8* %arrayidx2, align 4, !llvm.access.group !13
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 16
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !4
@@ -70,5 +70,6 @@
 for.end:
   ret void
 }
-!3 = !{!3}
+!3 = !{!3, !{!"llvm.loop.parallel_accesses", !13}}
 !4 = !{!4}
+!13 = distinct !{}
Index: test/Transforms/SROA/mem-par-metadata-sroa.ll
===================================================================
--- test/Transforms/SROA/mem-par-metadata-sroa.ll
+++ test/Transforms/SROA/mem-par-metadata-sroa.ll
@@ -1,6 +1,6 @@
 ; RUN: opt < %s -sroa -S | FileCheck %s
 ;
-; Make sure the llvm.mem.parallel_loop_access meta-data is preserved
+; Make sure the llvm.access.group meta-data is preserved
 ; when a load/store is replaced with another load/store by sroa
 ;
 ; class Complex {
@@ -33,9 +33,9 @@
 
 ; CHECK: for.body:
 ; CHECK-NOT:  store i32 %{{.*}}, i32* %{{.*}}, align 4
-; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1
+; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.access.group !1
 ; CHECK-NOT:  store i32 %{{.*}}, i32* %{{.*}}, align 4
-; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.mem.parallel_loop_access !1
+; CHECK: store i32 %{{.*}}, i32* %{{.*}}, align 4, !llvm.access.group !1
 ; CHECK-NOT:  store i32 %{{.*}}, i32* %{{.*}}, align 4
 ; CHECK: br label
 
@@ -63,30 +63,30 @@
   %arrayidx = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
   %real_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
   %real_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 0
-  %0 = load float, float* %real_.i.i, align 4, !llvm.mem.parallel_loop_access !1
-  store float %0, float* %real_.i, align 4, !llvm.mem.parallel_loop_access !1
+  %0 = load float, float* %real_.i.i, align 4, !llvm.access.group !11
+  store float %0, float* %real_.i, align 4, !llvm.access.group !11
   %imaginary_.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
   %imaginary_.i.i = getelementptr inbounds %class.Complex, %class.Complex* %arrayidx, i64 0, i32 1
-  %1 = load float, float* %imaginary_.i.i, align 4, !llvm.mem.parallel_loop_access !1
-  store float %1, float* %imaginary_.i, align 4, !llvm.mem.parallel_loop_access !1
+  %1 = load float, float* %imaginary_.i.i, align 4, !llvm.access.group !11
+  store float %1, float* %imaginary_.i, align 4, !llvm.access.group !11
   %arrayidx1 = getelementptr inbounds %class.Complex, %class.Complex* %out, i64 %offset.0
   %real_.i1 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
-  %2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
+  %2 = load float, float* %real_.i1, align 4, !noalias !3, !llvm.access.group !11
   %real_2.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 0
-  %3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
+  %3 = load float, float* %real_2.i, align 4, !noalias !3, !llvm.access.group !11
   %add.i = fadd float %2, %3
   %imaginary_.i2 = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
-  %4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
+  %4 = load float, float* %imaginary_.i2, align 4, !noalias !3, !llvm.access.group !11
   %imaginary_3.i = getelementptr inbounds %class.Complex, %class.Complex* %t0, i64 0, i32 1
-  %5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.mem.parallel_loop_access !1
+  %5 = load float, float* %imaginary_3.i, align 4, !noalias !3, !llvm.access.group !11
   %add4.i = fadd float %4, %5
   %real_.i.i3 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 0
-  store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1
+  store float %add.i, float* %real_.i.i3, align 4, !alias.scope !3, !llvm.access.group !11
   %imaginary_.i.i4 = getelementptr inbounds %class.Complex, %class.Complex* %tmpcast, i64 0, i32 1
-  store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.mem.parallel_loop_access !1
+  store float %add4.i, float* %imaginary_.i.i4, align 4, !alias.scope !3, !llvm.access.group !11
   %6 = bitcast %class.Complex* %arrayidx1 to i64*
-  %7 = load i64, i64* %ref.tmp, align 8, !llvm.mem.parallel_loop_access !1
-  store i64 %7, i64* %6, align 4, !llvm.mem.parallel_loop_access !1
+  %7 = load i64, i64* %ref.tmp, align 8, !llvm.access.group !11
+  store i64 %7, i64* %6, align 4, !llvm.access.group !11
   %inc = add nsw i64 %offset.0, 1
   br label %for.cond, !llvm.loop !1
 
@@ -103,8 +103,9 @@
 !llvm.ident = !{!0}
 
 !0 = !{!"clang version 4.0.0 (cfe/trunk 277751)"}
-!1 = distinct !{!1, !2}
+!1 = distinct !{!1, !2, !{!"llvm.loop.parallel_accesses", !11}}
 !2 = !{!"llvm.loop.vectorize.enable", i1 true}
 !3 = !{!4}
 !4 = distinct !{!4, !5, !"_ZNK7ComplexplERKS_: %agg.result"}
 !5 = distinct !{!5, !"_ZNK7ComplexplERKS_"}
+!11 = distinct !{}
Index: test/Transforms/Scalarizer/basic.ll
===================================================================
--- test/Transforms/Scalarizer/basic.ll
+++ test/Transforms/Scalarizer/basic.ll
@@ -205,17 +205,17 @@
   ret void
 }
 
-; Check that llvm.mem.parallel_loop_access information is preserved.
+; Check that llvm.access.group information is preserved.
 define void @f5(i32 %count, <4 x i32> *%src, <4 x i32> *%dst) {
 ; CHECK-LABEL: @f5(
-; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG:[0-9]*]]
-; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
-; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]
-; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
-; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.mem.parallel_loop_access ![[TAG]]
-; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
-; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.mem.parallel_loop_access ![[TAG]]
-; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.mem.parallel_loop_access ![[TAG]]
+; CHECK: %val.i0 = load i32, i32* %this_src.i0, align 16, !llvm.access.group ![[TAG:[0-9]*]]
+; CHECK: %val.i1 = load i32, i32* %this_src.i1, align 4, !llvm.access.group ![[TAG]]
+; CHECK: %val.i2 = load i32, i32* %this_src.i2, align 8, !llvm.access.group ![[TAG]]
+; CHECK: %val.i3 = load i32, i32* %this_src.i3, align 4, !llvm.access.group ![[TAG]]
+; CHECK: store i32 %add.i0, i32* %this_dst.i0, align 16, !llvm.access.group ![[TAG]]
+; CHECK: store i32 %add.i1, i32* %this_dst.i1, align 4, !llvm.access.group ![[TAG]]
+; CHECK: store i32 %add.i2, i32* %this_dst.i2, align 8, !llvm.access.group ![[TAG]]
+; CHECK: store i32 %add.i3, i32* %this_dst.i3, align 4, !llvm.access.group ![[TAG]]
 ; CHECK: ret void
 entry:
   br label %loop
@@ -224,9 +224,9 @@
   %index = phi i32 [ 0, %entry ], [ %next_index, %loop ]
   %this_src = getelementptr <4 x i32>, <4 x i32> *%src, i32 %index
   %this_dst = getelementptr <4 x i32>, <4 x i32> *%dst, i32 %index
-  %val = load <4 x i32> , <4 x i32> *%this_src, !llvm.mem.parallel_loop_access !3
+  %val = load <4 x i32> , <4 x i32> *%this_src, !llvm.access.group !13
   %add = add <4 x i32> %val, %val
-  store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.mem.parallel_loop_access !3
+  store <4 x i32> %add, <4 x i32> *%this_dst, !llvm.access.group !13
   %next_index = add i32 %index, -1
   %continue = icmp ne i32 %next_index, %count
   br i1 %continue, label %loop, label %end, !llvm.loop !3
@@ -446,6 +446,7 @@
 !0 = !{ !"root" }
 !1 = !{ !"set1", !0 }
 !2 = !{ !"set2", !0 }
-!3 = !{ !3 }
+!3 = !{ !3, !{!"llvm.loop.parallel_accesses", !13} }
 !4 = !{ float 4.0 }
 !5 = !{ i64 0, i64 8, null }
+!13 = distinct !{}
Index: test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll
===================================================================
--- test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll
+++ test/Transforms/SimplifyCFG/combine-parallel-mem-md.ll
@@ -8,39 +8,39 @@
   br label %for.body
 
 ; CHECK-LABEL: @Test
-; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0
-; CHECK: load i32, i32* {{.*}}, align 4, !llvm.mem.parallel_loop_access !0
-; CHECK: store i32 {{.*}}, align 4, !llvm.mem.parallel_loop_access !0
+; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
+; CHECK: load i32, i32* {{.*}}, align 4, !llvm.access.group !0
+; CHECK: store i32 {{.*}}, align 4, !llvm.access.group !0
 ; CHECK-NOT: load
 ; CHECK-NOT: store
 
 for.body:                                         ; preds = %cond.end, %entry
   %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %cond.end ]
   %arrayidx = getelementptr inbounds i32, i32* %p, i64 %indvars.iv
-  %0 = load i32, i32* %arrayidx, align 4, !llvm.mem.parallel_loop_access !0
+  %0 = load i32, i32* %arrayidx, align 4, !llvm.access.group !0
   %cmp1 = icmp eq i32 %0, 0
   br i1 %cmp1, label %cond.true, label %cond.false
 
 cond.false:                                       ; preds = %for.body
   %arrayidx3 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
-  %v = load i32, i32* %arrayidx3, align 4, !llvm.mem.parallel_loop_access !0
+  %v = load i32, i32* %arrayidx3, align 4, !llvm.access.group !0
   %arrayidx7 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
-  %1 = load i32, i32* %arrayidx7, align 4, !llvm.mem.parallel_loop_access !0
+  %1 = load i32, i32* %arrayidx7, align 4, !llvm.access.group !0
   %add = add nsw i32 %1, %v
   br label %cond.end
 
 cond.true:                                       ; preds = %for.body
   %arrayidx4 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
-  %w = load i32, i32* %arrayidx4, align 4, !llvm.mem.parallel_loop_access !0
+  %w = load i32, i32* %arrayidx4, align 4, !llvm.access.group !0
   %arrayidx8 = getelementptr inbounds i32, i32* %d, i64 %indvars.iv
-  %2 = load i32, i32* %arrayidx8, align 4, !llvm.mem.parallel_loop_access !0
+  %2 = load i32, i32* %arrayidx8, align 4, !llvm.access.group !0
   %add2 = add nsw i32 %2, %w
   br label %cond.end
 
 cond.end:                                         ; preds = %for.body, %cond.false
   %cond = phi i32 [ %add, %cond.false ], [ %add2, %cond.true ]
   %arrayidx9 = getelementptr inbounds i32, i32* %res, i64 %indvars.iv
-  store i32 %cond, i32* %arrayidx9, align 4, !llvm.mem.parallel_loop_access !0
+  store i32 %cond, i32* %arrayidx9, align 4, !llvm.access.group !0
   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
   %exitcond = icmp eq i64 %indvars.iv.next, 16
   br i1 %exitcond, label %for.end, label %for.body, !llvm.loop !0
@@ -51,5 +51,6 @@
 
 attributes #0 = { norecurse nounwind uwtable }
 
-!0 = distinct !{!0, !1}
+!0 = distinct !{!0, !1, !{!"llvm.loop.parallel_accesses", !10}}
 !1 = !{!"llvm.loop.vectorize.enable", i1 true}
+!10 = distinct !{}