This is an archive of the discontinued LLVM Phabricator instance.

[Polly][WIP] Use SCEV information for the second level aliasing
ClosedPublic

Authored by gareevroman on Jul 22 2017, 12:25 AM.

Download Raw Diff

Details

Reviewers

grosser
Meinersbur
jdoerfert
bollu

Commits

rG1563f039f504: Use SCEV information for the second level aliasing
rPLO310380: Use SCEV information for the second level aliasing
rL310380: Use SCEV information for the second level aliasing

Summary

We introduce another level of alias metadata to distinguish the individual non-aliasing accesses that have inter iteration alias-free base pointers marked with "Inter iteration alias-free" mark nodes. To distinguish two accesses, the comparison of raw pointers representing base pointers is used.

In case of, for example, ublas's prod function that implements GEMM, and DeLiCM we can get accesses to same location represented by different raw pointers. Consequently, we create different alias sets that can prevent accesses from, for example, being sinked or hoisted.

To avoid the issue, we compare the corresponding SCEV information instead of the corresponding raw pointers.

Diff Detail

Event Timeline

gareevroman created this revision.Jul 22 2017, 12:25 AM

Herald added a reviewer: bollu. · View Herald TranscriptJul 22 2017, 12:25 AM

Hi Roman,

thanks for the patch. I looked into it a couple of times, but need to have another deeper look. Two high-level questions already:

Why do we need to use SCEV? Should we not be able to tell form our information which base pointers are expected to be identical?
What exactly must be checked in the new test case here. What was generated before, why was this wrong, and what does the new code generate.

Also, I may have missed this earlier. Why are the number of elements in the check lines growing so much. Is this expected?

CHECK-NEXT: !43 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38, !40}

Marking this as requesting changes, to move this out of the review queue for now.

Roman, it would be great if you could answer my questions before i look deeper.

This revision now requires changes to proceed.Aug 3 2017, 11:54 PM

Why do we need to use SCEV? Should we not be able to tell form our information which base pointers are expected to be identical?

I think we should be able to tell that. However, it seems we don't do it.

The test case swaps the following access relations (the original JSCoP can be found in the new version of the patch):

{

"kind" : "read",
"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C1[0] }"

}

{

"kind" : "read",
"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"

}

Subsequently, for some reason, Polly generates different accesses to elements of the matrix C. For example, in case of C[3][7], Polly generates the following code:

  %459 = mul nsw i64 96, %polly.indvar98
  %460 = mul nsw i64 4, %polly.indvar134
  %461 = add nsw i64 %459, %460
  %462 = add nsw i64 %461, 3
  %463 = mul nsw i64 8, %polly.indvar128
  %464 = add nsw i64 %463, 7
  %465 = mul nsw i64 256, %polly.indvar
  %466 = add nsw i64 %465, %polly.indvar140
  br label %polly.stmt.for.body6940

polly.stmt.for.body6940:                          ; preds = %polly.stmt.for.body6914
  %polly.access.cast.C941 = bitcast [1024 x double]* %C to double*
  %467 = mul nsw i64 96, %polly.indvar98
  %468 = mul nsw i64 4, %polly.indvar134
  %469 = add nsw i64 %467, %468
  %470 = add nsw i64 %469, 3
  %polly.access.mul.C942 = mul nsw i64 %470, 1024
  %471 = mul nsw i64 8, %polly.indvar128
  %472 = add nsw i64 %471, 7
  %polly.access.add.C943 = add nsw i64 %polly.access.mul.C942, %472
  %polly.access.C944 = getelementptr double, double* %polly.access.cast.C941, i64 %polly.access.add.C943
  %tmp_p_scalar_945 = load double, double* %polly.access.C944, align 8, !alias.scope !136, !noalias !137
  %polly.access.cast.Packed_A946 = bitcast [24 x [256 x [4 x double]]]* %Packed_A to double*
  %polly.access.mul.Packed_A947 = mul nsw i64 %polly.indvar134, 256
  %polly.access.add.Packed_A948 = add nsw i64 %polly.access.mul.Packed_A947, %polly.indvar140
  %polly.access.mul.Packed_A949 = mul nsw i64 %polly.access.add.Packed_A948, 4
  %polly.access.add.Packed_A950 = add nsw i64 %polly.access.mul.Packed_A949, 3
  %polly.access.Packed_A951 = getelementptr double, double* %polly.access.cast.Packed_A946, i64 %polly.access.add.Packed_A950
  %tmp1_p_scalar_952 = load double, double* %polly.access.Packed_A951, align 8, !alias.scope !7, !noalias !10
  %polly.access.cast.Packed_B953 = bitcast [256 x [256 x [8 x double]]]* %Packed_B to double*
  %polly.access.mul.Packed_B954 = mul nsw i64 %polly.indvar128, 256
  %polly.access.add.Packed_B955 = add nsw i64 %polly.access.mul.Packed_B954, %polly.indvar140
  %polly.access.mul.Packed_B956 = mul nsw i64 %polly.access.add.Packed_B955, 8
  %polly.access.add.Packed_B957 = add nsw i64 %polly.access.mul.Packed_B956, 7
  %polly.access.Packed_B958 = getelementptr double, double* %polly.access.cast.Packed_B953, i64 %polly.access.add.Packed_B957
  %tmp2_p_scalar_959 = load double, double* %polly.access.Packed_B958, align 8, !alias.scope !6, !noalias !8
  %p_mul960 = fmul double %tmp1_p_scalar_952, %tmp2_p_scalar_959
  %p_add961 = fadd double %tmp_p_scalar_945, %p_mul960
  %polly.access.C1962 = getelementptr double, double* %C1, i64 0
  %tmp3_p_scalar_963 = load double, double* %polly.access.C1962, align 8, !alias.scope !3, !noalias !13
  %p_add18964 = fadd double %tmp3_p_scalar_963, %p_add961
  %scevgep965 = getelementptr [1024 x double], [1024 x double]* %C, i64 %462, i64 %464
  store double %p_add18964, double* %scevgep965, align 8, !alias.scope !138, !noalias !139
  %polly.indvar_next141 = add nsw i64 %polly.indvar140, 1
  %polly.loop_cond142 = icmp sle i64 %polly.indvar_next141, 255
  br i1 %polly.loop_cond142, label %polly.loop_header137, label %polly.loop_exit139

Let's consider how accesses to the matrix C are formed. Indices of the first dimension are equal:

…

%459 = mul nsw i64 96, %polly.indvar98
%460 = mul nsw i64 4, %polly.indvar134
%461 = add nsw i64 %459, %460
%462 = add nsw i64 %461, 3

…

%467 = mul nsw i64 96, %polly.indvar98
%468 = mul nsw i64 4, %polly.indvar134
%469 = add nsw i64 %467, %468
%470 = add nsw i64 %469, 3

…

Indices of the second dimension are equal too:

…

%463 = mul nsw i64 8, %polly.indvar128
%464 = add nsw i64 %463, 7

…

%471 = mul nsw i64 8, %polly.indvar128
%472 = add nsw i64 %471, 7

…

However, to store a value we access the two dimensional array:

%scevgep965 = getelementptr [1024 x double], [1024 x double]* %C, i64 %462, i64 %464

store double %p_add18964, double* %scevgep965, align 8

To read the value we access the one dimensional array:

%polly.access.cast.C941 = bitcast [1024 x double]* %C to double*

…

%polly.access.mul.C942 = mul nsw i64 %470, 1024

…

%polly.access.add.C943 = add nsw i64 %polly.access.mul.C942, %472
  %polly.access.C944 = getelementptr double, double* %polly.access.cast.C941, i64 %polly.access.add.C943
  %tmp_p_scalar_945 = load double, double* %polly.access.C944, align 8

Consequently, as far as I understand, we have two different base pointers that point to the same location. Since we compare raw pointers to determine a second level alias set, Polly generates different second level alias set for these read and write accesses to the matrix C. In case of DeLiCM, we have a similar situation. However, I haven't manage to get a reduced test case.

Probably, it'd be good to fix the problem of the redundant code generation of Polly. Unfortunately, I'm busy at the moment and can't do it. In any case, I think that it makes sense to make the second level aliasing use the SCEV information, since it makes it more robust.

What exactly must be checked in the new test case here.

Check that we don't create different alias sets for locations represented by different raw pointers.

What was generated before, why was this wrong,

Previously, we generated 64 second level alias sets instead of 32 second level alias sets, since we comparee raw pointers to determine second level alias sets.

and what does the new code generate.

32 second level alias sets.

Also, I may have missed this earlier. Why are the number of elements in the check lines growing so much. Is this expected?

CHECK-NEXT: !43 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38, !40}

I think this is expected. The innermost loop computes 32 different elements of the matrix C. Consequently, 32 different second level alias sets are generated.

P.S.: Sorry, I've found out that we check the wrong output. I've updated the test case.

OK, that's fine for me than. Can you possibly add to the test case check lines for the load and store to illustrate to which alias data they refer to?

This revision is now accepted and ready to land.Aug 5 2017, 2:11 PM

Closed by commit rL310380: Use SCEV information for the second level aliasing (authored by romangareev). · Explain WhyAug 8 2017, 9:52 AM

This revision was automatically updated to reflect the committed changes.

In D35761#833110, @grosser wrote:

OK, that's fine for me than. Can you possibly add to the test case check lines for the load and store to illustrate to which alias data they refer to?

AFAIU, it'd require to add IR and make it dependent on the generated code. I'm not sure that it's a good approach.

I am slightly confused. Don't we add IR in all our test cases?

This revision is now accepted and ready to land.Aug 8 2017, 9:59 AM

Closed by commit rG1563f039f504: Use SCEV information for the second level aliasing (authored by gareevroman). · Explain WhyOct 7 2019, 5:41 AM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptOct 7 2019, 5:41 AM

Herald added a subscriber: javed.absar. · View Herald Transcript

Meinersbur added a reverting change: rGcad9f98a2ad9: [Polly] Don't generate inter-iteration noalias metadata..Sep 20 2021, 8:20 PM

Revision Contents

Path

Size

include/

polly/

CodeGen/

IRBuilder.h

6 lines

lib/

CodeGen/

IRBuilder.cpp

18 lines

test/

ScheduleOptimizer/

kernel_gemm___%for.body---%for.end24.jscop.transformed

55 lines

pattern-matching-based-opts_12.ll

107 lines

Diff 107776

include/polly/CodeGen/IRBuilder.h

Show All 11 Lines
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef POLLY_CODEGEN_IRBUILDER_H		#ifndef POLLY_CODEGEN_IRBUILDER_H
#define POLLY_CODEGEN_IRBUILDER_H		#define POLLY_CODEGEN_IRBUILDER_H

#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/Analysis/LoopInfo.h"		#include "llvm/Analysis/LoopInfo.h"
		#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/ValueMap.h"		#include "llvm/IR/ValueMap.h"

namespace llvm {		namespace llvm {
class ScalarEvolution;		class ScalarEvolution;
} // namespace llvm		} // namespace llvm

namespace polly {		namespace polly {
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Lines	private:
/// A map from base pointers to its alias scope.		/// A map from base pointers to its alias scope.
llvm::MapVector<llvm::AssertingVH<llvm::Value>, llvm::MDNode *> AliasScopeMap;		llvm::MapVector<llvm::AssertingVH<llvm::Value>, llvm::MDNode *> AliasScopeMap;

/// A map from base pointers to an alias scope list of other pointers.		/// A map from base pointers to an alias scope list of other pointers.
llvm::DenseMap<llvm::AssertingVH<llvm::Value>, llvm::MDNode *>		llvm::DenseMap<llvm::AssertingVH<llvm::Value>, llvm::MDNode *>
OtherAliasScopeListMap;		OtherAliasScopeListMap;

/// A map from pointers to second level alias scopes.		/// A map from pointers to second level alias scopes.
llvm::DenseMap<llvm::AssertingVH<llvm::Value>, llvm::MDNode *>		llvm::DenseMap<const llvm::SCEV , llvm::MDNode > SecondLevelAliasScopeMap;
SecondLevelAliasScopeMap;

/// A map from pointers to second level alias scope list of other pointers.		/// A map from pointers to second level alias scope list of other pointers.
llvm::DenseMap<llvm::AssertingVH<llvm::Value>, llvm::MDNode *>		llvm::DenseMap<const llvm::SCEV , llvm::MDNode >
SecondLevelOtherAliasScopeListMap;		SecondLevelOtherAliasScopeListMap;

/// Inter iteration alias-free base pointers.		/// Inter iteration alias-free base pointers.
llvm::SmallPtrSet<llvm::Value *, 4> InterIterationAliasFreeBasePtrs;		llvm::SmallPtrSet<llvm::Value *, 4> InterIterationAliasFreeBasePtrs;

llvm::DenseMap<llvm::AssertingVH<llvm::Value>, llvm::AssertingVH<llvm::Value>>		llvm::DenseMap<llvm::AssertingVH<llvm::Value>, llvm::AssertingVH<llvm::Value>>
AlternativeAliasBases;		AlternativeAliasBases;
};		};
Show All 39 Lines

lib/CodeGen/IRBuilder.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	static llvm::Value getMemAccInstPointerOperand(Instruction Inst) {
if (!MemInst)		if (!MemInst)
return nullptr;		return nullptr;

return MemInst.getPointerOperand();		return MemInst.getPointerOperand();
}		}

void ScopAnnotator::annotateSecondLevel(llvm::Instruction *Inst,		void ScopAnnotator::annotateSecondLevel(llvm::Instruction *Inst,
llvm::Value *BasePtr) {		llvm::Value *BasePtr) {
auto *Ptr = getMemAccInstPointerOperand(Inst);		auto *PtrSCEV = SE->getSCEV(getMemAccInstPointerOperand(Inst));
if (!Ptr)		auto *BasePtrSCEV = SE->getPointerBase(PtrSCEV);

		if (!PtrSCEV)
return;		return;
auto SecondLevelAliasScope = SecondLevelAliasScopeMap.lookup(Ptr);		auto SecondLevelAliasScope = SecondLevelAliasScopeMap.lookup(PtrSCEV);
auto SecondLevelOtherAliasScopeList =		auto SecondLevelOtherAliasScopeList =
SecondLevelOtherAliasScopeListMap.lookup(Ptr);		SecondLevelOtherAliasScopeListMap.lookup(PtrSCEV);
if (!SecondLevelAliasScope) {		if (!SecondLevelAliasScope) {
auto AliasScope = AliasScopeMap.lookup(BasePtr);		auto AliasScope = AliasScopeMap.lookup(BasePtr);
if (!AliasScope)		if (!AliasScope)
return;		return;
LLVMContext &Ctx = SE->getContext();		LLVMContext &Ctx = SE->getContext();
SecondLevelAliasScope = getID(		SecondLevelAliasScope = getID(
Ctx, AliasScope, MDString::get(Ctx, "second level alias metadata"));		Ctx, AliasScope, MDString::get(Ctx, "second level alias metadata"));
SecondLevelAliasScopeMap[Ptr] = SecondLevelAliasScope;		SecondLevelAliasScopeMap[PtrSCEV] = SecondLevelAliasScope;
Metadata *Args = {SecondLevelAliasScope};		Metadata *Args = {SecondLevelAliasScope};
auto SecondLevelBasePtrAliasScopeList =		auto SecondLevelBasePtrAliasScopeList =
SecondLevelAliasScopeMap.lookup(BasePtr);		SecondLevelAliasScopeMap.lookup(BasePtrSCEV);
SecondLevelAliasScopeMap[BasePtr] = MDNode::concatenate(		SecondLevelAliasScopeMap[BasePtrSCEV] = MDNode::concatenate(
SecondLevelBasePtrAliasScopeList, MDNode::get(Ctx, Args));		SecondLevelBasePtrAliasScopeList, MDNode::get(Ctx, Args));
auto OtherAliasScopeList = OtherAliasScopeListMap.lookup(BasePtr);		auto OtherAliasScopeList = OtherAliasScopeListMap.lookup(BasePtr);
SecondLevelOtherAliasScopeList = MDNode::concatenate(		SecondLevelOtherAliasScopeList = MDNode::concatenate(
OtherAliasScopeList, SecondLevelBasePtrAliasScopeList);		OtherAliasScopeList, SecondLevelBasePtrAliasScopeList);
SecondLevelOtherAliasScopeListMap[Ptr] = SecondLevelOtherAliasScopeList;		SecondLevelOtherAliasScopeListMap[PtrSCEV] = SecondLevelOtherAliasScopeList;
}		}
Inst->setMetadata("alias.scope", SecondLevelAliasScope);		Inst->setMetadata("alias.scope", SecondLevelAliasScope);
Inst->setMetadata("noalias", SecondLevelOtherAliasScopeList);		Inst->setMetadata("noalias", SecondLevelOtherAliasScopeList);
}		}

void ScopAnnotator::annotate(Instruction *Inst) {		void ScopAnnotator::annotate(Instruction *Inst) {
if (!Inst->mayReadOrWriteMemory())		if (!Inst->mayReadOrWriteMemory())
return;		return;
▲ Show 20 Lines • Show All 55 Lines • Show Last 20 Lines

test/ScheduleOptimizer/kernel_gemm___%for.body---%for.end24.jscop.transformed

This file was added.

				{
				"arrays" : [
				{
				"name" : "MemRef_C1",
				"sizes" : [ "*" ],
				"type" : "double"
				},
				{
				"name" : "MemRef_A",
				"sizes" : [ "*", "1024" ],
				"type" : "double"
				},
				{
				"name" : "MemRef_B",
				"sizes" : [ "*", "1024" ],
				"type" : "double"
				},
				{
				"name" : "MemRef_C",
				"sizes" : [ "*", "1024" ],
				"type" : "double"
				}
				],
				"context" : "{ : }",
				"name" : "%for.body---%for.end24",
				"statements" : [
				{
				"accesses" : [
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"
				},
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_A[i0, i2] }"
				},
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_B[i2, i1] }"
				},
				{
				"kind" : "read",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C1[0] }"
				},
				{
				"kind" : "write",
				"relation" : "{ Stmt_for_body6[i0, i1, i2] -> MemRef_C[i0, i1] }"
				}
				],
				"domain" : "{ Stmt_for_body6[i0, i1, i2] : 0 <= i0 <= 1023 and 0 <= i1 <= 1023 and 0 <= i2 <= 1023 }",
				"name" : "Stmt_for_body6",
				"schedule" : "{ Stmt_for_body6[i0, i1, i2] -> [i0, i1, i2] }"
				}
				]
				}

test/ScheduleOptimizer/pattern-matching-based-opts_12.ll

This file was added.

				; RUN: opt %loadPolly -polly-import-jscop -polly-opt-isl \
				; RUN: -polly-target-throughput-vector-fma=1 \
				; RUN: -polly-target-latency-vector-fma=8 \
				; RUN: -polly-target-1st-cache-level-associativity=8 \
				; RUN: -polly-target-2nd-cache-level-associativity=8 \
				; RUN: -polly-target-1st-cache-level-size=32768 \
				; RUN: -polly-target-vector-register-bitwidth=256 \
				; RUN: -polly-target-2nd-cache-level-size=262144 \
				; RUN: -polly-import-jscop-postfix=transformed -polly-codegen -S < %s \
				; RUN: \| FileCheck %s
				;
				; Check that we do not create different alias sets for locations represented by
				; different raw pointers.
				;
				; CHECK: !0 = distinct !{!0, !1, !"polly.alias.scope.MemRef_B"}
				; CHECK-NEXT: !1 = distinct !{!1, !"polly.alias.scope.domain"}
				; CHECK-NEXT: !2 = !{!3, !4, !5, !6, !7}
				; CHECK-NEXT: !3 = distinct !{!3, !1, !"polly.alias.scope.MemRef_C1"}
				; CHECK-NEXT: !4 = distinct !{!4, !1, !"polly.alias.scope.MemRef_A"}
				; CHECK-NEXT: !5 = distinct !{!5, !1, !"polly.alias.scope.MemRef_C"}
				; CHECK-NEXT: !6 = distinct !{!6, !1, !"polly.alias.scope.Packed_B"}
				; CHECK-NEXT: !7 = distinct !{!7, !1, !"polly.alias.scope.Packed_A"}
				; CHECK-NEXT: !8 = !{!3, !4, !0, !5, !7}
				; CHECK-NEXT: !9 = !{!3, !0, !5, !6, !7}
				; CHECK-NEXT: !10 = !{!3, !4, !0, !5, !6}
				; CHECK-NEXT: !11 = distinct !{!11, !5, !"second level alias metadata"}
				; CHECK-NEXT: !12 = !{!3, !4, !0, !6, !7}
				; CHECK-NEXT: !13 = !{!4, !0, !5, !6, !7}
				; CHECK-NEXT: !14 = distinct !{!14, !5, !"second level alias metadata"}
				; CHECK-NEXT: !15 = !{!3, !4, !0, !6, !7, !11}
				; CHECK-NEXT: !16 = distinct !{!16, !5, !"second level alias metadata"}
				; CHECK-NEXT: !17 = !{!3, !4, !0, !6, !7, !11, !14}
				; CHECK-NEXT: !18 = distinct !{!18, !5, !"second level alias metadata"}
				; CHECK-NEXT: !19 = !{!3, !4, !0, !6, !7, !11, !14, !16}
				; CHECK-NEXT: !20 = distinct !{!20, !5, !"second level alias metadata"}
				; CHECK-NEXT: !21 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18}
				; CHECK-NEXT: !22 = distinct !{!22, !5, !"second level alias metadata"}
				; CHECK-NEXT: !23 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20}
				; CHECK-NEXT: !24 = distinct !{!24, !5, !"second level alias metadata"}
				; CHECK-NEXT: !25 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22}
				; CHECK-NEXT: !26 = distinct !{!26, !5, !"second level alias metadata"}
				; CHECK-NEXT: !27 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24}
				; CHECK-NEXT: !28 = distinct !{!28, !5, !"second level alias metadata"}
				; CHECK-NEXT: !29 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26}
				; CHECK-NEXT: !30 = distinct !{!30, !5, !"second level alias metadata"}
				; CHECK-NEXT: !31 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28}
				; CHECK-NEXT: !32 = distinct !{!32, !5, !"second level alias metadata"}
				; CHECK-NEXT: !33 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30}
				; CHECK-NEXT: !34 = distinct !{!34, !5, !"second level alias metadata"}
				; CHECK-NEXT: !35 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32}
				; CHECK-NEXT: !36 = distinct !{!36, !5, !"second level alias metadata"}
				; CHECK-NEXT: !37 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34}
				; CHECK-NEXT: !38 = distinct !{!38, !5, !"second level alias metadata"}
				; CHECK-NEXT: !39 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36}
				; CHECK-NEXT: !40 = distinct !{!40, !5, !"second level alias metadata"}
				; CHECK-NEXT: !41 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38}
				; CHECK-NEXT: !42 = distinct !{!42, !5, !"second level alias metadata"}
				; CHECK-NEXT: !43 = !{!3, !4, !0, !6, !7, !11, !14, !16, !18, !20, !22, !24, !26, !28, !30, !32, !34, !36, !38, !40}
				;
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-unknown"

				define void @kernel_gemm(i32 %ni, i32 %nj, i32 %nk, [1024 x double]* %A, [1024 x double]* %B, [1024 x double]* %C, double* %C1) {
				entry:
				br label %entry.split

				entry.split: ; preds = %entry
				br label %for.body

				for.body: ; preds = %for.inc22, %entry.split
				%indvars.iv43 = phi i64 [ 0, %entry.split ], [ %indvars.iv.next44, %for.inc22 ]
				br label %for.body3

				for.body3: ; preds = %for.inc19, %for.body
				%indvars.iv40 = phi i64 [ 0, %for.body ], [ %indvars.iv.next41, %for.inc19 ]
				br label %for.body6

				for.body6: ; preds = %for.body6, %for.body3
				%indvars.iv = phi i64 [ 0, %for.body3 ], [ %indvars.iv.next, %for.body6 ]
				%tmp = load double, double* %C1, align 8
				%arrayidx9 = getelementptr inbounds [1024 x double], [1024 x double]* %A, i64 %indvars.iv43, i64 %indvars.iv
				%tmp1 = load double, double* %arrayidx9, align 8
				%arrayidx13 = getelementptr inbounds [1024 x double], [1024 x double]* %B, i64 %indvars.iv, i64 %indvars.iv40
				%tmp2 = load double, double* %arrayidx13, align 8
				%mul = fmul double %tmp1, %tmp2
				%add = fadd double %tmp, %mul
				%arrayidx17 = getelementptr inbounds [1024 x double], [1024 x double]* %C, i64 %indvars.iv43, i64 %indvars.iv40
				%tmp3 = load double, double* %arrayidx17, align 8
				%add18 = fadd double %tmp3, %add
				store double %add18, double* %arrayidx17, align 8
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.body6, label %for.inc19

				for.inc19: ; preds = %for.body6
				%indvars.iv.next41 = add nuw nsw i64 %indvars.iv40, 1
				%exitcond42 = icmp ne i64 %indvars.iv.next41, 1024
				br i1 %exitcond42, label %for.body3, label %for.inc22

				for.inc22: ; preds = %for.inc19
				%indvars.iv.next44 = add nuw nsw i64 %indvars.iv43, 1
				%exitcond45 = icmp ne i64 %indvars.iv.next44, 1024
				br i1 %exitcond45, label %for.body, label %for.end24

				for.end24: ; preds = %for.inc22
				ret void
				}