This is an archive of the discontinued LLVM Phabricator instance.

Fix conversion of loops to GPU with no block/thread dimensions.
ClosedPublic

Authored by herhut on Jan 30 2020, 2:38 AM.

Download Raw Diff

Details

Reviewers

ftynse
nicolasvasilache

Commits

rG84695dd4d788: Fix conversion of loops to GPU with no block/thread dimensions.

Summary

The current code assumes that one always maps at least one loop to block
dimensions and at least one loop to thread dimensions. If either is not
the case, a loop would get mapped twice.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

herhut created this revision.Jan 30 2020, 2:38 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJan 30 2020, 2:38 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: llvm-commits, liufengdb, aartbik and 10 others. · View Herald Transcript

herhut edited reviewers, added: ftynse; removed: nicolasvasilache.Jan 30 2020, 2:38 AM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptJan 30 2020, 2:38 AM

Good catch, thanks!

mlir/test/Conversion/LoopsToGPU/no_blocks_no_threads.mlir
11	Nit: do we really need to know there are two "constant 1" emitted? More nit: does the absence of ops in between the given ops (-NEXT) matter?

This revision is now accepted and ready to land.Jan 30 2020, 2:44 AM

Unit tests: pass. 62328 tests passed, 0 failed and 838 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster completed remote builds in B45329: Diff 241381.Jan 30 2020, 2:51 AM

herhut marked an inline comment as done.Jan 30 2020, 3:38 AM

herhut added inline comments.

mlir/test/Conversion/LoopsToGPU/no_blocks_no_threads.mlir
11	I took the NEXT from the other loop lowering test. The check for two constant 1 is there as we have two constants and I need to match the second. I want to make sure that unmapped grids/blocks actually are constant one. Is there a better way?

ftynse added inline comments.Jan 30 2020, 3:41 AM

mlir/test/Conversion/LoopsToGPU/no_blocks_no_threads.mlir
11	I see. It's fine this way.

Closed by commit rG84695dd4d788: Fix conversion of loops to GPU with no block/thread dimensions. (authored by herhut). · Explain WhyJan 31 2020, 2:16 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: Joonsoo. · View Herald TranscriptJan 31 2020, 2:16 AM

Revision Contents

Path

Size

mlir/

lib/

Conversion/

LoopsToGPU/

LoopsToGPU.cpp

4 lines

test/

Conversion/

LoopsToGPU/

no_blocks_no_threads.mlir

34 lines

Diff 241381

mlir/lib/Conversion/LoopsToGPU/LoopsToGPU.cpp

Show First 20 Lines • Show All 397 Lines • ▼ Show 20 Lines	void LoopToGpuConverter::createLaunch(OpTy rootForOp, OpTy innermostForOp,
unsigned numBlockDims,		unsigned numBlockDims,
unsigned numThreadDims) {		unsigned numThreadDims) {
OpBuilder builder(rootForOp.getOperation());		OpBuilder builder(rootForOp.getOperation());
// Prepare the grid and block sizes for the launch operation. If there is		// Prepare the grid and block sizes for the launch operation. If there is
// no loop mapped to a specific dimension, use constant "1" as its size.		// no loop mapped to a specific dimension, use constant "1" as its size.
Value constOne = (numBlockDims < 3 \|\| numThreadDims < 3)		Value constOne = (numBlockDims < 3 \|\| numThreadDims < 3)
? builder.create<ConstantIndexOp>(rootForOp.getLoc(), 1)		? builder.create<ConstantIndexOp>(rootForOp.getLoc(), 1)
: nullptr;		: nullptr;
Value gridSizeX = dims[0];		Value gridSizeX = numBlockDims > 0 ? dims[0] : constOne;
Value gridSizeY = numBlockDims > 1 ? dims[1] : constOne;		Value gridSizeY = numBlockDims > 1 ? dims[1] : constOne;
Value gridSizeZ = numBlockDims > 2 ? dims[2] : constOne;		Value gridSizeZ = numBlockDims > 2 ? dims[2] : constOne;
Value blockSizeX = dims[numBlockDims];		Value blockSizeX = numThreadDims > 0 ? dims[numBlockDims] : constOne;
Value blockSizeY = numThreadDims > 1 ? dims[numBlockDims + 1] : constOne;		Value blockSizeY = numThreadDims > 1 ? dims[numBlockDims + 1] : constOne;
Value blockSizeZ = numThreadDims > 2 ? dims[numBlockDims + 2] : constOne;		Value blockSizeZ = numThreadDims > 2 ? dims[numBlockDims + 2] : constOne;

// Create a launch op and move the body region of the innermost loop to the		// Create a launch op and move the body region of the innermost loop to the
// launch op. Pass the values defined outside the outermost loop and used		// launch op. Pass the values defined outside the outermost loop and used
// inside the innermost loop and loop lower bounds as kernel data arguments.		// inside the innermost loop and loop lower bounds as kernel data arguments.
// Still assuming perfect nesting so there are no values other than induction		// Still assuming perfect nesting so there are no values other than induction
// variables that are defined in one loop and used in deeper loops.		// variables that are defined in one loop and used in deeper loops.
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

mlir/test/Conversion/LoopsToGPU/no_blocks_no_threads.mlir

This file was added.

				// RUN: mlir-opt -convert-loops-to-gpu -gpu-block-dims=0 -gpu-thread-dims=1 %s \| FileCheck --check-prefix=CHECK-THREADS %s --dump-input-on-failure
				// RUN: mlir-opt -convert-loops-to-gpu -gpu-block-dims=1 -gpu-thread-dims=0 %s \| FileCheck --check-prefix=CHECK-BLOCKS %s --dump-input-on-failure

				// CHECK-THREADS-LABEL: @one_d_loop
				// CHECK-BLOCKS-LABEL: @one_d_loop
				func @one_d_loop(%A : memref<?xf32>, %B : memref<?xf32>) {
				// Bounds of the loop, its range and step.
				// CHECK-THREADS-NEXT: %{{.*}} = constant 0 : index
				// CHECK-THREADS-NEXT: %{{.*}} = constant 42 : index
				// CHECK-THREADS-NEXT: %[[BOUND:.]] = subi %{{.}}, %{{.*}} : index
				// CHECK-THREADS-NEXT: %{{.*}} = constant 1 : index
				ftynseUnsubmitted Not Done Reply Inline Actions Nit: do we really need to know there are two "constant 1" emitted? More nit: does the absence of ops in between the given ops (-NEXT) matter? ftynse: Nit: do we really need to know there are two "constant 1" emitted? More nit: does the absence…
				herhutAuthorUnsubmitted Done Reply Inline Actions I took the NEXT from the other loop lowering test. The check for two constant 1 is there as we have two constants and I need to match the second. I want to make sure that unmapped grids/blocks actually are constant one. Is there a better way? herhut: I took the NEXT from the other loop lowering test. The check for two constant 1 is there as we…
				ftynseUnsubmitted Not Done Reply Inline Actions I see. It's fine this way. ftynse: I see. It's fine this way.
				// CHECK-THREADS-NEXT: %[[ONE:.*]] = constant 1 : index
				//
				// CHECK-BLOCKS-NEXT: %{{.*}} = constant 0 : index
				// CHECK-BLOCKS-NEXT: %{{.*}} = constant 42 : index
				// CHECK-BLOCKS-NEXT: %[[BOUND:.]] = subi %{{.}}, %{{.*}} : index
				// CHECK-BLOCKS-NEXT: %{{.*}} = constant 1 : index
				// CHECK-BLOCKS-NEXT: %[[ONE:.*]] = constant 1 : index

				// CHECK-THREADS-NEXT: gpu.launch blocks(%[[B0:.]], %[[B1:.]], %[[B2:.]]) in (%{{.}} = %[[ONE]], %{{.}} = %[[ONE]], %{{.}}0 = %[[ONE]]) threads(%[[T0:.]], %[[T1:.]], %[[T2:.]]) in (%{{.}} = %[[BOUND]], %{{.}} = %[[ONE]], %{{.}} = %[[ONE]])
				// CHECK-BLOCKS-NEXT: gpu.launch blocks(%[[B0:.]], %[[B1:.]], %[[B2:.]]) in (%{{.}} = %[[BOUND]], %{{.}} = %[[ONE]], %{{.}}0 = %[[ONE]]) threads(%[[T0:.]], %[[T1:.]], %[[T2:.]]) in (%{{.}} = %[[ONE]], %{{.}} = %[[ONE]], %{{.}} = %[[ONE]])
				affine.for %i = 0 to 42 {
				// CHECK-THREADS-NEXT: %[[INDEX:.]] = addi %{{.}}, %[[T0]]
				// CHECK-THREADS-NEXT: load %{{.*}}[%[[INDEX]]]
				// CHECK-BLOCKS-NEXT: %[[INDEX:.]] = addi %{{.}}, %[[B0]]
				// CHECK-BLOCKS-NEXT: load %{{.*}}[%[[INDEX]]]
				%0 = load %A[%i] : memref<?xf32>
				store %0, %B[%i] : memref<?xf32>
				// CHECK-THREADS: gpu.return
				// CHECK-BLOCKS: gpu.return
				}
				return
				}