Page MenuHomePhabricator

[RFC] IR extensions for annotating directive region entry/exit with a set of OpBundle name definitions for OpenMP
Needs ReviewPublic

Authored by xtian on Apr 4 2019, 10:19 AM.

Details

Summary

This is the first patch as a starting point for two RFCs list below.

[llvm-dev] [RFC] IR-level Region Annotations
[llvm-dev] [RFC] An Extension Mechanism for Parallel Compilers Based on LLVM

The updated LLVM IR proposal is summarized below.

-------LLVM Intrinsic Functions-------

Essentially, the LLVM OperandBundles, the LLVM token type, and three new
LLVM directive intrinsics form the foundation of the proposed extension
mechanism.

The three newly introduced LLVM intrinsic functions are the following:

token @llvm.directive.region.entry()[]
token @llvm.directive.region.entry()[]
i1 @llvm.directive.marker()[]

More concretely, these intrinsics are defined using the following
declarations:

// Directive and Qualifier Intrinsic Functions
def int_directive_region_entry : Intrinsic<[llvm_token_ty],[], []>;
def int_directive_region_exit : Intrinsic<[], [llvm_token_ty], []>;
def int_directive_marker : Intrinsic <[llvm_i1_ty], [], []>;

As described in Section SOUNDNESS, several correctness properties are
maintained using OperandBundles on calls to these intrinsics. In
LLVM, an OperandBundle has a tag name (a string to identify the
bundle) and an operand list consisting of zero or more operands. For
example, here are two OperandBundles:

"TagName01"(i32 *%x, f32 *%y, 7)
"AnotherTagName"()

The tag name of the first bundle is "TagName01", and it has an operand list
consisting of three operands, %x, %y, and 7. The second bundle has a tag
name "AnotherTagName" but no operands (it has an empty operand list).

The above new intrinsics allow:

  • Annotating a code region marked with directives / pragmas / explicit parallel function calls.
  • Annotating values associated with the region (or loops), that is, those values associated with directives / pragmas.
  • Providing information on LLVM IR transformations needed for the annotated code regions (or loops).
  • Introducing parallel IR constructs for (one of) a variety of different parallel IRs, e.g., Tapir or HPVM.
  • Most LLVM scalar and vector analyses and optimizations to be applied to parallel code without modifications to the passes, and without requiring parallel "tasks" to be outlined into separate, isolated functions.

These intrinsics can be used both by frontends and also by transformation
passes (e.g. automated parallelization).

The names used here are open to discussion.

--------Three Example Uses---------

Below, we show three very brief examples using three IRs: OpenMP [5],
Tapir [4] and HPVM [6]. Somewhat larger code examples are shown in the
Appendix of the accompanying Google Doc.

----Tapir IR----

; This simple Tapir loop uniformly scales each element of a vector of
; integers in parallel.
pfor.detach.lr.ph:
%wide.trip.count = zext i32 %n to i64
br label %pfor.detach

pfor.detach: ; preds = %pfor.inc, %
pfor.detach.lr.ph
%indvars.iv = phi i64 [ 0, %pfor.detach.lr.ph ], [ %indvars.iv.next,
%pfor.inc ]
detach label %pfor.body, label %pfor.inc

pfor.body: ; preds = %pfor.detach
%arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
%0 = load i32, i32* %arrayidx, align 4
%mul3 = mul nsw i32 %0, %a
store i32 %mul3, i32* %arrayidx, align 4
reattach label %pfor.inc

pfor.inc: ; preds = %pfor.body, %pfor.detach
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %pfor.cond.cleanup, label %pfor.detach

pfor.cond.cleanup: ; preds = %pfor.inc
sync label %sync.continue

---Tapir using LLVMPar intrinsics-----

; This simple parallel loop uniformly scales each element of a vector of
; integers.
pfor.detach.lr.ph: ; preds = %entry

%wide.trip.count = zext i32 %n to i64
br label %pfor.detach

pfor.detach: ; preds = %pfor.inc, %
pfor.detach.lr.ph

%indvars.iv = phi i64 [ 0, %pfor.detach.lr.ph ], [ %indvars.iv.next,

%pfor.inc ]

%c = call i1 @llvm.directive.marker()["detach_task"]
br i1 %c, label %pfor.body, label %pfor.inc

pfor.body: ; preds = %pfor.detach

%arrayidx = getelementptr inbounds i32, i32* %x, i64 %indvars.iv
%0 = load i32, i32* %arrayidx, align 4
%mul3 = mul nsw i32 %0, %a
store i32 %mul3, i32* %arrayidx, align 4
call i1 @llvm.directive.marker()["reattach_task"]
br label %pfor.inc

pfor.inc: ; preds = %pfor.body, %pfor.detach

%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
br i1 %exitcond, label %pfor.cond.cleanup, label %pfor.detach

pfor.cond.cleanup: ; preds = %pfor.inc, %entry

call i1 @llvm.directive.marker()["local_barrier"]
br label %sync.continue

Comment: If necessary, one can prevent hoisting of the getelementptr
instruction %arrayidx or the load instruction %0 in the above example
using the intrinsics @llvm.directive.region.entry,
@llvm.directive.region.exit, and @llvm.launder.invariant.group
intrinsics appropriately within pfor.body.

----HPVM----

; The function vector_add() performs point to point addition of incoming
; arguments, A and B, replicated at run-time across N parallel instances.
; We omit dataflow edges showing incoming/outgoing values.
;

%node = call i8* @llvm.hpvm.createNode1D(
 i8* bitcast %retStruct (i32*, i32, i32*, i32, i32*, i32) @vector_add
     to i8*,
 i32 %N)

----HPVM using LLVMPar intrinsics----

...           ; code using A, B, C, N
; The HPVM node function @vector_add is now inlined
%region = call token @llvm.directive.region.entry()[
   "HPVM_create_node"(%N),
   "dataflow_values"(i32* %A, i32 %bytesA, i32* %B, i32 %bytesB,
   i32* %C, i32 %bytesC),
   "attributes"(i32 0, i32 -1, i32 0, i32 -1, i32 1, i32 -1) ]
 ; 0 = ‘in', 1 = ‘out', 2 = ‘inout', -1 for non pointer arguments

; Loop structure corresponding to %N instances of vector_add()
%header: ...

; parallel loop with trip count %N, index variable %loop_index
%loop_index = phi i64 [ 0, %preheader ], [ %loop_index.next, %latch ]

%c = call i1 @llvm.directive.marker()["detach_task"]
br i1 %c, label %body, label %latch

%body:

; Loop index, instead of HPVM intrinsic calls to generate index
%ptrA = getelementptr i32, i32* %A, i32 %loop_index
%ptrB = getelementptr i32, i32* %B, i32 %loop_index
%ptrC = getelementptr i32, i32* %C, i32 %loop_index

%a = load i32, i32* %ptrA
%b = load i32, i32* %ptrB
%c = add i32, i32 %a, i32 %b
store i32 %c, i32* %ptrC

%ignore = call i1 @llvm.directive.marker()["reattach_task"]
br label %latch

%latch:

%loop_index.next = add nuw nsw i64 %loop_index, 1
%exitcond = icmp eq i64 %loop_index.next, %N
br i1 %exitcond, label %loop.end, label %header

%loop.end:

call void @llvm.directive.region.exit(token %region)[
"HPVM_create_node"(), "dataflow_values" () ]

...        ; code using A, B, C, N

Diff Detail

Repository
rL LLVM

Event Timeline

xtian created this revision.Apr 4 2019, 10:19 AM
xtian retitled this revision from [RFC] R extensions for annotating directive region entry and exit with a set of OpBundle name definitions for OpenMP to [RFC] IR extensions for annotating directive region entry/exit with a set of OpBundle name definitions for OpenMP .Apr 4 2019, 10:21 AM

This looks partial

  • no langref changes
  • INTEL_COLLAB is not a cmake option, and is not set in config.h - how it's supposed to function? is it supposed to stay?
llvm/include/llvm/IR/CMakeLists.txt
9–11

tabulate

llvm/include/llvm/IR/Intel_Directives.td
1 ↗(On Diff #193739)

license blurb is not in line with other files.

llvm/include/llvm/IR/Intrinsics.td
943

newline

1074

I don't think there is anything in llvm that uses these?
I wouldn't think a new stuff should be added that is already deprecated from get-go?

xtian added a comment.Apr 4 2019, 2:29 PM

Note: We will do more clean up to remove "INTEL" markers, and file header LIC information. Also, we will have an full open source github repo by end of April, people can build and test OpenMP 4.5 and some 5.0 code. The small patches are for getting feedback and up-streaming to LLVM main trunk. Thanks.

jdenny added a subscriber: jdenny.Apr 5 2019, 7:39 AM
smateo added a subscriber: smateo.Apr 9 2019, 5:42 AM
xtian updated this revision to Diff 194438.Apr 9 2019, 7:27 PM
xtian updated this revision to Diff 194440.Apr 9 2019, 7:47 PM
xtian added inline comments.
llvm/include/llvm/IR/CMakeLists.txt
9–11

The top level cMakeLists.txt is not included in this patch. At top-level cMakeLists.txt, we have below.

SET(INTEL_COLLAB ON)
SET(CMAKE_CXX_FLAGS "$(CMAKE_CXX_FLAGS) -DINREL_COLLAB=1")

aheejin removed a subscriber: aheejin.Apr 10 2019, 10:30 AM
xtian updated this revision to Diff 194587.Apr 10 2019, 2:10 PM

Done code clean-up: changes include Lic header and markers.