Skip to content

Commit 4c4baf5

Browse files
committedMay 29, 2017
[Docs] Add VectorizationPlan to docs/Proposals.
Following the request made in https://reviews.llvm.org/D32871, the general documentation of the Vectorization Plan is hereby placed under docs/Proposals. llvm-svn: 304161
1 parent 8fa639e commit 4c4baf5

File tree

3 files changed

+196
-0
lines changed

3 files changed

+196
-0
lines changed
 
+182
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
==================
2+
Vectorization Plan
3+
==================
4+
5+
.. contents::
6+
:local:
7+
8+
Abstract
9+
========
10+
The vectorization transformation can be rather complicated, involving several
11+
potential alternatives, especially for outer-loops [1]_ but also possibly for
12+
innermost loops. These alternatives may have significant performance impact,
13+
both positive and negative. A cost model is therefore employed to identify the
14+
best alternative, including the alternative of avoiding any transformation
15+
altogether.
16+
17+
The Vectorization Plan is an explicit model for describing vectorization
18+
candidates. It serves for both optimizing candidates including estimating their
19+
cost reliably, and for performing their final translation into IR. This
20+
facilitates dealing with multiple vectorization candidates.
21+
22+
High-level Design
23+
=================
24+
25+
Vectorization Workflow
26+
----------------------
27+
VPlan-based vectorization involves three major steps, taking a "scenario-based
28+
approach" to vectorization planning:
29+
30+
1. Legal Step: check if a loop can be legally vectorized; encode contraints and
31+
artifacts if so.
32+
2. Plan Step:
33+
34+
a. Build initial VPlans following the constraints and decisions taken by
35+
Legal Step 1, and compute their cost.
36+
b. Apply optimizations to the VPlans, possibly forking additional VPlans.
37+
Prune sub-optimal VPlans having relatively high cost.
38+
3. Execute Step: materialize the best VPlan. Note that this is the only step
39+
that modifies the IR.
40+
41+
Design Guidelines
42+
-----------------
43+
In what follows, the term "input IR" refers to code that is fed into the
44+
vectorizer whereas the term "output IR" refers to code that is generated by the
45+
vectorizer. The output IR contains code that has been vectorized or "widened"
46+
according to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed
47+
according to an Unroll Factor (UF).
48+
The design of VPlan follows several high-level guidelines:
49+
50+
1. Analysis-like: building and manipulating VPlans must not modify the input IR.
51+
In particular, if the best option is not to vectorize at all, the
52+
vectorization process terminates before reaching Step 3, and compilation
53+
should proceed as if VPlans had not been built.
54+
55+
2. Align Cost & Execute: each VPlan must support both estimating the cost and
56+
generating the output IR code, such that the cost estimation evaluates the
57+
to-be-generated code reliably.
58+
59+
3. Support vectorizing additional constructs:
60+
61+
a. Outer-loop vectorization. In particular, VPlan must be able to model the
62+
control-flow of the output IR which may include multiple basic-blocks and
63+
nested loops.
64+
b. SLP vectorization.
65+
c. Combinations of the above, including nested vectorization: vectorizing
66+
both an inner loop and an outer-loop at the same time (each with its own
67+
VF and UF), mixed vectorization: vectorizing a loop with SLP patterns
68+
inside [4]_, (re)vectorizing input IR containing vector code.
69+
d. Function vectorization [2]_.
70+
71+
4. Support multiple candidates efficiently. In particular, similar candidates
72+
related to a range of possible VF's and UF's must be represented efficiently.
73+
Potential versioning needs to be supported efficiently.
74+
75+
5. Support vectorizing idioms, such as interleaved groups of strided loads or
76+
stores. This is achieved by modeling a sequence of output instructions using
77+
a "Recipe", which is responsible for computing its cost and generating its
78+
code.
79+
80+
6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
81+
such regions may need to be, for example, predicated and linearized, or
82+
replicated VF*UF times to handle scalarized and predicated instructions.
83+
Innerloops are also modelled as SESE regions.
84+
85+
Low-level Design
86+
================
87+
The low-level design of VPlan comprises of the following classes.
88+
89+
:LoopVectorizationPlanner:
90+
A LoopVectorizationPlanner is designed to handle the vectorization of a loop
91+
or a loop nest. It can construct, optimize and discard one or more VPlans,
92+
each VPlan modelling a distinct way to vectorize the loop or the loop nest.
93+
Once the best VPlan is determined, including the best VF and UF, this VPlan
94+
drives the generation of output IR.
95+
96+
:VPlan:
97+
A model of a vectorized candidate for a given input IR loop or loop nest. This
98+
candidate is represented using a Hierarchical CFG. VPlan supports estimating
99+
the cost and driving the generation of the output IR code it represents.
100+
101+
:Hierarchical CFG:
102+
A control-flow graph whose nodes are basic-blocks or Hierarchical CFG's. The
103+
Hierarchical CFG data structure is similar to the Tile Tree [5]_, where
104+
cross-Tile edges are lifted to connect Tiles instead of the original
105+
basic-blocks as in Sharir [6]_, promoting the Tile encapsulation. The terms
106+
Region and Block are used rather than Tile [5]_ to avoid confusion with loop
107+
tiling.
108+
109+
:VPBlockBase:
110+
The building block of the Hierarchical CFG. A pure-virtual base-class of
111+
VPBasicBlock and VPRegionBlock, see below. VPBlockBase models the hierarchical
112+
control-flow relations with other VPBlocks. Note that in contrast to the IR
113+
BasicBlock, a VPBlockBase models its control-flow successors and predecessors
114+
directly, rather than through a Terminator branch or through predecessor
115+
branches that "use" the VPBlockBase.
116+
117+
:VPBasicBlock:
118+
VPBasicBlock is a subclass of VPBlockBase, and serves as the leaves of the
119+
Hierarchical CFG. It represents a sequence of output IR instructions that will
120+
appear consecutively in an output IR basic-block. The instructions of this
121+
basic-block originate from one or more VPBasicBlocks. VPBasicBlock holds a
122+
sequence of zero or more VPRecipes that model the cost and generation of the
123+
output IR instructions.
124+
125+
:VPRegionBlock:
126+
VPRegionBlock is a subclass of VPBlockBase. It models a collection of
127+
VPBasicBlocks and VPRegionBlocks which form a SESE subgraph of the output IR
128+
CFG. A VPRegionBlock may indicate that its contents are to be replicated a
129+
constant number of times when output IR is generated, effectively representing
130+
a loop with constant trip-count that will be completely unrolled. This is used
131+
to support scalarized and predicated instructions with a single model for
132+
multiple candidate VF's and UF's.
133+
134+
:VPRecipeBase:
135+
A pure-virtual base class modeling a sequence of one or more output IR
136+
instructions, possibly based on one or more input IR instructions. These
137+
input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe
138+
may specify how its ingredients are to be transformed to produce the output IR
139+
instructions; e.g., cloned once, replicated multiple times or widened
140+
according to selected VF.
141+
142+
:VPTransformState:
143+
Stores information used for generating output IR, passed from
144+
LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
145+
additional information down to VPBlocks and VPRecipes.
146+
147+
Related LLVM components
148+
-----------------------
149+
1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
150+
tree, where TSLP [3]_ adds Plan Step 2.b.
151+
152+
2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
153+
Polly [7]_.
154+
155+
References
156+
----------
157+
.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit
158+
Nuzman and Ayal Zaks, PACT 2008.
159+
160+
.. [2] "Proposal for function vectorization and loop vectorization with function
161+
calls", Xinmin Tian, [`cfe-dev
162+
<http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>`_].,
163+
March 2, 2016.
164+
See also `review <https://reviews.llvm.org/D22792>`_.
165+
166+
.. [3] "Throttling Automatic Vectorization: When Less is More", Vasileios
167+
Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015.
168+
169+
.. [4] "Exploiting mixed SIMD parallelism by reducing data reorganization
170+
overhead", Hao Zhou and Jingling Xue, CGO 2016.
171+
172+
.. [5] "Register Allocation via Hierarchical Graph Coloring", David Callahan and
173+
Brian Koblenz, PLDI 1991
174+
175+
.. [6] "Structural analysis: A new approach to flow analysis in optimizing
176+
compilers", M. Sharir, Journal of Computer Languages, Jan. 1980
177+
178+
.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma
179+
thesis, 2011.
180+
181+
.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
182+
European LLVM Developers' Meeting 2017.

‎llvm/docs/Vectorizers.rst

+11
Original file line numberDiff line numberDiff line change
@@ -382,6 +382,17 @@ And Linpack-pc with the same configuration. Result is Mflops, higher is better.
382382

383383
.. image:: linpack-pc.png
384384

385+
Ongoing Development Directions
386+
------------------------------
387+
388+
.. toctree::
389+
:hidden:
390+
391+
Proposals/VectorizationPlan
392+
393+
:doc:`Proposals/VectorizationPlan`
394+
Modeling the process and upgrading the infrastructure of LLVM's Loop Vectorizer.
395+
385396
.. _slp-vectorizer:
386397

387398
The SLP Vectorizer

‎llvm/docs/index.rst

+3
Original file line numberDiff line numberDiff line change
@@ -528,6 +528,7 @@ can be better.
528528

529529
CodeOfConduct
530530
Proposals/GitHubMove
531+
Proposals/VectorizationPlan
531532

532533
:doc:`CodeOfConduct`
533534
Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
@@ -536,6 +537,8 @@ can be better.
536537
:doc:`Proposals/GitHubMove`
537538
Proposal to move from SVN/Git to GitHub.
538539

540+
:doc:`Proposals/VectorizationPlan`
541+
Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.
539542

540543
Indices and tables
541544
==================

0 commit comments

Comments
 (0)
Please sign in to comment.