This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Vectorize/
-
Transforms/
-
Vectorize/
-
CMakeLists.txt
3/7
LoopVectorizationPlanner.h
2/9
LoopVectorize.cpp
2/8
VPlan.h
2/3
VPlanHCFGBuilder.h
4/9
VPlanHCFGBuilder.cpp
1/1
VPlanValue.h
5
VPlanVerifier.h
3/7
VPlanVerifier.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
2/3
vplan_hcfg_stress_test.ll

Differential D44338

[LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.
ClosedPublic

Authored by dcaballe on Mar 9 2018, 4:42 PM.

Download Raw Diff

Details

Reviewers

rengolin
fhahn
mkuper
mssimpso
a.elovikov
hfinkel
aprantl

Commits

rG168d04d54442: [VPlan] Reland r332654 and silence unused func warning
rGf58ad3129c87: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.
rL332860: [VPlan] Reland r332654 and silence unused func warning
rL332654: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.

Summary

Context

This is the patch #3 of the Patch Series #1 to introduce outer loop vectorization support in LV using the VPlan infrastructure.

RFC: http://lists.llvm.org/pipermail/llvm-dev/2017-December/119523.html
Patch #1: D40874
Patch #2: D42447

Patch Series #1. Sub-patch #3.

This patch is expected to be NFC for the current inner loop vectorization path. It introduces the basic algorithm to build the VPlan plain CFG (single-level CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization path. It includes:

VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.

The VPlan H-CFG is the basic infrastructure on which future VPlan-to-VPlan transformations will be implemented. VPInstruction will be the main instruction-level representation in the VPlan-native vectorization path. In this patch, we use VPInstruction to represent instructions within a VPBasicBlock, and VPValues to represent other entities that are not a VPInstruction but currently don't have a specific representation in VPlan (e.g., constants, definitions that are external to the VPlan CFG, etc.). This representation will be refined in the future by introducing more VPValue subclasses.

Testing

We don't have code generation capabilities to generate vector code in the VPlan-native path yet. Therefore, we cannot introduce LIT tests to check the correctness of the vector code generated. However, we introduce the flag -vplan-build-stress-test, which enables a stress testing mode that builds the VPlan H-CFG for any supported loop nest from the outermost loop. We also introduce the flag -vplan-verify-hcfg, which checks the consistency of a VPlan H-CFG. Both flags can be used together to thoroughly check the stability of the H-CFG construction algorithm and the consistency of the H-CFGs built. We used this approach on a wide variety of benchmarks suites. We plan to introduce LIT tests once code generation for the VPlan-native path is in place.

Files in 'Vectorize' Dir:

In this patch, we introduce new files for the VPlanHCFGBuilder class (VPlanHCFGBuilder.h/.cpp) and the VPlanVerifier class (VPlanVerifier.h/.cpp). Please, note that header files are private.

We would appreciate feedback on this regard. We understand that having too many small files is not a good idea, but the opposite extreme is not good either. We expect VPlanHCFGBuilder to grow when we introduce simplification/transformations necessary to build a H-CFG and support a wider range of outer loops (see RFC). VPlanVerifier will grow when we introduce verification utilities for regions and VPInstructions. Having separate files also allows having independent debug filters (DEBUG_TYPE) per individual component, which we think it's very convenient.

Thanks,
Diego

Diff Detail

Event Timeline

dcaballe created this revision.Mar 9 2018, 4:42 PM

Herald added subscribers: llvm-commits, bollu, mgorny. · View Herald TranscriptMar 9 2018, 4:42 PM

dcaballe added inline comments.Mar 9 2018, 5:10 PM

lib/Transforms/Vectorize/VPlan.h
457–458	This is the rationale behind the changes below: I wouldn't like VPBlockBase to end up with a large list of interfaces to set/insert successors/predecessors, all of them doing almost the same with very subtle differences. For that reason, I introduced the class VPBlockUtility to keep there all the VPBlockBase manipulation interfaces and keep VPBlockBase class cleaner. Of course, that doesn't mean that we want to add unnecessary utilities to VPBlockUtility class. In VPBlockBase class, I decided to just keep the very very basic interfaces, which can be used directly or can be a building block of a more complex utility in VPBlockUtility. In this regard, I'm simplifying the logic of setOneSuccessor and setTwoSuccessors and replacing their existing calls with a more generic utility function in VPBlockUtility. The previous implementation was very ad-hoc for the patch where they were introduced and I couldn't reuse them as is. Of course, I'm open to other suggestions.

dcaballe mentioned this in D42447: [LV][VPlan] Detect outer loops for explicit vectorization..Mar 9 2018, 5:27 PM

tschuett added a subscriber: tschuett.Mar 12 2018, 12:55 AM

sguggill added a subscriber: sguggill.Mar 14 2018, 11:34 AM

a.elovikov added inline comments.Mar 16 2018, 5:45 AM

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
101	For outer loop vectorization in int s = 0; for (int i = 0; i < N; ++i) { for (int j = 0; j < M; ++j) { s += x[i] * y[j]; } } We need a broadcast y[j] -> {y[j], y[j], y[j], y[j]} but this will generate a WIDEN recipe for the load. Is that OK? If so, can we document it somewhere?

hsaito added a subscriber: hsaito.Mar 16 2018, 1:01 PM

hsaito added inline comments.

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
101	Reference: LoopVectorizationPlanner::tryToWidenMemory(). VPWidenMemoryRecipe can handle CM_GatherScatter and uniform can be thought of as a special form of gather/scatter. From that perspective, it is okay. A vector load/store is deemed gather/scatter until analysis improves it to a better access type. From that perspective, using "generic gather/scatter" during the initial VPlan construction phase makes perfect sense. If we are building a single VPlan CFG for inner and/or outer loop vectorization (and that's something we should be doing if HCFG look identical), we can't encode "memory access kind" information within HCFG. So, keeping it in "generic gather/scatter" at HCFG level is the right thing to do for the long term also. In other words, we need a storage outside of HCFG to house "uniform/unit-stride/interleave/..." information for the load/store.

fhahn added inline comments.Mar 19 2018, 3:17 AM

lib/Transforms/Vectorize/VPlanVerifier.h
13	Is there a place where those invariants are mentioned? It may be worth briefly stating here what checks are done by the verifier. ATM it looks like it checks the links between the blocks and regions of the VPlan.

I agree with Hideki. In addition, please, note that in this patch we are trying to use the minimal number of recipes to make the transition to VPInstructions easier. Modeling a broadcast as a gather with VPWidenMemoryRecipe is a good trade-off in this regard, which will be improved in the future.

dcaballe added inline comments.Mar 19 2018, 9:51 AM

lib/Transforms/Vectorize/VPlanVerifier.h

The invariants that we are currently checking are described in the documentation of 'verifyHierarchicalCFG' (just below). I think we could move them here so that we can reference them from different utility functions. For example:

/// This file declares the class VPlanVerifier, which contains utility functions
/// to check the consistency of a VPlan. This includes the following kinds of
/// invariants:
///
/// 1. Region/Block invariants:
///   - Region's entry/exit block must have no predecessors/successors,
///     respectively.
///   - Block's parent must be the region immediately containing the block.
///   - Linked blocks must have a bi-directional link (successor/predecessor).
///   - All predecessors/successors of a block must belong to the same region.
///   - Blocks must have no duplicated successor/predecessor.

What do you think?

fhahn added inline comments.Mar 19 2018, 10:52 AM

lib/Transforms/Vectorize/VPlanVerifier.h
13	Thanks, sounds good to me.

fhahn added inline comments.Mar 19 2018, 10:52 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7725	Could we here just return NoVectorization and get rid of the additional check in LoopVectorizePass::processLoop? Also, could we move the setting of UserVF in `plan` too? Otherwise it seems harder to keep track of what's going on and we set UserVF even for inner loops. Also, I think ideally we would only bail out if the outer loop is not supported, but achieving that seems more trouble than it's worth.

dcaballe added inline comments.Mar 20 2018, 8:48 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7725	Could we here just return NoVectorization and get rid of the additional check in LoopVectorizePass::processLoop? Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after `plan` that we must skip in stress testing mode. Thanks! Also, could we move the setting of UserVF in plan too? Otherwise it seems harder to keep track of what's going on and we set UserVF even for inner loops. Could you please elaborate a bit more? I'm not sure I understand what you mean. Also, I think ideally we would only bail out if the outer loop is not supported, but achieving that seems more trouble than it's worth. We can think about it in the future. We would need legality analysis for outer loops if we want to generate code.

fhahn added inline comments.Mar 20 2018, 11:01 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7725	Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after plan that we must skip in stress testing mode. Thanks! If we return NoVectorization, I would assume we would not use the generated plans after `plan`, as we decided to skip vectorization? Could you please elaborate a bit more? I'm not sure I understand what you mean. I meant moving the code to set UserVF = 4 if VPlanBuildStressTest into this function. That would reduce the number of functions where we have to handle VPlanBuildStressTest, which IMO makes it easier to see what's going on.

dcaballe added inline comments.Mar 20 2018, 11:43 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7725	If we return NoVectorization, I would assume we would not use the generated plans after plan, as we decided to skip vectorization? Yes, it was just a thought. For example, stress testing will be skipping some checks (only `isExplicitVecOuterLoop`, for now). If we have code after `plan` expecting a loop compliant with those checks, we could have problems. But, again, I agree on the change. We can deal with that if it happens. I meant moving the code to set UserVF = 4 if VPlanBuildStressTest into this function. That would reduce the number of functions where we have to handle VPlanBuildStressTest, which IMO makes it easier to see what's going on. Thanks, got it! I wonder if this would be problematic. If we moved this change into this function and we used UserVF after `plan` (likely to happen, look at uses of line 8794), we would be using inconsistent UserVF values. Maybe it's better to keep it here?

fhahn added inline comments.Mar 20 2018, 3:59 PM

lib/Transforms/Vectorize/LoopVectorize.cpp
7725	But as it is currently, the UserVF set in the outer loop code path limited to that block. And I cannot think of any case where using UserVF set for VPlanBuildStressTest will useful anywhere else for now, especially the inner loop code path. My understanding was that we have to bail after planning anyways for VPlanBuildStressTest. At least as everything is now.

dcaballe added inline comments.Mar 21 2018, 11:51 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
7725	Ok, fair enough. I'll move it into the function, at least, for now. Thanks!

Addressing Florian's comments.
Thanks!

fhahn added inline comments.Mar 27 2018, 9:42 AM

lib/Transforms/Vectorize/LoopVectorize.cpp
275	Please mention after what stage we bail out
2369–2370	Please update the comment to mention VPlan stress testing.
lib/Transforms/Vectorize/VPlan.h
974	nit: I think we should match the indent with VPBlockBase?
lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
28	Do we need SE here? I could not find any uses.
lib/Transforms/Vectorize/VPlanHCFGBuilder.h
31	Could be moved to .cpp?
35	Do we need SE here? I could not find any uses.
lib/Transforms/Vectorize/VPlanVerifier.cpp
45	Iterating over the blocks in a region seems a generic thing and it would probably be worth adding it to VPRegionBlock. At least VPRegionBlock::dumpRegion seems to be using a similar logic.
92	I think some compilers will complain about Entry and Exit being unused when building without assertions
118	no brackets needed
lib/Transforms/Vectorize/VPlanVerifier.h
37	I think this TODO does not really add much info.
test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll
2	nit: we are not checking the generated code, so I think we could drop `-S` here and below.

Addressing Florian's comments.

lib/Transforms/Vectorize/VPlan.h
974	Sorry, thanks!
lib/Transforms/Vectorize/VPlanHCFGBuilder.h
31	I can remove this for now.
lib/Transforms/Vectorize/VPlanVerifier.cpp
45	Good point. However, I don't think this operation is that generic. Sometimes we will need RPO, some others DFS, some others RPO or DFS but filtering some blocks... But I agree, at least we could start with a blocksDFS() range. Since this spans beyond this patch, could we address it in a separate patch?
92	Good catch! I showed some warning with other similar cases but not here. Thanks!
lib/Transforms/Vectorize/VPlanVerifier.h
37	OK. Let me remove it. I was just trying to justify why the 'verifyHierarchicalCFG' is not static. We will have class members with analysis information that will be used by this interface.
test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll
2	Right, thanks!

rengolin added inline comments.Apr 3 2018, 12:58 PM

lib/Transforms/Vectorize/VPlan.h
457–458	I like this idea and I like the methods, as they give a better impression as to what is truly happening behind the scenes. I agree we shouldn't bloat this class, but it's a good buffer class to have during the prototype phase of the outer loop implementation, so we can gauge the BPBasicBlock usability.

dcaballe added inline comments.Apr 5 2018, 11:14 AM

lib/Transforms/Vectorize/VPlan.h
457–458	Thanks, Renato! I agree we shouldn't bloat this class, but it's a good buffer class to have during the prototype phase of the outer loop implementation, so we can gauge the BPBasicBlock usability. Agreed. Any particular suggestion on moving some of the current utilities in VPBlockUtils back to VPBlockBase? I think the current ones are OK there but let me know if you have any comments in that regard.

egarcia added a subscriber: egarcia.Apr 6 2018, 9:25 AM

rengolin added inline comments.Apr 9 2018, 7:31 AM

lib/Transforms/Vectorize/VPlan.h
457–458	I think we can start with these, and move up and down as needed later.

After discussion with reviewers, working on replacing initial simple recipes with VPInstructions to avoid unnecessary steps towards the final instruction level representation.

Sorry for the delay!

In the new diff I'm replacing recipes with VPInstructions. I tried to keep this patch small and functional. I'm using VPInstruction class to represent instructions within a VPBasicBlock, and VPValue class to represent other entities that are not a VPInstruction but currently don't have a specific representation in VPlan (e.g., constants, definitions that are external to the VPlan CFG, etc.). The representation will be refined in subsequent patches by introducing more VPValue subclasses.

Thanks Diego! I left some small comments inline, I think overall it looks quite good now. One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
73	There's been a few commits removing \brief from the codebase (e.g. D46290) recently, could you remove \brief from this patch, it should not be needed as we use AUTOBRIEF.
150	What does this comment mean?
lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
134	Could we use a similar (simpler) logic to what @hsaito used in D46302 here? Like Instr->getParent() strictly dominates the pre header?
164	nit: IRVal as it is a value?

rogfer01 added a subscriber: rogfer01.May 9 2018, 9:26 PM

Thanks, Florian! Some comments below.

One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

I'm not aware of any particular proposal for debug info in VPlan at this point but I will check with my team. Currently, DbgInfoIntrinsic would be represented as a regular VPInstruction. We could think about if a specific representation for this is necessary in VPlan.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
73	Thanks! I had no idea. I don't usually use \brief but most of this documentation is coming from IRBuilder.
150	`AssertingVH` is used in IRBuilder but we are not using it here. However, we may want to think about using something similar but I haven't look at it at all. We could do it as a separate patch if we are interested. I don't think we need this TODO. I'll remove it.
lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
134	Good point! I think the scenario is a bit different and there would be some corner cases that wouldn't work if we do `DT->properlyDominates(Instr->getParent(), PH)`. For example, any definition in the loop exit with a use within the HCFG wouldn't work. Imagine something like: ph.inner: %0 = phi %1, %t loop.body: ... loop.exit: %t = %a = ... = %a ... If I'm not mistaken, uses of '%t' and %a would be classified as external definitions and they are not. Make sense? Any other idea?

Addressing Florian's comments.

Thanks!

Sorry. Now, addressing Florian's comments.

In D44338#1093996, @dcaballe wrote:

Thanks, Florian! Some comments below.

One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

I'm not aware of any particular proposal for debug info in VPlan at this point but I will check with my team. Currently, DbgInfoIntrinsic would be represented as a regular VPInstruction. We could think about if a specific representation for this is necessary in VPlan.

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

In D44338#1094030, @fhahn wrote:

In D44338#1093996, @dcaballe wrote:

Thanks, Florian! Some comments below.

One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

I'm not aware of any particular proposal for debug info in VPlan at this point but I will check with my team. Currently, DbgInfoIntrinsic would be represented as a regular VPInstruction. We could think about if a specific representation for this is necessary in VPlan.

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

In D44338#1094601, @aprantl wrote:

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

Aside from widening, what vectorizer does is not much different from expression tree rewriting (in LLVM, 3-address code equivalent of that). We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same. What could be tricky is handling of interleaved memory access optimization, e.g., where multiple strided vector memrefs are converted into unit-stride vector memrefs and shuffles. Even then, I'd hope there are some existing memory access optimizations doing something comparable enough and learn from it.

In D44338#1094746, @hsaito wrote:

In D44338#1094601, @aprantl wrote:

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Assuming widening means putting a smaller value into a larger register at offset 0, then it is safe to just point the dbg.value to the new larger register. If the offset is nonzero, you'll need to generate a DW_OP_shr DIExpression to shift the value into place in the debugger.

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

Aside from widening, what vectorizer does is not much different from expression tree rewriting (in LLVM, 3-address code equivalent of that). We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same. What could be tricky is handling of interleaved memory access optimization, e.g., where multiple strided vector memrefs are converted into unit-stride vector memrefs and shuffles. Even then, I'd hope there are some existing memory access optimizations doing something comparable enough and learn from it.

Great! Let me know if you come across any concrete questions.

In D44338#1096041, @aprantl wrote:

In D44338#1094746, @hsaito wrote:

In D44338#1094601, @aprantl wrote:

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Assuming widening means putting a smaller value into a larger register at offset 0, then it is safe to just point the dbg.value to the new larger register. If the offset is nonzero, you'll need to generate a DW_OP_shr DIExpression to shift the value into place in the debugger.

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

Having said that, this issue exists from day 1 of vectorization. Nothing new to VPlan based vectorization. If there is a solution, we should use it. Else, we just start from letting debugger show the lowest element only, and try improving from there.

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

In D44338#1096518, @aprantl wrote:

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

That matches my understanding. My preference is artificial $X_vectorized approach, but I'm also fine using "only the first element" as a stepping stone to eventually get to $X_vectorized approach.

It goes without saying ---- but if we can work together to extend DWARF to represent multiple values, that would be the best long term outcome.

aprantl added a subscriber: debug-info.May 11 2018, 2:37 PM

fhahn added a child revision: D46827: [VPlan] Add VPInstruction to VPRecipe transformation..May 14 2018, 7:27 AM

Together with D46827, I get a lot of leaks with the address sanitizer from plans constructed by VPlanHCFGBuilder. I'll take a look and come back once I know more.

Thanks a lot for the feedback regarding debug info! This is an interesting topic.

Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same.

Agreed. We have to start thinking about it but we need to have a better understanding of what is needed and where we want to go. In this regard, I would have the following comment/questions:

Currently, a DbgInfoIntrinsic is represented as a standard VPInstruction in VPlan where its metadata arguments are external definitions. We'd need to decide if: a) the current representation for debug intrinsics in VPlan is enough or we want a more native representation, such as specific VPInstruction opcodes for them; b) we need a more proper representation for metadata, particularly if we have to create *new* metadata for new VPInstructions, as Florian mentioned. However, the answer will depend on #2.
Do we really have to model accurate debug information on the VPlan representation? Debug information won't impact the vectorization decisions so maybe we shouldn't pay the cost of modeling it for all the VPlan candidates. I wonder if it would be possible to generate the expected output debug information IR "on the fly" during VPlan code generation, once the best VPlan has been chosen. WDYT?

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

What is existing inner loop vectorizer and SLP vectorizer doing in this regard? That could be our starting point.

In any case, maybe we should move this discussion outside of this review to llvm-dev so that we get feedback from a broader audience.

Thanks,
Diego

In D44338#1097949, @fhahn wrote:

Together with D46827, I get a lot of leaks with the address sanitizer from plans constructed by VPlanHCFGBuilder. I'll take a look and come back once I know more.

Thanks, Florian. Please, let me know what you find. I'm running some experiments with this patch alone but I haven't found anything so far.

In D44338#1098137, @dcaballe wrote:

In D44338#1097949, @fhahn wrote:

Together with D46827, I get a lot of leaks with the address sanitizer from plans constructed by VPlanHCFGBuilder. I'll take a look and come back once I know more.

Thanks, Florian. Please, let me know what you find. I'm running some experiments with this patch alone but I haven't found anything so far.

Looks like the VPValues for external definitions were not deleted properly. With D46833, the sanitizers are happy.

Looks like the VPValues for external definitions were not deleted properly. With D46833, the sanitizers are happy.

Oh, sorry. I missed that when stripping out the code for this patch. Do you want me to include it in this patch?

In D44338#1098152, @dcaballe wrote:

Looks like the VPValues for external definitions were not deleted properly. With D46833, the sanitizers are happy.

Oh, sorry. I missed that when stripping out the code for this patch. Do you want me to include it in this patch?

Yep I think it would be good to include this in the patch.

dcaballe mentioned this in D46826: [VPlan] Add VPlan based sinkInstructions utility..May 14 2018, 10:03 PM

Added missing delete for external definitions.

Thanks Diego, LGTM. I've left a few minor nits and I am not entirely sure about the DEBUG_TYPE. Please wait with committing a bit, in case anybody else has additional comments/thoughts.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
81	nit: no braces around returned expression.
150	nit: VPBasicBlock *Block?
lib/Transforms/Vectorize/LoopVectorize.cpp
7736	"means no vectorization" ?
lib/Transforms/Vectorize/VPlan.h
1042	nit: 32 seems quite large
lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
102	nit: auto *VPPhi
134	Ah yes, we would have to account for instructions in the preheader and exit block separately. Then it would probably not simplify things much.
238	nit: space after .
lib/Transforms/Vectorize/VPlanValue.h
48	Value *UnderlyingVal?
lib/Transforms/Vectorize/VPlanVerifier.cpp
20	I am not entirely sure how this will interact with `-debug-only`. IIUC if we do not use loop-vectorize here, those messages will be excluded from `-debug-only=loop-vectorize`. IMO it is convenient to get the complete picture with `-debug-only=loop-vectorize`.
test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll
53	nit: those attributes are unnecessary I think

This revision is now accepted and ready to land.May 15 2018, 8:04 AM

Thanks for spending time on this review, Florian! I addressed all your comments. I'll wait for one day or so before committing.
Regarding the DEBUG_TYPE, I think your suggestion makes more sense. I think it's better to have the complete picture.

Thanks,
Diego

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
150	Thanks! I'm applying formatting to all this code.
lib/Transforms/Vectorize/VPlan.h
1042	Ok, let's use 16. It would be easy to go over 8 for a double loop nest using a few memory references.
lib/Transforms/Vectorize/VPlanVerifier.cpp
20	Ok, it makes sense. I guess when you know exactly what your are looking for, having independent debug types helps you but we definitely lose the complete picture. Let me change it to loop-vectorize. Thanks!

Closed by commit rL332654: [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops. (authored by dcaballe). · Explain WhyMay 17 2018, 12:28 PM

This revision was automatically updated to reflect the committed changes.

dcaballe marked 3 inline comments as done.

Herald added a subscriber: rkruppe. · View Herald TranscriptMay 17 2018, 12:28 PM

fhahn removed a child revision: D46827: [VPlan] Add VPInstruction to VPRecipe transformation..Jun 12 2018, 8:44 AM

Revision Contents

Path

Size

lib/

Transforms/

Vectorize/

CMakeLists.txt

2 lines

LoopVectorizationPlanner.h

102 lines

50 lines

142 lines

55 lines

320 lines

35 lines

44 lines

125 lines

test/

Transforms/

LoopVectorize/

vplan_hcfg_stress_test.ll

53 lines

Diff 146067

lib/Transforms/Vectorize/CMakeLists.txt

	add_llvm_library(LLVMVectorize			add_llvm_library(LLVMVectorize
	LoadStoreVectorizer.cpp			LoadStoreVectorizer.cpp
	LoopVectorize.cpp			LoopVectorize.cpp
	SLPVectorizer.cpp			SLPVectorizer.cpp
	Vectorize.cpp			Vectorize.cpp
	VPlan.cpp			VPlan.cpp
				VPlanHCFGBuilder.cpp
				VPlanVerifier.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms			${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms

	DEPENDS			DEPENDS
	intrinsics_gen			intrinsics_gen
	)			)

lib/Transforms/Vectorize/LoopVectorizationPlanner.h

	Show All 33 Lines

	/// VPlan-based builder utility analogous to IRBuilder.			/// VPlan-based builder utility analogous to IRBuilder.
	class VPBuilder {			class VPBuilder {
	private:			private:
	VPBasicBlock *BB = nullptr;			VPBasicBlock *BB = nullptr;
	VPBasicBlock::iterator InsertPt = VPBasicBlock::iterator();			VPBasicBlock::iterator InsertPt = VPBasicBlock::iterator();

	VPInstruction *createInstruction(unsigned Opcode,			VPInstruction *createInstruction(unsigned Opcode,
	std::initializer_list<VPValue *> Operands) {			ArrayRef<VPValue *> Operands) {
	VPInstruction *Instr = new VPInstruction(Opcode, Operands);			VPInstruction *Instr = new VPInstruction(Opcode, Operands);
				if (BB)
	BB->insert(Instr, InsertPt);			BB->insert(Instr, InsertPt);
	return Instr;			return Instr;
	}			}

				VPInstruction *createInstruction(unsigned Opcode,
				std::initializer_list<VPValue *> Operands) {
				return createInstruction(Opcode, ArrayRef<VPValue *>(Operands));
				}

	public:			public:
	VPBuilder() {}			VPBuilder() {}

	/// \brief This specifies that created VPInstructions should be appended to			/// Clear the insertion point: created instructions will not be inserted into
	/// the end of the specified block.			/// a block.
				void clearInsertionPoint() {
				BB = nullptr;
				InsertPt = VPBasicBlock::iterator();
				}

				VPBasicBlock *getInsertBlock() const { return BB; }
				VPBasicBlock::iterator getInsertPoint() const { return InsertPt; }

				/// InsertPoint - A saved insertion point.
				class VPInsertPoint {
				VPBasicBlock *Block = nullptr;
				VPBasicBlock::iterator Point;

				public:
				/// Creates a new insertion point which doesn't point to anything.
				fhahnUnsubmitted Done Reply Inline Actions There's been a few commits removing \brief from the codebase (e.g. D46290) recently, could you remove \brief from this patch, it should not be needed as we use AUTOBRIEF. fhahn: There's been a few commits removing \brief from the codebase (e.g. D46290) recently, could you…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Thanks! I had no idea. I don't usually use \brief but most of this documentation is coming from IRBuilder. dcaballe: Thanks! I had no idea. I don't usually use \brief but most of this documentation is coming from…
				VPInsertPoint() = default;

				/// Creates a new insertion point at the given location.
				VPInsertPoint(VPBasicBlock *InsertBlock, VPBasicBlock::iterator InsertPoint)
				: Block(InsertBlock), Point(InsertPoint) {}

				/// Returns true if this insert point is set.
				bool isSet() const { return (Block != nullptr); }
				fhahnUnsubmitted Done Reply Inline Actions nit: no braces around returned expression. fhahn: nit: no braces around returned expression.

				VPBasicBlock *getBlock() const { return Block; }
				VPBasicBlock::iterator getPoint() const { return Point; }
				};

				/// Sets the current insert point to a previously-saved location.
				void restoreIP(VPInsertPoint IP) {
				if (IP.isSet())
				setInsertPoint(IP.getBlock(), IP.getPoint());
				else
				clearInsertionPoint();
				}

				/// This specifies that created VPInstructions should be appended to the end
				/// of the specified block.
	void setInsertPoint(VPBasicBlock *TheBB) {			void setInsertPoint(VPBasicBlock *TheBB) {
	assert(TheBB && "Attempting to set a null insert point");			assert(TheBB && "Attempting to set a null insert point");
	BB = TheBB;			BB = TheBB;
	InsertPt = BB->end();			InsertPt = BB->end();
	}			}

				/// This specifies that created instructions should be inserted at the
				/// specified point.
				void setInsertPoint(VPBasicBlock *TheBB, VPBasicBlock::iterator IP) {
				BB = TheBB;
				InsertPt = IP;
				}

				/// Insert and return the specified instruction.
				VPInstruction insert(VPInstruction I) const {
				BB->insert(I, InsertPt);
				return I;
				}

				/// Create an N-ary operation with \p Opcode, \p Operands and set \p Inst as
				/// its underlying Instruction.
				VPValue createNaryOp(unsigned Opcode, ArrayRef<VPValue > Operands,
				Instruction *Inst = nullptr) {
				VPInstruction *NewVPInst = createInstruction(Opcode, Operands);
				NewVPInst->setUnderlyingValue(Inst);
				return NewVPInst;
				}
				VPValue *createNaryOp(unsigned Opcode,
				std::initializer_list<VPValue *> Operands,
				Instruction *Inst = nullptr) {
				return createNaryOp(Opcode, ArrayRef<VPValue *>(Operands), Inst);
				}

	VPValue createNot(VPValue Operand) {			VPValue createNot(VPValue Operand) {
	return createInstruction(VPInstruction::Not, {Operand});			return createInstruction(VPInstruction::Not, {Operand});
	}			}

	VPValue createAnd(VPValue LHS, VPValue *RHS) {			VPValue createAnd(VPValue LHS, VPValue *RHS) {
	return createInstruction(Instruction::BinaryOps::And, {LHS, RHS});			return createInstruction(Instruction::BinaryOps::And, {LHS, RHS});
	}			}

	VPValue createOr(VPValue LHS, VPValue *RHS) {			VPValue createOr(VPValue LHS, VPValue *RHS) {
	return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS});			return createInstruction(Instruction::BinaryOps::Or, {LHS, RHS});
	}			}

				//===--------------------------------------------------------------------===//
				// RAII helpers.
				//===--------------------------------------------------------------------===//

				/// RAII object that stores the current insertion point and restores it when
				/// the object is destroyed.
				class InsertPointGuard {
				VPBuilder &Builder;
				VPBasicBlock* Block;
				fhahnUnsubmitted Not Done Reply Inline Actions What does this comment mean? fhahn: What does this comment mean?
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions `AssertingVH` is used in IRBuilder but we are not using it here. However, we may want to think about using something similar but I haven't look at it at all. We could do it as a separate patch if we are interested. I don't think we need this TODO. I'll remove it. dcaballe: `AssertingVH` is used in IRBuilder but we are not using it here. However, we may want to think…
				fhahnUnsubmitted Done Reply Inline Actions nit: VPBasicBlock Block? fhahn:* nit: VPBasicBlock *Block?
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Thanks! I'm applying formatting to all this code. dcaballe: Thanks! I'm applying formatting to all this code.
				VPBasicBlock::iterator Point;

				public:
				InsertPointGuard(VPBuilder &B)
				: Builder(B), Block(B.getInsertBlock()), Point(B.getInsertPoint()) {}

				InsertPointGuard(const InsertPointGuard &) = delete;
				InsertPointGuard &operator=(const InsertPointGuard &) = delete;

				~InsertPointGuard() {
				Builder.restoreIP(VPInsertPoint(Block, Point));
				}
				};
	};			};


	/// TODO: The following VectorizationFactor was pulled out of			/// TODO: The following VectorizationFactor was pulled out of
	/// LoopVectorizationCostModel class. LV also deals with			/// LoopVectorizationCostModel class. LV also deals with
	/// VectorizerParams::VectorizationFactor and VectorizationCostTy.			/// VectorizerParams::VectorizationFactor and VectorizationCostTy.
	/// We need to streamline them.			/// We need to streamline them.

	▲ Show 20 Lines • Show All 186 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
//		//
// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of		// S. Maleki, Y. Gao, M. Garzaran, T. Wong and D. Padua. An Evaluation of
// Vectorizing Compilers.		// Vectorizing Compilers.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Vectorize/LoopVectorize.h"		#include "llvm/Transforms/Vectorize/LoopVectorize.h"
#include "LoopVectorizationPlanner.h"		#include "LoopVectorizationPlanner.h"
		#include "VPlanHCFGBuilder.h"
#include "llvm/ADT/APInt.h"		#include "llvm/ADT/APInt.h"
#include "llvm/ADT/ArrayRef.h"		#include "llvm/ADT/ArrayRef.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/MapVector.h"		#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/None.h"		#include "llvm/ADT/None.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
▲ Show 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	static cl::opt<unsigned> PragmaVectorizeSCEVCheckThreshold(
cl::desc("The maximum number of SCEV checks allowed with a "		cl::desc("The maximum number of SCEV checks allowed with a "
"vectorize(enable) pragma"));		"vectorize(enable) pragma"));

static cl::opt<bool> EnableVPlanNativePath(		static cl::opt<bool> EnableVPlanNativePath(
"enable-vplan-native-path", cl::init(false), cl::Hidden,		"enable-vplan-native-path", cl::init(false), cl::Hidden,
cl::desc("Enable VPlan-native vectorization path with "		cl::desc("Enable VPlan-native vectorization path with "
"support for outer loop vectorization."));		"support for outer loop vectorization."));

		// This flag enables the stress testing of the VPlan H-CFG construction in the
		// VPlan-native vectorization path. It must be used in conjuction with
		// -enable-vplan-native-path. -vplan-verify-hcfg can also be used to enable the
		// verification of the H-CFGs built.
		static cl::opt<bool> VPlanBuildStressTest(
		"vplan-build-stress-test", cl::init(false), cl::Hidden,
		cl::desc(
		"Build VPlan for every supported loop nest in the function and bail "
		fhahnUnsubmitted Done Reply Inline Actions Please mention after what stage we bail out fhahn: Please mention after what stage we bail out
		"out right after the build (stress test the VPlan H-CFG construction "
		"in the VPlan-native vectorization path)."));

/// Create an analysis remark that explains why vectorization failed		/// Create an analysis remark that explains why vectorization failed
///		///
/// \p PassName is the name of the pass (e.g. can be AlwaysPrint). \p		/// \p PassName is the name of the pass (e.g. can be AlwaysPrint). \p
/// RemarkName is the identifier for the remark. If \p I is passed it is an		/// RemarkName is the identifier for the remark. If \p I is passed it is an
/// instruction that prevents vectorization. Otherwise \p TheLoop is used for		/// instruction that prevents vectorization. Otherwise \p TheLoop is used for
/// the location of the remark. \return the remark object that can be		/// the location of the remark. \return the remark object that can be
/// streamed to.		/// streamed to.
static OptimizationRemarkAnalysis		static OptimizationRemarkAnalysis
▲ Show 20 Lines • Show All 2,074 Lines • ▼ Show 20 Lines	static bool isExplicitVecOuterLoop(Loop *OuterLp,
}		}

return true;		return true;
}		}

static void collectSupportedLoops(Loop &L, LoopInfo *LI,		static void collectSupportedLoops(Loop &L, LoopInfo *LI,
OptimizationRemarkEmitter *ORE,		OptimizationRemarkEmitter *ORE,
SmallVectorImpl<Loop *> &V) {		SmallVectorImpl<Loop *> &V) {
// Collect inner loops and outer loops without irreducible control flow. For		// Collect inner loops and outer loops without irreducible control flow. For
// now, only collect outer loops that have explicit vectorization hints.		// now, only collect outer loops that have explicit vectorization hints. If we
		fhahnUnsubmitted Not Done Reply Inline Actions Please update the comment to mention VPlan stress testing. fhahn: Please update the comment to mention VPlan stress testing.
if (L.empty() \|\| (EnableVPlanNativePath && isExplicitVecOuterLoop(&L, ORE))) {		// are stress testing the VPlan H-CFG construction, we collect the outermost
		// loop of every loop nest.
		if (L.empty() \|\| VPlanBuildStressTest \|\|
		(EnableVPlanNativePath && isExplicitVecOuterLoop(&L, ORE))) {
LoopBlocksRPO RPOT(&L);		LoopBlocksRPO RPOT(&L);
RPOT.perform(LI);		RPOT.perform(LI);
if (!containsIrreducibleCFG<const BasicBlock >(RPOT, LI)) {		if (!containsIrreducibleCFG<const BasicBlock >(RPOT, LI)) {
V.push_back(&L);		V.push_back(&L);
// TODO: Collect inner loops inside marked outer loops in case		// TODO: Collect inner loops inside marked outer loops in case
// vectorization fails for the outer loop. Do not invoke		// vectorization fails for the outer loop. Do not invoke
// 'containsIrreducibleCFG' again for inner loops when the outer loop is		// 'containsIrreducibleCFG' again for inner loops when the outer loop is
// already known to be reducible. We can use an inherited attribute for		// already known to be reducible. We can use an inherited attribute for
▲ Show 20 Lines • Show All 5,318 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::planInVPlanNativePath(bool OptForSize,
// Width 1 means no vectorize, cost 0 means uncomputed cost.		// Width 1 means no vectorize, cost 0 means uncomputed cost.
const VectorizationFactor NoVectorization = {1U, 0U};		const VectorizationFactor NoVectorization = {1U, 0U};

// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
if (!OrigLoop->empty()) {		if (!OrigLoop->empty()) {
		// TODO: If UserVF is not provided, we set UserVF to 4 for stress testing.
		// This won't be necessary when UserVF is not required in the VPlan-native
		// path.
		if (VPlanBuildStressTest && !UserVF)
		UserVF = 4;

assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");
assert(UserVF && "Expected UserVF for outer loop vectorization.");		assert(UserVF && "Expected UserVF for outer loop vectorization.");
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
buildVPlans(UserVF, UserVF);		buildVPlans(UserVF, UserVF);

		// For VPlan build stress testing, we bail out after VPlan construction.
		if (VPlanBuildStressTest)
		return NoVectorization;

return {UserVF, 0};		return {UserVF, 0};
		fhahnUnsubmitted Not Done Reply Inline Actions Could we here just return NoVectorization and get rid of the additional check in LoopVectorizePass::processLoop? Also, could we move the setting of UserVF in `plan` too? Otherwise it seems harder to keep track of what's going on and we set UserVF even for inner loops. Also, I think ideally we would only bail out if the outer loop is not supported, but achieving that seems more trouble than it's worth. fhahn: Could we here just return NoVectorization and get rid of the additional check in…
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Could we here just return NoVectorization and get rid of the additional check in LoopVectorizePass::processLoop? Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after `plan` that we must skip in stress testing mode. Thanks! Also, could we move the setting of UserVF in plan too? Otherwise it seems harder to keep track of what's going on and we set UserVF even for inner loops. Could you please elaborate a bit more? I'm not sure I understand what you mean. Also, I think ideally we would only bail out if the outer loop is not supported, but achieving that seems more trouble than it's worth. We can think about it in the future. We would need legality analysis for outer loops if we want to generate code. dcaballe: > Could we here just return NoVectorization and get rid of the additional check in…
		fhahnUnsubmitted Not Done Reply Inline Actions Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after plan that we must skip in stress testing mode. Thanks! If we return NoVectorization, I would assume we would not use the generated plans after `plan`, as we decided to skip vectorization? Could you please elaborate a bit more? I'm not sure I understand what you mean. I meant moving the code to set UserVF = 4 if VPlanBuildStressTest into this function. That would reduce the number of functions where we have to handle VPlanBuildStressTest, which IMO makes it easier to see what's going on. fhahn: > Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after plan…
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions If we return NoVectorization, I would assume we would not use the generated plans after plan, as we decided to skip vectorization? Yes, it was just a thought. For example, stress testing will be skipping some checks (only `isExplicitVecOuterLoop`, for now). If we have code after `plan` expecting a loop compliant with those checks, we could have problems. But, again, I agree on the change. We can deal with that if it happens. I meant moving the code to set UserVF = 4 if VPlanBuildStressTest into this function. That would reduce the number of functions where we have to handle VPlanBuildStressTest, which IMO makes it easier to see what's going on. Thanks, got it! I wonder if this would be problematic. If we moved this change into this function and we used UserVF after `plan` (likely to happen, look at uses of line 8794), we would be using inconsistent UserVF values. Maybe it's better to keep it here? dcaballe: > If we return NoVectorization, I would assume we would not use the generated plans after plan…
		fhahnUnsubmitted Not Done Reply Inline Actions But as it is currently, the UserVF set in the outer loop code path limited to that block. And I cannot think of any case where using UserVF set for VPlanBuildStressTest will useful anywhere else for now, especially the inner loop code path. My understanding was that we have to bail after planning anyways for VPlanBuildStressTest. At least as everything is now. fhahn: But as it is currently, the UserVF set in the outer loop code path limited to that block. And I…
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Ok, fair enough. I'll move it into the function, at least, for now. Thanks! dcaballe: Ok, fair enough. I'll move it into the function, at least, for now. Thanks!
}		}

DEBUG(dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "		DEBUG(dbgs() << "LV: Not vectorizing. Inner loops aren't supported in the "
"VPlan-native path.\n");		"VPlan-native path.\n");
return NoVectorization;		return NoVectorization;
}		}

VectorizationFactor		VectorizationFactor
LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {		LoopVectorizationPlanner::plan(bool OptForSize, unsigned UserVF) {
assert(OrigLoop->empty() && "Inner loop expected.");		assert(OrigLoop->empty() && "Inner loop expected.");
// Width 1 means no vectorize, cost 0 means uncomputed cost.		// Width 1 means no vectorize, cost 0 means uncomputed cost.
		fhahnUnsubmitted Done Reply Inline Actions "means no vectorization" ? fhahn: "means no vectorization" ?
const VectorizationFactor NoVectorization = {1U, 0U};		const VectorizationFactor NoVectorization = {1U, 0U};
Optional<unsigned> MaybeMaxVF = CM.computeMaxVF(OptForSize);		Optional<unsigned> MaybeMaxVF = CM.computeMaxVF(OptForSize);
if (!MaybeMaxVF.hasValue()) // Cases considered too costly to vectorize.		if (!MaybeMaxVF.hasValue()) // Cases considered too costly to vectorize.
return NoVectorization;		return NoVectorization;

if (UserVF) {		if (UserVF) {
DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");		DEBUG(dbgs() << "LV: Using user VF " << UserVF << ".\n");
assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");		assert(isPowerOf2_32(UserVF) && "VF needs to be a power of two");
▲ Show 20 Lines • Show All 508 Lines • ▼ Show 20 Lines	if (!IsPredicated) {
VPBB->appendRecipe(Recipe);		VPBB->appendRecipe(Recipe);
return VPBB;		return VPBB;
}		}
DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");		DEBUG(dbgs() << "LV: Scalarizing and predicating:" << *I << "\n");
assert(VPBB->getSuccessors().empty() &&		assert(VPBB->getSuccessors().empty() &&
"VPBB has successors when handling predicated replication.");		"VPBB has successors when handling predicated replication.");
// Record predicated instructions for above packing optimizations.		// Record predicated instructions for above packing optimizations.
PredInst2Recipe[I] = Recipe;		PredInst2Recipe[I] = Recipe;
VPBlockBase *Region =		VPBlockBase *Region = createReplicateRegion(I, Recipe, Plan);
VPBB->setOneSuccessor(createReplicateRegion(I, Recipe, Plan));		VPBlockUtils::insertBlockAfter(Region, VPBB);
return cast<VPBasicBlock>(Region->setOneSuccessor(new VPBasicBlock()));		auto *RegSucc = new VPBasicBlock();
		VPBlockUtils::insertBlockAfter(RegSucc, Region);
		return RegSucc;
}		}

VPRegionBlock *		VPRegionBlock *
LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,		LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,
VPRecipeBase *PredRecipe,		VPRecipeBase *PredRecipe,
VPlanPtr &Plan) {		VPlanPtr &Plan) {
// Instructions marked for predication are replicated and placed under an		// Instructions marked for predication are replicated and placed under an
// if-then construct to prevent side-effects.		// if-then construct to prevent side-effects.
Show All 9 Lines	LoopVectorizationPlanner::createReplicateRegion(Instruction *Instr,
auto *PHIRecipe =		auto *PHIRecipe =
Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);		Instr->getType()->isVoidTy() ? nullptr : new VPPredInstPHIRecipe(Instr);
auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);		auto *Exit = new VPBasicBlock(Twine(RegionName) + ".continue", PHIRecipe);
auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);		auto *Pred = new VPBasicBlock(Twine(RegionName) + ".if", PredRecipe);
VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);		VPRegionBlock *Region = new VPRegionBlock(Entry, Exit, RegionName, true);

// Note: first set Entry as region entry and then connect successors starting		// Note: first set Entry as region entry and then connect successors starting
// from it in order, to propagate the "parent" of each VPBasicBlock.		// from it in order, to propagate the "parent" of each VPBasicBlock.
Entry->setTwoSuccessors(Pred, Exit);		VPBlockUtils::insertTwoBlocksAfter(Pred, Exit, Entry);
Pred->setOneSuccessor(Exit);		VPBlockUtils::connectBlocks(Pred, Exit);

return Region;		return Region;
}		}

LoopVectorizationPlanner::VPlanPtr		LoopVectorizationPlanner::VPlanPtr
LoopVectorizationPlanner::buildVPlan(VFRange &Range,		LoopVectorizationPlanner::buildVPlan(VFRange &Range,
const SmallPtrSetImpl<Value *> &NeedDef) {		const SmallPtrSetImpl<Value *> &NeedDef) {
// Outer loop handling: They may require CFG and instruction level		// Outer loop handling: They may require CFG and instruction level
// transformations before even evaluating whether vectorization is profitable.		// transformations before even evaluating whether vectorization is profitable.
// Since we cannot modify the incoming IR, we need to build VPlan upfront in		// Since we cannot modify the incoming IR, we need to build VPlan upfront in
// the vectorization pipeline.		// the vectorization pipeline.
if (!OrigLoop->empty()) {		if (!OrigLoop->empty()) {
assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");		assert(EnableVPlanNativePath && "VPlan-native path is not enabled.");

// Create new empty VPlan		// Create new empty VPlan
auto Plan = llvm::make_unique<VPlan>();		auto Plan = llvm::make_unique<VPlan>();

		// Build hierarchical CFG
		VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI);
		HCFGBuilder.buildHierarchicalCFG(*Plan.get());

return Plan;		return Plan;
}		}

assert(OrigLoop->empty() && "Inner loop expected.");		assert(OrigLoop->empty() && "Inner loop expected.");
EdgeMaskCache.clear();		EdgeMaskCache.clear();
BlockMaskCache.clear();		BlockMaskCache.clear();
DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();		DenseMap<Instruction , Instruction > &SinkAfter = Legal->getSinkAfter();
DenseMap<Instruction , Instruction > SinkAfterInverse;		DenseMap<Instruction , Instruction > SinkAfterInverse;
Show All 25 Lines	LoopVectorizationPlanner::buildVPlan(VFRange &Range,
LoopBlocksDFS DFS(OrigLoop);		LoopBlocksDFS DFS(OrigLoop);
DFS.perform(LI);		DFS.perform(LI);

for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {		for (BasicBlock *BB : make_range(DFS.beginRPO(), DFS.endRPO())) {
// Relevant instructions from basic block BB will be grouped into VPRecipe		// Relevant instructions from basic block BB will be grouped into VPRecipe
// ingredients and fill a new VPBasicBlock.		// ingredients and fill a new VPBasicBlock.
unsigned VPBBsForBB = 0;		unsigned VPBBsForBB = 0;
auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());		auto *FirstVPBBForBB = new VPBasicBlock(BB->getName());
VPBB->setOneSuccessor(FirstVPBBForBB);		VPBlockUtils::insertBlockAfter(FirstVPBBForBB, VPBB);
VPBB = FirstVPBBForBB;		VPBB = FirstVPBBForBB;
Builder.setInsertPoint(VPBB);		Builder.setInsertPoint(VPBB);

std::vector<Instruction *> Ingredients;		std::vector<Instruction *> Ingredients;

// Organize the ingredients to vectorize from current basic block in the		// Organize the ingredients to vectorize from current basic block in the
// right order.		// right order.
for (Instruction &I : *BB) {		for (Instruction &I : *BB) {
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	LoopVectorizationPlanner::buildVPlan(VFRange &Range,
}		}

// Discard empty dummy pre-entry VPBasicBlock. Note that other VPBasicBlocks		// Discard empty dummy pre-entry VPBasicBlock. Note that other VPBasicBlocks
// may also be empty, such as the last one VPBB, reflecting original		// may also be empty, such as the last one VPBB, reflecting original
// basic-blocks with no recipes.		// basic-blocks with no recipes.
VPBasicBlock *PreEntry = cast<VPBasicBlock>(Plan->getEntry());		VPBasicBlock *PreEntry = cast<VPBasicBlock>(Plan->getEntry());
assert(PreEntry->empty() && "Expecting empty pre-entry block.");		assert(PreEntry->empty() && "Expecting empty pre-entry block.");
VPBlockBase *Entry = Plan->setEntry(PreEntry->getSingleSuccessor());		VPBlockBase *Entry = Plan->setEntry(PreEntry->getSingleSuccessor());
PreEntry->disconnectSuccessor(Entry);		VPBlockUtils::disconnectBlocks(PreEntry, Entry);
delete PreEntry;		delete PreEntry;

std::string PlanName;		std::string PlanName;
raw_string_ostream RSO(PlanName);		raw_string_ostream RSO(PlanName);
unsigned VF = Range.Start;		unsigned VF = Range.Start;
Plan->addVF(VF);		Plan->addVF(VF);
RSO << "Initial VPlan for VF={" << VF;		RSO << "Initial VPlan for VF={" << VF;
for (VF = 2; VF < Range.End; VF = 2) {		for (VF = 2; VF < Range.End; VF = 2) {
▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

lib/Transforms/Vectorize/VPlan.h

Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	struct VPTransformState {
InnerLoopVectorizer *ILV;		InnerLoopVectorizer *ILV;

VPCallback &Callback;		VPCallback &Callback;
};		};

/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.		/// VPBlockBase is the building block of the Hierarchical Control-Flow Graph.
/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.		/// A VPBlockBase can be either a VPBasicBlock or a VPRegionBlock.
class VPBlockBase {		class VPBlockBase {
		friend class VPBlockUtils;

private:		private:
const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).		const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

/// An optional name for the block.		/// An optional name for the block.
std::string Name;		std::string Name;

/// The immediate VPRegionBlock which this VPBlockBase belongs to, or null if		/// The immediate VPRegionBlock which this VPBlockBase belongs to, or null if
/// it is a topmost VPBlockBase.		/// it is a topmost VPBlockBase.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	public:

void setName(const Twine &newName) { Name = newName.str(); }		void setName(const Twine &newName) { Name = newName.str(); }

/// \return an ID for the concrete type of this object.		/// \return an ID for the concrete type of this object.
/// This is used to implement the classof checks. This should not be used		/// This is used to implement the classof checks. This should not be used
/// for any other purpose, as the values may change as LLVM evolves.		/// for any other purpose, as the values may change as LLVM evolves.
unsigned getVPBlockID() const { return SubclassID; }		unsigned getVPBlockID() const { return SubclassID; }

		VPRegionBlock *getParent() { return Parent; }
const VPRegionBlock *getParent() const { return Parent; }		const VPRegionBlock *getParent() const { return Parent; }

void setParent(VPRegionBlock *P) { Parent = P; }		void setParent(VPRegionBlock *P) { Parent = P; }

/// \return the VPBasicBlock that is the entry of this VPBlockBase,		/// \return the VPBasicBlock that is the entry of this VPBlockBase,
/// recursively, if the latter is a VPRegionBlock. Otherwise, if this		/// recursively, if the latter is a VPRegionBlock. Otherwise, if this
/// VPBlockBase is a VPBasicBlock, it is returned.		/// VPBlockBase is a VPBasicBlock, it is returned.
const VPBasicBlock *getEntryBasicBlock() const;		const VPBasicBlock *getEntryBasicBlock() const;
Show All 18 Lines	public:
}		}

/// \return the predecessor of this VPBlockBase if it has a single		/// \return the predecessor of this VPBlockBase if it has a single
/// predecessor. Otherwise return a null pointer.		/// predecessor. Otherwise return a null pointer.
VPBlockBase *getSinglePredecessor() const {		VPBlockBase *getSinglePredecessor() const {
return (Predecessors.size() == 1 ? *Predecessors.begin() : nullptr);		return (Predecessors.size() == 1 ? *Predecessors.begin() : nullptr);
}		}

		size_t getNumSuccessors() const { return Successors.size(); }
		size_t getNumPredecessors() const { return Predecessors.size(); }

/// An Enclosing Block of a block B is any block containing B, including B		/// An Enclosing Block of a block B is any block containing B, including B
/// itself. \return the closest enclosing block starting from "this", which		/// itself. \return the closest enclosing block starting from "this", which
/// has successors. \return the root enclosing block if all enclosing blocks		/// has successors. \return the root enclosing block if all enclosing blocks
/// have no successors.		/// have no successors.
VPBlockBase *getEnclosingBlockWithSuccessors();		VPBlockBase *getEnclosingBlockWithSuccessors();

/// \return the closest enclosing block starting from "this", which has		/// \return the closest enclosing block starting from "this", which has
/// predecessors. \return the root enclosing block if all enclosing blocks		/// predecessors. \return the root enclosing block if all enclosing blocks
Show All 26 Lines	const VPBlocksTy &getHierarchicalPredecessors() {
return getEnclosingBlockWithPredecessors()->getPredecessors();		return getEnclosingBlockWithPredecessors()->getPredecessors();
}		}

/// \return the hierarchical predecessor of this VPBlockBase if it has a		/// \return the hierarchical predecessor of this VPBlockBase if it has a
/// single hierarchical predecessor. Otherwise return a null pointer.		/// single hierarchical predecessor. Otherwise return a null pointer.
VPBlockBase *getSingleHierarchicalPredecessor() {		VPBlockBase *getSingleHierarchicalPredecessor() {
return getEnclosingBlockWithPredecessors()->getSinglePredecessor();		return getEnclosingBlockWithPredecessors()->getSinglePredecessor();
}		}

/// Sets a given VPBlockBase \p Successor as the single successor and \return		/// Set a given VPBlockBase \p Successor as the single successor of this
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions This is the rationale behind the changes below: I wouldn't like VPBlockBase to end up with a large list of interfaces to set/insert successors/predecessors, all of them doing almost the same with very subtle differences. For that reason, I introduced the class VPBlockUtility to keep there all the VPBlockBase manipulation interfaces and keep VPBlockBase class cleaner. Of course, that doesn't mean that we want to add unnecessary utilities to VPBlockUtility class. In VPBlockBase class, I decided to just keep the very very basic interfaces, which can be used directly or can be a building block of a more complex utility in VPBlockUtility. In this regard, I'm simplifying the logic of setOneSuccessor and setTwoSuccessors and replacing their existing calls with a more generic utility function in VPBlockUtility. The previous implementation was very ad-hoc for the patch where they were introduced and I couldn't reuse them as is. Of course, I'm open to other suggestions. dcaballe: This is the rationale behind the changes below: I wouldn't like VPBlockBase to end up with a…
		rengolinUnsubmitted Not Done Reply Inline Actions I like this idea and I like the methods, as they give a better impression as to what is truly happening behind the scenes. I agree we shouldn't bloat this class, but it's a good buffer class to have during the prototype phase of the outer loop implementation, so we can gauge the BPBasicBlock usability. rengolin: I like this idea and I like the methods, as they give a better impression as to what is truly…
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Thanks, Renato! I agree we shouldn't bloat this class, but it's a good buffer class to have during the prototype phase of the outer loop implementation, so we can gauge the BPBasicBlock usability. Agreed. Any particular suggestion on moving some of the current utilities in VPBlockUtils back to VPBlockBase? I think the current ones are OK there but let me know if you have any comments in that regard. dcaballe: Thanks, Renato! >I agree we shouldn't bloat this class, but it's a good buffer class to have…
		rengolinUnsubmitted Not Done Reply Inline Actions I think we can start with these, and move up and down as needed later. rengolin: I think we can start with these, and move up and down as needed later.
/// \p Successor. The parent of this Block is copied to be the parent of		/// VPBlockBase. This VPBlockBase is not added as predecessor of \p Successor.
/// \p Successor.		/// This VPBlockBase must have no successors.
VPBlockBase setOneSuccessor(VPBlockBase Successor) {		void setOneSuccessor(VPBlockBase *Successor) {
assert(Successors.empty() && "Setting one successor when others exist.");		assert(Successors.empty() && "Setting one successor when others exist.");
appendSuccessor(Successor);		appendSuccessor(Successor);
Successor->appendPredecessor(this);
Successor->Parent = Parent;
return Successor;
}		}

/// Sets two given VPBlockBases \p IfTrue and \p IfFalse to be the two		/// Set two given VPBlockBases \p IfTrue and \p IfFalse to be the two
/// successors. The parent of this Block is copied to be the parent of both		/// successors of this VPBlockBase. This VPBlockBase is not added as
/// \p IfTrue and \p IfFalse.		/// predecessor of \p IfTrue or \p IfFalse. This VPBlockBase must have no
		/// successors.
void setTwoSuccessors(VPBlockBase IfTrue, VPBlockBase IfFalse) {		void setTwoSuccessors(VPBlockBase IfTrue, VPBlockBase IfFalse) {
assert(Successors.empty() && "Setting two successors when others exist.");		assert(Successors.empty() && "Setting two successors when others exist.");
appendSuccessor(IfTrue);		appendSuccessor(IfTrue);
appendSuccessor(IfFalse);		appendSuccessor(IfFalse);
IfTrue->appendPredecessor(this);
IfFalse->appendPredecessor(this);
IfTrue->Parent = Parent;
IfFalse->Parent = Parent;
}		}

void disconnectSuccessor(VPBlockBase *Successor) {		/// Set each VPBasicBlock in \p NewPreds as predecessor of this VPBlockBase.
assert(Successor && "Successor to disconnect is null.");		/// This VPBlockBase must have no predecessors. This VPBlockBase is not added
removeSuccessor(Successor);		/// as successor of any VPBasicBlock in \p NewPreds.
Successor->removePredecessor(this);		void setPredecessors(ArrayRef<VPBlockBase *> NewPreds) {
		assert(Predecessors.empty() && "Block predecessors already set.");
		for (auto *Pred : NewPreds)
		appendPredecessor(Pred);
}		}

/// The method which generates the output IR that correspond to this		/// The method which generates the output IR that correspond to this
/// VPBlockBase, thereby "executing" the VPlan.		/// VPBlockBase, thereby "executing" the VPlan.
virtual void execute(struct VPTransformState *State) = 0;		virtual void execute(struct VPTransformState *State) = 0;

/// Delete all blocks reachable from a given VPBlockBase, inclusive.		/// Delete all blocks reachable from a given VPBlockBase, inclusive.
static void deleteCFG(VPBlockBase *Entry);		static void deleteCFG(VPBlockBase *Entry);
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	private:
typedef unsigned char OpcodeTy;		typedef unsigned char OpcodeTy;
OpcodeTy Opcode;		OpcodeTy Opcode;

/// Utility method serving execute(): generates a single instance of the		/// Utility method serving execute(): generates a single instance of the
/// modeled instruction.		/// modeled instruction.
void generateInstruction(VPTransformState &State, unsigned Part);		void generateInstruction(VPTransformState &State, unsigned Part);

public:		public:
VPInstruction(unsigned Opcode, std::initializer_list<VPValue *> Operands)		VPInstruction(unsigned Opcode, ArrayRef<VPValue *> Operands)
: VPUser(VPValue::VPInstructionSC, Operands),		: VPUser(VPValue::VPInstructionSC, Operands),
VPRecipeBase(VPRecipeBase::VPInstructionSC), Opcode(Opcode) {}		VPRecipeBase(VPRecipeBase::VPInstructionSC), Opcode(Opcode) {}

		VPInstruction(unsigned Opcode, std::initializer_list<VPValue *> Operands)
		: VPInstruction(Opcode, ArrayRef<VPValue *>(Operands)) {}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPValue *V) {		static inline bool classof(const VPValue *V) {
return V->getVPValueID() == VPValue::VPInstructionSC;		return V->getVPValueID() == VPValue::VPInstructionSC;
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPRecipeBase *R) {		static inline bool classof(const VPRecipeBase *R) {
return R->getVPRecipeID() == VPRecipeBase::VPInstructionSC;		return R->getVPRecipeID() == VPRecipeBase::VPInstructionSC;
▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines	VPRegionBlock(VPBlockBase Entry, VPBlockBase Exit,
const std::string &Name = "", bool IsReplicator = false)		const std::string &Name = "", bool IsReplicator = false)
: VPBlockBase(VPRegionBlockSC, Name), Entry(Entry), Exit(Exit),		: VPBlockBase(VPRegionBlockSC, Name), Entry(Entry), Exit(Exit),
IsReplicator(IsReplicator) {		IsReplicator(IsReplicator) {
assert(Entry->getPredecessors().empty() && "Entry block has predecessors.");		assert(Entry->getPredecessors().empty() && "Entry block has predecessors.");
assert(Exit->getSuccessors().empty() && "Exit block has successors.");		assert(Exit->getSuccessors().empty() && "Exit block has successors.");
Entry->setParent(this);		Entry->setParent(this);
Exit->setParent(this);		Exit->setParent(this);
}		}
		VPRegionBlock(const std::string &Name = "", bool IsReplicator = false)
		: VPBlockBase(VPRegionBlockSC, Name), Entry(nullptr), Exit(nullptr),
		IsReplicator(IsReplicator) {}
		fhahnUnsubmitted Done Reply Inline Actions nit: I think we should match the indent with VPBlockBase? fhahn: nit: I think we should match the indent with VPBlockBase?
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Sorry, thanks! dcaballe: Sorry, thanks!

~VPRegionBlock() override {		~VPRegionBlock() override {
if (Entry)		if (Entry)
deleteCFG(Entry);		deleteCFG(Entry);
}		}

/// Method to support type inquiry through isa, cast, and dyn_cast.		/// Method to support type inquiry through isa, cast, and dyn_cast.
static inline bool classof(const VPBlockBase *V) {		static inline bool classof(const VPBlockBase *V) {
return V->getVPBlockID() == VPBlockBase::VPRegionBlockSC;		return V->getVPBlockID() == VPBlockBase::VPRegionBlockSC;
}		}

const VPBlockBase *getEntry() const { return Entry; }		const VPBlockBase *getEntry() const { return Entry; }
VPBlockBase *getEntry() { return Entry; }		VPBlockBase *getEntry() { return Entry; }

		/// Set \p EntryBlock as the entry VPBlockBase of this VPRegionBlock. \p
		/// EntryBlock must have no predecessors.
		void setEntry(VPBlockBase *EntryBlock) {
		assert(EntryBlock->getPredecessors().empty() &&
		"Entry block cannot have predecessors.");
		Entry = EntryBlock;
		EntryBlock->setParent(this);
		}

const VPBlockBase *getExit() const { return Exit; }		const VPBlockBase *getExit() const { return Exit; }
VPBlockBase *getExit() { return Exit; }		VPBlockBase *getExit() { return Exit; }

		/// Set \p ExitBlock as the exit VPBlockBase of this VPRegionBlock. \p
		/// ExitBlock must have no successors.
		void setExit(VPBlockBase *ExitBlock) {
		assert(ExitBlock->getSuccessors().empty() &&
		"Exit block cannot have successors.");
		Exit = ExitBlock;
		ExitBlock->setParent(this);
		}

/// An indicator whether this region is to generate multiple replicated		/// An indicator whether this region is to generate multiple replicated
/// instances of output IR corresponding to its VPBlockBases.		/// instances of output IR corresponding to its VPBlockBases.
bool isReplicator() const { return IsReplicator; }		bool isReplicator() const { return IsReplicator; }

/// The method which generates the output IR instructions that correspond to		/// The method which generates the output IR instructions that correspond to
/// this VPRegionBlock, thereby "executing" the VPlan.		/// this VPRegionBlock, thereby "executing" the VPlan.
void execute(struct VPTransformState *State) override;		void execute(struct VPTransformState *State) override;
};		};
Show All 11 Lines	private:
VPBlockBase *Entry;		VPBlockBase *Entry;

/// Holds the VFs applicable to this VPlan.		/// Holds the VFs applicable to this VPlan.
SmallSet<unsigned, 2> VFs;		SmallSet<unsigned, 2> VFs;

/// Holds the name of the VPlan, for printing.		/// Holds the name of the VPlan, for printing.
std::string Name;		std::string Name;

		/// Holds all the external definitions created for this VPlan.
		// TODO: Introduce a specific representation for external definitions in
		// VPlan. External definitions must be immutable and hold a pointer to its
		// underlying IR that will be used to implement its structural comparison
		// (operators '==' and '<').
		SmallSet<VPValue *, 32> VPExternalDefs;
		fhahnUnsubmitted Done Reply Inline Actions nit: 32 seems quite large fhahn: nit: 32 seems quite large
		dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Ok, let's use 16. It would be easy to go over 8 for a double loop nest using a few memory references. dcaballe: Ok, let's use 16. It would be easy to go over 8 for a double loop nest using a few memory…

/// Holds a mapping between Values and their corresponding VPValue inside		/// Holds a mapping between Values and their corresponding VPValue inside
/// VPlan.		/// VPlan.
Value2VPValueTy Value2VPValue;		Value2VPValueTy Value2VPValue;

public:		public:
VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {}		VPlan(VPBlockBase *Entry = nullptr) : Entry(Entry) {}

~VPlan() {		~VPlan() {
Show All 14 Lines	public:
void addVF(unsigned VF) { VFs.insert(VF); }		void addVF(unsigned VF) { VFs.insert(VF); }

bool hasVF(unsigned VF) { return VFs.count(VF); }		bool hasVF(unsigned VF) { return VFs.count(VF); }

const std::string &getName() const { return Name; }		const std::string &getName() const { return Name; }

void setName(const Twine &newName) { Name = newName.str(); }		void setName(const Twine &newName) { Name = newName.str(); }

		/// Add \p VPVal to the pool of external definitions if it's not already
		/// in the pool.
		void addExternalDef(VPValue *VPVal) {
		VPExternalDefs.insert(VPVal);
		}

void addVPValue(Value *V) {		void addVPValue(Value *V) {
assert(V && "Trying to add a null Value to VPlan");		assert(V && "Trying to add a null Value to VPlan");
assert(!Value2VPValue.count(V) && "Value already exists in VPlan");		assert(!Value2VPValue.count(V) && "Value already exists in VPlan");
Value2VPValue[V] = new VPValue();		Value2VPValue[V] = new VPValue();
}		}

VPValue getVPValue(Value V) {		VPValue getVPValue(Value V) {
assert(V && "Trying to get the VPValue of a null Value");		assert(V && "Trying to get the VPValue of a null Value");
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	static inline ChildIteratorType child_begin(NodeRef N) {
return N->getPredecessors().begin();		return N->getPredecessors().begin();
}		}

static inline ChildIteratorType child_end(NodeRef N) {		static inline ChildIteratorType child_end(NodeRef N) {
return N->getPredecessors().end();		return N->getPredecessors().end();
}		}
};		};

		//===----------------------------------------------------------------------===//
		// VPlan Utilities
		//===----------------------------------------------------------------------===//

		/// Class that provides utilities for VPBlockBases in VPlan.
		class VPBlockUtils {
		public:
		VPBlockUtils() = delete;

		/// Insert disconnected VPBlockBase \p NewBlock after \p BlockPtr. Add \p
		/// NewBlock as successor of \p BlockPtr and \p Block as predecessor of \p
		/// NewBlock, and propagate \p BlockPtr parent to \p NewBlock. \p NewBlock
		/// must have neither successors nor predecessors.
		static void insertBlockAfter(VPBlockBase NewBlock, VPBlockBase BlockPtr) {
		assert(NewBlock->getSuccessors().empty() &&
		"Can't insert new block with successors.");
		// TODO: move successors from BlockPtr to NewBlock when this functionality
		// is necessary. For now, setBlockSingleSuccessor will assert if BlockPtr
		// already has successors.
		BlockPtr->setOneSuccessor(NewBlock);
		NewBlock->setPredecessors({BlockPtr});
		NewBlock->setParent(BlockPtr->getParent());
		}

		/// Insert disconnected VPBlockBases \p IfTrue and \p IfFalse after \p
		/// BlockPtr. Add \p IfTrue and \p IfFalse as succesors of \p BlockPtr and \p
		/// BlockPtr as predecessor of \p IfTrue and \p IfFalse. Propagate \p BlockPtr
		/// parent to \p IfTrue and \p IfFalse. \p BlockPtr must have no successors
		/// and \p IfTrue and \p IfFalse must have neither successors nor
		/// predecessors.
		static void insertTwoBlocksAfter(VPBlockBase IfTrue, VPBlockBase IfFalse,
		VPBlockBase *BlockPtr) {
		assert(IfTrue->getSuccessors().empty() &&
		"Can't insert IfTrue with successors.");
		assert(IfFalse->getSuccessors().empty() &&
		"Can't insert IfFalse with successors.");
		BlockPtr->setTwoSuccessors(IfTrue, IfFalse);
		IfTrue->setPredecessors({BlockPtr});
		IfFalse->setPredecessors({BlockPtr});
		IfTrue->setParent(BlockPtr->getParent());
		IfFalse->setParent(BlockPtr->getParent());
		}

		/// Connect VPBlockBases \p From and \p To bi-directionally. Append \p To to
		/// the successors of \p From and \p From to the predecessors of \p To. Both
		/// VPBlockBases must have the same parent, which can be null. Both
		/// VPBlockBases can be already connected to other VPBlockBases.
		static void connectBlocks(VPBlockBase From, VPBlockBase To) {
		assert((From->getParent() == To->getParent()) &&
		"Can't connect two block with different parents");
		assert(From->getNumSuccessors() < 2 &&
		"Blocks can't have more than two successors.");
		From->appendSuccessor(To);
		To->appendPredecessor(From);
		}

		/// Disconnect VPBlockBases \p From and \p To bi-directionally. Remove \p To
		/// from the successors of \p From and \p From from the predecessors of \p To.
		static void disconnectBlocks(VPBlockBase From, VPBlockBase To) {
		assert(To && "Successor to disconnect is null.");
		From->removeSuccessor(To);
		To->removePredecessor(From);
		}
		};
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_H		#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_H

lib/Transforms/Vectorize/VPlanHCFGBuilder.h

This file was added.

				//===-- VPlanHCFGBuilder.h --------------------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the VPlanHCFGBuilder class which contains the public
				/// interface (buildHierarchicalCFG) to build a VPlan-based Hierarchical CFG
				/// (H-CFG) for an incoming IR.
				///
				/// A H-CFG in VPlan is a control-flow graph whose nodes are VPBasicBlocks
				/// and/or VPRegionBlocks (i.e., other H-CFGs). The outermost H-CFG of a VPlan
				/// consists of a VPRegionBlock, denoted Top Region, which encloses any other
				/// VPBlockBase in the H-CFG. This guarantees that any VPBlockBase in the H-CFG
				/// other than the Top Region will have a parent VPRegionBlock and allows us
				/// to easily add more nodes before/after the main vector loop (such as the
				/// reduction epilogue).
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLAN_VPLANHCFGBUILDER_H
				#define LLVM_TRANSFORMS_VECTORIZE_VPLAN_VPLANHCFGBUILDER_H

				#include "VPlan.h"
				#include "VPlanVerifier.h"

				namespace llvm {
				fhahnUnsubmitted Done Reply Inline Actions Could be moved to .cpp? fhahn: Could be moved to .cpp?
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions I can remove this for now. dcaballe: I can remove this for now.

				class Loop;

				/// Main class to build the VPlan H-CFG for an incoming IR.
				fhahnUnsubmitted Done Reply Inline Actions Do we need SE here? I could not find any uses. fhahn: Do we need SE here? I could not find any uses.
				class VPlanHCFGBuilder {
				private:
				// The outermost loop of the input loop nest considered for vectorization.
				Loop *TheLoop;

				// Loop Info analysis.
				LoopInfo *LI;

				// VPlan verifier utility.
				VPlanVerifier Verifier;

				public:
				VPlanHCFGBuilder(Loop Lp, LoopInfo LI) : TheLoop(Lp), LI(LI) {}

				/// Build H-CFG for TheLoop and update \p Plan accordingly.
				void buildHierarchicalCFG(VPlan &Plan);
				};
				} // namespace llvm

				#endif // LLVM_TRANSFORMS_VECTORIZE_VPLAN_VPLANHCFGBUILDER_H

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp

This file was added.

				//===-- VPlanHCFGBuilder.cpp ----------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file implements the construction of a VPlan-based Hierarchical CFG
				/// (H-CFG) for an incoming IR. This construction comprises the following
				/// components and steps:
				//
				/// 1. PlainCFGBuilder class: builds a plain VPBasicBlock-based CFG that
				/// faithfully represents the CFG in the incoming IR. A VPRegionBlock (Top
				/// Region) is created to enclose and serve as parent of all the VPBasicBlocks
				/// in the plain CFG.
				/// NOTE: At this point, there is a direct correspondence between all the
				/// VPBasicBlocks created for the initial plain CFG and the incoming
				/// BasicBlocks. However, this might change in the future.
				///
				//===----------------------------------------------------------------------===//

				#include "VPlanHCFGBuilder.h"
				#include "LoopVectorizationPlanner.h"
				#include "llvm/Analysis/LoopIterator.h"

				fhahnUnsubmitted Done Reply Inline Actions Do we need SE here? I could not find any uses. fhahn: Do we need SE here? I could not find any uses.
				#define DEBUG_TYPE "vplan-hcfg-builder"

				using namespace llvm;

				// Class that is used to build the plain CFG for the incoming IR.
				class PlainCFGBuilder {
				private:
				// The outermost loop of the input loop nest considered for vectorization.
				Loop *TheLoop;

				// Loop Info analysis.
				LoopInfo *LI;

				// Vectorization plan that we are working on.
				VPlan &Plan;

				// Output Top Region.
				VPRegionBlock *TopRegion = nullptr;

				// Builder of the VPlan instruction-level representation.
				VPBuilder VPIRBuilder;

				// NOTE: The following maps are intentionally destroyed after the plain CFG
				// construction because subsequent VPlan-to-VPlan transformation may
				// invalidate them.
				// Map incoming BasicBlocks to their newly-created VPBasicBlocks.
				DenseMap<BasicBlock , VPBasicBlock > BB2VPBB;
				// Map incoming Value definitions to their newly-created VPValues.
				DenseMap<Value , VPValue > IRDef2VPValue;

				// Hold phi node's that need to be fixed once the plain CFG has been built.
				SmallVector<PHINode *, 8> PhisToFix;

				// Utility functions.
				void setVPBBPredsFromBB(VPBasicBlock VPBB, BasicBlock BB);
				void fixPhiNodes();
				VPBasicBlock getOrCreateVPBB(BasicBlock BB);
				bool isExternalDef(Value *Val);
				VPValue getOrCreateVPOperand(Value IRVal);
				void createVPInstructionsForVPBB(VPBasicBlock VPBB, BasicBlock BB);

				public:
				PlainCFGBuilder(Loop Lp, LoopInfo LI, VPlan &P)
				: TheLoop(Lp), LI(LI), Plan(P) {}

				// Build the plain CFG and return its Top Region.
				VPRegionBlock *buildPlainCFG();
				};

				// Return true if \p Inst is an incoming Instruction to be ignored in the VPlan
				// representation.
				static bool isInstructionToIgnore(Instruction *Inst) {
				return isa<BranchInst>(Inst);
				}

				// Set predecessors of \p VPBB in the same order as they are in \p BB. \p VPBB
				// must have no predecessors.
				void PlainCFGBuilder::setVPBBPredsFromBB(VPBasicBlock VPBB, BasicBlock BB) {
				SmallVector<VPBlockBase *, 8> VPBBPreds;
				// Collect VPBB predecessors.
				for (BasicBlock *Pred : predecessors(BB))
				VPBBPreds.push_back(getOrCreateVPBB(Pred));

				VPBB->setPredecessors(VPBBPreds);
				}

				// Add operands to VPInstructions representing phi nodes from the input IR.
				void PlainCFGBuilder::fixPhiNodes() {
				for (auto *Phi : PhisToFix) {
				assert(IRDef2VPValue.count(Phi) && "Missing VPInstruction for PHINode.");
				VPValue * VPVal = IRDef2VPValue[Phi];
				assert(isa<VPInstruction>(VPVal) && "Expected VPInstruction for phi node.");
				auto * VPPhi = cast<VPInstruction>(VPVal);
				a.elovikovUnsubmitted Not Done Reply Inline Actions For outer loop vectorization in int s = 0; for (int i = 0; i < N; ++i) { for (int j = 0; j < M; ++j) { s += x[i] * y[j]; } } We need a broadcast y[j] -> {y[j], y[j], y[j], y[j]} but this will generate a WIDEN recipe for the load. Is that OK? If so, can we document it somewhere? a.elovikov: For outer loop vectorization in int s = 0; for (int i = 0; i < N; ++i) { for…
				hsaitoUnsubmitted Not Done Reply Inline Actions Reference: LoopVectorizationPlanner::tryToWidenMemory(). VPWidenMemoryRecipe can handle CM_GatherScatter and uniform can be thought of as a special form of gather/scatter. From that perspective, it is okay. A vector load/store is deemed gather/scatter until analysis improves it to a better access type. From that perspective, using "generic gather/scatter" during the initial VPlan construction phase makes perfect sense. If we are building a single VPlan CFG for inner and/or outer loop vectorization (and that's something we should be doing if HCFG look identical), we can't encode "memory access kind" information within HCFG. So, keeping it in "generic gather/scatter" at HCFG level is the right thing to do for the long term also. In other words, we need a storage outside of HCFG to house "uniform/unit-stride/interleave/..." information for the load/store. hsaito: Reference: LoopVectorizationPlanner::tryToWidenMemory(). VPWidenMemoryRecipe can handle…
				assert(VPPhi->getNumOperands() == 0 &&
				fhahnUnsubmitted Done Reply Inline Actions nit: auto VPPhi fhahn:* nit: auto *VPPhi
				"Expected VPInstruction with no operands.");

				for (Value *Op : Phi->operands())
				VPPhi->addOperand(getOrCreateVPOperand(Op));
				}
				}

				// Create a new empty VPBasicBlock for an incoming BasicBlock or retrieve an
				// existing one if it was already created.
				VPBasicBlock PlainCFGBuilder::getOrCreateVPBB(BasicBlock BB) {
				auto BlockIt = BB2VPBB.find(BB);
				if (BlockIt != BB2VPBB.end())
				// Retrieve existing VPBB.
				return BlockIt->second;

				// Create new VPBB.
				DEBUG(dbgs() << "Creating VPBasicBlock for " << BB->getName() << "\n");
				VPBasicBlock *VPBB = new VPBasicBlock(BB->getName());
				BB2VPBB[BB] = VPBB;
				VPBB->setParent(TopRegion);
				return VPBB;
				}

				// Return true if \p Val is considered an external definition. An external
				// definition is either:
				// 1. A Value that is not an Instruction. This will be refined in the future.
				// 2. An Instruction that is outside of the CFG snippet represented in VPlan,
				// i.e., is not part of: a) the loop nest, b) outermost loop PH and, c)
				// outermost loop exits.
				bool PlainCFGBuilder::isExternalDef(Value *Val) {
				// All the Values that are not Instructions are considered external
				// definitions for now.
				fhahnUnsubmitted Not Done Reply Inline Actions Could we use a similar (simpler) logic to what @hsaito used in D46302 here? Like Instr->getParent() strictly dominates the pre header? fhahn: Could we use a similar (simpler) logic to what @hsaito used in D46302 here? Like Instr…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Good point! I think the scenario is a bit different and there would be some corner cases that wouldn't work if we do `DT->properlyDominates(Instr->getParent(), PH)`. For example, any definition in the loop exit with a use within the HCFG wouldn't work. Imagine something like: ph.inner: %0 = phi %1, %t loop.body: ... loop.exit: %t = %a = ... = %a ... If I'm not mistaken, uses of '%t' and %a would be classified as external definitions and they are not. Make sense? Any other idea? dcaballe: Good point! I think the scenario is a bit different and there would be some corner cases that…
				fhahnUnsubmitted Not Done Reply Inline Actions Ah yes, we would have to account for instructions in the preheader and exit block separately. Then it would probably not simplify things much. fhahn: Ah yes, we would have to account for instructions in the preheader and exit block separately.
				Instruction *Inst = dyn_cast<Instruction>(Val);
				if (!Inst)
				return true;

				BasicBlock *InstParent = Inst->getParent();
				assert(InstParent && "Expected instruction parent.");

				// Check whether Instruction definition is in loop PH.
				BasicBlock *PH = TheLoop->getLoopPreheader();
				assert(PH && "Expected loop pre-header.");

				if (InstParent == PH)
				// Instruction definition is in outermost loop PH.
				return false;

				// Check whether Instruction definition is in the loop exit.
				BasicBlock *Exit = TheLoop->getUniqueExitBlock();
				assert(Exit && "Expected loop with single exit.");
				if (InstParent == Exit) {
				// Instruction definition is in outermost loop exit.
				return false;
				}

				// Check whether Instruction definition is in loop body.
				return !TheLoop->contains(Inst);
				}

				// Create a new VPValue or retrieve an existing one for the Instruction's
				// operand \p IRVal. This function must only be used to create/retrieve VPValues
				// for Instruction's operands and not to create regular VPInstruction's. For
				fhahnUnsubmitted Done Reply Inline Actions nit: IRVal as it is a value? fhahn: nit: IRVal as it is a value?
				// the latter, please, look at 'createVPInstructionsForVPBB'.
				VPValue PlainCFGBuilder::getOrCreateVPOperand(Value IRVal) {
				auto VPValIt = IRDef2VPValue.find(IRVal);
				if (VPValIt != IRDef2VPValue.end())
				// Operand has an associated VPInstruction or VPValue that was previously
				// created.
				return VPValIt->second;

				// Operand doesn't have a previously created VPInstruction/VPValue. This
				// means that operand is:
				// A) a definition external to VPlan,
				// B) any other Value without specific representation in VPlan.
				// For now, we use VPValue to represent A and B and classify both as external
				// definitions. We may introduce specific VPValue subclasses for them in the
				// future.
				assert(isExternalDef(IRVal) && "Expected external definition as operand.");

				// A and B: Create VPValue and add it to the pool of external definitions and
				// to the Value->VPValue map.
				VPValue *NewVPVal = new VPValue(IRVal);
				Plan.addExternalDef(NewVPVal);
				IRDef2VPValue[IRVal] = NewVPVal;
				return NewVPVal;
				}

				// Create new VPInstructions in a VPBasicBlock, given its BasicBlock
				// counterpart. This function must be invoked in RPO so that the operands of a
				// VPInstruction in \p BB have been visited before (except for Phi nodes).
				void PlainCFGBuilder::createVPInstructionsForVPBB(VPBasicBlock *VPBB,
				BasicBlock *BB) {
				VPIRBuilder.setInsertPoint(VPBB);
				for (Instruction &InstRef : *BB) {
				Instruction *Inst = &InstRef;
				if (isInstructionToIgnore(Inst))
				continue;

				// There should't be any VPValue for Inst at this point. Otherwise, we
				// visited Inst when we shouldn't, breaking the RPO traversal order.
				assert(!IRDef2VPValue.count(Inst) &&
				"Instruction shouldn't have been visited.");

				VPInstruction *NewVPInst;
				if (PHINode *Phi = dyn_cast<PHINode>(Inst)) {
				// Phi node's operands may have not been visited at this point. We create
				// an empty VPInstruction that we will fix once the whole plain CFG has
				// been built.
				NewVPInst = cast<VPInstruction>(VPIRBuilder.createNaryOp(
				Inst->getOpcode(), {} /No operands/, Inst));
				PhisToFix.push_back(Phi);
				} else {
				// Translate LLVM-IR operands into VPValue operands and set them in the
				// new VPInstruction.
				SmallVector<VPValue *, 4> VPOperands;
				for (Value *Op : Inst->operands())
				VPOperands.push_back(getOrCreateVPOperand(Op));

				// Build VPInstruction for any arbitraty Instruction without specific
				// representation in VPlan.
				NewVPInst = cast<VPInstruction>(
				VPIRBuilder.createNaryOp(Inst->getOpcode(), VPOperands, Inst));
				}

				IRDef2VPValue[Inst] = NewVPInst;
				}
				}

				// Main interface to build the plain CFG.
				VPRegionBlock *PlainCFGBuilder::buildPlainCFG() {
				// 1. Create the Top Region. It will be the parent of all VPBBs.
				TopRegion = new VPRegionBlock("TopRegion", false /isReplicator/);

				// 2. Scan the body of the loop in a topological order to visit each basic
				// block after having visited its predecessor basic blocks.Create a VPBB for
				// each BB and link it to its successor and predecessor VPBBs. Note that
				fhahnUnsubmitted Done Reply Inline Actions nit: space after . fhahn: nit: space after .
				// predecessors must be set in the same order as they are in the incomming IR.
				// Otherwise, there might be problems with existing phi nodes and algorithm
				// based on predecessors traversal.

				// Loop PH needs to be explicitly visited since it's not taken into account by
				// LoopBlocksDFS.
				BasicBlock *PreheaderBB = TheLoop->getLoopPreheader();
				assert((PreheaderBB->getTerminator()->getNumSuccessors() == 1) &&
				"Unexpected loop preheader");
				VPBasicBlock *PreheaderVPBB = getOrCreateVPBB(PreheaderBB);
				createVPInstructionsForVPBB(PreheaderVPBB, PreheaderBB);
				// Create empty VPBB for Loop H so that we can link PH->H.
				VPBlockBase *HeaderVPBB = getOrCreateVPBB(TheLoop->getHeader());
				// Preheader's predecessors will be set during the loop RPO traversal below.
				PreheaderVPBB->setOneSuccessor(HeaderVPBB);

				LoopBlocksRPO RPO(TheLoop);
				RPO.perform(LI);

				for (BasicBlock *BB : RPO) {
				// Create or retrieve the VPBasicBlock for this BB and create its
				// VPInstructions.
				VPBasicBlock *VPBB = getOrCreateVPBB(BB);
				createVPInstructionsForVPBB(VPBB, BB);

				// Set VPBB successors. We create empty VPBBs for successors if they don't
				// exist already. Recipes will be created when the successor is visited
				// during the RPO traversal.
				TerminatorInst *TI = BB->getTerminator();
				assert(TI && "Terminator expected.");
				unsigned NumSuccs = TI->getNumSuccessors();

				if (NumSuccs == 1) {
				VPBasicBlock *SuccVPBB = getOrCreateVPBB(TI->getSuccessor(0));
				assert(SuccVPBB && "VPBB Successor not found.");
				VPBB->setOneSuccessor(SuccVPBB);
				} else if (NumSuccs == 2) {
				VPBasicBlock *SuccVPBB0 = getOrCreateVPBB(TI->getSuccessor(0));
				assert(SuccVPBB0 && "Successor 0 not found.");
				VPBasicBlock *SuccVPBB1 = getOrCreateVPBB(TI->getSuccessor(1));
				assert(SuccVPBB1 && "Successor 1 not found.");
				VPBB->setTwoSuccessors(SuccVPBB0, SuccVPBB1);
				} else
				llvm_unreachable("Number of successors not supported.");

				// Set VPBB predecessors in the same order as they are in the incoming BB.
				setVPBBPredsFromBB(VPBB, BB);
				}

				// 3. Process outermost loop exit. We created an empty VPBB for the loop
				// single exit BB during the RPO traversal of the loop body but Instructions
				// weren't visited because it's not part of the the loop.
				BasicBlock *LoopExitBB = TheLoop->getUniqueExitBlock();
				assert(LoopExitBB && "Loops with multiple exits are not supported.");
				VPBasicBlock *LoopExitVPBB = BB2VPBB[LoopExitBB];
				createVPInstructionsForVPBB(LoopExitVPBB, LoopExitBB);
				// Loop exit was already set as successor of the loop exiting BB.
				// We only set its predecessor VPBB now.
				setVPBBPredsFromBB(LoopExitVPBB, LoopExitBB);

				// 4. The whole CFG has been built at this point so all the input Values must
				// have a VPlan couterpart. Fix VPlan phi nodes by adding their corresponding
				// VPlan operands.
				fixPhiNodes();

				// 5. Final Top Region setup. Set outermost loop pre-header and single exit as
				// Top Region entry and exit.
				TopRegion->setEntry(PreheaderVPBB);
				TopRegion->setExit(LoopExitVPBB);
				return TopRegion;
				}

				// Public interface to build a H-CFG.
				void VPlanHCFGBuilder::buildHierarchicalCFG(VPlan &Plan) {
				// Build Top Region enclosing the plain CFG and set it as VPlan entry.
				PlainCFGBuilder PCFGBuilder(TheLoop, LI, Plan);
				VPRegionBlock *TopRegion = PCFGBuilder.buildPlainCFG();
				Plan.setEntry(TopRegion);
				DEBUG(Plan.setName("HCFGBuilder: Plain CFG\n"); dbgs() << Plan);

				Verifier.verifyHierarchicalCFG(TopRegion);
				}

lib/Transforms/Vectorize/VPlanValue.h

	Show All 31 Lines
	// Forward declarations.			// Forward declarations.
	class VPUser;			class VPUser;

	// This is the base class of the VPlan Def/Use graph, used for modeling the data			// This is the base class of the VPlan Def/Use graph, used for modeling the data
	// flow into, within and out of the VPlan. VPValues can stand for live-ins			// flow into, within and out of the VPlan. VPValues can stand for live-ins
	// coming from the input IR, instructions which VPlan will generate if executed			// coming from the input IR, instructions which VPlan will generate if executed
	// and live-outs which the VPlan will need to fix accordingly.			// and live-outs which the VPlan will need to fix accordingly.
	class VPValue {			class VPValue {
				friend class VPBuilder;
	private:			private:
	const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).			const unsigned char SubclassID; ///< Subclass identifier (for isa/dyn_cast).

	SmallVector<VPUser *, 1> Users;			SmallVector<VPUser *, 1> Users;

	protected:			protected:
	VPValue(const unsigned char SC) : SubclassID(SC) {}			// Hold the underlying Value, if any, attached to this VPValue.
				Value * UnderlyingVal;
				fhahnUnsubmitted Done Reply Inline Actions Value UnderlyingVal? fhahn:* Value *UnderlyingVal?

				VPValue(const unsigned char SC, Value *UV = nullptr)
				: SubclassID(SC), UnderlyingVal(UV) {}

				// DESIGN PRINCIPLE: Access to the underlying IR must be strictly limited to
				// the front-end and back-end of VPlan so that the middle-end is as
				// independent as possible of the underlying IR. We grant access to the
				// underlying IR using friendship. In that way, we should be able to use VPlan
				// for multiple underlying IRs (Polly?) by providing a new VPlan front-end,
				// back-end and analysis information for the new IR.

				/// Return the underlying Value attached to this VPValue.
				Value *getUnderlyingValue() { return UnderlyingVal; }

				// Set \p Val as the underlying Value of this VPValue.
				void setUnderlyingValue(Value *Val) {
				assert(!UnderlyingVal && "Underlying Value is already set.");
				UnderlyingVal = Val;
				}

	public:			public:
	/// An enumeration for keeping track of the concrete subclass of VPValue that			/// An enumeration for keeping track of the concrete subclass of VPValue that
	/// are actually instantiated. Values of this enumeration are kept in the			/// are actually instantiated. Values of this enumeration are kept in the
	/// SubclassID field of the VPValue objects. They are used for concrete			/// SubclassID field of the VPValue objects. They are used for concrete
	/// type identification.			/// type identification.
	enum { VPValueSC, VPUserSC, VPInstructionSC };			enum { VPValueSC, VPUserSC, VPInstructionSC };

	VPValue() : SubclassID(VPValueSC) {}			VPValue(Value *UV = nullptr) : VPValue(VPValueSC, UV) {}
	VPValue(const VPValue &) = delete;			VPValue(const VPValue &) = delete;
	VPValue &operator=(const VPValue &) = delete;			VPValue &operator=(const VPValue &) = delete;

	/// \return an ID for the concrete type of this object.			/// \return an ID for the concrete type of this object.
	/// This is used to implement the classof checks. This should not be used			/// This is used to implement the classof checks. This should not be used
	/// for any other purpose, as the values may change as LLVM evolves.			/// for any other purpose, as the values may change as LLVM evolves.
	unsigned getVPValueID() const { return SubclassID; }			unsigned getVPValueID() const { return SubclassID; }

	Show All 25 Lines
	raw_ostream &operator<<(raw_ostream &OS, const VPValue &V);			raw_ostream &operator<<(raw_ostream &OS, const VPValue &V);

	/// This class augments VPValue with operands which provide the inverse def-use			/// This class augments VPValue with operands which provide the inverse def-use
	/// edges from VPValue's users to their defs.			/// edges from VPValue's users to their defs.
	class VPUser : public VPValue {			class VPUser : public VPValue {
	private:			private:
	SmallVector<VPValue *, 2> Operands;			SmallVector<VPValue *, 2> Operands;

	void addOperand(VPValue *Operand) {
	Operands.push_back(Operand);
	Operand->addUser(*this);
	}

	protected:			protected:
	VPUser(const unsigned char SC) : VPValue(SC) {}			VPUser(const unsigned char SC) : VPValue(SC) {}
	VPUser(const unsigned char SC, ArrayRef<VPValue *> Operands) : VPValue(SC) {			VPUser(const unsigned char SC, ArrayRef<VPValue *> Operands) : VPValue(SC) {
	for (VPValue *Operand : Operands)			for (VPValue *Operand : Operands)
	addOperand(Operand);			addOperand(Operand);
	}			}

	public:			public:
	VPUser() : VPValue(VPValue::VPUserSC) {}			VPUser() : VPValue(VPValue::VPUserSC) {}
	VPUser(ArrayRef<VPValue *> Operands) : VPUser(VPValue::VPUserSC, Operands) {}			VPUser(ArrayRef<VPValue *> Operands) : VPUser(VPValue::VPUserSC, Operands) {}
	VPUser(std::initializer_list<VPValue *> Operands)			VPUser(std::initializer_list<VPValue *> Operands)
	: VPUser(ArrayRef<VPValue *>(Operands)) {}			: VPUser(ArrayRef<VPValue *>(Operands)) {}
	VPUser(const VPUser &) = delete;			VPUser(const VPUser &) = delete;
	VPUser &operator=(const VPUser &) = delete;			VPUser &operator=(const VPUser &) = delete;

	/// Method to support type inquiry through isa, cast, and dyn_cast.			/// Method to support type inquiry through isa, cast, and dyn_cast.
	static inline bool classof(const VPValue *V) {			static inline bool classof(const VPValue *V) {
	return V->getVPValueID() >= VPUserSC &&			return V->getVPValueID() >= VPUserSC &&
	V->getVPValueID() <= VPInstructionSC;			V->getVPValueID() <= VPInstructionSC;
	}			}

				void addOperand(VPValue *Operand) {
				Operands.push_back(Operand);
				Operand->addUser(*this);
				}

	unsigned getNumOperands() const { return Operands.size(); }			unsigned getNumOperands() const { return Operands.size(); }
	inline VPValue *getOperand(unsigned N) const {			inline VPValue *getOperand(unsigned N) const {
	assert(N < Operands.size() && "Operand index out of bounds");			assert(N < Operands.size() && "Operand index out of bounds");
	return Operands[N];			return Operands[N];
	}			}

	typedef SmallVectorImpl<VPValue *>::iterator operand_iterator;			typedef SmallVectorImpl<VPValue *>::iterator operand_iterator;
	typedef SmallVectorImpl<VPValue *>::const_iterator const_operand_iterator;			typedef SmallVectorImpl<VPValue *>::const_iterator const_operand_iterator;
	Show All 16 Lines

lib/Transforms/Vectorize/VPlanVerifier.h

This file was added.

				//===-- VPlanVerifier.h ------------------------------------------ C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file declares the class VPlanVerifier, which contains utility functions
				/// to check the consistency of a VPlan. This includes the following kinds of
				/// invariants:
				fhahnUnsubmitted Not Done Reply Inline Actions Is there a place where those invariants are mentioned? It may be worth briefly stating here what checks are done by the verifier. ATM it looks like it checks the links between the blocks and regions of the VPlan. fhahn: Is there a place where those invariants are mentioned? It may be worth briefly stating here…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions The invariants that we are currently checking are described in the documentation of 'verifyHierarchicalCFG' (just below). I think we could move them here so that we can reference them from different utility functions. For example: /// This file declares the class VPlanVerifier, which contains utility functions /// to check the consistency of a VPlan. This includes the following kinds of /// invariants: /// /// 1. Region/Block invariants: /// - Region's entry/exit block must have no predecessors/successors, /// respectively. /// - Block's parent must be the region immediately containing the block. /// - Linked blocks must have a bi-directional link (successor/predecessor). /// - All predecessors/successors of a block must belong to the same region. /// - Blocks must have no duplicated successor/predecessor. What do you think? dcaballe: The invariants that we are currently checking are described in the documentation of…
				fhahnUnsubmitted Not Done Reply Inline Actions Thanks, sounds good to me. fhahn: Thanks, sounds good to me.
				///
				/// 1. Region/Block invariants:
				/// - Region's entry/exit block must have no predecessors/successors,
				/// respectively.
				/// - Block's parent must be the region immediately containing the block.
				/// - Linked blocks must have a bi-directional link (successor/predecessor).
				/// - All predecessors/successors of a block must belong to the same region.
				/// - Blocks must have no duplicated successor/predecessor.
				///
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_VECTORIZE_VPLANVERIFIER_H
				#define LLVM_TRANSFORMS_VECTORIZE_VPLANVERIFIER_H

				#include "VPlan.h"

				namespace llvm {

				/// Class with utility functions that can be used to check the consistency and
				/// invariants of a VPlan, including the components of its H-CFG.
				class VPlanVerifier {
				public:
				/// Verify the invariants of the H-CFG starting from \p TopRegion. The
				/// verification process comprises the following steps:
				fhahnUnsubmitted Not Done Reply Inline Actions I think this TODO does not really add much info. fhahn: I think this TODO does not really add much info.
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions OK. Let me remove it. I was just trying to justify why the 'verifyHierarchicalCFG' is not static. We will have class members with analysis information that will be used by this interface. dcaballe: OK. Let me remove it. I was just trying to justify why the 'verifyHierarchicalCFG' is not…
				/// 1. Region/Block verification: Check the Region/Block verification
				/// invariants for every region in the H-CFG.
				void verifyHierarchicalCFG(const VPRegionBlock *TopRegion) const;
				};
				} // namespace llvm

				#endif //LLVM_TRANSFORMS_VECTORIZE_VPLANVERIFIER_H

lib/Transforms/Vectorize/VPlanVerifier.cpp

This file was added.

				//===-- VPlanVerifier.cpp -------------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				///
				/// \file
				/// This file defines the class VPlanVerifier, which contains utility functions
				/// to check the consistency and invariants of a VPlan.
				///
				//===----------------------------------------------------------------------===//

				#include "VPlanVerifier.h"
				#include "llvm/ADT/DepthFirstIterator.h"

				#define DEBUG_TYPE "vplan-verifier"

				fhahnUnsubmitted Done Reply Inline Actions I am not entirely sure how this will interact with `-debug-only`. IIUC if we do not use loop-vectorize here, those messages will be excluded from `-debug-only=loop-vectorize`. IMO it is convenient to get the complete picture with `-debug-only=loop-vectorize`. fhahn: I am not entirely sure how this will interact with `-debug-only`. IIUC if we do not use loop…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Ok, it makes sense. I guess when you know exactly what your are looking for, having independent debug types helps you but we definitely lose the complete picture. Let me change it to loop-vectorize. Thanks! dcaballe: Ok, it makes sense. I guess when you know exactly what your are looking for, having independent…
				using namespace llvm;

				static cl::opt<bool> EnableHCFGVerifier("vplan-verify-hcfg", cl::init(false),
				cl::Hidden,
				cl::desc("Verify VPlan H-CFG."));

				/// Utility function that checks whether \p VPBlockVec has duplicate
				/// VPBlockBases.
				static bool hasDuplicates(const SmallVectorImpl<VPBlockBase *> &VPBlockVec) {
				SmallDenseSet<const VPBlockBase *, 8> VPBlockSet;
				for (const auto *Block : VPBlockVec) {
				if (VPBlockSet.count(Block))
				return true;
				VPBlockSet.insert(Block);
				}
				return false;
				}

				/// Helper function that verifies the CFG invariants of the VPBlockBases within
				/// \p Region. Checks in this function are generic for VPBlockBases. They are
				/// not specific for VPBasicBlocks or VPRegionBlocks.
				static void verifyBlocksInRegion(const VPRegionBlock *Region) {
				for (const VPBlockBase *VPB :
				make_range(df_iterator<const VPBlockBase *>::begin(Region->getEntry()),
				df_iterator<const VPBlockBase *>::end(Region->getExit()))) {
				fhahnUnsubmitted Not Done Reply Inline Actions Iterating over the blocks in a region seems a generic thing and it would probably be worth adding it to VPRegionBlock. At least VPRegionBlock::dumpRegion seems to be using a similar logic. fhahn: Iterating over the blocks in a region seems a generic thing and it would probably be worth…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Good point. However, I don't think this operation is that generic. Sometimes we will need RPO, some others DFS, some others RPO or DFS but filtering some blocks... But I agree, at least we could start with a blocksDFS() range. Since this spans beyond this patch, could we address it in a separate patch? dcaballe: Good point. However, I don't think this operation is that generic. Sometimes we will need RPO…
				// Check block's parent.
				assert(VPB->getParent() == Region && "VPBlockBase has wrong parent");

				// Check block's successors.
				const auto &Successors = VPB->getSuccessors();
				// There must be only one instance of a successor in block's successor list.
				// TODO: This won't work for switch statements.
				assert(!hasDuplicates(Successors) &&
				"Multiple instances of the same successor.");

				for (const VPBlockBase *Succ : Successors) {
				// There must be a bi-directional link between block and successor.
				const auto &SuccPreds = Succ->getPredecessors();
				assert(std::find(SuccPreds.begin(), SuccPreds.end(), VPB) !=
				SuccPreds.end() &&
				"Missing predecessor link.");
				(void)SuccPreds;
				}

				// Check block's predecessors.
				const auto &Predecessors = VPB->getPredecessors();
				// There must be only one instance of a predecessor in block's predecessor
				// list.
				// TODO: This won't work for switch statements.
				assert(!hasDuplicates(Predecessors) &&
				"Multiple instances of the same predecessor.");

				for (const VPBlockBase *Pred : Predecessors) {
				// Block and predecessor must be inside the same region.
				assert(Pred->getParent() == VPB->getParent() &&
				"Predecessor is not in the same region.");

				// There must be a bi-directional link between block and predecessor.
				const auto &PredSuccs = Pred->getSuccessors();
				assert(std::find(PredSuccs.begin(), PredSuccs.end(), VPB) !=
				PredSuccs.end() &&
				"Missing successor link.");
				(void)PredSuccs;
				}
				}
				}

				/// Verify the CFG invariants of VPRegionBlock \p Region and its nested
				/// VPBlockBases. Do not recurse inside nested VPRegionBlocks.
				static void verifyRegion(const VPRegionBlock *Region) {
				const VPBlockBase *Entry = Region->getEntry();
				const VPBlockBase *Exit = Region->getExit();
				fhahnUnsubmitted Done Reply Inline Actions I think some compilers will complain about Entry and Exit being unused when building without assertions fhahn: I think some compilers will complain about Entry and Exit being unused when building without…
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Good catch! I showed some warning with other similar cases but not here. Thanks! dcaballe: Good catch! I showed some warning with other similar cases but not here. Thanks!

				// Entry and Exit shouldn't have any predecessor/successor, respectively.
				assert(!Entry->getNumPredecessors() && "Region entry has predecessors.");
				assert(!Exit->getNumSuccessors() && "Region exit has successors.");
				(void)Entry;
				(void)Exit;

				verifyBlocksInRegion(Region);
				}

				/// Verify the CFG invariants of VPRegionBlock \p Region and its nested
				/// VPBlockBases. Recurse inside nested VPRegionBlocks.
				static void verifyRegionRec(const VPRegionBlock *Region) {
				verifyRegion(Region);

				// Recurse inside nested regions.
				for (const VPBlockBase *VPB :
				make_range(df_iterator<const VPBlockBase *>::begin(Region->getEntry()),
				df_iterator<const VPBlockBase *>::end(Region->getExit()))) {
				if (const auto *SubRegion = dyn_cast<VPRegionBlock>(VPB))
				verifyRegionRec(SubRegion);
				}
				}

				void VPlanVerifier::verifyHierarchicalCFG(
				const VPRegionBlock *TopRegion) const {
				fhahnUnsubmitted Done Reply Inline Actions no brackets needed fhahn: no brackets needed
				if (!EnableHCFGVerifier)
				return;

				DEBUG(dbgs() << "Verifying VPlan H-CFG.\n");
				assert(!TopRegion->getParent() && "VPlan Top Region should have no parent.");
				verifyRegionRec(TopRegion);
				}

test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll

This file was added.

				; RUN: opt < %s -loop-vectorize -enable-vplan-native-path -vplan-build-stress-test -vplan-verify-hcfg -debug-only=vplan-verifier -disable-output 2>&1 \| FileCheck %s -check-prefix=VERIFIER
				; RUN: opt < %s -loop-vectorize -enable-vplan-native-path -vplan-build-stress-test -debug-only=vplan-verifier -disable-output 2>&1 \| FileCheck %s -check-prefix=NO-VERIFIER -allow-empty
				fhahnUnsubmitted Done Reply Inline Actions nit: we are not checking the generated code, so I think we could drop `-S` here and below. fhahn: nit: we are not checking the generated code, so I think we could drop `-S` here and below.
				dcaballeAuthorUnsubmitted Not Done Reply Inline Actions Right, thanks! dcaballe: Right, thanks!
				; REQUIRES: asserts

				; Verify that the stress testing flag for the VPlan H-CFG builder works as
				; expected with and without enabling the VPlan H-CFG Verifier.

				; VERIFIER: Verifying VPlan H-CFG.
				; NO-VERIFIER-NOT: Verifying VPlan H-CFG.

				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				define void @foo(i32* nocapture %a, i32* nocapture readonly %b, i32 %N, i32 %M) local_unnamed_addr #0 {
				entry:
				%cmp32 = icmp sgt i32 %N, 0
				br i1 %cmp32, label %outer.ph, label %for.end15

				outer.ph:
				%cmp230 = icmp sgt i32 %M, 0
				%0 = sext i32 %M to i64
				%wide.trip.count = zext i32 %M to i64
				%wide.trip.count38 = zext i32 %N to i64
				br label %outer.body

				outer.body:
				%indvars.iv35 = phi i64 [ 0, %outer.ph ], [ %indvars.iv.next36, %outer.inc ]
				br i1 %cmp230, label %inner.ph, label %outer.inc

				inner.ph:
				%1 = mul nsw i64 %indvars.iv35, %0
				br label %inner.body

				inner.body:
				%indvars.iv = phi i64 [ 0, %inner.ph ], [ %indvars.iv.next, %inner.body ]
				%2 = add nsw i64 %indvars.iv, %1
				%arrayidx = getelementptr inbounds i32, i32* %b, i64 %2
				%3 = load i32, i32* %arrayidx, align 4
				%arrayidx12 = getelementptr inbounds i32, i32* %a, i64 %2
				store i32 %3, i32* %arrayidx12, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %wide.trip.count
				br i1 %exitcond, label %outer.inc, label %inner.body

				outer.inc:
				%indvars.iv.next36 = add nuw nsw i64 %indvars.iv35, 1
				%exitcond39 = icmp eq i64 %indvars.iv.next36, %wide.trip.count38
				br i1 %exitcond39, label %for.end15, label %outer.body

				for.end15:
				ret void
				}

				attributes #0 = { norecurse nounwind uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
				fhahnUnsubmitted Done Reply Inline Actions nit: those attributes are unnecessary I think fhahn: nit: those attributes are unnecessary I think

This is an archive of the discontinued LLVM Phabricator instance.

[LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.ClosedPublic

Details

Context

Patch Series #1. Sub-patch #3.

Testing

Files in 'Vectorize' Dir:

Diff Detail

Event Timeline

Revision Contents

Diff 146067

lib/Transforms/Vectorize/CMakeLists.txt

lib/Transforms/Vectorize/LoopVectorizationPlanner.h

lib/Transforms/Vectorize/LoopVectorize.cpp

lib/Transforms/Vectorize/VPlan.h

lib/Transforms/Vectorize/VPlanHCFGBuilder.h

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp

lib/Transforms/Vectorize/VPlanValue.h

lib/Transforms/Vectorize/VPlanVerifier.h

lib/Transforms/Vectorize/VPlanVerifier.cpp

test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll

[LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.
ClosedPublic