This is an archive of the discontinued LLVM Phabricator instance.

[LV][VPlan] Build plain CFG with simple VPInstructions for outer loops.
ClosedPublic

Authored by dcaballe on Mar 9 2018, 4:42 PM.

Details

Summary

Context

This is the patch #3 of the Patch Series #1 to introduce outer loop vectorization support in LV using the VPlan infrastructure.

Patch Series #1. Sub-patch #3.

This patch is expected to be NFC for the current inner loop vectorization path. It introduces the basic algorithm to build the VPlan plain CFG (single-level CFG, no hierarchical CFG (H-CFG), yet) in the VPlan-native vectorization path. It includes:

  • VPlanHCFGBuilder: Main class to build the VPlan H-CFG (plain CFG without nested regions, for now).
  • VPlanVerifier: Main class with utilities to check the consistency of a H-CFG.
  • VPlanBlockUtils: Main class with utilities to manipulate VPBlockBases in VPlan.

The VPlan H-CFG is the basic infrastructure on which future VPlan-to-VPlan transformations will be implemented. VPInstruction will be the main instruction-level representation in the VPlan-native vectorization path. In this patch, we use VPInstruction to represent instructions within a VPBasicBlock, and VPValues to represent other entities that are not a VPInstruction but currently don't have a specific representation in VPlan (e.g., constants, definitions that are external to the VPlan CFG, etc.). This representation will be refined in the future by introducing more VPValue subclasses.

Testing

We don't have code generation capabilities to generate vector code in the VPlan-native path yet. Therefore, we cannot introduce LIT tests to check the correctness of the vector code generated. However, we introduce the flag -vplan-build-stress-test, which enables a stress testing mode that builds the VPlan H-CFG for any supported loop nest from the outermost loop. We also introduce the flag -vplan-verify-hcfg, which checks the consistency of a VPlan H-CFG. Both flags can be used together to thoroughly check the stability of the H-CFG construction algorithm and the consistency of the H-CFGs built. We used this approach on a wide variety of benchmarks suites. We plan to introduce LIT tests once code generation for the VPlan-native path is in place.

Files in 'Vectorize' Dir:

In this patch, we introduce new files for the VPlanHCFGBuilder class (VPlanHCFGBuilder.h/.cpp) and the VPlanVerifier class (VPlanVerifier.h/.cpp). Please, note that header files are private.

We would appreciate feedback on this regard. We understand that having too many small files is not a good idea, but the opposite extreme is not good either. We expect VPlanHCFGBuilder to grow when we introduce simplification/transformations necessary to build a H-CFG and support a wider range of outer loops (see RFC). VPlanVerifier will grow when we introduce verification utilities for regions and VPInstructions. Having separate files also allows having independent debug filters (DEBUG_TYPE) per individual component, which we think it's very convenient.

Thanks,
Diego

Diff Detail

Event Timeline

dcaballe created this revision.Mar 9 2018, 4:42 PM
dcaballe added inline comments.Mar 9 2018, 5:10 PM
lib/Transforms/Vectorize/VPlan.h
457–458

This is the rationale behind the changes below: I wouldn't like VPBlockBase to end up with a large list of interfaces to set/insert successors/predecessors, all of them doing almost the same with very subtle differences. For that reason, I introduced the class VPBlockUtility to keep there all the VPBlockBase manipulation interfaces and keep VPBlockBase class cleaner. Of course, that doesn't mean that we want to add unnecessary utilities to VPBlockUtility class.

In VPBlockBase class, I decided to just keep the very very basic interfaces, which can be used directly or can be a building block of a more complex utility in VPBlockUtility. In this regard, I'm simplifying the logic of setOneSuccessor and setTwoSuccessors and replacing their existing calls with a more generic utility function in VPBlockUtility. The previous implementation was very ad-hoc for the patch where they were introduced and I couldn't reuse them as is.

Of course, I'm open to other suggestions.

a.elovikov added inline comments.Mar 16 2018, 5:45 AM
lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
101

For outer loop vectorization in

int s = 0;
for (int i = 0; i < N; ++i) {
  for (int j = 0; j < M; ++j) {
    s += x[i] * y[j];
  }
}

We need a broadcast y[j] -> {y[j], y[j], y[j], y[j]} but this will generate a WIDEN recipe for the load. Is that OK? If so, can we document it somewhere?

hsaito added a subscriber: hsaito.Mar 16 2018, 1:01 PM
hsaito added inline comments.
lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
101

Reference: LoopVectorizationPlanner::tryToWidenMemory().

VPWidenMemoryRecipe can handle CM_GatherScatter and uniform can be thought of as a special form of gather/scatter. From that perspective, it is okay.

A vector load/store is deemed gather/scatter until analysis improves it to a better access type. From that perspective, using "generic gather/scatter" during the initial VPlan construction phase makes perfect sense.

If we are building a single VPlan CFG for inner and/or outer loop vectorization (and that's something we should be doing if HCFG look identical), we can't encode "memory access kind" information within HCFG. So, keeping it in "generic gather/scatter" at HCFG level is the right thing to do for the long term also.

In other words, we need a storage outside of HCFG to house "uniform/unit-stride/interleave/..." information for the load/store.

fhahn added inline comments.Mar 19 2018, 3:17 AM
lib/Transforms/Vectorize/VPlanVerifier.h
13

Is there a place where those invariants are mentioned? It may be worth briefly stating here what checks are done by the verifier. ATM it looks like it checks the links between the blocks and regions of the VPlan.

I agree with Hideki. In addition, please, note that in this patch we are trying to use the minimal number of recipes to make the transition to VPInstructions easier. Modeling a broadcast as a gather with VPWidenMemoryRecipe is a good trade-off in this regard, which will be improved in the future.

dcaballe added inline comments.Mar 19 2018, 9:51 AM
lib/Transforms/Vectorize/VPlanVerifier.h
13

The invariants that we are currently checking are described in the documentation of 'verifyHierarchicalCFG' (just below). I think we could move them here so that we can reference them from different utility functions. For example:

/// This file declares the class VPlanVerifier, which contains utility functions
/// to check the consistency of a VPlan. This includes the following kinds of
/// invariants:
///
/// 1. Region/Block invariants:
///   - Region's entry/exit block must have no predecessors/successors,
///     respectively.
///   - Block's parent must be the region immediately containing the block.
///   - Linked blocks must have a bi-directional link (successor/predecessor).
///   - All predecessors/successors of a block must belong to the same region.
///   - Blocks must have no duplicated successor/predecessor.

What do you think?

fhahn added inline comments.Mar 19 2018, 10:52 AM
lib/Transforms/Vectorize/VPlanVerifier.h
13

Thanks, sounds good to me.

fhahn added inline comments.Mar 19 2018, 10:52 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7725

Could we here just return NoVectorization and get rid of the additional check in LoopVectorizePass::processLoop? Also, could we move the setting of UserVF in plan too? Otherwise it seems harder to keep track of what's going on and we set UserVF even for inner loops.

Also, I think ideally we would only bail out if the outer loop is not supported, but achieving that seems more trouble than it's worth.

dcaballe added inline comments.Mar 20 2018, 8:48 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7725

Could we here just return NoVectorization and get rid of the additional check in LoopVectorizePass::processLoop?

Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after plan that we must skip in stress testing mode. Thanks!

Also, could we move the setting of UserVF in plan too? Otherwise it seems harder to keep track of what's going on and we set UserVF even for inner loops.

Could you please elaborate a bit more? I'm not sure I understand what you mean.

Also, I think ideally we would only bail out if the outer loop is not supported, but achieving that seems more trouble than it's worth.

We can think about it in the future. We would need legality analysis for outer loops if we want to generate code.

fhahn added inline comments.Mar 20 2018, 11:01 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7725

Ok. It sounds good to me, at least for now. Hopefully we won't introduce anything after plan that we must skip in stress testing mode. Thanks!

If we return NoVectorization, I would assume we would not use the generated plans after plan, as we decided to skip vectorization?

Could you please elaborate a bit more? I'm not sure I understand what you mean.

I meant moving the code to set UserVF = 4 if VPlanBuildStressTest into this function. That would reduce the number of functions where we have to handle VPlanBuildStressTest, which IMO makes it easier to see what's going on.

dcaballe added inline comments.Mar 20 2018, 11:43 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7725

If we return NoVectorization, I would assume we would not use the generated plans after plan, as we decided to skip vectorization?

Yes, it was just a thought. For example, stress testing will be skipping some checks (only isExplicitVecOuterLoop, for now). If we have code after plan expecting a loop compliant with those checks, we could have problems. But, again, I agree on the change. We can deal with that if it happens.

I meant moving the code to set UserVF = 4 if VPlanBuildStressTest into this function. That would reduce the number of functions where we have to handle VPlanBuildStressTest, which IMO makes it easier to see what's going on.

Thanks, got it! I wonder if this would be problematic. If we moved this change into this function and we used UserVF after plan (likely to happen, look at uses of line 8794), we would be using inconsistent UserVF values. Maybe it's better to keep it here?

fhahn added inline comments.Mar 20 2018, 3:59 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
7725

But as it is currently, the UserVF set in the outer loop code path limited to that block. And I cannot think of any case where using UserVF set for VPlanBuildStressTest will useful anywhere else for now, especially the inner loop code path. My understanding was that we have to bail after planning anyways for VPlanBuildStressTest. At least as everything is now.

dcaballe added inline comments.Mar 21 2018, 11:51 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
7725

Ok, fair enough. I'll move it into the function, at least, for now.
Thanks!

dcaballe updated this revision to Diff 139339.Mar 21 2018, 12:01 PM

Addressing Florian's comments.
Thanks!

fhahn added inline comments.Mar 27 2018, 9:42 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
275

Please mention after what stage we bail out

2369–2370

Please update the comment to mention VPlan stress testing.

lib/Transforms/Vectorize/VPlan.h
974

nit: I think we should match the indent with VPBlockBase?

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
28

Do we need SE here? I could not find any uses.

lib/Transforms/Vectorize/VPlanHCFGBuilder.h
31

Could be moved to .cpp?

35

Do we need SE here? I could not find any uses.

lib/Transforms/Vectorize/VPlanVerifier.cpp
45

Iterating over the blocks in a region seems a generic thing and it would probably be worth adding it to VPRegionBlock. At least VPRegionBlock::dumpRegion seems to be using a similar logic.

92

I think some compilers will complain about Entry and Exit being unused when building without assertions

118

no brackets needed

lib/Transforms/Vectorize/VPlanVerifier.h
37

I think this TODO does not really add much info.

test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll
2

nit: we are not checking the generated code, so I think we could drop -S here and below.

dcaballe updated this revision to Diff 140081.Mar 28 2018, 8:44 AM
dcaballe marked 8 inline comments as done.

Addressing Florian's comments.

lib/Transforms/Vectorize/VPlan.h
974

Sorry, thanks!

lib/Transforms/Vectorize/VPlanHCFGBuilder.h
31

I can remove this for now.

lib/Transforms/Vectorize/VPlanVerifier.cpp
45

Good point. However, I don't think this operation is that generic. Sometimes we will need RPO, some others DFS, some others RPO or DFS but filtering some blocks... But I agree, at least we could start with a blocksDFS() range. Since this spans beyond this patch, could we address it in a separate patch?

92

Good catch! I showed some warning with other similar cases but not here. Thanks!

lib/Transforms/Vectorize/VPlanVerifier.h
37

OK. Let me remove it. I was just trying to justify why the 'verifyHierarchicalCFG' is not static. We will have class members with analysis information that will be used by this interface.

test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll
2

Right, thanks!

rengolin added inline comments.Apr 3 2018, 12:58 PM
lib/Transforms/Vectorize/VPlan.h
457–458

I like this idea and I like the methods, as they give a better impression as to what is truly happening behind the scenes.

I agree we shouldn't bloat this class, but it's a good buffer class to have during the prototype phase of the outer loop implementation, so we can gauge the BPBasicBlock usability.

dcaballe added inline comments.Apr 5 2018, 11:14 AM
lib/Transforms/Vectorize/VPlan.h
457–458

Thanks, Renato!

I agree we shouldn't bloat this class, but it's a good buffer class to have during the prototype phase of the outer loop implementation, so we can gauge the BPBasicBlock usability.

Agreed. Any particular suggestion on moving some of the current utilities in VPBlockUtils back to VPBlockBase? I think the current ones are OK there but let me know if you have any comments in that regard.

egarcia added a subscriber: egarcia.Apr 6 2018, 9:25 AM
rengolin added inline comments.Apr 9 2018, 7:31 AM
lib/Transforms/Vectorize/VPlan.h
457–458

I think we can start with these, and move up and down as needed later.

dcaballe planned changes to this revision.Apr 18 2018, 3:10 PM

After discussion with reviewers, working on replacing initial simple recipes with VPInstructions to avoid unnecessary steps towards the final instruction level representation.

dcaballe updated this revision to Diff 145728.May 8 2018, 11:07 AM
dcaballe retitled this revision from [LV][VPlan] Build plain CFG with simple recipes for outer loops. to [LV][VPlan] Build plain CFG with simple VPInstructions for outer loops..
dcaballe edited the summary of this revision. (Show Details)

Sorry for the delay!

In the new diff I'm replacing recipes with VPInstructions. I tried to keep this patch small and functional. I'm using VPInstruction class to represent instructions within a VPBasicBlock, and VPValue class to represent other entities that are not a VPInstruction but currently don't have a specific representation in VPlan (e.g., constants, definitions that are external to the VPlan CFG, etc.). The representation will be refined in subsequent patches by introducing more VPValue subclasses.

fhahn added a subscriber: aprantl.

Thanks Diego! I left some small comments inline, I think overall it looks quite good now. One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
73

There's been a few commits removing \brief from the codebase (e.g. D46290) recently, could you remove \brief from this patch, it should not be needed as we use AUTOBRIEF.

150

What does this comment mean?

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
134

Could we use a similar (simpler) logic to what @hsaito used in D46302 here? Like Instr->getParent() strictly dominates the pre header?

164

nit: IRVal as it is a value?

dcaballe marked 2 inline comments as done.May 9 2018, 9:47 PM

Thanks, Florian! Some comments below.

One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

I'm not aware of any particular proposal for debug info in VPlan at this point but I will check with my team. Currently, DbgInfoIntrinsic would be represented as a regular VPInstruction. We could think about if a specific representation for this is necessary in VPlan.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
73

Thanks! I had no idea. I don't usually use \brief but most of this documentation is coming from IRBuilder.

150

AssertingVH is used in IRBuilder but we are not using it here. However, we may want to think about using something similar but I haven't look at it at all. We could do it as a separate patch if we are interested. I don't think we need this TODO. I'll remove it.

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
134

Good point! I think the scenario is a bit different and there would be some corner cases that wouldn't work if we do DT->properlyDominates(Instr->getParent(), PH). For example, any definition in the loop exit with a use within the HCFG wouldn't work. Imagine something like:

ph.inner:
   %0 = phi %1, %t

loop.body:
   ...

loop.exit:
   %t =
   %a = ...
         = %a ...

If I'm not mistaken, uses of '%t' and %a would be classified as external definitions and they are not.

Make sense? Any other idea?

dcaballe updated this revision to Diff 146061.May 9 2018, 9:48 PM
dcaballe marked an inline comment as done.

Addressing Florian's comments.

Thanks!

dcaballe updated this revision to Diff 146067.May 9 2018, 10:07 PM

Sorry. Now, addressing Florian's comments.

fhahn added a comment.May 9 2018, 10:31 PM

Thanks, Florian! Some comments below.

One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

I'm not aware of any particular proposal for debug info in VPlan at this point but I will check with my team. Currently, DbgInfoIntrinsic would be represented as a regular VPInstruction. We could think about if a specific representation for this is necessary in VPlan.

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Thanks, Florian! Some comments below.

One thing worth discussing briefly before this goes in may be what the plan for dealing with debug info will be with VPlan. Adding @aprantl in case he has some thoughts.

I'm not aware of any particular proposal for debug info in VPlan at this point but I will check with my team. Currently, DbgInfoIntrinsic would be represented as a regular VPInstruction. We could think about if a specific representation for this is necessary in VPlan.

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

Aside from widening, what vectorizer does is not much different from expression tree rewriting (in LLVM, 3-address code equivalent of that). We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same. What could be tricky is handling of interleaved memory access optimization, e.g., where multiple strided vector memrefs are converted into unit-stride vector memrefs and shuffles. Even then, I'd hope there are some existing memory access optimizations doing something comparable enough and learn from it.

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Assuming widening means putting a smaller value into a larger register at offset 0, then it is safe to just point the dbg.value to the new larger register. If the offset is nonzero, you'll need to generate a DW_OP_shr DIExpression to shift the value into place in the debugger.

Great thanks. Besides the DbgInfoIntrinsics, do we need some way to attach the debug metadata from the original instructions to the VPInstructions? I suppose initially we could get them from the underlying values, but IIUC some VPlan transformations could introduce new VPInstructions without underlying values.

Similarly, when you expand/rewrite an instruction with a DILocation metadata attachment into a new instruction, preserving the metadata is crucial for accurate crash logs, profiling, and debugging in general. Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

Aside from widening, what vectorizer does is not much different from expression tree rewriting (in LLVM, 3-address code equivalent of that). We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same. What could be tricky is handling of interleaved memory access optimization, e.g., where multiple strided vector memrefs are converted into unit-stride vector memrefs and shuffles. Even then, I'd hope there are some existing memory access optimizations doing something comparable enough and learn from it.

Great! Let me know if you come across any concrete questions.

hsaito added a comment.EditedMay 11 2018, 1:08 PM

Can you outline what would make updating dbg.value intrinsics to point to vector instructions special, such that it can't be handled immediately?

I keep pushing the implementers (Diego, Satish, etc.) very hard to maintain good correspondence between input IR and output IR. Assuming that dbg.value can handle widened values from scalar values, there shouldn't be anything special.

Assuming widening means putting a smaller value into a larger register at offset 0, then it is safe to just point the dbg.value to the new larger register. If the offset is nonzero, you'll need to generate a DW_OP_shr DIExpression to shift the value into place in the debugger.

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

Having said that, this issue exists from day 1 of vectorization. Nothing new to VPlan based vectorization. If there is a solution, we should use it. Else, we just start from letting debugger show the lowest element only, and try improving from there.

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

hsaito added a comment.EditedMay 11 2018, 2:06 PM

Vectorized loop by nature executes multiple iterations of the sequential loop at the same time. So, the same variable X (say, i32 type) is widened to widenedX (4 x i32 type) to represent 4 different values of X, say, for iterations i, i+1, i+2, and i+3 executing together. In terms of debugging vectorized code, when the programmer points to X during vector execution, debugger needs to show 4 different values of X in this scenario. It's different from placing one value in a larger sized register.

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

That matches my understanding. My preference is artificial $X_vectorized approach, but I'm also fine using "only the first element" as a stepping stone to eventually get to $X_vectorized approach.

It goes without saying ---- but if we can work together to extend DWARF to represent multiple values, that would be the best long term outcome.

fhahn added a comment.May 14 2018, 7:43 AM

Together with D46827, I get a lot of leaks with the address sanitizer from plans constructed by VPlanHCFGBuilder. I'll take a look and come back once I know more.

Thanks a lot for the feedback regarding debug info! This is an interesting topic.

Speaking from personal experience here, it is usually much easier to think about this in the beginning rather than having to bolt it on later when the original transformation pass authors have moved on :-)

We need to educate ourselves about what scalar optimizers are doing for those circumstances and do the same.

Agreed. We have to start thinking about it but we need to have a better understanding of what is needed and where we want to go. In this regard, I would have the following comment/questions:

  1. Currently, a DbgInfoIntrinsic is represented as a standard VPInstruction in VPlan where its metadata arguments are external definitions. We'd need to decide if: a) the current representation for debug intrinsics in VPlan is enough or we want a more native representation, such as specific VPInstruction opcodes for them; b) we need a more proper representation for metadata, particularly if we have to create *new* metadata for new VPInstructions, as Florian mentioned. However, the answer will depend on #2.
  2. Do we really have to model accurate debug information on the VPlan representation? Debug information won't impact the vectorization decisions so maybe we shouldn't pay the cost of modeling it for all the VPlan candidates. I wonder if it would be possible to generate the expected output debug information IR "on the fly" during VPlan code generation, once the best VPlan has been chosen. WDYT?

I see. This is not directly representable in debug info. You could either introduce a new artificial $X_vectorized variable that shows the entire vector, or you could represent only the first element in the vector and hide the other unrolled iterations. DWARF cannot currently represent more than one value per variable per pc address.

What is existing inner loop vectorizer and SLP vectorizer doing in this regard? That could be our starting point.

In any case, maybe we should move this discussion outside of this review to llvm-dev so that we get feedback from a broader audience.

Thanks,
Diego

Together with D46827, I get a lot of leaks with the address sanitizer from plans constructed by VPlanHCFGBuilder. I'll take a look and come back once I know more.

Thanks, Florian. Please, let me know what you find. I'm running some experiments with this patch alone but I haven't found anything so far.

fhahn added a comment.May 14 2018, 9:57 AM

Together with D46827, I get a lot of leaks with the address sanitizer from plans constructed by VPlanHCFGBuilder. I'll take a look and come back once I know more.

Thanks, Florian. Please, let me know what you find. I'm running some experiments with this patch alone but I haven't found anything so far.

Looks like the VPValues for external definitions were not deleted properly. With D46833, the sanitizers are happy.

Looks like the VPValues for external definitions were not deleted properly. With D46833, the sanitizers are happy.

Oh, sorry. I missed that when stripping out the code for this patch. Do you want me to include it in this patch?

Looks like the VPValues for external definitions were not deleted properly. With D46833, the sanitizers are happy.

Oh, sorry. I missed that when stripping out the code for this patch. Do you want me to include it in this patch?

Yep I think it would be good to include this in the patch.

dcaballe updated this revision to Diff 146745.May 14 2018, 10:24 PM

Added missing delete for external definitions.

fhahn accepted this revision.May 15 2018, 8:04 AM

Thanks Diego, LGTM. I've left a few minor nits and I am not entirely sure about the DEBUG_TYPE. Please wait with committing a bit, in case anybody else has additional comments/thoughts.

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
81

nit: no braces around returned expression.

150

nit: VPBasicBlock *Block?

lib/Transforms/Vectorize/LoopVectorize.cpp
7736

"means no vectorization" ?

lib/Transforms/Vectorize/VPlan.h
1042

nit: 32 seems quite large

lib/Transforms/Vectorize/VPlanHCFGBuilder.cpp
102

nit: auto *VPPhi

134

Ah yes, we would have to account for instructions in the preheader and exit block separately. Then it would probably not simplify things much.

238

nit: space after .

lib/Transforms/Vectorize/VPlanValue.h
48

Value *UnderlyingVal?

lib/Transforms/Vectorize/VPlanVerifier.cpp
20

I am not entirely sure how this will interact with -debug-only. IIUC if we do not use loop-vectorize here, those messages will be excluded from -debug-only=loop-vectorize. IMO it is convenient to get the complete picture with -debug-only=loop-vectorize.

test/Transforms/LoopVectorize/vplan_hcfg_stress_test.ll
53

nit: those attributes are unnecessary I think

This revision is now accepted and ready to land.May 15 2018, 8:04 AM
dcaballe marked 9 inline comments as done.May 15 2018, 10:09 AM

Thanks for spending time on this review, Florian! I addressed all your comments. I'll wait for one day or so before committing.
Regarding the DEBUG_TYPE, I think your suggestion makes more sense. I think it's better to have the complete picture.

Thanks,
Diego

lib/Transforms/Vectorize/LoopVectorizationPlanner.h
150

Thanks! I'm applying formatting to all this code.

lib/Transforms/Vectorize/VPlan.h
1042

Ok, let's use 16. It would be easy to go over 8 for a double loop nest using a few memory references.

lib/Transforms/Vectorize/VPlanVerifier.cpp
20

Ok, it makes sense. I guess when you know exactly what your are looking for, having independent debug types helps you but we definitely lose the complete picture. Let me change it to loop-vectorize.

Thanks!

This revision was automatically updated to reflect the committed changes.
dcaballe marked 3 inline comments as done.