Page MenuHomePhabricator

[LV] Unify vector and scalar maps
ClosedPublic

Authored by mssimpso on Aug 4 2016, 9:52 AM.

Details

Summary

This patch unifies the data structures we use for mapping instructions from the original loop to their corresponding instructions in the new loop. Previously, we maintained two distinct maps for this purpose: WidenMap and ScalarIVMap. WidenMap maintained the vector values each instruction from the old loop was represented with, and ScalarIVMap maintained the scalar values each scalarized induction variable was represented with. With this patch, all values created for the new loop (vector and scalar) are maintained in WidenMap (renamed to VectorLoopValueMap).

The change allows for several simplifications. Previously, when an instruction was scalarized, we had to insert the scalar values into vectors in order to maintain the mapping in WidenMap. Then, if a user of the scalarized value was also scalar, we had to extract the scalar values from the temporary vector we created. This caused unnecessary scalar-to-vector-to-scalar conversions, resulting in some cases in significant code bloat pre-InstCombine.

This patch avoids these unnecessary conversions by maintaining the scalar values directly. This can improve compile-time since it reduces the number of instructions in each block that InstCombine needs to simplify. If a scalarized value is used by a scalar instruction, the scalar value can be used directly. However, if the scalarized value is needed by a vector instruction, we will now generate the needed insertelement instructions on-demand.

A common idiom in several locations in the code (including the scalarization code), is to first get the vector values an instruction from the original loop maps to, and then extract a particular scalar value. This patch adds getScalarValue for this purpose along side getVectorValue as an interface into WidenMap. getScalarValue reuses a scalar value if available, or creates an extractelement instruction if a scalar value is not yet available. Similarly, getVectorValue has been modified to generate insertelement instructions on-demand if a vector version of a value is not yet available.

There's no real functional change with this patch (post-InstCombine); only compile-time savings and refactoring. However, in some cases we will generate different code. Because we now might generate an insertelement sequence on-demand, the IR post-InstCombine might be reordered somewhat. For example, instead of the insertelement sequence following the definition of an instruction, it will now precede the first use of that instruction. This can be seen in the test case changes.

Diff Detail

Repository
rL LLVM

Event Timeline

mssimpso updated this revision to Diff 66820.Aug 4 2016, 9:52 AM
mssimpso retitled this revision from to [LV] Unify vector and scalar maps.
mssimpso updated this object.
mssimpso added reviewers: nadav, anemet, mkuper.
mssimpso added subscribers: gilr, mcrosier, llvm-commits.
wmi added a subscriber: wmi.Aug 4 2016, 11:04 AM
mssimpso updated this revision to Diff 66850.Aug 4 2016, 1:16 PM

Ensure the new patch handles scalarized values with void types as we previously did. We make a vector entry for the value in WidenMap mapped to nullptr's.

spatel added a subscriber: spatel.Aug 4 2016, 3:35 PM
mkuper edited edge metadata.Aug 4 2016, 4:44 PM

+1000 on the design and the cleanup, some nits about the implementation.

lib/Transforms/Vectorize/LoopVectorize.cpp
549 ↗(On Diff #66850)

Perhaps use a 2D vector for this ("ScalarParts"), instead of the 1D VectorParts?
I don't think we actually gain anything from using VectorParts for both - even in the one place where we do move a VectorParts from one map to the other (line 2300), you don't actually break the abstraction.

630 ↗(On Diff #66850)

Bike-shedding - maybe change the name of the variable? "WidenMap" made sense when it mapped scalar values to the corresponding "wide" vector values.

2175 ↗(On Diff #66850)

I -> Lane ?

2310 ↗(On Diff #66850)

For clarity, I'd prefer a temp value here, and an assignment to Parts[Part] in the end.

2343 ↗(On Diff #66850)

Is this for VF==1, for uniform values, or both?

anemet edited edge metadata.Aug 4 2016, 10:09 PM

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

Anyhow, sorry about the somewhat random comments, I was just wondering what you guys thought about this.

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

Right, that is confusing. We use "get" (now getScalar/getVector) to return an existing entry in the map or initialize an empty entry it if it's not already there. We use "getVectorValue" to return an existing entry or construct a non-empty entry (by broadcasting or, now, by inserting scalar elements into a vector). getVector/getScalar should probably just be initializers. For example, I think we should be able to replace all but one instance of getVector with getVectorValue. Then, getVector/getScalar could be renamed to initVector/initScalar, which would assert or noop if we try to initialize a key more than once. What do you think?

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

That's right. Sure I'll add some high-level comments.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

lib/Transforms/Vectorize/LoopVectorize.cpp
549 ↗(On Diff #66850)

Sounds good to me.

630 ↗(On Diff #66850)

I actually thought about doing this, but couldn't really come up with anything better. Do you have any ideas? What about VectorLoopMap?

2175 ↗(On Diff #66850)

Makes sense.

2310 ↗(On Diff #66850)

Sure.

2343 ↗(On Diff #66850)

Nice question! This is for the VF == 1 case, and it prevents us from trying to extract an element from a non-vector type. I moved this check from the existing scalarization code (line 2827). But we could replace the if condition here with VF == 1 if that makes more sense.

To be clear, if a value is not an instruction in the loop, it's handled by the first check. For the instructions in Uniforms, I believe we will vectorize/scalarize them, and then the unused pieces will be later deleted.

As we're piling on more complexity on the top of WidenMap, I feel like that the completely permissive ValueParts &get() interface makes it really hard to reason about this code. The new eraseVector API is certainly a warning sign.

Right, that is confusing. We use "get" (now getScalar/getVector) to return an existing entry in the map or initialize an empty entry it if it's not already there. We use "getVectorValue" to return an existing entry or construct a non-empty entry (by broadcasting or, now, by inserting scalar elements into a vector). getVector/getScalar should probably just be initializers. For example, I think we should be able to replace all but one instance of getVector with getVectorValue. Then, getVector/getScalar could be renamed to initVector/initScalar, which would assert or noop if we try to initialize a key more than once. What do you think?

That would be very nice.

I think the new logic is correct because we first widen definitions before getting to the uses. So when widening the definition we either create the vector variant or the scalar variant or in some cases of int induction variables both. And then as we encounter uses, we derive the missing scalar or vector variants from each unless we already have a value for it. Please correct me if if I am misinterpreting this. It would also be good to have a high-level overview of these interactions commented somewhere before the class.

That's right. Sure I'll add some high-level comments.

Thanks.

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

mssimpso updated this revision to Diff 67216.Aug 8 2016, 12:24 PM
mssimpso updated this object.
mssimpso edited edge metadata.

Addressed comments from Michael and Adam.

  • Added "ScalarParts" type as a 2D vector.
  • Removed from WidenMap the "VectorParts &get" interface and replaced it with "VectorParts &initVector" and "ScalarParts &initScalar", as previously discussed. These initializers assert if a key has already been mapped, which should prevent data from being overwritten unintentionally. Because of this change, I had to slightly modify interleaved access vectorization since we vectorize all instructions in a group at once.
  • I also removed the "VectorParts &splat" interface since this is now just a special case of initVector.
  • The data in WidenMap can now only be accessed via getVectorValue and getScalarValue. Until we can move these functions inside WidenMap, we have to declare them as friends to get access to the data.
  • Renamed WidenMap to VectorLoopValueMap (but I'm still open to name suggestions).
  • Added high-level comments about the interaction between getVectorValue and getScalarValue.
  • Addressed other minor comments.
mssimpso marked 4 inline comments as done.Aug 8 2016, 12:25 PM
mssimpso updated this revision to Diff 67219.Aug 8 2016, 1:39 PM

Replaced unneeded interleaved access check in vectorizeMemoryInstruction with an assert.

Michael/Adam,

Do you have anymore comments on this change? Thanks!

Matt, sorry about the delay.

lib/Transforms/Vectorize/LoopVectorize.cpp
545–553 ↗(On Diff #67219)

If we can't change all users to use init*, it would be better to keep both functionalities separate rather than hiding it behind a default parameter. Just keep the ugly get interface as well, so that the users are easy to find. Hopefully we can migrate all users to init* in the future.

2587–2589 ↗(On Diff #67219)

This is weird in terms of its interface use. getVectorValue computes values and then this one overrides it?! I think we should just stick with the &get interface here and clean in it up in the future.

2679 ↗(On Diff #67219)

Is this call necessary?

2859–2862 ↗(On Diff #67219)

It would be good to explain the scenario under which we have already created a vector value for this value.

Matt, I also had some comments from earlier that I've never sent out. This was about the future direction of get{Scalar,Vector}Value:

Anyhow, this all seems like the property of WidenMap, so I am wondering how hard it would be to pull getScalarValue and getVectorValue under that class. (I am also not sure that hiding the class inside InnerLoopVectorizer in a .cpp file is really adding anything but certainly hurts readability as the class is growing.)

I think moving getVectorValue and getScalarValue into WidenMap might be tricky. The issue is that getVectorValue uses getBroadcastInstrs, which is a member of InnerLoopVectorizer, and hasStride from LoopVectorizationLegality. For getBroadcastInstrs, would we pass an InnerLoopVectorizer* to WidenMap? Alternatively, we could probably make getBroadcastInstrs into a utility (maybe in VectorUtils?). What about hasStride? What do you think?

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Sorry for the delay, thanks for nudging me!

lib/Transforms/Vectorize/LoopVectorize.cpp
558 ↗(On Diff #67219)

Do we need the Val parameter? It doesn't look like we call this with a non-default Val anywhere.

2322 ↗(On Diff #67219)

This looks a bit odd.

When does this happen? I don't think we call getVectorValue() for stores - and if we do, I'm not sure what the expected result is.
If it doesn't happen, perhaps this ought to be an assert?

2368 ↗(On Diff #67219)

I'd still appreciate a comment (and maybe an assert) that this is for VF==1.

2371 ↗(On Diff #67219)

The comment is a bit odd, because the "Otherwise" looks like it refers to the previous comment ("If the value has not been scalarized...") while in fact it refers to the one before ("If the value from the original loop has not been vectorized").

2861 ↗(On Diff #67219)

Just to make sure I'm not missing anything - the reason this is needed is because we eagerly created the vector entry in the beginning of vectorizeBlockInLoop(), right? We don't *actually* have a vector entry at this point, it's just an empty placeholder - but we need to delete it, so that if we need a real vector entry, it will get constructed from scratch from the scalars?

Michael/Adam,

Thanks very much for the comments! No problem about the delay - I'm happy to have the feedback. I've replied to your comments inline, and in one case asked a question about what we want the ValueMap interface change here to be.

Yeah, hasStride does *not* seem to belong to WM. We could have a wrapper of WM's getVectorValue in ILV that just swaps in the speculated stride value before calling WM::getVectorValue.

getBroadcastInstrs on the other hand probably does belong to WM because it's part of the whole picture how values are mapped. So probably making it stand-alone utility is the way to go.

Anyhow this all is certainly for another patch. I just wanted to see what you guys thought about moving even more of the mapping responsibility out of ILV to WM.

Sorry to miss replying to these comments earlier. Yes, I think it makes sense to move the mapping related tasks out of ILV and into ValueMap. I think getBroadcastIstrs, since it's currently used by both ValueMap and ILV, could actually live in VectorUtils. I think it just needs to know where to insert the code.

lib/Transforms/Vectorize/LoopVectorize.cpp
545–553 ↗(On Diff #67219)

I actually did change all users to either init(Vector|Scalar) or get(Vector|Scalar)Value. The default parameter here was added as a replacement for the "splat" interface, which I removed. I can add this back though.

But looking over your other comments, I think you're wondering why getVectorValue returns a reference to the map entries that are then able to be overridden and set by the caller? This is basically the same as the original WidenMap.get I agree that's, strange.

So to be clear, you're suggesting that for now we keep the original "get" and "splat" interfaces along side the new init* interfaces? Or are you suggesting that we abandon the init* interfaces for now as well?

558 ↗(On Diff #67219)

No, I'll remove it. Thanks!

2322 ↗(On Diff #67219)

I agree, I don't think this should happen. This was taken from the scalarization code, and I kept it this way to make the patch as NFC as possible. The only possible, but unlikely, issue I can think of is if we use presence in WidenMap as an indicator for "already vectorized". But again, I don't think we do this. I think it's perfectly sensible to add an assert. I'll update the patch.

2368 ↗(On Diff #67219)

Sure, sounds good!

2371 ↗(On Diff #67219)

The comments are confusing; I'll fix that. The intent was the following. Case 2: "value has been scalarized." Case 3: "value hasn't been scalarized, but also hasn't been vectorized (VF=1)". Case 4: "value has been vectorized".

2587–2589 ↗(On Diff #67219)

Sure. To be clear, getVectorValue returns a reference to an entry (which can be overridden) if it exists in the map, and only computes values if it's not already there.

2679 ↗(On Diff #67219)

Yes, the Entry values are set later on in the function. (The diff here is a bit misleading. Here, I had replaced WidenMap.get with getVectorValue).

2859–2862 ↗(On Diff #67219)

I think Michael covered this in his comment. I'll update the comment here to explain why the erase is needed.

2861 ↗(On Diff #67219)

That's right! I'll update the comment to better clarify this.

anemet added inline comments.Aug 15 2016, 7:59 PM
lib/Transforms/Vectorize/LoopVectorize.cpp
545–553 ↗(On Diff #67219)

Yes, the former. The init stuff is great, my hope is that it will slowly become the main interface along with get{Vector,Scalar}Value. I think we want to use 'get' when all we do is allocate the entries. 'init' when we actually set the value and get*Value when we let the variants be derived from each other.

I think I still don't understand initVector with Val=nullptr. Isn't that just &get? If yes than I think that's what we should use for it since we're not setting up the entries.

How much of this you want to put in this patch or leave it for later improvements (by you or others) I leave it to you. My goal is to use the interfaces that return mutable references as little as possible and restrain the current ones to return const & as much as possible.

mssimpso updated this revision to Diff 68578.Aug 18 2016, 11:30 AM
mssimpso marked 13 inline comments as done.

Addressed comments from Adam and Michael.

This revision changes the ValueMap interface to more closely align with the spirit of Adam's comments. In particular, the init{Vector, Scalar} interface has been changed to accept a reference to {Vector, Scalar}Parts, which are then moved over to the actual map entries. Also, getVectorValue in InnerLoopVectorizer has been modified to return a constant reference.

Taking VectorizeBlockInLoop as an example, instead of getting an empty map entry with WidenMap.get() at the top of the function, we now just allocate a temporary VectorParts. Once we assign all the VectorParts entries, we then move the VectorParts to the map with initVector. So whenever we actually map a value, we use init{Vector, Scalar}. Also, because we no longer eagerly create empty mappings, eraseVector is no longer needed.

Most of the changes required to use initVector and the constant reference version of getVectorValue are fairly mechanical. One exception was interleaved accesses, where I had to reorder the code such that we didn't require mutable entries. In a few other cases, we still actually do need to change the mapped entries, and I've left getVector in ValueMap for this, which returns a non-constant reference. These cases include the "fix-up" operations that happen after the first phase of widening (i.e., type truncation and the second widening phase for recurrences).

All the other minor comments have been addressed as well. Thanks!

Also, because we no longer eagerly create empty mappings, eraseVector is no longer needed.

Hurray!

This is all looking pretty nice now!

lib/Transforms/Vectorize/LoopVectorize.cpp
655 ↗(On Diff #68578)

I don't think this is what you want. This will still do a copy because Entry is a const &. I think that this function should take Entry by value (or by r-value but that would make it harder to use).

659–661 ↗(On Diff #68578)

Please add a comment discouraging the use of this function in favor of initVector.

2309 ↗(On Diff #68578)

This should std::move Entry.

3830 ↗(On Diff #68578)

I am not a big fan of auto in cases where the type is not obvious. At least we should have const. Same later.

4389 ↗(On Diff #68578)

Would it be safer to do this inside the blocks? I want to make sure you didn't leave a Entry[] = ... without a corresponding initVector.

mssimpso added inline comments.Aug 22 2016, 10:25 AM
lib/Transforms/Vectorize/LoopVectorize.cpp
655 ↗(On Diff #68578)

Ah, you're right. Thanks for pointing this out. I'll update this to be by value.

659–661 ↗(On Diff #68578)

Sure.

2309 ↗(On Diff #68578)

Right.

3830 ↗(On Diff #68578)

Sure, instead of auto, I'll just make these "const VectorParts &" to make things clear.

4389 ↗(On Diff #68578)

I agree. There's a few cases that don't actually use Entry. I'll update the patch.

mssimpso updated this revision to Diff 68890.Aug 22 2016, 11:42 AM
mssimpso marked 3 inline comments as done.

Addressed Adam's comments.

Regarding the moves, I think this may be a case of unneeded complexity. If the SmallVectors are sized appropriately, a move shouldn't be much better (if better at all) than a copy. So I've reversed my previous comment and left int{Vector, Scalar} taking a constant reference to {Vector, Scalar}Parts and just removed the std::move's, which were wrong as Adam pointed out. The next best thing, I think, would be if they accepted rvalues, but again, that would probably be overly complex for very little gain.

All remaining comments are addressed. Thanks!

anemet accepted this revision.Aug 23 2016, 9:39 AM
anemet edited edge metadata.

Looks great to me! Thanks for your work.

This revision is now accepted and ready to land.Aug 23 2016, 9:39 AM
This revision was automatically updated to reflect the committed changes.