This is an archive of the discontinued LLVM Phabricator instance.

[LV] move useEmulatedMaskMemRefHack() functionality to TTI.
Needs ReviewPublic

Authored by hsaito on Mar 8 2019, 12:14 PM.

Details

Summary

This is a long overdue response to the review comments in https://reviews.llvm.org/D43208#inline-382175.

D43208 created a generic "hack" to essentially disable vectorization if the masked memref requires emulation, to "match" the preexisting behavior.
This patch moves it to TTI so that this can be moved towards the better cost modeling, one target at a time.

Thanks, @markus for reminding me.

Diff Detail

Event Timeline

hsaito created this revision.Mar 8 2019, 12:14 PM
hsaito marked an inline comment as done.Mar 8 2019, 12:16 PM
hsaito added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5230

Please comment how much of this documentation you'd like to see moved to TTI.

hsaito marked an inline comment as not done.Mar 8 2019, 12:17 PM

Any improvement is an improvement so I am happy with that but it is still mentioned that this solution is a hack and I guess the

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270

This value seems a bit arbitrary. Could it not be e.g. INT_MAX instead if it is supposed to represent something infinitely expensive?

The concept of "hacked" is lost when you move up to TTI. I'd change the logic to reflect that this is making it "prohibitively expensive" instead of "hacked value".

It doesn't need to change much, just use the "override value" pattern:

Cost = someComputation();
// Target can override mask cost (ex. when it's prohibitively expensive)
EmulatedCost = TTI.getEmulatedMaskMemRefCost(...);
if (EmulatedCost)
  Cost = EmulatedCost;

Also, that's perhaps not the right place to add actual costs. Moving up the boolean function as is would make more sense, but the "hacked cost" would still remain.

Can't you move this to the cost model?

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270

Costs will be added to others, INT_MAX would wrap.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

This if doesn't need the Cost above, so you can avoid the division by moving it up to the beginning of the block.

Nit: you can just return HackedCost instead of assigning, but that will depend on the new pattern.

hsaito marked 2 inline comments as done.Mar 13 2019, 12:39 PM

Any improvement is an improvement so I am happy with that but it is still mentioned that this solution is a hack and I guess the

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

Each target to run many applications/benchmarks to come up with the "right" adjustment to the cost model. No way around that.

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270

I can certainly add a comment saying that this is an arbitrarily chosen "big enough value" --- just need to be sufficiently bigger than a typical instruction cost, which is a single digit or a low double digit, so as to disable most cases of emulated masked memory references.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

This is actually a good question.

Maybe, we want to structure this TTI interface as the adjustment to the base cost computed above.
In which case, the code would look like

Cost += TTI.getEmulatedMaskMemRefAdjustment(...)

Does this make more sense?

Even then, I still agree that we can return early like
if (!isPredicaredInst(I))

return Cost;

Cost /= getReciprocalPredBlockProb();
return Cost + TTI.getEmulatedMaskMemRefAdjustment(...)

Does this sound better?

hsaito marked 2 inline comments as not done.Mar 13 2019, 12:39 PM

Cost model for emulated masked load/store is completely broken.

comment is still valid. What would it take to address this properly?

Each target to run many applications/benchmarks to come up with the "right" adjustment to the cost model. No way around that.

Indeed, not something that can be done overnight.

But if the intention of this patch is to start preparing terrain for that work to start (at least on Intel), then I guess the current strategy needs to start changing from "hacked big enough values overriding the cost" to "adding up an estimate of the number of cycles or something".

As I said on my comment, this doesn't mean the patch needs to do everything, just the difference between (Cost = BigEnough) versus (Cost += Adjustment) in the current case.

cheers,
--renato

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h
270

This is not the only case, and I think it would be interesting to look at all the other "big enough values" and perhaps create a global const int PROHIBITIVE_COST and set to a "big enough value" that would suit all cases.

But this is not necessarily for this patch, though. Just an idea.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

If the prohibitive cost is large enough, then adding up will make no difference, and you can early return on either Cost or Adjusted.

If the cost isn't meant as just "big enough" but to (later) actually emulate the masking costs, then it makes sense to add like you propose on the comment above.

Right now, the case looks like the former, but if you're planning it to be more like the latter, than the new proposal makes sense.

I guess what we want to achieve here (and please correct me if I am wrong) is

  • For this first patch to have zero impact on generated code
  • Allow targets to to start overriding getEmulatedMaskMemRefCost and return some sensible cost for the single operation being queried (and not just 3000000 or 0)
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5251

Would this still work if we allow getEmulatedMaskMemRefCost to return an actual cost and not just 3000000 or 0?

5488

I agree that this should be changed to

Cost += TTI.getEmulatedMaskMemRefAdjustment(...)

(or equivalent with the early return).

rengolin added inline comments.Mar 19 2019, 2:30 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

IFF the adjustment has any meaning other than "too large a number", I agree, too.

hsaito added inline comments.Mar 19 2019, 10:06 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

The reason I changed here to this style is that the person starting to work on the real adjustment doesn't have to change the structure here. If you think this suggested "stylistic" change requires at least one target doing a "reasonable" adjustment (or committed to start such a study), I could do that for AVX2 target ---- come up with a smaller number that doesn't degrade some (undisclosed) set of benchmarks that I can run quickly enough. Does this sound reasonable?

markus added inline comments.Mar 20 2019, 1:54 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

That sounds excellent to me!

rengolin added inline comments.Mar 20 2019, 2:07 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

+1

hsaito added inline comments.Mar 22 2019, 2:25 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5488

Started working on it. Need to run a lot of perf experiments. Will take a bit of time. Just FYI.

In the mean time, I'll update the patch along this direction.