This is an archive of the discontinued LLVM Phabricator instance.

[SLP] Double UserCost compensation for vector store of aggregate
AbandonedPublic

Authored by anton-afanasyev on Feb 16 2021, 8:32 AM.

Details

Summary

For the case of vector storing of aggregate built before by findBuildAggregate() we adjust total cost to consider that vector is already prepared.

This patch fixes llvm.org/PR40522

Diff Detail

Event Timeline

anton-afanasyev requested review of this revision.Feb 16 2021, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 16 2021, 8:32 AM

I think the better approach would be to pass the list of InsertUses as a buildTree function UserIgnoreLst argument. You just need to correctly generate ExtractElement instructions for these InsertElements because currently the compiler just crashes trying to remove instructions used as operands for the original InsertElements. Thoughts?

I think the better approach would be to pass the list of InsertUses as a buildTree function UserIgnoreLst argument. You just need to correctly generate ExtractElement instructions for these InsertElements because currently the compiler just crashes trying to remove instructions used as operands for the original InsertElements. Thoughts?

Thanks, I'm to try this approach.

Please can you cleanup pr40522.ll (remove the metatdata etc.)? Also, PR40522 notes missing fp2int vectorization - please can you add those?

typedef int int4 __attribute__((__vector_size__(16)));

void fp2int_vec1(float a, float b, float c, float d, int *p) {
  int4 result = (int4) { (int) a, (int) b, (int) c, (int) d };
  *p++ = result[0];
  *p++ = result[1];
  *p++ = result[2];
  *p++ = result[3];
}

void fp2int_vec2(float a, float b, float c, float d, int4 *p) {
  int4 result = (int4) { (int) a, (int) b, (int) c, (int) d };
  *p = result;
}

Cleaned pr40522.ll

I can add fp2int sample, but it is not touched by this patch. Actually, its tree built is not vectorized by marking as "tiny" (R.isTreeTinyAndNotFullyVectorizable() function), its size is 2 (since fp2int is unary operation, whereas add is binary one). Though it could be fixed by checking that stores are using its result and therefore tree size could be increased. But this looks too hacky, isn't it?

Please can you cleanup pr40522.ll (remove the metatdata etc.)? Also, PR40522 notes missing fp2int vectorization - please can you add those?

OK - adding the fp2int tests to the same test file makes sense to me as they call came from the same bug, even if we don't yet have a fix

anton-afanasyev updated this revision to Diff 324601.EditedFeb 18 2021, 5:10 AM

Added fp2int scalar and vector test cases

@ABataev Any comments?

llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
6298

Can we use an llvm::any_of pattern instead here?

I'm trying another approach here, so abandoned this for a while.

anton-afanasyev abandoned this revision.Mar 16 2021, 8:23 AM

Here is another fix approach: https://reviews.llvm.org/D98714