This is an archive of the discontinued LLVM Phabricator instance.

[LV] Delay predication of stores until near the end of vector code generation
ClosedPublic

Authored by jmolloy on Aug 30 2015, 4:33 AM.

Details

Summary

Predicating stores requires creating extra blocks. It's much cleaner if we do this in one pass instead of mutating the CFG while writing vector instructions.

Besides which we can make use of helper functions to update domtree for us, reducing the work we need to do.

This may seem like a trivial cleanup but it reduces the amount of work that gets done in one pass in the loop vectorizer, which will be very important when it gets extracted into a utility.

Diff Detail

Repository
rL LLVM

Event Timeline

jmolloy updated this revision to Diff 33534.Aug 30 2015, 4:33 AM
jmolloy retitled this revision from to [LV] Delay predication of stores until near the end of vector code generation.
jmolloy updated this object.
jmolloy added reviewers: anemet, mzolotukhin, hfinkel.
jmolloy set the repository for this revision to rL LLVM.
jmolloy added a subscriber: llvm-commits.
mzolotukhin edited edge metadata.Sep 1 2015, 12:30 PM

Hi James,

While the general idea looks appealing to me, I'm worried on the changes that we have in the test. It looks like we increased loop's critical path length by this change by moving extractelement up from branches. What would it take to change it back?

Thanks,
Michael

jmolloy updated this revision to Diff 33799.Sep 2 2015, 5:04 AM
jmolloy edited edge metadata.

Hi Michael,

This updated diff contains a testcase with -instcombine to ensure that the generated code looks no worse than before.

Cheers,

James

Hi James,

Are you sure you updated the patch? I'm still seeing extractelements outside the branches (and btw, there is no context in this patch).

Michael

jmolloy updated this revision to Diff 33907.Sep 3 2015, 12:56 AM

Hi Michael,

[reuploaded with full context]

Yes I did, although it was just a test change - there were no code changes. The major test change is the test lines starting "CHECK-VEC-IC:" - these check that when adding -instcombine (which we do in the normal pipeline), the extracts are moved into the predicate blocks.

And this does in fact happen. There are a couple of extracts outside the blocks, but they're used outside too (for example, [[v3]] is used in the branch).

mzolotukhin accepted this revision.Sep 3 2015, 9:10 AM
mzolotukhin edited edge metadata.

Hi James,

Oh, I thought you were going to just add '-inst-combine' to all command-lines, that's why I though you forgot to update the patch. That looks fine then.

Thank you for the work!

Michael

This revision is now accepted and ready to land.Sep 3 2015, 9:10 AM
jmolloy closed this revision.Oct 12 2015, 6:05 AM