Page MenuHomePhabricator

[Passes] Run vector-combine early with -fenable-matrix.
AcceptedPublic

Authored by fhahn on May 14 2021, 6:50 AM.

Details

Summary

IR with matrix intrinsics is likely to also contain large vector
operations, which can benefit from early simplifications.

This is the last step in a series of changes to improve code-gen for
code using matrix subscript operators with the C/C++ matrix extension in
CLang, like

using matrix_t = double __attribute__((matrix_type(15, 15)));

void foo(unsigned i, matrix_t &A, matrix_t &B) {
  for (unsigned j = 0; j < 4; ++j)
    for (unsigned k = 0; k < i; k++)
      B[k][j] -= A[k][j] * B[i][j];
}

https://clang.godbolt.org/z/6dKxK1Ed7

Diff Detail

Event Timeline

fhahn created this revision.May 14 2021, 6:50 AM
fhahn requested review of this revision.May 14 2021, 6:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2021, 6:50 AM

Any testcase to ensure that passes together now produce what you want?

Some test coverage (phaseordering?) would be useful.

fhahn updated this revision to Diff 345704.May 16 2021, 5:52 AM

Thanks for taking a look! I added a phase ordering test and updated the pipeline tests as well.

LGTM - anyone else have any comments?

Over in D102002, I am looking at divergence between the regular and LTO pipelines...
If I'm seeing this correctly, we will not alter the LTO pipeline in this patch. Is that intentional?

fhahn added a comment.May 21 2021, 8:48 AM

Over in D102002, I am looking at divergence between the regular and LTO pipelines...
If I'm seeing this correctly, we will not alter the LTO pipeline in this patch. Is that intentional?

Yes that's intentional. The motivation to run vector-combine early here is to catch combine & scalarization opportunities before operations are moved too much by GVN, unrolling & co. At the LTO stage, those should already be covered by the pre-LTO steps, so there's no need to do another run during the LTO stage I think.

spatel accepted this revision.May 21 2021, 9:28 AM

Over in D102002, I am looking at divergence between the regular and LTO pipelines...
If I'm seeing this correctly, we will not alter the LTO pipeline in this patch. Is that intentional?

Yes that's intentional. The motivation to run vector-combine early here is to catch combine & scalarization opportunities before operations are moved too much by GVN, unrolling & co. At the LTO stage, those should already be covered by the pre-LTO steps, so there's no need to do another run during the LTO stage I think.

Ah, I still haven't made sense of all the pipeline stages for LTO.
LGTM.

This revision is now accepted and ready to land.May 21 2021, 9:28 AM