This is an archive of the discontinued LLVM Phabricator instance.

[mlir] Update SVE integration tests to use mlir-cpu-runner
AbandonedPublic

Authored by awarzynski on Apr 11 2023, 3:22 AM.

Details

Summary

With the recent addition of "-mattr" and "-march" to the list of options
supported by mlir-cpu-runner, we can update the SVE integration tests
to better to use mlir-cpu-runner instead of lli. This way we are
making sure that these tests better align with the other integration
tests in MLIR.

This patch replaces all the logic related to lli from the test setup
for SVE and replaces that with mlir-cpu-runner (e.g. CMake and LIT
variables). It also reduces the duplication of RUN lines in
"mlir/test/Integration/Dialect/SparseTensor/CPU/". More specifically,
at the moment there are usually 2 RUN lines to test for vectorisation:

  • one for VLS vectorisation,
  • one for VLA vectorisation whenever that's available and which reduces to VLS vectorisation when VLA is not supported.

When VLA is not available, VLS vectorisation is verified twice. This
should be avoinded - integration test are relatively expansive to run.
Instead, this patch makes srue that there's only one RUN line for
vectorisation:

  • "use VLA when requested/supported, otherwise use VLS".

A dedicated input file that implements main for lli is no longer
needed and hence removed.

[1] https://reviews.llvm.org/D146917

Diff Detail

Event Timeline

awarzynski created this revision.Apr 11 2023, 3:22 AM
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
awarzynski requested review of this revision.Apr 11 2023, 3:22 AM

NOTE: I've only updated few tests so far - I want to make sure that the logic is OK before I proceed. Once that's approved, I'll refactor the remaining tests and update this patch. Thanks!

Matt added a subscriber: Matt.Apr 11 2023, 11:44 AM
c-rhodes added inline comments.Apr 12 2023, 4:08 AM
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
2

Unrelated changes such as this make reviewing more difficult, please could you commit (or post) them separately as NFC and base your actual changes on top.

19

I think -vla can be dropped to make this generic, and perhaps use underscores to be consistent with other binary substitutions? https://llvm.org/docs/TestingGuide.html#substitutions

mlir/test/Integration/Dialect/SparseTensor/CPU/lit.local.cfg
32

With the above suggestion this could be moved to a higher Lit config (mlir/test/lit.cfg.py?) to remove duplication.

Revert unrelated changes, replace %mlir-cpu-runner-vla with %mlir_cpu_runner_vla

Cheers for taking a look @c-rhodes! I was meant to submit my replies before updating the patch, apologies if this feels out-of-sync.

mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
2

Sorry about that, I was meant to revert that.

19

Using underscores is a good suggestion, but I'm still strongly in favor of some suffix (could be -vla, doesn't have to be). Otherwise, we will end up with %mlir-cpu-runner and %mlir_cpu_runner. That would be too easy to confuse. Perhaps %mlir_cpu_runner_vec? Or %mlir_cpu_runner_emu?

mlir/test/Integration/Dialect/SparseTensor/CPU/lit.local.cfg
32

Good idea, though that will have impact on all testing in MLIR and I would feel more confident doing that in a separate patch :) (once reviewers approve the logic, I'll update the remaining tests in mlir/test/Integration/Dialect/SparseTensor/CPU, which will increase the size of this patch quite significantly).

But yes, the overall goal is to reduce duplication and unify how things get configured.

c-rhodes added inline comments.Apr 13 2023, 1:16 AM
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
19

Using underscores is a good suggestion, but I'm still strongly in favor of some suffix (could be -vla, doesn't have to be). Otherwise, we will end up with %mlir-cpu-runner and %mlir_cpu_runner. That would be too easy to confuse. Perhaps %mlir_cpu_runner_vec? Or %mlir_cpu_runner_emu?

It's not clear to me why 2 substitutions (vla and non-vla?) for mlir-cpu-runner binary are required?

awarzynski added inline comments.Apr 13 2023, 1:39 AM
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
19

Let me illustrate with an example (_before_ and _after_):

BEFORE

// RUN: %{compile} | %{run_config_1}
// RUN: %{compile} | %{run_config_2}
// RUN: %{compile} | %{run_config_3}
// RUN: %{compile} | %{run_config_3_with_or_without_emulation}

The final RUN line depends on e.g. ENABLE_VLA and ARM_EMULATOR_EXECUTABLE

AFTER

// RUN: %{compile} | %{run_config_1}
// RUN: %{compile} | %{run_config_2}
// RUN: %{compile} | %{run_config_3_with_or_without_emulation}

AFTER v2
If there's only one %mlir_cpu_runner substitution, then we will have this:

// RUN: %{compile} | %{run_config_1_with_or_without_emulation}
// RUN: %{compile} | %{run_config_2_with_or_without_emulation}
// RUN: %{compile} | %{run_config_3_with_or_without_emulation}

My goal in this patch is to preserve the original config as much as I can. I'm not against running everything in an emulator, but I'd rather do it in incremental steps (so, BEFORE --> AFTER --> AFTER v2 ).

c-rhodes added inline comments.Apr 13 2023, 2:24 AM
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
19

If there's only one %mlir_cpu_runner substitution, then we will have this:

// RUN: %{compile} | %{run_config_1_with_or_without_emulation}
// RUN: %{compile} | %{run_config_2_with_or_without_emulation}
// RUN: %{compile} | %{run_config_3_with_or_without_emulation}

But %mlir_cpu_runner substitution is only used in the SVE run line? The other run lines use the binary name mlir-cpu-runner.

awarzynski added inline comments.Apr 14 2023, 1:00 AM
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
19

I feel that we might be referring to different things.

But %mlir_cpu_runner substitution is only used in the SVE run line? The other run lines use the binary name mlir-cpu-runner

"the binary name mlir-cpu-runner" is also effectively a substitution (created here). Because of these "default" substitutions, things like %mlir-cpu-runner are best avoided [1].

Now, I want to keep two substitutions to preserve the original logic as much as I can:

  1. mlir-cpu-runner for regular RUN lines (config_1 and config_2 above),
  2. %mlir_cpu_runner_vec (or something else) for SVE RUN lines (config_3 above).

This is not possible with one substitution.

I'm against introducing %mlir_cpu_runner (or %mlir-cpu-runner), because it is too close to mlir-cpu-runner (and I want to make sure that the spelling makes it obvious that the two are completely different). That's why I am suggesting an alternative. As the 3rd configuration is meant to run vectorised code, which _may_ or _may not_ be run via an emulator, I see 3 potential suffixes: _vec|_emu| _vla. To me, %mlir_cpu_runner_vec seems like the closes match for config_3 and that's what I suggest that we use here.

WDYT?

[1] It would get expanded too %/<absolute-path>/mlir-cpu-runner, because mlir-cpu-runner substitution would precede the one for %mlir-cpu-runner. Unless I misunderstood how LIT expansion works.

c-rhodes added inline comments.Apr 14 2023, 2:18 AM
mlir/test/Integration/Dialect/SparseTensor/CPU/concatenate_dim_1.mlir
19

I feel that we might be referring to different things.

But %mlir_cpu_runner substitution is only used in the SVE run line? The other run lines use the binary name mlir-cpu-runner

"the binary name mlir-cpu-runner" is also effectively a substitution (created here). Because of these "default" substitutions, things like %mlir-cpu-runner are best avoided [1].

Perhaps that's why underscores are used elsewhere?

Now, I want to keep two substitutions to preserve the original logic as much as I can:

  1. mlir-cpu-runner for regular RUN lines (config_1 and config_2 above),
  2. %mlir_cpu_runner_vec (or something else) for SVE RUN lines (config_3 above).

This is not possible with one substitution.

I'm against introducing %mlir_cpu_runner (or %mlir-cpu-runner), because it is too close to mlir-cpu-runner (and I want to make sure that the spelling makes it obvious that the two are completely different). That's why I am suggesting an alternative. As the 3rd configuration is meant to run vectorised code, which _may_ or _may not_ be run via an emulator, I see 3 potential suffixes: _vec|_emu| _vla. To me, %mlir_cpu_runner_vec seems like the closes match for config_3 and that's what I suggest that we use here.

WDYT?

[1] It would get expanded too %/<absolute-path>/mlir-cpu-runner, because mlir-cpu-runner substitution would precede the one for %mlir-cpu-runner. Unless I misunderstood how LIT expansion works.

I thought this substitution is to support cross-compiling so the path to the target lli (rather than host) binary can be specified and run on an emulator. I think this should be as simple as specifying %mlir_cpu_runner (much like %clang, %clang_cc1, etc) for RUN lines where this is the case and mlir-cpu-runner (defaulting to llvm-tools-dir) everywhere else?

jsetoain accepted this revision.EditedJun 11 2023, 3:46 AM

Other than the typo, all looks good to me.

One comment. In your patch you mention the fact that, when VLA is not enabled, we run VLS twice. Your solution is to run either/or, but that's also not a good testing approach. Ideally, it will run one or both. Given that these tests are, at the moment, really inexpensive in VLS, running 2xVLS or VLS+VLA looks to me like the least worst solution (under the principle that redundant testing is not as bad as incomplete testing).

Fwiw, when I first wrote these tests, I considered two other approaches. In one of them, the most straightforward, I simply duplicated the test under a different directory. The second, I (mostly) avoided code duplication by including the original tests from a subdirectory. I pushed that patch for review, but it was deemed too ugly, which is how we ended up with this approach. You are definitely more proficient with LIT than I am, tests look much better nowadays, so you may be interested in reconsidering that approach (check: Diff1@D12304).

mlir/test/CMakeLists.txt
23
This revision is now accepted and ready to land.Jun 11 2023, 3:46 AM

Other than the typo, all looks good to me.

Thanks for taking a look! I actually need to refactor this a bit to match the changes from https://reviews.llvm.org/D148929. I just haven't had the time recently, but will do this week.

One comment. In your patch you mention the fact that, when VLA is not enabled, we run VLS twice. Your solution is to run either/or, but that's also not a good testing approach. Ideally, it will run one or both.

I agree. The approach proposed here is a compromise to avoid running VLS twice. That's the only low hanging fruit that we have found so far to reduce the execution time of these tests.

Given that these tests are, at the moment, really inexpensive in VLS, running 2xVLS or VLS+VLA looks to me like the least worst solution

Based on this comment, I am guessing that you assume that VLS is cheap and VLA is expensive :) That would be the case if:

  • VLS was always run natively,
  • VLA was always run in an emulator.

That's not the case though. We always run VLA natively (emulator is an option for folks without access to hardware). And we also made sure that these tests are enabled in the public SVE buildbots (https://reviews.llvm.org/D136460). Also, I assume that most folks have VLA tests disabled anyway. So when people say that these integration tests are slow, they are most likely referring to the VLS code-path.

Now, I am making a lot of assumptions - please let me know if I am completely wrong or misinterpreted your comment :)

(under the principle that redundant testing is not as bad as incomplete testing).

Good point! That's why before proposing this change, I made sure that the SVE integration tests are enabled in upstream SVE bots: (https://reviews.llvm.org/D136460. Ultimately, we want to make sure that the VLA code-patch is well defended against any breakage and I am hoping that these bots are a strong enough defence lines :)

Again, this is a compromise. But this way we make sure that:

  • these tests are regularly run (by upstream buildbots),
  • folks that don't/can't run them, don't have to pay the extra price for an additional RUN line.

Fwiw, when I first wrote these tests, I considered two other approaches. In one of them, the most straightforward, I simply duplicated the test under a different directory. The second, I (mostly) avoided code duplication by including the original tests from a subdirectory. I pushed that patch for review, but it was deemed too ugly, which is how we ended up with this approach. You are definitely more proficient with LIT than I am, tests look much better nowadays, so you may be interested in reconsidering that approach (check: Diff1@D12304).

Thanks for the context, this is super helpful! In fact, we have been discussing how to reduce the number of these RUN lines and what you experimented there with is one approach that we have considered. But we would like do that globally, i.e.:

  • move the test input/expected output to dedicated files,
  • move RUN lines to separate files.

And then, use CMake flags to control which RUN lines to enable/disable. AFAIK, that's currently not supported by LIT. You achieved that by duplicating test files, but that would mean ... more files to maintain. Usually that's unpopular :(

I think that people do agree that these tests (and the LIT config) could benefit from some more refactor/re-design. That is bound to require some compromise, which, IMHO, is fine as long as we make sure that all tests are regularly run in buildbots.

-Andrzej

Given that these tests are, at the moment, really inexpensive in VLS, running 2xVLS or VLS+VLA looks to me like the least worst solution

Based on this comment, I am guessing that you assume that VLS is cheap and VLA is expensive :) That would be the case if:

  • VLS was always run natively,
  • VLA was always run in an emulator.

Actually, from what remember from both, neither of them took a particularly long time to finish. I might be wrong (it's been a while), but I half-remember native VLS/VLA being instantaneous, and emulated VLA almost instantaneous.

(under the principle that redundant testing is not as bad as incomplete testing).

Good point! That's why before proposing this change, I made sure that the SVE integration tests are enabled in upstream SVE bots: (https://reviews.llvm.org/D136460. Ultimately, we want to make sure that the VLA code-patch is well defended against any breakage and I am hoping that these bots are a strong enough defence lines :)

Just to be clear, my concern here is that, with this approach, if I modify something that breaks VLS but I'm compiling with VLA support, the integration tests won't detect the integration errors locally because the test won't run in VLS mode. That said, it's true that this is not a huge problem because those bugs will eventually be caught by bots testing VLS, and if test performance has become an issue, then I agree that VLA ^ VLS + build bot tests for each is a reasonable compromise.

Thanks for the context, this is super helpful! In fact, we have been discussing how to reduce the number of these RUN lines and what you experimented there with is one approach that we have considered. But we would like do that globally, i.e.:

  • move the test input/expected output to dedicated files,
  • move RUN lines to separate files.

And then, use CMake flags to control which RUN lines to enable/disable. AFAIK, that's currently not supported by LIT. You achieved that by duplicating test files, but that would mean ... more files to maintain. Usually that's unpopular :(

Not sure I understand this. I only duplicated the tests that could not run in VLA (because of unsupported reduction operations on Arm; at the time I asked to the internal compiler team, and those were supposed to be eventually fixed. In all the others, the only thing the tests do is to include the other source and run with VLA options. If you mean that by "duplicating test files", then yes, indeed :-)

It does feel like this is something lit should support one way or another, back then I spent quite a lot of time (way too much) trying to figure it out, and this is the best I could come up with. Do you know of anybody interested in adding a "RUN-IF" directive to lit? 😬

Actually, from what remember from both, neither of them took a particularly long time to finish. I might be wrong (it's been a while), but I half-remember native VLS/VLA being instantaneous, and emulated VLA almost instantaneous.

Thanks for clarifying. I will just add that quite a few tests have been added in the last 6-9 months, so it's possible that these would no longer be "instantaneous" for you :) (but I might be wrong)

Just to be clear, my concern here is that, with this approach, if I modify something that breaks VLS but I'm compiling with VLA support, the integration tests won't detect the integration errors locally because the test won't run in VLS mode. That said, it's true that this is not a huge problem because those bugs will eventually be caught by bots testing VLS, and if test performance has become an issue, then I agree that VLA ^ VLS + build bot tests for each is a reasonable compromise.

This is a valid concern. However:

  • I suspect that only folks with access to SVE hardware enable/test the VLA code-path anyway. So it won't be exercised as frequently as VLS anyway.
  • Any change in "core" MLIR could break something in Flang (which depends on MLIR)? Should MLIR devs run Flang tests on regular basis? Given Flang build times, that's an unrealistic expectation (caveat: I worked on Flang for quite a while). In practice, buildbots are used to capture such breakages (that was one of the reasons to set them up - there's ~8 that run on AArch64).

I'm definitely not against the flexibility that you are suggesting and agree with pretty much everything that you have said. I am just thinking that we should also be a bit more open to rely on our CI infrastructure for certain things. Especially, given these rather tricky limitations in LIT.

Thanks for the context, this is super helpful! In fact, we have been discussing how to reduce the number of these RUN lines and what you experimented there with is one approach that we have considered. But we would like do that globally, i.e.:

  • move the test input/expected output to dedicated files,
  • move RUN lines to separate files.

And then, use CMake flags to control which RUN lines to enable/disable. AFAIK, that's currently not supported by LIT. You achieved that by duplicating test files, but that would mean ... more files to maintain. Usually that's unpopular :(

Not sure I understand this. I only duplicated the tests that could not run in VLA (because of unsupported reduction operations on Arm; at the time I asked to the internal compiler team, and those were supposed to be eventually fixed. In all the others, the only thing the tests do is to include the other source and run with VLA options. If you mean that by "duplicating test files", then yes, indeed :-)

Yeah, I meant the latter. Sorry for the confusion. In general, we should work towards some basic model where: 1 test == 1 file (or, 1 file for input/output, 1 file with RUN lines).

Btw, I am aware of the limitations that you were hitting and my comment wasn't meant as criticism. Setting these tests up was a non-trivial task and we are super lucky that you paved the way for testing SVE in MLIR 🙏 😄 .

Do you know of anybody interested in adding a "RUN-IF" directive to lit? 😬

Haha, I was just about to say - this is clearly a huge a gap in LIT infrastructure and we just need some brave volunteer(s) to fill it with some magic 😂 .

Sorry about the delay!

It does feel like this is something lit should support one way or another, back then I spent quite a lot of time (way too much) trying to figure it out, and this is the best I could come up with. Do you know of anybody interested in adding a "RUN-IF" directive to lit? 😬

@jsetoain Looks like this is actually already available 😂 In https://reviews.llvm.org/D155403 I used LIT's "conditional substitution" to achieve that. I intend to rebase this change ("switch from lli to mlir-cpu-runner for SVE tests") on top of D155403 ("make sure that there are no duplicated RUN lines"). My ultimate goal is to update all integration tests to make sure that:

  • there's no duplication, i.e. that the VLS vectorisation is always run once (enabled in D155403),
  • all RUN lines use mlir-cpu-runner so that most of the logic is shared (enabled here).

WDYT?

mlir/test/CMakeLists.txt