This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Add end-to-end codegen tests for wasm_simd128.h
ClosedPublic

Authored by tlively on Apr 30 2021, 10:37 PM.

Details

Reviewers
aheejin
dschuff
Summary

Add tests checking that each SIMD intrinsic produces the expected instruction.
Assembly tests are generally discouraged in clang, but in this case we actually
care about the specific instruction being generated from the intrinsics. There
are nine problems with the current intrinsic codegen and they are marked in the
tests with FIXMEs.

Also fix the names of a few instructions to match the spec, fix the ordering of
*_const intrinsics, and add the missing wasm_i64x2_make intrinsic.

Diff Detail

Event Timeline

tlively created this revision.Apr 30 2021, 10:37 PM
tlively requested review of this revision.Apr 30 2021, 10:37 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptApr 30 2021, 10:37 PM
tlively updated this revision to Diff 342121.Apr 30 2021, 10:39 PM
  • squash to include all changes
aheejin accepted this revision.May 1 2021, 7:30 PM

Wow these are really a lot of instructions!

clang/test/Headers/wasm.c
2

Now that we have CHECK lines, we don't need this

1060–1069

Why can't this be done in a single instruction and what is the FIXME for? Maybe a bit of more explanation would help.

This revision is now accepted and ready to land.May 1 2021, 7:30 PM

Assembly tests are generally discouraged in clang, but in this case we actually care about the specific instruction being generated from the intrinsics.

I don't think this is a sound reason to add an end-to-end test in clang. The same is true of all clang tests, right? We ultimately care that accessing a parameter lowers to a certain register (because we're trying to implement a certain ABI) but we don't test that in clang - we test that we lower to certain IR which is guaranteed to lower to a certain register use - and then in LLVM we test that that IR does lower to that register.

I think the same holds true here - and a clang test should verify the IR and an LLVM test should verify the assembly.

Assembly tests are generally discouraged in clang, but in this case we actually care about the specific instruction being generated from the intrinsics.

I don't think this is a sound reason to add an end-to-end test in clang. The same is true of all clang tests, right? We ultimately care that accessing a parameter lowers to a certain register (because we're trying to implement a certain ABI) but we don't test that in clang - we test that we lower to certain IR which is guaranteed to lower to a certain register use - and then in LLVM we test that that IR does lower to that register.

I think the same holds true here - and a clang test should verify the IR and an LLVM test should verify the assembly.

In order to get the benefit of this end-to-end test from split tests like that, the LLVM test would have to be automatically generated from the clang test. This wouldn't be so bad to do as long as the LLVM test also used autogenerated checks, but overall that would be extra complexity, verbosity, and indirection for no additional testing benefit, especially given that clang and LLVM are now in the same monorepo and can trivially be kept in sync. Do you think that extra complexity is worth it?

Assembly tests are generally discouraged in clang, but in this case we actually care about the specific instruction being generated from the intrinsics.

I don't think this is a sound reason to add an end-to-end test in clang. The same is true of all clang tests, right? We ultimately care that accessing a parameter lowers to a certain register (because we're trying to implement a certain ABI) but we don't test that in clang - we test that we lower to certain IR which is guaranteed to lower to a certain register use - and then in LLVM we test that that IR does lower to that register.

I think the same holds true here - and a clang test should verify the IR and an LLVM test should verify the assembly.

In order to get the benefit of this end-to-end test from split tests like that, the LLVM test would have to be automatically generated from the clang test.

Why is that? We don't do that for other test surface area between clang and LLVM.

In order to get the benefit of this end-to-end test from split tests like that, the LLVM test would have to be automatically generated from the clang test.

Why is that? We don't do that for other test surface area between clang and LLVM.

The question this test answers is "Do the intrinsic functions generate the proper WebAssembly instructions?" (Notably, the test reveals that in multiple cases, they don't). If we had separate C->IR and and IR->Wasm tests, they would be able to answer this question only if we were sure that the output of the C test matched the source of the IR test, and generating the IR test from the C test would be the best way to ensure that.

I understand your point that clang tests typically do not try to answer this kind of question, but this is an important question to be able to answer for the folks working on WebAssembly SIMD. So the options I see are:

  1. Have this abnormal end-to-end test in clang.
  2. Autogenerate an IR test from the C test so the composition of tests tells us what we want to know.
  3. Host the test in some other repository.

Among those, the first is both the easiest to maintain and the most useful.

In order to get the benefit of this end-to-end test from split tests like that, the LLVM test would have to be automatically generated from the clang test.

Why is that? We don't do that for other test surface area between clang and LLVM.

The question this test answers is "Do the intrinsic functions generate the proper WebAssembly instructions?" (Notably, the test reveals that in multiple cases, they don't). If we had separate C->IR and and IR->Wasm tests, they would be able to answer this question only if we were sure that the output of the C test matched the source of the IR test, and generating the IR test from the C test would be the best way to ensure that.

I understand your point that clang tests typically do not try to answer this kind of question, but this is an important question to be able to answer for the folks working on WebAssembly SIMD.

Is it fundamentally differently/more important than all the other questions like ABI compatibility? In what way?

So the options I see are:

  1. Have this abnormal end-to-end test in clang.
  2. Autogenerate an IR test from the C test so the composition of tests tells us what we want to know.
  3. Host the test in some other repository.

Among those, the first is both the easiest to maintain and the most useful.

My main objection is to the testing itself compared to the rest of the clang/llvm test philosophy - mostly in the hopes that splitting such testing, the same as nearly all other testing, would be sufficient here. If not, it might be nice to understand what kinds of test/properties make this suitable here despite it generally not being suitable for what seem like similar issues elsewhere in the compiler.

Also, other platforms seem to be OK with this sort of split testing - there's lots of testing of intrinsics (mostly in clang/test/CodeGen, rather than clang/test/Headers, by the looks of it) which, at a cursory glance, seems to generally use emit-llvm+FileCheck, not going all the way to assembly. I don't know I've seen much discussion that this has been a problematic gap in testing for LLVM targets so far.

(all that said, if it's really needed, there's something that makes it fundamentally different from what LLVM's done historically here, or evidence that's been problematic/costly in terms of allowing regressions, I don't mind them being in the clang test directory - though there's also the currently-being-refactored debuginfo-tests directory which will also be for higher level more end-to-end tests and these tests might be suitable there... though again, be good to understand why the current testing for other targets has been inadequate or what makes WebAssembly different here)

I think there's a clear upside on keeping this within clang/.

  1. As @tlively said, there are many number of instructions to test and keeping "C function - LLVM intrinsic" and "LLVM intrinsic - Wasm instruction" tests in sync without autogeneration will be hard and error-prone.
  1. Also it is not always the case that we have 1-1-1 relationship of C intrinsic function - LLVM intrinsic - Wasm instruction. For some of these we don't have our own intrinsics but rather use LLVM's common intrinsics, and they don't always boil down to a single LLVM intrinsic. There are cases where a single C function call can be lowered to multiple instructions in the LLVM IR, which we try to pattern match and lower to a single Wasm instruction. This kind of relationship can't be tested easily so it will take significantly longer time to check if a single C function call will result in a single Wasm instruction.

It might be good to move this to clang/test/CodeGen though, if that's more suitable.

I think there's a clear upside on keeping this within clang/.

  1. As @tlively said, there are many number of instructions to test and keeping "C function - LLVM intrinsic" and "LLVM intrinsic - Wasm instruction" tests in sync without autogeneration will be hard and error-prone.

I don't necessarily see that they should be kept in sync - in the same way that we don't do this for other IR targets and other features, like ABIs - it's important they are lowered to specific instructions, but we don't generally validate that through end to end tests.

  1. Also it is not always the case that we have 1-1-1 relationship of C intrinsic function - LLVM intrinsic - Wasm instruction. For some of these we don't have our own intrinsics but rather use LLVM's common intrinsics, and they don't always boil down to a single LLVM intrinsic. There are cases where a single C function call can be lowered to multiple instructions in the LLVM IR, which we try to pattern match and lower to a single Wasm instruction. This kind of relationship can't be tested easily so it will take significantly longer time to check if a single C function call will result in a single Wasm instruction.

The lack of 1-to-1 is part of the reason I prefer these tests to be separate. If one C level intrinsic lowers to multiple IR operations - then it's good to test each of those IR operations separately from each other. So that all their uses can be validated. Then, knowing they are validated - the Clang C-to-IR test can test that suitable operations are generated, knowing that they're tested in LLVM to work as specified.

It might be good to move this to clang/test/CodeGen though, if that's more suitable.

I'm still not following why this is different from existing testing - many targets already exist in LLVM and already test their intrinsics in various ways, generally, so far as I know (though I haven't looked comprehensively - might be worth you taking a look at existing test strategies for intrinsics to compare/contrast?) . Why is WebAssembly different here?

I think there's a clear upside on keeping this within clang/.

  1. As @tlively said, there are many number of instructions to test and keeping "C function - LLVM intrinsic" and "LLVM intrinsic - Wasm instruction" tests in sync without autogeneration will be hard and error-prone.

I don't necessarily see that they should be kept in sync - in the same way that we don't do this for other IR targets and other features, like ABIs - it's important they are lowered to specific instructions, but we don't generally validate that through end to end tests.

What I meant by keeping in sync was, because there are many instructions and some of them are added and deleted as we progress, so making sure the two tests test the same set of instructions without missing anything is different, without resorting to some kind of autogeneration tool.

  1. Also it is not always the case that we have 1-1-1 relationship of C intrinsic function - LLVM intrinsic - Wasm instruction. For some of these we don't have our own intrinsics but rather use LLVM's common intrinsics, and they don't always boil down to a single LLVM intrinsic. There are cases where a single C function call can be lowered to multiple instructions in the LLVM IR, which we try to pattern match and lower to a single Wasm instruction. This kind of relationship can't be tested easily so it will take significantly longer time to check if a single C function call will result in a single Wasm instruction.

The lack of 1-to-1 is part of the reason I prefer these tests to be separate. If one C level intrinsic lowers to multiple IR operations - then it's good to test each of those IR operations separately from each other. So that all their uses can be validated. Then, knowing they are validated - the Clang C-to-IR test can test that suitable operations are generated, knowing that they're tested in LLVM to work as specified.

We have tests in LLVM too. But they can't check whether a single C function call boils down to a single Wasm instruction. And it is not always easy to look at those patterns and come up with the reverse mapping that created it. We match and optimize a lot of patterns, some of them generated from a single C function call but others not.

I thought maybe /some/ of the other targets used end-to-end clang tests to test intrinsics, but I can't seem to find any (they seem to be a small minority, if there are any):

grep -r -l intrin.h clang/test/ | xargs grep -L emit-llvm.*FileCheck | xargs grep RUN | less

(then looking for any RUN lines that use FileCheck but don't use emit-llvm)

Unless there's something really different about WebAssembly here, or some evidence that the existing test strategy has been problematic - I think it'd be good to stick with this general approach to testing WebAssembly's intrinsics too.

I chatted with @dblaikie offline about this just now, and we both think it makes sense to turn this particular test into a C->IR test, then later potentially add a C->Wasm end-to-end test to the cross-project-tests directory created in this WIP stack of diffs: https://reviews.llvm.org/D95339. I'll also bump the RFC thread about cross-project-tests with a pointer to this conversation to solicit more feedback about whether this kind of end-to-end intrinsic test should be in scope for cross-project-tests.

tlively closed this revision.May 3 2021, 2:55 PM
penzn added a subscriber: penzn.May 4 2021, 6:49 PM

I think there is another dimension to this aside from project composition - intrinsics have a tendency to "interact" with their surroundings, and it better to capture the IR rather than the end result. Even if we can verify that simple calls produce instructions we expect, this might not hold true is the arguments change, or the call is in a different context. IR definitely gives more opportunities to test things through.

I think there is another dimension to this aside from project composition - intrinsics have a tendency to "interact" with their surroundings, and it better to capture the IR rather than the end result. Even if we can verify that simple calls produce instructions we expect, this might not hold true is the arguments change, or the call is in a different context. IR definitely gives more opportunities to test things through.

Yeah, the contract that specific instructions are generated really only holds in trivial cases by design. I'm not sure how to best formalize that, though.

I think there is another dimension to this aside from project composition - intrinsics have a tendency to "interact" with their surroundings, and it better to capture the IR rather than the end result. Even if we can verify that simple calls produce instructions we expect, this might not hold true is the arguments change, or the call is in a different context. IR definitely gives more opportunities to test things through.

Yeah, the contract that specific instructions are generated really only holds in trivial cases by design. I'm not sure how to best formalize that, though.

I don't know that much about intrinsics, but happy to help with test suggestions - do you have any practical examples of this you could show & I can see if I've got any ideas of good ways to test them?