This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Legalize vector types by widening
ClosedPublic

Authored by tlively on Aug 4 2021, 2:50 PM.

Details

Summary

The default legalization of unsupported vector types is to promote the integers
in each lane, which leads to extra sign or zero extending and masking when
moving data into and out of vectors. Switch our preferred type legalization from
the default to vector widening, which keeps the data in the low lanes of the
vector rather than in the low bits of each lane. The unused high lanes can be
ignored.

Half-wide vectors are now loaded from memory into the low 64 bits of the v128
rather than spread out among the lanes. As a result, v128.load64_splat is a much
more common operation, so add new patterns to support it.

Diff Detail

Event Timeline

tlively created this revision.Aug 4 2021, 2:50 PM
tlively requested review of this revision.Aug 4 2021, 2:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 4 2021, 2:50 PM

Looks very nice! It looks it will have a significant impact on the performance. Are there possibly code patterns that will suffer from this change? If not, I wonder why was the default option in TargetLowering set up that way.. Maybe other architectures have more benefits when doing integer promotion? Anyway I feel I don't fully understand things yet so I added some questions.

And why were the two tests deleted?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
122–124

Why is the exception?

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
199–205
  • If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary?
  • Why is only load64_splat necessary? What happens when we widen i8x4 into i32x4? In this case do we need load32_splat? Is widening only happens by a factor of 2?
llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297

It looks this actually changes the result being returned compared to the previously generate code. Is that fine?

llvm/test/CodeGen/WebAssembly/simd-offset.ll
926

Now that we are doing this, are the narrowing store patterns added in D84377 necessary?

tlively added a subscriber: srj.Aug 17 2021, 8:19 PM
tlively updated this revision to Diff 367102.Aug 17 2021, 8:42 PM
  • Remove obsolete narrowing store support
  • Widen vector types more selectively
  • Move getPreferredVectorAction impl from .h to .cpp
tlively updated this revision to Diff 367104.Aug 17 2021, 8:53 PM
  • Switch from using load64_splat to load64_zero to load half-wide vectors
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h
122–124

The target-independent code that figures out how to legalize types[1] tries the strategies in this order:

  1. PromoteInteger (i.e. make individual lanes types wider)
  2. WidenVector (i.e. keep the lane types but add more lanes)
  3. Split or Scalarize

The preferred vector action says which strategy to start with, but if that fails, subsequent strategies will still be tried. The important part here is that strategies prior to the preferred strategy will not be tried. So if we set the preferred action to be WidenVector for all vector types, whenever that fails the vector will be split or scalarized instead of integer-promoted. Since we don't have any legal i1 vector types, this means that all i1 vectors would be scalarized without this exception, leading to terrible codegen.

Thinking on this more, it seems that we should actually be more selective and opt into WidenVector as the preferred action only for i8, i16, i32, i64, f32, and f64 vectors, since those are the only vectors for which WidenVector can possibly succeed.

[1] https://github.com/llvm/llvm-project/blob/a7ebc4d145892fd22442832549cb12c4b6920dea/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1403-L1499

llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td
199–205
  • If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary?

Good point, I guess we could use v128.load64_zero here instead.

  • Why is only load64_splat necessary? What happens when we widen i8x4 into i32x4? In this case do we need load32_splat? Is widening only happens by a factor of 2?

We don't have any tests that cover those cases right now, but yes, I think we would need load32_splat or load32_zero in that case. Note that we could go even smaller and have e.g. v2i8. Unfortunately load16_zero does not exist, so there load16_splat would probably be the best option. We could also use load16_lane, but that would require an extra zero input vector. Since we don't have test cases that depend on these additional patterns right now, I would like to address them in a follow-up.

llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297

This is expected because the way we are representing these vector types has changed. It is an ABI break for vector code, but I hope that will be ok in practice. Perhaps it's worth mentioning in the LLVM release notes?

@dschuff, do you have thoughts here?

@srj, will that cause problems for Halide?

llvm/test/CodeGen/WebAssembly/simd-offset.ll
926

Nice catch! It looks like that can all be removed now.

Looks very nice! It looks it will have a significant impact on the performance. Are there possibly code patterns that will suffer from this change? If not, I wonder why was the default option in TargetLowering set up that way.. Maybe other architectures have more benefits when doing integer promotion? Anyway I feel I don't fully understand things yet so I added some questions.

Thanks! I haven't seen any code patterns that looks worse after this change, but it's possible it exposes new missed opportunities where we don't have patterns in place yet, like with the scalar_to_vector stuff. I'm not sure why the defaults are set up this way, either.

And why were the two tests deleted?

I deleted the two regression tests because after this change the code paths that contained their relevant bugs are either no longer used at all (e.g. simd-nonconst-sext.ll) or have become well-tested by other tests (e.g. simd-scalar-to-vector.ll). In both cases, the regression tests no longer seemed useful to keep around.

dschuff added inline comments.Aug 18 2021, 2:40 PM
llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297

There are sort of 2 issues here.
One is that our stable ABI is actually a C ABI, and not an LLVM ABI. I forget whether our C ABI (or other C ABIs) actually even defines the convention for vector types, but maybe it should (even if we declare it unstable). It's probably fair to say that we've not promised/stabilized a C vector ABI yet, so if this is a break of some part of the C ABI, I don't know that I'm too worried about it.
The other question is whether we want to define and/or stabilize an LLVM ABI, which is of course broader than just C due to the richer (and simultaneously less rich) type system. My sense would be probably not, at least for everything (I don't know of any other platforms that do this, but I haven't thought about it recently). It probably would still be good to inform/consult stakeholders such as Halide (and Rust?) though.

tlively added inline comments.Aug 18 2021, 2:51 PM
llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297

This ABI change does affect the C ABI via the C vector extensions, but our documented C ABI does not say anything about those vector extensions at all, so it sounds like this is not a problem.

aheejin accepted this revision.Aug 18 2021, 5:26 PM

LGTM, thanks!

This revision is now accepted and ready to land.Aug 18 2021, 5:26 PM
dschuff added inline comments.Aug 18 2021, 5:51 PM
llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll
297

Probably it should say something, even if that something is "vector extensions are not considered part of the stable ABI"

This revision was automatically updated to reflect the committed changes.
llvm/test/CodeGen/WebAssembly/simd-offset.ll