This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Support for WebAssembly globals in LLVM IR
ClosedPublic

Authored by wingo on Apr 30 2021, 2:34 AM.

Details

Summary

This patch adds support for WebAssembly globals in LLVM IR, representing
them as pointers to global values, in a non-default, non-integral
address space. Instruction selection legalizes loads and stores to
these pointers to new WebAssemblyISD nodes GLOBAL_GET and GLOBAL_SET.
Once the lowering creates the new nodes, tablegen pattern matches those
and converts them to Wasm global.get/set of the appropriate type.

Based on work by Paulo Matos in https://reviews.llvm.org/D95425.

Diff Detail

Event Timeline

wingo created this revision.Apr 30 2021, 2:34 AM
wingo requested review of this revision.Apr 30 2021, 2:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2021, 2:34 AM
wingo added a comment.Apr 30 2021, 2:39 AM

Context here is that I realized that in D95425, that we could use a single address space for globals of any type, and that it might make sense to do so for "ordinary" globals like i32 etc. Then we can express adding support for funcref / externref on top of this. WDYT @pmatos ?

wingo updated this revision to Diff 341868.Apr 30 2021, 6:12 AM

Fix clang datalayout expectations

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2021, 6:12 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Nice! I think that splitting this out of the change that adds reference types is a good idea. Regarding the FIXME in the test, is it the case that the globals are also not emitted in the binary format, or is it just the assembly output that is broken?

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1324

I'm not sure it's necessary to create a MERGE_NODES node here, since the the GlobalGet already contains the chain. For a similar example, see how LOAD_SPLAT nodes are created and returned.

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
268

In other places we have used the abbreviation rc as the name for WebAssemblyRegClass parameters, but I don't know if we do that everywhere.

328

Ahhhh, I did not know you could do this! Very cool.

pmatos accepted this revision.Apr 30 2021, 12:12 PM

This looks good to me - I will rebase D95425 on this.

This revision is now accepted and ready to land.Apr 30 2021, 12:12 PM

A quick note - apparently phab felt that since I accepted the revision, it's now ready to land. Probably obvious but I accepted expecting @tlively comments to be addressed first. :)

wingo added 1 blocking reviewer(s): tlively.May 4 2021, 4:14 AM
This revision now requires review to proceed.May 4 2021, 4:14 AM
wingo updated this revision to Diff 342719.May 4 2021, 7:02 AM

Rebase on main

wingo marked 3 inline comments as done.May 5 2021, 12:51 AM
wingo added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1324

I think it is

1324

Aaaah thanks for this tip! Took some finagling to find the right formula but I got it now.

llvm/lib/Target/WebAssembly/WebAssemblyInstrInfo.td
268

Good tip; apparently it's consistent except in WebAssemblyInstrRef.td, where I had initiailly looked. Will go ahead and fix that one too.

wingo updated this revision to Diff 342963.May 5 2021, 12:54 AM
wingo marked 2 inline comments as done.

Address feedback

wingo updated this revision to Diff 342970.May 5 2021, 2:04 AM

Pull in fixes from D101140

sbc100 added a comment.May 5 2021, 8:49 AM

Does this mean that the magic __stack_pointer global can be referenced at the IR level? I wonder if there are some hacks around handling of __stack_pointer that could be removed after this lands?

llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

What does "object" mean here? Are we just talking about reference types? Or also wasm globals that hold integers (like __stack_pointer). If its just ref types that live in this address space should this be called WASM_ADDRESS_SPACE_ANYREF? If its the latter should this be called WASM_ADDRESS_SPACE_WASM_GLOBAL?

llvm/lib/Target/WebAssembly/WebAssemblyISD.def
48

Is this just for ref types or also for global that hold integers too (like __stack_pointer)

llvm/test/CodeGen/WebAssembly/global-get.ll
14

I don't suppose there is any way to give addrspace(1) a symbolic name is there?

sbc100 added inline comments.May 5 2021, 10:59 AM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1378

Is it worth checking this is within WasmAddressSpace range?

LGTM with Sam's comments resolved!

llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

I was also wondering about the best name here because OBJECT is somewhat vague. I think the idea is that this address space can be used for arbitrary Wasm globals of any type, but it could also be used later for things like additional tables and memories. It's unclear whether those would need separate address spaces for some reason, but if they don't, re-using this address space 1 would be best.

Maybe WASM_ADDRESS_SPACE_STATIC would be a better name because it will be used for things that are given static indices in the final module?

llvm/lib/Target/WebAssembly/WebAssemblyISD.def
48

Yep, it's for arbitrary types (see the tests), so this comment should be updated.

llvm/test/CodeGen/WebAssembly/global-get.ll
14

Not that I know of, unfortunately :(

sbc100 added inline comments.May 5 2021, 3:26 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

I was also wondering about the best name here because OBJECT is somewhat vague. I think the idea is that this address space can be used for arbitrary Wasm globals of any type, but it could also be used later for things like additional tables and memories. It's unclear whether those would need separate address spaces for some reason, but if they don't, re-using this address space 1 would be best.

Maybe WASM_ADDRESS_SPACE_STATIC would be a better name because it will be used for things that are given static indices in the final module?

But that is also true for static data symbols that point memory addresess.

tlively added inline comments.May 5 2021, 3:28 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

Hmm, maybe WASM_ADDRESS_SPACE_MODULE_ELEMENT?

sbc100 added inline comments.May 5 2021, 3:33 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

The address space is the space of wasm globals right? So I feel like it should probably contain the word global... shouldn't it?

tlively added inline comments.May 5 2021, 3:45 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

No, see my previous comment. It's more general than just globals. Tables and Memories may also reside in this address space in the future. That being said, I guess we could just name it GLOBAL for now and change the name once we get table and memory support.

Although now that I think about it, https://reviews.llvm.org/D101140 also uses this address space for locals, so it's even more general than I was thinking.

sbc100 added inline comments.May 5 2021, 4:06 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

No, see my previous comment. It's more general than just globals. Tables and Memories may also reside in this address space in the future. That being said, I guess we could just name it GLOBAL for now and change the name once we get table and memory support.

Although now that I think about it, https://reviews.llvm.org/D101140 also uses this address space for locals, so it's even more general than I was thinking.

But are those locals backed by global.get and global.set?

OK well I don't feel so strongly about it then. I had assumed that address space corresponded to wasm globals...

tlively added inline comments.May 5 2021, 4:14 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
40 ↗(On Diff #342970)

No, those locals turn into normal locals backed by local.get and local.set. In that case, the address space is used in the type of alloca instructions.

I guess maybe OBJECT is an appropriately general label for such a general address space. Looking forward to seeing if @wingo has any better ideas!

sbc100 added a comment.May 5 2021, 4:35 PM

How about WASM_ADDRESS_SPACE_NON_INTEGRAL or WASM_ADDRESS_SPACE_OPAQUE ?

wingo marked an inline comment as done.May 6 2021, 3:22 AM

Thanks for the review!

Regarding the address space name -- the intention with "object" was to rhyme a bit with "object capabilities", somehow, but I see that it is confusing.

How about WASM_ADDRESS_SPACE_MANAGED ? I.e. this address space is for "managed" variables. Doesn't mean that the variables are under GC control (though they may be), but rather that the wasm implementation manages their storage itself, instead of having them be in linear memory. Will post a renamed patch, lmk what you think.

wingo added inline comments.May 6 2021, 3:24 AM
llvm/lib/Target/WebAssembly/WebAssemblyISD.def
48

FWIW I think you would be able to define __stack_pointer from IR via

@__stack_pointer = external addrspace(1) global i32

and then to use it:

%current_sp = load i32, i32 addrspace(1)* @__stack_pointer
wingo updated this revision to Diff 343344.May 6 2021, 3:27 AM

Rename "object" address space to "managed"; address feedback.

wingo updated this revision to Diff 343360.May 6 2021, 4:43 AM

yarr, shiver me timbers, fix me typos

tlively accepted this revision.May 6 2021, 10:22 AM

This LGTM! MANAGED sounds like a good address space name to me. @sbc100, do you have any final comments?

This revision is now accepted and ready to land.May 6 2021, 10:22 AM
sbc100 accepted this revision.May 6 2021, 10:30 AM

managed seems reasonable yes.

tlively added inline comments.May 6 2021, 10:31 AM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1280

Actually, should we enforce that these LLVM IR globals be thread_local, since the resulting Wasm globals will be thread_local? I don't know if that will affect any optimizations, but it seems like a more accurate modeling. If we do that, CoalesceLocalsAndStripAtomics::stripThreadLocals in WebAssemblyTargetMachine.cpp will have to be updated to not strip the thread locals corresponding to Wasm globals.

1280

I'd be fine handling this in a follow-up if you want to get this landed.

sunfish added inline comments.May 6 2021, 10:43 AM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
39 ↗(On Diff #343360)

Sorry to throw more paint at the bikeshed here, but as someone who's only following along at a high-level here, I found it confusing whether this is talking about the wasm globals themselves, or the objects referred to by reference values in the wasm globals. I think the feature here is talking about the wasm globals themselves, but "managed" initially made me think it might be talking about the objects they reference, which in a browser context especially are "managed" in every sense of the word.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1280

Note that this will be true of Worker-based threads, but not of the expected future "wasm native threads". I wonder if this means that clang/llvm will (eventually) need to be aware of this difference.

tlively added inline comments.May 6 2021, 10:54 AM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
39 ↗(On Diff #343360)

Fair point. What does everyone think about INDEXED, because it is used to represent objects given static indexes (in the WebAssembly sense) in the final binary.

sbc100 added inline comments.May 6 2021, 11:18 AM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1280

Don't you think is very likely that even in "wasm native threads" we would want to opt into sharing like we do with wasm memories? That path would seem better for compatibility with existing worker-based threading.

Obviously once we do have shared wasm globals we will need to modify llvm's assumptions, but for now I think it is a reasonable assumption/assertion that wasm globals are TLS.

sunfish added inline comments.May 6 2021, 11:23 AM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1280

Yeah, I agree we'll likely want some way to opt into sharing. So the difference here is, worker-based threads won't have the option of sharing, so maybe requiring thread_local makes sense. But native threads could opt into sharing, so they could be thread_local or not.

sbc100 added inline comments.May 6 2021, 2:41 PM
llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1280

Yup. And who know .. maybe by the time shared globals and native threads are added, Workers might also add shared globals?

sunfish added inline comments.May 6 2021, 3:17 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
39 ↗(On Diff #343360)

Do I understand correctly that global variables and local variables are being assigned addresses within the same conceptual address space here?

How about WASM_VARIABLES or WASM_VARS? The wasm spec terms for these are global variables and local variables.

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp
1280

It's possible :-). In any case, I don't think it's anything we need to do anything about right now.

tlively added inline comments.May 6 2021, 5:22 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
39 ↗(On Diff #343360)

Sure, that works for me. We may want to rename it in the future if we end up using the same address space for tables and memories, but we can cross that bridge when we get there.

sunfish added inline comments.May 6 2021, 5:38 PM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
39 ↗(On Diff #343360)

Tables and memories have their own index spaces in wasm, so it seems like they'd want their own address spaces here, perhaps 2 and 3, respectively? I also agree that we can figure this out later.

wingo added inline comments.May 10 2021, 12:26 AM
llvm/lib/Target/WebAssembly/Utils/WebAssemblyUtilities.h
39 ↗(On Diff #343360)

WASM_VARS is fine with me, with one caveat -- in https://reviews.llvm.org/D101140 where we add support for locals, we add a new "core" TargetStackID enum which describes static alloca that is actually a wasm local. Really this enum describes objects allocated on a parallel stack, outside of linear memory. It was nice to use the same name to describe the address space and the stack id, but I don't think variable or wasm-var are good core enum names. It feels like there is a core concept here for systems that have two stacks -- one of named objects and one of linear-memory bytes. I thought object or managed could be a good name for the former, but evidently not :)

I will let this one sit overnight; if there are no other good ideas before tomorrow I will do WASM_ADDRESS_SPACE_WASM_VARS and either find a new name for TargetStackID or attempt to use a target-specific definition.

wingo updated this revision to Diff 344320.May 11 2021, 1:39 AM

Rename "managed" address space to "wasm vars"

This revision was landed with ongoing or failed builds.May 11 2021, 2:24 AM
This revision was automatically updated to reflect the committed changes.