This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Adding 64-bit version of R_WASM_MEMORY_ADDR_* relocs
ClosedPublic

Authored by aardappel on Jun 11 2020, 5:39 PM.

Details

Summary

This adds 4 new reloc types.

A lot of code that previously assumed any memory or offset values could be contained in a uint32_t (and often truncated results from functions returning 64-bit values) have been upgraded to uint64_t. This is not comprehensive: it is only the values that come in contact with the new relocation values and their dependents.

A new tablegen mapping was added to automatically upgrade loads/stores in the assembler, which otherwise has no way to select for these instructions (since they are indentical other than for the offset immediate).

Diff Detail

Event Timeline

aardappel created this revision.Jun 11 2020, 5:39 PM
Herald added a project: Restricted Project. · View Herald Transcript

I don't really grok the tablegen stuff but everything else looks pretty good to me!

lld/wasm/Driver.cpp
502 ↗(On Diff #270269)

This seems odd.

I wonder if this function should at least take an in32_t?

lld/wasm/InputChunks.cpp
89

I don't think I'm ready for these kind of autos just yet. I'd rather see what type it is.

117

ditto

lld/wasm/InputFiles.cpp
140

oops?

436

ditto

llvm/include/llvm/BinaryFormat/Wasm.h
79

How about calling this something else? Maybe ConstVal? Or just Int?

The fact that you have left all the other ones makes me less attracted to this part of the change. What would it looks like to instead leave this member of the union and instead check all accesses to ensure they match the Opcode?

llvm/lib/Object/RelocationResolver.cpp
496

But we don't support 64 bit relocs o wasm32 and vice versa, right? I mean I don't think we should.

llvm/lib/Object/WasmObjectFile.cpp
170

Removing this seems wrong.. we don't to support wider const values with this opcode.

570

ditto re: auto

1256

Again it doesn't sit right to assign to the Int64 member for I32_CONST opcode.

1353–1359

ditto.

llvm/lib/ObjectYAML/WasmYAML.cpp
423 ↗(On Diff #270269)

Maybe would make more sense if the union member had a different name.. but I'm leaning towards restoring the different members.

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
849

Does the triple have something like is64() that might make this shorter? Or isWasm64?

aardappel marked 12 inline comments as done.Jun 12 2020, 10:06 AM
aardappel added inline comments.
lld/wasm/Driver.cpp
502 ↗(On Diff #270269)

As I mentioned below, I have tried instead your approach of explicitly checking the opcode. I hope I have caught them all. Please review :)

lld/wasm/InputChunks.cpp
89

The reason I replaced these with auto is we had a bunch of spots where we had unnecessary truncation because of a uint32_t var = exp that results in a 64-bit value. Using auto avoids these problems, now, and with future changes.

lld/wasm/InputFiles.cpp
140

how did I miss that..

llvm/include/llvm/BinaryFormat/Wasm.h
79

Ok, as you wish. This added a bunch of new if-thens to the code, please have a look if they are correct. Also see the TODO(wvo).

llvm/lib/Object/RelocationResolver.cpp
496

ok, I'll make 2 of these functions instead.

llvm/lib/Target/WebAssembly/AsmParser/WebAssemblyAsmParser.cpp
849

Using isArch64Bit now.

aardappel marked 2 inline comments as done.

Changed init expr to use all union types, and added code to check for opcode.
Other misc code review.
Fixed linker error due to tablegen adding both generated functions to same #ifdef :(
clang-format

Nice! lgtm % the extra used of auto.

I'm not sure how strictly to apply the rules here: https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable. But I feel like they confirm my hunch these are overuses. I'm not sure though. and I don't personally care too much either way. @dschuff what do you think about these uses of auto in this PR?

lld/wasm/SyntheticSections.cpp
290

I think the answer is "for functions symbols never and for data symbols alwasys".

GOT symbols represent offsets into either the table or the memory.

Happy to leave this as followup (PIC output is limited to emscripten today anyway and is likely to change in the official ABI).

as for auto, indeed its says "Use auto if and only if it makes the code more readable or easier to maintain". That was exactly my point when I mentioned "The reason I replaced these with auto is we had a bunch of spots where we had unnecessary truncation because of a uint32_t var = exp that results in a 64-bit value. Using auto avoids these problems, now, and with future changes."

I'm worried that if one interprets the "easier to maintain" argument too broadly then we end up with the "“almost always auto” situation that the guidelines specifically want to avoid.

Some argue that its good to use auto everywhere that you store the return value of another function since then you can then change the return type of an function without having the change all the callsites. I think that argument is one that llvm doesn't currently accept.

Am I misinterpreting the llvm guidelines? Do you think that auto var = SomeFunction() should generally be preferred based on your reading of the guide? Or maybe you only thinking about function that return types that have builtin lossy coercions?

@sbc100 Like I said, I think these silent truncations by using explicit types are a maintenance problem which the guidelines mention.

But it sounds like we're not going to agree on the use of auto here.. since you're working with this code more often than me, I'd be happy to change them to int64_t etc if you prefer that.

sbc100 accepted this revision.Jun 12 2020, 3:09 PM
This revision is now accepted and ready to land.Jun 12 2020, 3:09 PM
dschuff accepted this revision.Jun 12 2020, 5:22 PM
dschuff added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td
239–240

I guess this bit is just a holdover from the previous CL?

aardappel marked 2 inline comments as done.Jun 12 2020, 5:52 PM
aardappel added inline comments.
llvm/lib/Target/WebAssembly/WebAssemblyInstrMemory.td
239–240

Yes, I need to rebase this on the other CL once merged.

(Phabricator tip: You can set parent / child revisions, which can be useful in case multiple CLs are in flight and have not been landed yet :) Probably you already know that, but anyway just in case)

(Phabricator tip: You can set parent / child revisions, which can be useful in case multiple CLs are in flight and have not been landed yet :) Probably you already know that, but anyway just in case)

Better yet, Phabricator will do it for you, if you say something like "Depends on DXXXXX"!

aardappel marked an inline comment as done.Jun 15 2020, 9:46 AM

Thanks for the tips. Though frankly in this case I should have tried to make this commit independent, since significant changes in the other one made rebasing a mess.

aardappel updated this revision to Diff 270782.Jun 15 2020, 9:49 AM

Rebased against previous commit

Made dedicated supportsWasm64/resolveWasm64 rather than sharing the Wasm32 versions.

thakis added a subscriber: thakis.Jun 15 2020, 10:18 AM
thakis added inline comments.
llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp
40

This is a bit awkward. It makes WebAssembly the only target that has its TargetInfo dir depend on a llvm-tblgen generated file. It doesn't cause actual issues, but it's irregular.

Make of this what you will :)

This revision was automatically updated to reflect the committed changes.
aardappel marked an inline comment as done.
aardappel marked an inline comment as done.Jun 15 2020, 10:38 AM
aardappel added inline comments.
llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp
40

I'd be happy to move this something else.. we just don't have any other .cpp that is shared between CodeGen and MC. I could add a new lib that both depend on? @dschuff

aardappel marked an inline comment as done.Jun 15 2020, 10:59 AM
aardappel added inline comments.
llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp
40

Actually, the cleaner solution would be to have tablegen (or whatever creates the instruction mappings) not stick both generated functions in the same #ifdef, which is the root cause of this having to sit in a shared location in the first place. Or make the functions static. But that potentially affects other targets, etc.

thakis added inline comments.Jul 6 2020, 5:48 PM
llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp
40

This came back to bite me today (or, well, actually, a few weeks ago, but I didn't notice until today): In the GN build (which isn't supported and which you don't have to care about; it's just FYI, nothing to act on), the .inc files get generated in the Target sub directory they best fit into. Due to this change, I had moved WebAssemblyGenInstrInfo.inc from WebAssembly/MCTargetDesc to WebAssembly/TargetInfo. I didn't realize that this had the effect that I now had a correct WebAssemblyGenInstrInfo.inc and a stale WebAssemblyGenInstrInfo.inc in different directories in build dirs on the bots, and they happened to pick up the stale one. Since the .inc include is unqualified, a wasm .td change today broke my bots. The hack fix was to delete the stale .inc file.

The cmake build puts all target llvm-tblgen output in lib/Target/$targetname so it's not a problem there (the GN build should do that too), but I thought it's a nice example of how "this looks a bit funny" turned into an actual problem down the road.

Again, nothing for you to do here :)

aardappel marked an inline comment as done.Jul 6 2020, 5:58 PM
aardappel added inline comments.
llvm/lib/Target/WebAssembly/TargetInfo/WebAssemblyTargetInfo.cpp
40

Sorry about that.. I wouldn't mind fixing this, question is what is the best fix? Do you know of a way to have tablegen generate one file per instruction mapping?