This is an archive of the discontinued LLVM Phabricator instance.

[WebAssembly] Restore defaults for stores per memop
ClosedPublic

Authored by tlively on Sep 16 2019, 4:13 PM.

Download Raw Diff

Details

Reviewers

aheejin
alexcrichton

Commits

rGdbcd7f560270: [WebAssembly] Restore defaults for stores per memop
rL372275: [WebAssembly] Restore defaults for stores per memop

Summary

Large slowdowns were observed in Rust due to many small, constant
sized copies in conjunction with poorly-optimized memory.copy
implementations. Since memory.copy cannot be expected to be inlined
efficiently by engines at this time, stop using it for the smallest
copies. We continue to lower all memcpy intrinsics to memory.copy,
though.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 38184
Build 38183: arc lint + arc unit

Event Timeline

tlively created this revision.Sep 16 2019, 4:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2019, 4:13 PM

Herald added subscribers: llvm-commits, sunfish, JDevlieghere and 4 others. · View Herald Transcript

Harbormaster completed remote builds in B38184: Diff 220401.Sep 16 2019, 4:15 PM

What are the default values for those if we unset them?

This revision is now accepted and ready to land.Sep 16 2019, 5:22 PM

I've tested this patch compiling the motivating test case for the original performance issue for Firefox and performance is back to what it was in LLVM 8, so I can at least say from my perspective that this does the trick!

In D67639#1671921, @aheejin wrote:

What are the default values for those if we unset them?

8 stores normally or 4 stores when optimizing for size. If SIMD is enabled that will be 128 and 64 bytes, respectively, or else it will be 64 and 32 bytes.

Closed by commit rL372275: [WebAssembly] Restore defaults for stores per memop (authored by tlively). · Explain WhySep 18 2019, 4:17 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

Target/

WebAssembly/

WebAssemblyISelLowering.cpp

10 lines

test/

CodeGen/

WebAssembly/

bulk-memory.ll

40 lines

Diff 220401

llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp

Show First 20 Lines • Show All 253 Lines • ▼ Show 20 Lines	WebAssemblyTargetLowering::WebAssemblyTargetLowering(
setOperationAction(ISD::TRAP, MVT::Other, Legal);		setOperationAction(ISD::TRAP, MVT::Other, Legal);

// Exception handling intrinsics		// Exception handling intrinsics
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);		setOperationAction(ISD::INTRINSIC_VOID, MVT::Other, Custom);

setMaxAtomicSizeInBitsSupported(64);		setMaxAtomicSizeInBitsSupported(64);

if (Subtarget->hasBulkMemory()) {
// Use memory.copy and friends over multiple loads and stores
MaxStoresPerMemcpy = 1;
MaxStoresPerMemcpyOptSize = 1;
MaxStoresPerMemmove = 1;
MaxStoresPerMemmoveOptSize = 1;
MaxStoresPerMemset = 1;
MaxStoresPerMemsetOptSize = 1;
}

// Override the __gnu_f2h_ieee/__gnu_h2f_ieee names so that the f32 name is		// Override the __gnu_f2h_ieee/__gnu_h2f_ieee names so that the f32 name is
// consistent with the f64 and f128 names.		// consistent with the f64 and f128 names.
setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2");		setLibcallName(RTLIB::FPEXT_F16_F32, "__extendhfsf2");
setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2");		setLibcallName(RTLIB::FPROUND_F32_F16, "__truncsfhf2");

// Define the emscripten name for return address helper.		// Define the emscripten name for return address helper.
// TODO: when implementing other WASM backends, make this generic or only do		// TODO: when implementing other WASM backends, make this generic or only do
// this on emscripten depending on what they end up doing.		// this on emscripten depending on what they end up doing.
▲ Show 20 Lines • Show All 1,208 Lines • Show Last 20 Lines

llvm/test/CodeGen/WebAssembly/bulk-memory.ll

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	; BULK-MEM-NEXT: memory.fill 0, $0, $1, $pop[[L0]]			; BULK-MEM-NEXT: memory.fill 0, $0, $1, $pop[[L0]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memset_1024(i8* %dest, i8 %val) {			define void @memset_1024(i8* %dest, i8 %val) {
	call void @llvm.memset.p0i8.i32(i8* %dest, i8 %val, i32 1024, i1 0)			call void @llvm.memset.p0i8.i32(i8* %dest, i8 %val, i32 1024, i1 0)
	ret void			ret void
	}			}

	; The following tests check that frame index elimination works for			; The following tests check that frame index elimination works for
	; bulk memory instructions. The stack pointer is bumped by 16 instead			; bulk memory instructions. The stack pointer is bumped by 112 instead
	; of 10 because the stack pointer in WebAssembly is currently always			; of 100 because the stack pointer in WebAssembly is currently always
	; 16-byte aligned, even in leaf functions, although it is not written			; 16-byte aligned, even in leaf functions, although it is not written
	; back to the global in this case.			; back to the global in this case.

	; TODO: Change TransientStackAlignment to 1 to avoid this extra			; TODO: Change TransientStackAlignment to 1 to avoid this extra
	; arithmetic. This will require forcing the use of StackAlignment in			; arithmetic. This will require forcing the use of StackAlignment in
	; PrologEpilogEmitter.cpp when			; PrologEpilogEmitter.cpp when
	; WebAssemblyFrameLowering::needsSPWriteback would be true.			; WebAssemblyFrameLowering::needsSPWriteback would be true.

	; CHECK-LABEL: memcpy_alloca_src:			; CHECK-LABEL: memcpy_alloca_src:
	; NO-BULK-MEM-NOT: memory.copy			; NO-BULK-MEM-NOT: memory.copy
	; BULK-MEM-NEXT: .functype memcpy_alloca_src (i32) -> ()			; BULK-MEM-NEXT: .functype memcpy_alloca_src (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 16			; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 6			; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 12
	; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 10			; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]]			; BULK-MEM-NEXT: memory.copy 0, 0, $0, $pop[[L4]], $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memcpy_alloca_src(i8* %dst) {			define void @memcpy_alloca_src(i8* %dst) {
	%a = alloca [10 x i8]			%a = alloca [100 x i8]
	%p = bitcast [10 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %p, i32 10, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst, i8* %p, i32 100, i1 false)
	ret void			ret void
	}			}

	; CHECK-LABEL: memcpy_alloca_dst:			; CHECK-LABEL: memcpy_alloca_dst:
	; NO-BULK-MEM-NOT: memory.copy			; NO-BULK-MEM-NOT: memory.copy
	; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i32) -> ()			; BULK-MEM-NEXT: .functype memcpy_alloca_dst (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 16			; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 6			; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 12
	; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 10			; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]]			; BULK-MEM-NEXT: memory.copy 0, 0, $pop[[L4]], $0, $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memcpy_alloca_dst(i8* %src) {			define void @memcpy_alloca_dst(i8* %src) {
	%a = alloca [10 x i8]			%a = alloca [100 x i8]
	%p = bitcast [10 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memcpy.p0i8.p0i8.i32(i8* %p, i8* %src, i32 10, i1 false)			call void @llvm.memcpy.p0i8.p0i8.i32(i8* %p, i8* %src, i32 100, i1 false)
	ret void			ret void
	}			}

	; CHECK-LABEL: memset_alloca:			; CHECK-LABEL: memset_alloca:
	; NO-BULK-MEM-NOT: memory.fill			; NO-BULK-MEM-NOT: memory.fill
	; BULK-MEM-NEXT: .functype memset_alloca (i32) -> ()			; BULK-MEM-NEXT: .functype memset_alloca (i32) -> ()
	; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer			; BULK-MEM-NEXT: global.get $push[[L0:[0-9]+]]=, __stack_pointer
	; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 16			; BULK-MEM-NEXT: i32.const $push[[L1:[0-9]+]]=, 112
	; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]			; BULK-MEM-NEXT: i32.sub $push[[L2:[0-9]+]]=, $pop[[L0]], $pop[[L1]]
	; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 6			; BULK-MEM-NEXT: i32.const $push[[L3:[0-9]+]]=, 12
	; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]			; BULK-MEM-NEXT: i32.add $push[[L4:[0-9]+]]=, $pop[[L2]], $pop[[L3]]
	; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 10			; BULK-MEM-NEXT: i32.const $push[[L5:[0-9]+]]=, 100
	; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]]			; BULK-MEM-NEXT: memory.fill 0, $pop[[L4]], $0, $pop[[L5]]
	; BULK-MEM-NEXT: return			; BULK-MEM-NEXT: return
	define void @memset_alloca(i8 %val) {			define void @memset_alloca(i8 %val) {
	%a = alloca [10 x i8]			%a = alloca [100 x i8]
	%p = bitcast [10 x i8]* %a to i8*			%p = bitcast [100 x i8]* %a to i8*
	call void @llvm.memset.p0i8.i32(i8* %p, i8 %val, i32 10, i1 false)			call void @llvm.memset.p0i8.i32(i8* %p, i8 %val, i32 100, i1 false)
	ret void			ret void
	}			}