Page MenuHomePhabricator

Set hasSideEffects=0 for TargetOpcode::BUNDLE
AbandonedPublic

Authored by asb on Aug 28 2017, 1:28 PM.

Details

Summary

This patch builds on top of D37065.

The test changes definitely need someone else to look over them - I'm not familiar with AMDGPU assembly. I need to return to call-argument-types.ll, I just couldn't seem to get it to pass despite the fact the test patterns seemed fine in relation to the generated output (I must be missing something obvious, will look again).

Diff Detail

Event Timeline

asb created this revision.Aug 28 2017, 1:28 PM

As a side note, in case anybody is wondering: the differences in the generated code come from the post-RA scheduler, which can do more reordering of instructions relative to bundles.

asb added a comment.Oct 11 2017, 8:22 AM

I'm trying to update this patch to reflect the latest upstream changes in the test files, and am having a terrible time updating the AMDGPU tests. I'd really appreciate a second pair of eyes here.

Command line:
./bin/llc -mtriple=amdgcn-amd-amdhsa -mcpu=fiji -verify-machineinstrs < ../test/CodeGen/AMDGPU/byval-frame-setup.ll | ./bin/FileCheck -enable-var-scope -check-prefixes=GCN,VI ../test/CodeGen/AMDGPU/byval-frame-setup.ll

This complains:

../test/CodeGen/AMDGPU/byval-frame-setup.ll:83:8: error: expected string not found in input
; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s5 offset:8
       ^
<stdin>:111:2: note: scanning from here
 buffer_load_dword v2, off, s[0:3], s5 offset:16
 ^
<stdin>:124:2: note: possible intended match here
 buffer_load_dword v0, off, s[0:3], s5 offset:24
 ^

The relevant part of the test is:

; GCN-LABEL: {{^}}call_void_func_byval_struct_func:
; GCN: s_mov_b32 s5, s32
; GCN-DAG: s_add_u32 s32, s32, 0xc00{{$}}
; GCN-DAG: v_writelane_b32

; GCN-DAG: v_mov_b32_e32 [[NINE:v[0-9]+]], 9
; GCN-DAG: v_mov_b32_e32 [[THIRTEEN:v[0-9]+]], 13

; GCN-DAG: buffer_store_dword [[NINE]], off, s[0:3], s5 offset:8
; GCN-DAG: buffer_store_dword [[THIRTEEN]], off, s[0:3], s5 offset:24

; GCN: buffer_load_dword [[LOAD0:v[0-9]+]], off, s[0:3], s5 offset:8
; GCN: buffer_load_dword [[LOAD1:v[0-9]+]], off, s[0:3], s5 offset:12
; GCN: buffer_load_dword [[LOAD2:v[0-9]+]], off, s[0:3], s5 offset:16
; GCN: buffer_load_dword [[LOAD3:v[0-9]+]], off, s[0:3], s5 offset:20

; GCN-NOT: s_add_u32 s32, s32, 0x800

; GCN-DAG: buffer_store_dword [[LOAD0]], off, s[0:3], s32 offset:4{{$}}
; GCN-DAG: buffer_store_dword [[LOAD1]], off, s[0:3], s32 offset:8
; GCN-DAG: buffer_store_dword [[LOAD2]], off, s[0:3], s32 offset:12
; GCN-DAG: buffer_store_dword [[LOAD3]], off, s[0:3], s32 offset:16

The function in question:

call_void_func_byval_struct_func:       ; @call_void_func_byval_struct_func
; BB#0:                                 ; %entry
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	s_mov_b32 s5, s32
	buffer_store_dword v32, off, s[0:3], s5 offset:40
	s_waitcnt expcnt(0)
	v_writelane_b32 v32, s33, 0
	v_writelane_b32 v32, s34, 1
	v_mov_b32_e32 v0, 9
	v_mov_b32_e32 v1, 13
	v_writelane_b32 v32, s35, 2
	buffer_store_dword v0, off, s[0:3], s5 offset:8
	buffer_store_dword v1, off, s[0:3], s5 offset:24
	s_waitcnt expcnt(1)
	buffer_load_dword v0, off, s[0:3], s5 offset:8
	s_waitcnt expcnt(0)
	buffer_load_dword v1, off, s[0:3], s5 offset:12
	s_add_u32 s32, s32, 0xc00
	buffer_load_dword v2, off, s[0:3], s5 offset:16
	buffer_load_dword v3, off, s[0:3], s5 offset:20
	s_getpc_b64 s[6:7]
	s_add_u32 s6, s6, void_func_byval_struct@rel32@lo+4
	s_addc_u32 s7, s7, void_func_byval_struct@rel32@hi+4
	s_mov_b64 s[34:35], s[30:31]
	s_mov_b32 s33, s5
	s_waitcnt vmcnt(0)
	buffer_store_dword v3, off, s[0:3], s32 offset:16
	buffer_store_dword v2, off, s[0:3], s32 offset:12
	buffer_store_dword v1, off, s[0:3], s32 offset:8
	buffer_store_dword v0, off, s[0:3], s32 offset:4
	s_waitcnt expcnt(0)
	buffer_load_dword v0, off, s[0:3], s5 offset:24
	buffer_load_dword v1, off, s[0:3], s5 offset:28
	buffer_load_dword v2, off, s[0:3], s5 offset:32
	buffer_load_dword v3, off, s[0:3], s5 offset:36
	s_waitcnt vmcnt(0)
	buffer_store_dword v3, off, s[0:3], s32 offset:32
	buffer_store_dword v2, off, s[0:3], s32 offset:28
	buffer_store_dword v1, off, s[0:3], s32 offset:24
	buffer_store_dword v0, off, s[0:3], s32 offset:20
	s_swappc_b64 s[30:31], s[6:7]
	s_mov_b32 s5, s33
	s_mov_b64 s[30:31], s[34:35]
	v_readlane_b32 s35, v32, 2
	v_readlane_b32 s34, v32, 1
	v_readlane_b32 s33, v32, 0
	buffer_load_dword v32, off, s[0:3], s5 offset:40
	s_sub_u32 s32, s32, 0xc00
	s_waitcnt vmcnt(0) expcnt(0)
	s_setpc_b64 s[30:31]
.Lfunc_end2:
	.size	call_void_func_byval_struct_func, .Lfunc_end2-call_void_func_byval_struct_func
                                        ; -- End function

Now unless I'm going crazy here, I really can't see the problem matching buffer_load_dword v0, off, s[0:3], s5 offset:8. Am I missing something obvious, or misunderstanding -DAG in FileCheck?

tstellar edited edge metadata.Oct 11 2017, 9:19 AM

I think this is failing because the s_add_u32 s32, s32, 0xc00{{$}} is moved after the loads.

asb abandoned this revision.Nov 21 2017, 2:01 AM

As mentioned on the mailing list I don't have time to fix up and audit the test case changes. If anyone wants to see BUNDLE with hasSideEffects=0, please do commandeer the revision.

bjope added a subscriber: bjope.Nov 26 2017, 10:59 AM

A BUNDLE instruction is supposed to describe the combined properties of the bundled instructions (at least when it comes to machine operands). Properties like hasSideEffects/mayLoad/mayStore are static, so we can't update those properties for the BUNDLE instruction depending on the properties of the bundled instructions. So maybe it is a good idea to have the defensive approach of letting the BUNDLE instruction have hasSideEffects=1.

Afaik some machine intruction analyses/passes does not look inside the bundle (e.g. iterating using a MachineInstrBundleIterator).
If we set hasSideEffects=0 on the BUNDLE instruction, then that information can't be trusted anyway in case one of the bundled instructions has hasSideEffects=1. Or otherwise I think we need to avoid bundling of instructions with hasSideEffects=1.
If a pass/analysis is smart enough to look inside the BUNDLE, then I guess it can check if all of the bundled instructions has hasSideEffects=0 and ignore the hasSideEffects=1 on the BUNDLE instruction.

Maybe we need to update documentation related to the BUNDLE instruction to describe this better.