Details
Details
- Reviewers
• tstellarAMD nhaehnle
Diff Detail
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1962–1963 | I just took a look, and for some reason the reloads tend to look like buffer_load_dword v3, off, s[72:75], s70 offset:1444 ; 16-byte Folded Reload ; encoding: [0xa4,0x05,0x30,0xe0,0x00,0x03,0x12,0x46] s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf] buffer_load_dword v4, off, s[72:75], s70 offset:1448 ; 16-byte Folded Reload ; encoding: [0xa8,0x05,0x30,0xe0,0x00,0x04,0x12,0x46] s_waitcnt vmcnt(0) ; encoding: [0x70,0x0f,0x8c,0xbf] etc., so you actually get 12 bytes per dword. Not sure if that's a problem, especially since those waits are really wrong anyway (perhaps the wait insertion gets confused by the register/subregister relationship?). |
lib/Target/AMDGPU/SIInstructions.td | ||
---|---|---|
1962–1963 | I'm not really sure what to do about waitcnts. It doesn't really matter for correctness, since the branch relax pass right now runs after these should be eliminated (these may be inserted during relaxation but isn't a concern yet) |
I just took a look, and for some reason the reloads tend to look like
etc., so you actually get 12 bytes per dword. Not sure if that's a problem, especially since those waits are really wrong anyway (perhaps the wait insertion gets confused by the register/subregister relationship?).