- User Since
- Jul 24 2018, 5:18 PM (77 w, 3 d)
Nov 4 2019
Oct 2 2019
Sep 9 2019
Aug 8 2019
I like the general approach,
Jul 17 2019
Thanks folks, publishing now
@grokos any comments here please?
hoping to get this patch in before the 9.0 branch.
Jul 16 2019
May 28 2019
thanks for test, LGTM
one follow on question, is there value in adding a testcase for this patch ?
May 24 2019
looks like a good change
also consistent with some recent OpenMP language examples committee discussions on this topic.
May 17 2019
Superseded by D61313
May 7 2019
May 3 2019
have you run all the lit tests on this patch ?
This sure does look like the same problem to me. https://reviews.llvm.org/D60459
Apr 18 2019
Added use of LivePhysRegs, happily lifted some code Krzy wrote for Hexagon to compute getLiveRegsAt.
Apr 17 2019
slightly generalized to some physical reg. only look at previous instruction.
The definition is either there, and were all good, or we will bail.
Apr 12 2019
Apr 11 2019
Apr 10 2019
added two MIR tests,and refined logic to properly bail.
Apr 9 2019
Added check for instr match missing, and bail on optimization if so.
I prefer the .ll test we have for the patch now over that of creating an MIR test for this issue.
i think bailing the optimization if not found within some reasonable distance (10 seems to be popular), is a good suggestion. Much better than aborting. thx
The current problem i am trying to resolve in somewhat analogous to hoisting 1/2 of the 64 bit add instruction pair. Although in this particular situation we are actually sinking 1/2 of the instruction pair into a later position within the same block. And yes, i can see how in the future a new machine instruction pass might choose to hoist one of the instructions into a pred BB. I realize i can write additional code to scan a previous block. However i think its better that passes not hoist part of an instruction pair, especially ones such as these. To that end i would rather see my patch assert so that we are forced to deal with such a situation should it arise.
Your example, btw, is a good one for why we should have an IR test for the current problem, rather than an MIR test. An MIR test that runs just before SILoadStoreOptimizer will not detect the affects of a new pass. Whereas the IR test attached to this patch stands a better chance of detecting the issue.
i agree it could happen. Not sure what to do about it here.
after looking at the suggestion of using computeRegisterLiveness, I noticed that it does not return the MI where the register in question is most recently defined.
Rather, it informs on liveness within a range. I dont really see how I would use this method effectively?
The problem I am trying to solve requires identifying a specific instruction that is needed by a subsequent instruction and then adding the identified instruction to a list constructed by SILoadStoreOptimizer.
correction: the original input test did NOT have the instructions separated by more than 1 or 2 instructions The resultant output showed the large separation.
The default neighborhood of 10 is probably more than enough.
test will convert to MIR form.
Patch will change to use computeRegisterLiveness. i will have to use a pretty large neighborhood , as the original code this error occurred in (before running bugpoint) , had the s_add_u32 instruction was separated by over 400 instructions from the s_addc_u32 instruction. We will assert fail if we cannot find the s_add_u32 instruction, so that will alert us to increase neighborhood size. This patch will also handle the corresponding sub instructions.
Dec 31 2018
LGTM, pending what you decide about adding another lit test.
Dec 30 2018
generally seems fine to me.
Would it be reasonable/useful to have a lit test that somewhat represents what we observed in the DeviceMemory test ?
if you think the fdiv32-to-rcp-folding.ll adequately covers it, then thats fine by me.
Dec 13 2018
Dec 11 2018
Dec 3 2018
Dec 2 2018
Nov 30 2018
Patch passed our internal jenkins testing: #890
Nov 29 2018
Added MIR test, and changes per review comments.
Nov 27 2018
latest approach: transform the pair of ADDs/SUBs into e32, and tighten up check on def/use from one ADD to the other.
Incorporated changes for Some of the Shrink suggestions.
Still need to do an MIR test.
Also, investigate if moving/adding Shrink pass results in test progressions or regressions
I tried adding Shrink pass before PeepholeSDWA and observed 88 lit test failures.
i tried moving Shrink pass before Peephole SDWA and observed 25 lit test failures
Nov 26 2018
New patch addressing many (but not all) of the review comments.
Will look into the shrink related comments soon ...
New patch arriving momentarily ...
Nov 25 2018
Nov 16 2018
rL347008: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Nov 15 2018
Nov 14 2018
Per review comments, now using getRegBitWidth(),
which exposed a missing case in getRegBitWidth() for AMDGPU::SReg_64_XEXECRegClassID
Nov 13 2018
Added additional comments and some code related to subregs.
Rebased with latest llvm, pushed up to trigger some AMD internal Jenkins testing.
Will work on the subregs suggestions next.
Oct 10 2018
Cleaned up global-saddr.ll test, reducing number of functions, and use addrspace(1) i64
Consolidated 5 tests into 2.
A few minor cleanups on some spacing and indentation.
Oct 9 2018
is this patch worthy of a lit test ?
Cleaned up an if-else statement
Slightly reduced global-saddr-misc.ll
updated global-saddr-misc.ll with instnamer changes
ran instnamer on global-saddr-misc.ll
Oct 8 2018
Thanks Mark , the instnamer suggestion is a good idea. Completed locally, waiting on further review comments from Matt.