- User Since
- Jul 24 2018, 5:18 PM (112 w, 4 d)
Thu, Sep 17
accepted. not sure if there were any outstanding comments from Johannes. your call
Wed, Sep 16
The latest patch applied cleanly to our downstream port.
builds fine, tests very nicely as well. All the failing SOLLVE task wait depend tests now pass.
Tue, Sep 15
[AMD Official Use Only - Internal Distribution Only]
Tue, Sep 8
i applied this patch to our downstream build for amdgcn. it resolves the problem i was seeing with traps on tests with task wait depend tests from sollve.
perhaps rebase it to latest llvm.
Aug 16 2020
Jul 2 2020
LGTM, would like Saiyed and Johannes to review and approve
Jun 26 2020
LGTM, please commit
Jun 24 2020
thanks for the quick response, lgtm.
Jun 20 2020
if its relevant, the llvm was built with gcc , but the test is of course compiled with clang
Thx for the quick fix, this seems to have resolved my issue. LGTM
Jun 2 2020
Johannes: here is a reduce source test case, let me know what else you might need?
Apr 24 2020
Modified patch to assign to IsNew on entry to Device::getOrAllocTgtPtr
Apr 23 2020
Nov 4 2019
Oct 2 2019
Sep 9 2019
Aug 8 2019
I like the general approach,
Jul 17 2019
Thanks folks, publishing now
@grokos any comments here please?
hoping to get this patch in before the 9.0 branch.
Jul 16 2019
May 28 2019
thanks for test, LGTM
one follow on question, is there value in adding a testcase for this patch ?
May 24 2019
looks like a good change
also consistent with some recent OpenMP language examples committee discussions on this topic.
May 17 2019
Superseded by D61313
May 7 2019
May 3 2019
have you run all the lit tests on this patch ?
This sure does look like the same problem to me. https://reviews.llvm.org/D60459
Apr 18 2019
Added use of LivePhysRegs, happily lifted some code Krzy wrote for Hexagon to compute getLiveRegsAt.
Apr 17 2019
slightly generalized to some physical reg. only look at previous instruction.
The definition is either there, and were all good, or we will bail.
Apr 12 2019
Apr 11 2019
Apr 10 2019
added two MIR tests,and refined logic to properly bail.
Apr 9 2019
Added check for instr match missing, and bail on optimization if so.
I prefer the .ll test we have for the patch now over that of creating an MIR test for this issue.
i think bailing the optimization if not found within some reasonable distance (10 seems to be popular), is a good suggestion. Much better than aborting. thx
The current problem i am trying to resolve in somewhat analogous to hoisting 1/2 of the 64 bit add instruction pair. Although in this particular situation we are actually sinking 1/2 of the instruction pair into a later position within the same block. And yes, i can see how in the future a new machine instruction pass might choose to hoist one of the instructions into a pred BB. I realize i can write additional code to scan a previous block. However i think its better that passes not hoist part of an instruction pair, especially ones such as these. To that end i would rather see my patch assert so that we are forced to deal with such a situation should it arise.
Your example, btw, is a good one for why we should have an IR test for the current problem, rather than an MIR test. An MIR test that runs just before SILoadStoreOptimizer will not detect the affects of a new pass. Whereas the IR test attached to this patch stands a better chance of detecting the issue.
i agree it could happen. Not sure what to do about it here.
after looking at the suggestion of using computeRegisterLiveness, I noticed that it does not return the MI where the register in question is most recently defined.
Rather, it informs on liveness within a range. I dont really see how I would use this method effectively?
The problem I am trying to solve requires identifying a specific instruction that is needed by a subsequent instruction and then adding the identified instruction to a list constructed by SILoadStoreOptimizer.
correction: the original input test did NOT have the instructions separated by more than 1 or 2 instructions The resultant output showed the large separation.
The default neighborhood of 10 is probably more than enough.
test will convert to MIR form.
Patch will change to use computeRegisterLiveness. i will have to use a pretty large neighborhood , as the original code this error occurred in (before running bugpoint) , had the s_add_u32 instruction was separated by over 400 instructions from the s_addc_u32 instruction. We will assert fail if we cannot find the s_add_u32 instruction, so that will alert us to increase neighborhood size. This patch will also handle the corresponding sub instructions.
Dec 31 2018
LGTM, pending what you decide about adding another lit test.
Dec 30 2018
generally seems fine to me.
Would it be reasonable/useful to have a lit test that somewhat represents what we observed in the DeviceMemory test ?
if you think the fdiv32-to-rcp-folding.ll adequately covers it, then thats fine by me.
Dec 13 2018
Dec 11 2018
Dec 3 2018
Dec 2 2018
Nov 30 2018
Patch passed our internal jenkins testing: #890
Nov 29 2018
Added MIR test, and changes per review comments.
Nov 27 2018
latest approach: transform the pair of ADDs/SUBs into e32, and tighten up check on def/use from one ADD to the other.
Incorporated changes for Some of the Shrink suggestions.
Still need to do an MIR test.
Also, investigate if moving/adding Shrink pass results in test progressions or regressions
I tried adding Shrink pass before PeepholeSDWA and observed 88 lit test failures.
i tried moving Shrink pass before Peephole SDWA and observed 25 lit test failures
Nov 26 2018
New patch addressing many (but not all) of the review comments.
Will look into the shrink related comments soon ...
New patch arriving momentarily ...
Nov 25 2018
Nov 16 2018
rL347008: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Nov 15 2018