Page MenuHomePhabricator
Feed Advanced Search

Tue, May 28

ronlieb added a comment to D62431: Fix OMP_TARGET_OFFLOAD parsing.

thanks for test, LGTM

Tue, May 28, 1:41 PM · Unknown Object (Project)
ronlieb added a comment to D62431: Fix OMP_TARGET_OFFLOAD parsing.

one follow on question, is there value in adding a testcase for this patch ?

Tue, May 28, 7:47 AM · Unknown Object (Project)

Fri, May 24

ronlieb added a comment to D62431: Fix OMP_TARGET_OFFLOAD parsing.

looks like a good change
also consistent with some recent OpenMP language examples committee discussions on this topic.

Fri, May 24, 3:41 PM · Unknown Object (Project)

May 17 2019

ronlieb abandoned D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

Superseded by D61313

May 17 2019, 11:45 AM · Restricted Project

May 7 2019

ronlieb added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

This sure does look like the same problem to me. https://reviews.llvm.org/D60459

D60459 doesn't fix the issue.

May 7 2019, 8:54 AM · Restricted Project

May 3 2019

ronlieb added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

have you run all the lit tests on this patch ?

May 3 2019, 2:54 PM · Restricted Project
ronlieb added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

This sure does look like the same problem to me. https://reviews.llvm.org/D60459

May 3 2019, 5:10 AM · Restricted Project

Apr 18 2019

ronlieb retitled D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl from SILoadStoreOptimizer pass mischedules s_add,s_addc with interfering s_lshl to SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 18 2019, 8:30 AM · Restricted Project
ronlieb updated the diff for D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

Added use of LivePhysRegs, happily lifted some code Krzy wrote for Hexagon to compute getLiveRegsAt.

Apr 18 2019, 8:29 AM · Restricted Project

Apr 17 2019

ronlieb updated the diff for D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

slightly generalized to some physical reg. only look at previous instruction.
The definition is either there, and were all good, or we will bail.

Apr 17 2019, 10:23 AM · Restricted Project

Apr 12 2019

ronlieb updated the diff for D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 12 2019, 2:38 PM · Restricted Project
ronlieb added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 12 2019, 2:38 PM · Restricted Project

Apr 11 2019

ronlieb added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 11 2019, 12:51 PM · Restricted Project

Apr 10 2019

ronlieb updated the diff for D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

added two MIR tests,and refined logic to properly bail.

Apr 10 2019, 8:02 AM · Restricted Project

Apr 9 2019

ronlieb added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 9 2019, 6:01 PM · Restricted Project
ronlieb updated the diff for D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

Added check for instr match missing, and bail on optimization if so.
I prefer the .ll test we have for the patch now over that of creating an MIR test for this issue.

Apr 9 2019, 4:58 PM · Restricted Project
ronlieb added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

i think bailing the optimization if not found within some reasonable distance (10 seems to be popular), is a good suggestion. Much better than aborting. thx

Apr 9 2019, 4:24 PM · Restricted Project
ronlieb added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

The current problem i am trying to resolve in somewhat analogous to hoisting 1/2 of the 64 bit add instruction pair. Although in this particular situation we are actually sinking 1/2 of the instruction pair into a later position within the same block. And yes, i can see how in the future a new machine instruction pass might choose to hoist one of the instructions into a pred BB. I realize i can write additional code to scan a previous block. However i think its better that passes not hoist part of an instruction pair, especially ones such as these. To that end i would rather see my patch assert so that we are forced to deal with such a situation should it arise.
Your example, btw, is a good one for why we should have an IR test for the current problem, rather than an MIR test. An MIR test that runs just before SILoadStoreOptimizer will not detect the affects of a new pass. Whereas the IR test attached to this patch stands a better chance of detecting the issue.

Apr 9 2019, 4:12 PM · Restricted Project
ronlieb added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

i agree it could happen. Not sure what to do about it here.

Apr 9 2019, 1:35 PM · Restricted Project
ronlieb added inline comments to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 9 2019, 1:27 PM · Restricted Project
ronlieb updated the diff for D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 9 2019, 9:39 AM · Restricted Project
ronlieb added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

after looking at the suggestion of using computeRegisterLiveness, I noticed that it does not return the MI where the register in question is most recently defined.
Rather, it informs on liveness within a range. I don’t really see how I would use this method effectively?
The problem I am trying to solve requires identifying a specific instruction that is needed by a subsequent instruction and then adding the identified instruction to a list constructed by SILoadStoreOptimizer.

Apr 9 2019, 9:35 AM · Restricted Project
ronlieb added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

correction: the original input test did NOT have the instructions separated by more than 1 or 2 instructions The resultant output showed the large separation.
The default neighborhood of 10 is probably more than enough.

Apr 9 2019, 7:14 AM · Restricted Project
ronlieb added a comment to D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.

test will convert to MIR form.
Patch will change to use computeRegisterLiveness. i will have to use a pretty large neighborhood , as the original code this error occurred in (before running bugpoint) , had the s_add_u32 instruction was separated by over 400 instructions from the s_addc_u32 instruction. We will assert fail if we cannot find the s_add_u32 instruction, so that will alert us to increase neighborhood size. This patch will also handle the corresponding sub instructions.

Apr 9 2019, 7:05 AM · Restricted Project
ronlieb created D60459: SILoadStoreOptimizer pass schedules s_add,s_addc with interfering s_lshl.
Apr 9 2019, 6:16 AM · Restricted Project

Dec 31 2018

ronlieb accepted D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.

LGTM, pending what you decide about adding another lit test.

Dec 31 2018, 9:17 AM

Dec 30 2018

ronlieb added a comment to D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.

generally seems fine to me.
Would it be reasonable/useful to have a lit test that somewhat represents what we observed in the DeviceMemory test ?
if you think the fdiv32-to-rcp-folding.ll adequately covers it, then thats fine by me.

Dec 30 2018, 11:25 AM

Dec 13 2018

ronlieb added inline comments to D55570: [AMDGPU] Improve SDWA generation for V_OR_B32_E32..
Dec 13 2018, 5:07 AM

Dec 11 2018

ronlieb updated the diff for D55570: [AMDGPU] Improve SDWA generation for V_OR_B32_E32..
Dec 11 2018, 6:22 PM
ronlieb added a comment to D55570: [AMDGPU] Improve SDWA generation for V_OR_B32_E32..

next patch

Dec 11 2018, 6:22 PM
ronlieb created D55570: [AMDGPU] Improve SDWA generation for V_OR_B32_E32..
Dec 11 2018, 12:33 PM

Dec 3 2018

ronlieb committed rL348132: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
[AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos
Dec 3 2018, 5:08 AM
ronlieb closed D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 3 2018, 5:08 AM

Dec 2 2018

ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 6:09 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 4:48 PM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 4:44 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 3:40 PM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 3:39 PM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 12:56 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 11:06 AM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 7:28 AM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Dec 2 2018, 7:24 AM

Nov 30 2018

ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 6:35 PM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 6:05 PM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 6:01 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 5:33 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 12:45 PM
ronlieb committed rL348014: [AMDGPU] Disable SReg Global LD/ST, perf regression.
[AMDGPU] Disable SReg Global LD/ST, perf regression
Nov 30 2018, 10:32 AM
ronlieb closed D55093: [AMDGPU] Disable SReg Global LD/ST, perf regression.
Nov 30 2018, 10:32 AM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 8:37 AM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 30 2018, 8:06 AM
ronlieb added a comment to D55093: [AMDGPU] Disable SReg Global LD/ST, perf regression.

Patch passed our internal jenkins testing: #890

Nov 30 2018, 5:08 AM

Nov 29 2018

ronlieb created D55093: [AMDGPU] Disable SReg Global LD/ST, perf regression.
Nov 29 2018, 4:26 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Rebased

Nov 29 2018, 10:33 AM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Added MIR test, and changes per review comments.

Nov 29 2018, 9:39 AM
ronlieb added inline comments to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 29 2018, 9:38 AM

Nov 27 2018

ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Adding the shrink pass just before Peephole SDWA does not help the lit test, it made no difference.
I think at this point i should proceed with adding the MIR test to verify when we cannot fold.

Do you know why was it unable to shrink these instructions?

SIInstrInfo::splitScalar64BitAddSub converts the S_ADD_U64_PSEUDO into the two add instructions which use the SReg_64_XEXECRegClass instead of VCC.
Later when SIShrinkInstructions::runOnMachineFunction pass runs, it sees that the Carry regs are not VCC and simply marks them with a hint to later convert to VCC ,
and then continues without doing a transformation.

if (SDst) {
  if (SDst->getReg() != AMDGPU::VCC) {
    if (TargetRegisterInfo::isVirtualRegister(SDst->getReg()))
      MRI.setRegAllocationHint(SDst->getReg(), 0, AMDGPU::VCC);
    continue;
  }
 
  // All of the instructions with carry outs also have an SGPR input in
  // src2.
  if (Src2 && Src2->getReg() != AMDGPU::VCC) {
    if (TargetRegisterInfo::isVirtualRegister(Src2->getReg()))
      MRI.setRegAllocationHint(Src2->getReg(), 0, AMDGPU::VCC);
 
    continue;
  }
}

OK, that makes sense. It just leaves it to post-RA shrink that way. Probably shrink pass could do the same, but it is not desirable as it limits scheduling opportunities.

But I see the problem in your code now: you do not check that vcc is not clobbered or used in between of two instructions.
I also think you need to shrink both instructions, otherwise you have carry-in of addc and carry-out of add in different registers, which just happen to be allocated to the same vcc. Note, that isConvertibleToSDWA() returning true does not guarantee final sdwa conversion, so you can end up with vop3 form for the first instruction anyway.

i think the following code does make sure that there are no intervening uses, i can also strengthen it to make sure the defining instruction of the CarryIn is the first ADD instruction.
+ if (!MRI->hasOneUse(CarryIn->getReg()) || !MRI->use_empty(CarryOut->getReg()))
+ return false;

It does not. It only checks that original sreg is not used. However you are replacing original carry sreg with vcc by shrinking instruction, and you do not check vcc uses.

Nov 27 2018, 2:48 PM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

latest approach: transform the pair of ADDs/SUBs into e32, and tighten up check on def/use from one ADD to the other.

Nov 27 2018, 2:42 PM
ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Adding the shrink pass just before Peephole SDWA does not help the lit test, it made no difference.
I think at this point i should proceed with adding the MIR test to verify when we cannot fold.

Do you know why was it unable to shrink these instructions?

SIInstrInfo::splitScalar64BitAddSub converts the S_ADD_U64_PSEUDO into the two add instructions which use the SReg_64_XEXECRegClass instead of VCC.
Later when SIShrinkInstructions::runOnMachineFunction pass runs, it sees that the Carry regs are not VCC and simply marks them with a hint to later convert to VCC ,
and then continues without doing a transformation.

if (SDst) {
  if (SDst->getReg() != AMDGPU::VCC) {
    if (TargetRegisterInfo::isVirtualRegister(SDst->getReg()))
      MRI.setRegAllocationHint(SDst->getReg(), 0, AMDGPU::VCC);
    continue;
  }
 
  // All of the instructions with carry outs also have an SGPR input in
  // src2.
  if (Src2 && Src2->getReg() != AMDGPU::VCC) {
    if (TargetRegisterInfo::isVirtualRegister(Src2->getReg()))
      MRI.setRegAllocationHint(Src2->getReg(), 0, AMDGPU::VCC);
 
    continue;
  }
}

OK, that makes sense. It just leaves it to post-RA shrink that way. Probably shrink pass could do the same, but it is not desirable as it limits scheduling opportunities.

But I see the problem in your code now: you do not check that vcc is not clobbered or used in between of two instructions.
I also think you need to shrink both instructions, otherwise you have carry-in of addc and carry-out of add in different registers, which just happen to be allocated to the same vcc. Note, that isConvertibleToSDWA() returning true does not guarantee final sdwa conversion, so you can end up with vop3 form for the first instruction anyway.

Nov 27 2018, 2:39 PM
ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Adding the shrink pass just before Peephole SDWA does not help the lit test, it made no difference.
I think at this point i should proceed with adding the MIR test to verify when we cannot fold.

Do you know why was it unable to shrink these instructions?

Nov 27 2018, 9:02 AM
ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Essentially this is a limited version of shrinking. So I have several questions:

  1. Why not to run shrink pass before sdwa instead?

I tried adding Shrink pass before PeepholeSDWA and observed 88 lit test failures.
i tried moving Shrink pass before Peephole SDWA and observed 25 lit test failures

Which may be a good thing if these failures are progressions (as I suspect) and not regressions. Are they progressions?
That is the point of other comments too, this patch is limited to handle just two instructions while there is a clear possibility to do it for almost any VOP3.

I would also assume many of these failures are just commute which is attempted by shrink pass. That is normal and would only need to change the tests.

i tried an experiment of simply invoking the Shrink pass a 2nd time.

addPass(createSIShrinkInstructionsPass());
addPass(createSIShrinkInstructionsPass());

which resulted in 74 failures, and they do seem to be commute changes primarily (did not look at them all)
So then, i added a 3rd invocation and zero failures (i'm still laughing at this one).

So it must be commute. I guess you just need to add a new shrink pass before sdwa. Does it help to deal with these two instructions, e.g. does it help you lit test?

If yes there are two options, either:

  1. Revert the commute in shrink pass if it did not help.
  2. Just update tests.
Nov 27 2018, 7:33 AM
ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Essentially this is a limited version of shrinking. So I have several questions:

  1. Why not to run shrink pass before sdwa instead?

I tried adding Shrink pass before PeepholeSDWA and observed 88 lit test failures.
i tried moving Shrink pass before Peephole SDWA and observed 25 lit test failures

Which may be a good thing if these failures are progressions (as I suspect) and not regressions. Are they progressions?
That is the point of other comments too, this patch is limited to handle just two instructions while there is a clear possibility to do it for almost any VOP3.

I would also assume many of these failures are just commute which is attempted by shrink pass. That is normal and would only need to change the tests.

Nov 27 2018, 6:45 AM
ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Incorporated changes for Some of the Shrink suggestions.
Still need to do an MIR test.
Also, investigate if moving/adding Shrink pass results in test progressions or regressions

Nov 27 2018, 6:28 AM
ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

Essentially this is a limited version of shrinking. So I have several questions:

  1. Why not to run shrink pass before sdwa instead?

I tried adding Shrink pass before PeepholeSDWA and observed 88 lit test failures.
i tried moving Shrink pass before Peephole SDWA and observed 25 lit test failures

Nov 27 2018, 6:17 AM

Nov 26 2018

ronlieb updated the diff for D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

New patch addressing many (but not all) of the review comments.
Will look into the shrink related comments soon ...

Nov 26 2018, 5:36 PM
ronlieb added a comment to D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.

New patch arriving momentarily ...

Nov 26 2018, 5:32 PM

Nov 25 2018

ronlieb created D54882: [AMDGPU] Add sdwa support for ADD|SUB U64 decomposed Pseudos.
Nov 25 2018, 3:50 PM

Nov 16 2018

ronlieb closed D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

rL347008: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST

Nov 16 2018, 2:08 PM

Nov 15 2018

ronlieb committed rL347008: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.
[AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST
Nov 15 2018, 5:16 PM
ronlieb committed rL347002: [AMDGPU] NFC Test commit.
[AMDGPU] NFC Test commit
Nov 15 2018, 4:49 PM
ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Rebased.

Nov 15 2018, 4:46 AM

Nov 14 2018

ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Per review comments, now using getRegBitWidth(),
which exposed a missing case in getRegBitWidth() for AMDGPU::SReg_64_XEXECRegClassID

Nov 14 2018, 12:45 PM

Nov 13 2018

ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Added additional comments and some code related to subregs.

Nov 13 2018, 6:14 PM
ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Rebased with latest llvm, pushed up to trigger some AMD internal Jenkins testing.
Will work on the subregs suggestions next.

Nov 13 2018, 7:18 AM

Oct 10 2018

ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Cleaned up global-saddr.ll test, reducing number of functions, and use addrspace(1) i64

Oct 10 2018, 6:19 PM
ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Consolidated 5 tests into 2.
A few minor cleanups on some spacing and indentation.

Oct 10 2018, 10:29 AM

Oct 9 2018

ronlieb accepted D52817: AMDGPU: Only add implicit super-reg def for first subreg.
Oct 9 2018, 8:17 PM
ronlieb added a comment to D52817: AMDGPU: Only add implicit super-reg def for first subreg.

is this patch worthy of a lit test ?

Oct 9 2018, 8:07 PM
ronlieb added inline comments to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.
Oct 9 2018, 7:50 PM
ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Cleaned up an if-else statement
Slightly reduced global-saddr-misc.ll

Oct 9 2018, 10:19 AM
ronlieb added inline comments to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.
Oct 9 2018, 9:14 AM
ronlieb added inline comments to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.
Oct 9 2018, 8:10 AM
ronlieb added inline comments to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.
Oct 9 2018, 7:45 AM
ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

updated global-saddr-misc.ll with instnamer changes

Oct 9 2018, 5:22 AM
ronlieb added a comment to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

ran instnamer on global-saddr-misc.ll

Oct 9 2018, 5:12 AM

Oct 8 2018

ronlieb added a comment to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Thanks Mark , the instnamer suggestion is a good idea. Completed locally, waiting on further review comments from Matt.

Oct 8 2018, 5:43 PM

Oct 6 2018

ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

added a load/store globals test case (.mir)
updated global-saddr-misc.ll per offline discussion.
changed assert to simpler if (check) continue.

Oct 6 2018, 1:28 PM
ronlieb added a comment to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

I am planning to add another lit test for coverage of all the Global load/store opcodes.

Oct 6 2018, 7:56 AM

Oct 5 2018

ronlieb updated the diff for D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Revised with various changes to reflect review comments.

Oct 5 2018, 1:43 PM
ronlieb added a comment to D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.

Matt, thanks for review, I have addressed most of the issues although the inreg/sgpr one may require some more investigation on my end.
see my responses to each of your review comments.
A revision to this patch will be uploaded in the next hour or so...

Oct 5 2018, 1:34 PM

Oct 3 2018

ronlieb created D52846: [AMDGPU] Add FixupVectorISel pass, currently Supports SREGs in GLOBAL LD/ST.
Oct 3 2018, 1:58 PM