Page MenuHomePhabricator
Feed Advanced Search

Sun, Nov 22

hliao abandoned D91928: [nvptx] Skip alloca for read-only byval arguments..

In case it's not used in PHI or SELECT and cannot ensure the result is also a pointer to the parameter space, we could skip alloca insertion.

I think an allowlist might be more appropriate than a denylist. Rather than, anything other than PHI and SELECT, could it be, if it's only transitively used by gep and load we're good?

I am not 100% sure even that works, though. The real problem is that this pass is trying to reason about what the addrspace inference pass is capable of. We can only do the transformation if here if we're positive that addrspace inference will eliminate all generic loads from the arg. That's a layering violation and ultimately is fragile.

Sun, Nov 22, 3:02 PM · Restricted Project
hliao added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

I don't believe there's any exception to prove deduction [of the readonly attribute] wrong.

Understood.

The address space inference here only refers to the one in the backend directly after this argument lowering gpass.

Also understood.

This isn't speaking to my concern, though.

Suppose we have

__global__ void foo(int x, const int* y, int* out, bool flag) {
  int* ptr = flag ? &x : y;
  *out = *ptr;
}

In this case we can say with confidence that x is readonly.

But address space inference cannot infer the address space of ptr (how could it?). Therefore we will do a generic load, which is wrong.

Sun, Nov 22, 12:49 PM · Restricted Project
hliao added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

This looks really simple, which is awesome. I am enthusiastic. But I am worried it may not be correct.

AIUI params are special in that they *must* be read from the param address space. It is illegal to do a generic load of a param.

So this change is correct only if we can guarantee that address space inference will infer the specific address space for all uses of the pointer.

But address space inference is not guaranteed. For example, you could select on two pointers of two different address spaces. So long as you only ever read from these pointers, the arg can still be marked as ReadOnly. But with this patch, we'd end up doing a generic load from the param space, which would be illegal.

Take it all with a grain of salt since I've also been out of the game for a while.

Sun, Nov 22, 9:20 AM · Restricted Project

Sat, Nov 21

hliao added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

It turns out that the simplest way is to skip generating alloca once that byval argument is readonly. As readonly will be attributed once there's no write to that argument, it's safe to just cast that pointer to the parameter space if it has readonly. Basically, that argument lowering pass does a similar to D91590 but, instead, applies that in the backend. I verified that, for that simple test CUDA code, it would generate the same SASS.

Sat, Nov 21, 11:55 PM · Restricted Project
hliao added a reviewer for D91928: [nvptx] Skip alloca for read-only byval arguments.: jlebar.
Sat, Nov 21, 11:52 PM · Restricted Project
hliao requested review of D91928: [nvptx] Skip alloca for read-only byval arguments..
Sat, Nov 21, 11:51 PM · Restricted Project
hliao committed rGdcc06597b1d6: Fix shared build. (authored by hliao).
Fix shared build.
Sat, Nov 21, 2:08 PM

Thu, Nov 19

hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Do you have permission to commit?

Thu, Nov 19, 9:10 PM · Restricted Project

Wed, Nov 18

hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

Post order visit and iteratively run are merged.

Wed, Nov 18, 8:15 AM · Restricted Project
hliao added a comment to D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters..

As mentioned earlier, that's very experimental support. Even though the SASS looks reasonable, it still needs verifying on real systems. For non-kernel functions, it seems we share the path. So that we should do a similar thing. The current approach fixes that in the codegen phase by adding back the alloca to match the parameter space semantic. Once that alloca is dynamically indexed, it won't be promoted in SROA. Only instcomb eliminates that alloca when it is only modified once by copying from a constant memory. As instcomb won't break certain patterns prepared in the codegen preparation, it won't run in the backend. That dynamically indexed alloca won't be removed.

Wed, Nov 18, 12:15 AM · Restricted Project, Restricted Project

Tue, Nov 17

hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

BTW, please add a test case with that def in back-edge with acyclic dep.

Tue, Nov 17, 10:23 PM · Restricted Project
hliao accepted D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

Tue, Nov 17, 9:57 PM · Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

Tue, Nov 17, 10:58 AM · Restricted Project

Mon, Nov 16

hliao added a comment to D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters..

This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment

includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through LDC instruction in SASS instead of MOV due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

Mon, Nov 16, 11:24 PM · Restricted Project, Restricted Project
hliao requested review of D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters..
Mon, Nov 16, 11:15 PM · Restricted Project, Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

That's exactly what I had in mind, a phi node as the only way to get a cyclic dependency in SSA.

I tend to say this is LGTM. Although I wish to see a test with a cyclic dependency.

Mon, Nov 16, 9:21 PM · Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

Mon, Nov 16, 9:14 PM · Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Mon, Nov 16, 8:51 PM · Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Mon, Nov 16, 2:24 PM · Restricted Project, Restricted Project
hliao committed rGf375885ab86d: [InferAddrSpace] Teach to handle assumed address space. (authored by hliao).
[InferAddrSpace] Teach to handle assumed address space.
Mon, Nov 16, 2:07 PM
hliao closed D91121: [InferAddrSpace] Teach to handle assumed address space..
Mon, Nov 16, 2:06 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Mon, Nov 16, 10:15 AM · Restricted Project, Restricted Project
hliao added a comment to D91121: [InferAddrSpace] Teach to handle assumed address space..

Kindly ping for review.

Mon, Nov 16, 8:29 AM · Restricted Project, Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

Mon, Nov 16, 6:47 AM · Restricted Project

Fri, Nov 13

hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Fri, Nov 13, 1:38 PM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Revise the interface of that target hook.
Add a dedicated test case for value reading from parameter even though most cases are already covered in the clang test.

Fri, Nov 13, 1:31 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Fri, Nov 13, 9:57 AM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Revise the condition check.

Fri, Nov 13, 9:57 AM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Fri, Nov 13, 8:11 AM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..
  • Add a note in the AMDGPU usage document on the assumption made here.
  • Revise the test in clang.
Fri, Nov 13, 8:09 AM · Restricted Project, Restricted Project

Thu, Nov 12

hliao committed rG8920ef06a138: [hip] Remove the coercion on aggregate kernel arguments. (authored by hliao).
[hip] Remove the coercion on aggregate kernel arguments.
Thu, Nov 12, 6:20 PM
hliao closed D89980: [hip] Remove the coercion on aggregate kernel arguments..
Thu, Nov 12, 6:19 PM · Restricted Project
hliao added inline comments to D89980: [hip] Remove the coercion on aggregate kernel arguments..
Thu, Nov 12, 1:55 PM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Add a test case for the single element struct.

Thu, Nov 12, 1:54 PM · Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Thu, Nov 12, 12:20 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Thu, Nov 12, 12:18 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Thu, Nov 12, 12:15 PM · Restricted Project, Restricted Project

Wed, Nov 11

hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

PING for review

Wed, Nov 11, 1:54 PM · Restricted Project
hliao added a comment to D91121: [InferAddrSpace] Teach to handle assumed address space..

PING for review

Wed, Nov 11, 1:54 PM · Restricted Project, Restricted Project

Tue, Nov 10

hliao added a reviewer for D89980: [hip] Remove the coercion on aggregate kernel arguments.: msearles.
Tue, Nov 10, 9:42 PM · Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Rebase

Tue, Nov 10, 6:17 PM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Fix clang-tidy warnings.

Tue, Nov 10, 12:26 PM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Revise the fix.

Tue, Nov 10, 8:58 AM · Restricted Project, Restricted Project
hliao retitled D89980: [hip] Remove the coercion on aggregate kernel arguments. from [hip] Remove kernel argument coercion. to [hip] Remove the coercion on aggregate kernel arguments..
Tue, Nov 10, 7:24 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Revise the commit message.

Tue, Nov 10, 7:23 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Remove aggregate kernel argument coercion only.

Tue, Nov 10, 7:11 AM · Restricted Project
hliao abandoned D89900: [amdgpu] Enhance disjoint memory accesses checking..

with multiple MMO is supported in the scheduler, this patch is no longer for performance.

Tue, Nov 10, 7:10 AM · Restricted Project

Mon, Nov 9

hliao requested review of D91121: [InferAddrSpace] Teach to handle assumed address space..
Mon, Nov 9, 9:30 PM · Restricted Project, Restricted Project

Sun, Nov 8

hliao committed rGfa5d31f82569: [GlobalsAA] Teach to handle `addrspacecast`. (authored by hliao).
[GlobalsAA] Teach to handle `addrspacecast`.
Sun, Nov 8, 9:05 PM

Thu, Nov 5

hliao committed rG23c6d1501d80: [amdgpu] Add `llvm.amdgcn.endpgm` support. (authored by hliao).
[amdgpu] Add `llvm.amdgcn.endpgm` support.
Thu, Nov 5, 4:07 PM
hliao closed D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..
Thu, Nov 5, 4:07 PM · Restricted Project, Restricted Project
hliao added a comment to D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..

Should this also be IntrConvergent?

Probably yes... This is control flow after all.

Thu, Nov 5, 1:13 PM · Restricted Project, Restricted Project
hliao added inline comments to D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..
Thu, Nov 5, 11:51 AM · Restricted Project, Restricted Project
hliao updated the diff for D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..

Add IntrCold.

Thu, Nov 5, 11:51 AM · Restricted Project, Restricted Project

Wed, Nov 4

hliao requested review of D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..
Wed, Nov 4, 6:24 PM · Restricted Project, Restricted Project

Tue, Nov 3

hliao committed rG4b1120159274: [MachineInstr] Add support for instructions with multiple memory operands. (authored by hliao).
[MachineInstr] Add support for instructions with multiple memory operands.
Tue, Nov 3, 5:49 PM
hliao closed D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Tue, Nov 3, 5:49 PM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

This should use byref, but I don't think this should come at the cost of the promotion. I would still like to see this promotion occur for the in-memory byref type

Once we use byref, that in-memory byref type has no way to be preserved based on C model as it will be treated as a local variable. The initial value with the coerced type won't be preserved after that. That happens to the case with static index as well, but the promotion helps to build the chain from the initial value to the final use. But, if we cannot promote alloca finally, we lost that information or cannot assume that.

Then the promotion can also be applied to the temporary argument slot

Tue, Nov 3, 12:55 PM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

This should use byref, but I don't think this should come at the cost of the promotion. I would still like to see this promotion occur for the in-memory byref type

Tue, Nov 3, 12:10 PM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..
In D89980#2371526, @tra wrote:

@jlebar -- FYI. This looks pretty similar to the issue you've reported recently for NVPTX.

Tue, Nov 3, 10:14 AM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

The code could be simply converted to a kernel one following the same pattern:

Tue, Nov 3, 8:49 AM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

I think this is a dead end approach. I don't see the connection to the original problem you are trying to solve. Can you send me an IR testcase that this is supposed to help?

Tue, Nov 3, 7:37 AM · Restricted Project
hliao added inline comments to D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Tue, Nov 3, 7:08 AM · Restricted Project
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Change how that limit is set.

Tue, Nov 3, 7:07 AM · Restricted Project

Fri, Oct 30

hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

Even GLOBAL may have a better addressing mode, the unpromotable alloca resolved in this change has an even significant performance issue. We could favor GLOBAL LOAD/STORE for kernel function as I proposed in other threads but, considering that an aggregate argument may be accessed indirectly, we need to pass it indirectly.

Fri, Oct 30, 4:57 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Kindly PING again

Fri, Oct 30, 12:57 PM · Restricted Project
hliao committed rGc82403d025f3: [gvn] PRE needs to skip convergent intrinsics/calls. (authored by hliao).
[gvn] PRE needs to skip convergent intrinsics/calls.
Fri, Oct 30, 8:25 AM
hliao closed D90391: [gvn] PRE needs to skip convergent intrinsics/calls..
Fri, Oct 30, 8:24 AM · Restricted Project
hliao added inline comments to D90391: [gvn] PRE needs to skip convergent intrinsics/calls..
Fri, Oct 30, 8:05 AM · Restricted Project
hliao updated the diff for D90391: [gvn] PRE needs to skip convergent intrinsics/calls..

Revise the test following reviewers' comments.

Fri, Oct 30, 8:02 AM · Restricted Project

Thu, Oct 29

hliao committed rG15a68fed111f: Fix shared build. (authored by hliao).
Fix shared build.
Thu, Oct 29, 9:46 PM

Oct 29 2020

hliao updated the diff for D90391: [gvn] PRE needs to skip convergent intrinsics/calls..

Revise the comment.

Oct 29 2020, 6:38 AM · Restricted Project
hliao requested review of D90391: [gvn] PRE needs to skip convergent intrinsics/calls..
Oct 29 2020, 6:26 AM · Restricted Project

Oct 27 2020

hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

PING for review

Oct 27 2020, 12:05 PM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

Besides the unpromotable alloca issue due to indirect accesses, such coercion to GLOBAL pointer directly is not safe as, in HIP/CUDA, both CONSTANT and GLOBAL pointers would be passed as the kernel arguments. Without introducing a new address space combing GLOBAL/CONSTANT, such coercion would be unsafe.

Oct 27 2020, 11:30 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove the coercion on aggregate kernel arguments..
Oct 27 2020, 11:17 AM · Restricted Project
hliao committed rG46c3d5cb05d6: [amdgpu] Add the late codegen preparation pass. (authored by hliao).
[amdgpu] Add the late codegen preparation pass.
Oct 27 2020, 11:08 AM
hliao closed D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
Oct 27 2020, 11:08 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove the coercion on aggregate kernel arguments..
Oct 27 2020, 9:33 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove the coercion on aggregate kernel arguments..
Oct 27 2020, 9:29 AM · Restricted Project
hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Fix coding style following clang-tidy.

Oct 27 2020, 8:55 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Add amdgpu-kernel-arg-pointer-type.cu back and revise its checks.

Oct 27 2020, 8:49 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove the coercion on aggregate kernel arguments..
Oct 27 2020, 7:54 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Revise the comment and point the safety issue by coercing the kernel argument
from a generic pointer to a global one.

Oct 27 2020, 7:44 AM · Restricted Project
hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..
In D89980#2348339, @tra wrote:

Are there any tests to illustrate what this change does to IR or generated code?

Oct 27 2020, 7:41 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Test case is enhanced to check that no kernel argument type is coerced.

Oct 27 2020, 7:39 AM · Restricted Project
hliao committed rG0d092303b446: [amdgpu] Enable use of AA during codegen. (authored by hliao).
[amdgpu] Enable use of AA during codegen.
Oct 27 2020, 6:46 AM
hliao closed D89320: [amdgpu] Enable use of AA during codegen..
Oct 27 2020, 6:46 AM · Restricted Project

Oct 26 2020

hliao added inline comments to D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Oct 26 2020, 3:01 PM · Restricted Project
hliao updated the diff for D89320: [amdgpu] Enable use of AA during codegen..

Fix more regression tests due to the enhanced AMDGPU AA.

Oct 26 2020, 7:38 AM · Restricted Project
hliao added inline comments to D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Oct 26 2020, 7:14 AM · Restricted Project
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Remove unordered check.

Oct 26 2020, 7:09 AM · Restricted Project

Oct 24 2020

hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Rebase

Oct 24 2020, 8:01 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Has the SystremZ Prefetch issues been resolved?

Oct 24 2020, 8:51 AM · Restricted Project

Oct 23 2020

hliao committed rG9497e2e7d88f: Fix shared build. NFC. (authored by hliao).
Fix shared build. NFC.
Oct 23 2020, 12:53 PM
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Rebase

Oct 23 2020, 8:27 AM · Restricted Project

Oct 22 2020

hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

PING for review. As a similar patch was approved, shall I just commit it again with the compilation time issue is addressed?

Oct 22 2020, 10:01 PM · Restricted Project
hliao added a reviewer for D89447: [MachineInstr] Add support for instructions with multiple memory operands.: dmgreen.
Oct 22 2020, 1:11 PM · Restricted Project
hliao requested review of D89980: [hip] Remove the coercion on aggregate kernel arguments..
Oct 22 2020, 12:54 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Just kingly PING for review.

Oct 22 2020, 8:08 AM · Restricted Project