Page MenuHomePhabricator

hliao (Michael Liao)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 7 2014, 12:01 PM (337 w, 2 d)

Recent Activity

Wed, Jan 20

hliao committed rG7b5d7c7b0a24: [hip] Fix `<complex>` compilation on Windows with VS2019. (authored by hliao).
[hip] Fix `<complex>` compilation on Windows with VS2019.
Wed, Jan 20, 1:44 PM
hliao closed D95075: [hip] Fix `<complex>` compilation on Windows with VS2019..
Wed, Jan 20, 1:44 PM · Restricted Project
hliao updated the diff for D95075: [hip] Fix `<complex>` compilation on Windows with VS2019..

Fix typo.

Wed, Jan 20, 12:10 PM · Restricted Project
hliao requested review of D95075: [hip] Fix `<complex>` compilation on Windows with VS2019..
Wed, Jan 20, 11:56 AM · Restricted Project

Thu, Jan 7

hliao committed rGf78d6af7319a: [hip] Enable HIP compilation with `<complex`> on MSVC. (authored by hliao).
[hip] Enable HIP compilation with `<complex`> on MSVC.
Thu, Jan 7, 2:42 PM
hliao closed D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Thu, Jan 7, 2:41 PM · Restricted Project
hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Forget that C function could be overloaded on Clang with overloadable
extension. With that, we don't need to mark functions from <ymath.h> as HD.
Instead, we could provide their device-side implementation directly.

Thu, Jan 7, 2:28 AM · Restricted Project
hliao added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Thu, Jan 7, 1:46 AM · Restricted Project

Wed, Jan 6

hliao added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 5:54 PM · Restricted Project
hliao added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 5:32 PM · Restricted Project
hliao added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 2:50 PM · Restricted Project
hliao committed rG2a29ce303451: [hip] Fix HIP version parsing. (authored by hliao).
[hip] Fix HIP version parsing.
Wed, Jan 6, 2:00 PM
hliao closed D93587: [hip] Fix HIP version parsing..
Wed, Jan 6, 2:00 PM · Restricted Project
hliao added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 1:59 PM · Restricted Project
hliao added inline comments to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Wed, Jan 6, 10:55 AM · Restricted Project
hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Only mark HD attributes in ymath.h wrapper header when compiled with MSVC.

Wed, Jan 6, 10:55 AM · Restricted Project
hliao updated the diff for D93587: [hip] Fix HIP version parsing..

Revise following reviewers' comments.

Wed, Jan 6, 10:45 AM · Restricted Project
hliao added inline comments to D93587: [hip] Fix HIP version parsing..
Wed, Jan 6, 10:45 AM · Restricted Project

Tue, Jan 5

hliao added a comment to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

PING

Tue, Jan 5, 8:47 AM · Restricted Project
hliao added a comment to D93587: [hip] Fix HIP version parsing..

PING

Tue, Jan 5, 8:47 AM · Restricted Project

Dec 22 2020

hliao added inline comments to D92999: [amdgpu] Enhance load widening in the constant address space..
Dec 22 2020, 9:51 AM · Restricted Project

Dec 21 2020

hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Fix the cmake to distribute that header wrapper.

Dec 21 2020, 3:33 PM · Restricted Project
hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

These functions are pure C functions.

Dec 21 2020, 2:16 PM · Restricted Project
hliao updated the summary of D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Dec 21 2020, 2:10 PM · Restricted Project
hliao committed rGbb8d20d9f3bb: [cuda][hip] Fix typoes in header wrappers. (authored by hliao).
[cuda][hip] Fix typoes in header wrappers.
Dec 21 2020, 10:03 AM
hliao added a comment to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Disclaimer: I request changes because of the next sentence, other than that I have no objection but also cannot review this.
All cuda_wrapper headers say something about complex in the first row, copy & paste error. All have the wrong license text (I think).

Dec 21 2020, 9:59 AM · Restricted Project
hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Fix license.

Dec 21 2020, 9:58 AM · Restricted Project
hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Fix typo again.

Dec 21 2020, 9:53 AM · Restricted Project
hliao updated the diff for D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Fix typo.

Dec 21 2020, 8:28 AM · Restricted Project
hliao added a comment to D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..

Beyond the enabling of the compilation with <complex> on Windows, I really have the concern on the current approach supporting <complex> compilation in the device compilation. The device compilation should not relies on the host STL implementation. That results in inconsistent compilation results across various platforms, especially Linux vs. Windows.
BTW, the use of <complex> in CUDA cannot be compiled with NVCC directly even with --expt-relaxed-constexpr, c.f. https://godbolt.org/z/3f79co

Dec 21 2020, 8:27 AM · Restricted Project
hliao requested review of D93638: [hip] Enable HIP compilation with `<complex`> on MSVC..
Dec 21 2020, 7:59 AM · Restricted Project

Dec 19 2020

hliao requested review of D93587: [hip] Fix HIP version parsing..
Dec 19 2020, 3:56 PM · Restricted Project

Dec 14 2020

hliao added inline comments to D93174: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified..
Dec 14 2020, 10:10 AM · Restricted Project
hliao committed rG1fd1f638b68c: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified. (authored by hliao).
[amdgpu] Fix a crash case when `V_CNDMASK` could be simplified.
Dec 14 2020, 10:08 AM
hliao closed D93174: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified..
Dec 14 2020, 10:08 AM · Restricted Project
hliao added inline comments to D93174: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified..
Dec 14 2020, 8:11 AM · Restricted Project
hliao added inline comments to D93174: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified..
Dec 14 2020, 6:58 AM · Restricted Project

Dec 13 2020

hliao added a comment to rG8904ee8ac7eb: [JITLink] Add JITLinkDylib type, thread through JITLinkMemoryManager APIs..

The build is broken due to the missing file.

Dec 13 2020, 6:15 PM

Dec 12 2020

hliao requested review of D93174: [amdgpu] Fix a crash case when `V_CNDMASK` could be simplified..
Dec 12 2020, 9:41 PM · Restricted Project

Dec 10 2020

hliao accepted D92893: [CUDA] Do not diagnose host/device variable access in dependent types..

LGTM if you revise the test based on Sam's suggestion on the test case.

Dec 10 2020, 9:34 PM · Restricted Project

Dec 9 2020

hliao requested review of D92999: [amdgpu] Enhance load widening in the constant address space..
Dec 9 2020, 10:22 PM · Restricted Project

Dec 8 2020

hliao added a comment to D92893: [CUDA] Do not diagnose host/device variable access in dependent types..

LGTM if there's a regression test available.

Dec 8 2020, 11:40 PM · Restricted Project

Dec 4 2020

hliao added inline comments to D92394: [amdgpu] Teach one more case for assumed global pointers..
Dec 4 2020, 6:44 PM · Restricted Project

Dec 2 2020

hliao committed rG21d74172dff7: Remove `-Wunused-result` and `-Wpedantic` warnings from GCC. NFC. (authored by hliao).
Remove `-Wunused-result` and `-Wpedantic` warnings from GCC. NFC.
Dec 2 2020, 7:54 AM
hliao committed rGd8949a8ad3ca: [hip] Fix host object creation from fatbin (authored by hliao).
[hip] Fix host object creation from fatbin
Dec 2 2020, 7:36 AM
hliao closed D92418: [hip] Fix host object creation from fatbin.
Dec 2 2020, 7:36 AM · Restricted Project

Dec 1 2020

hliao added a comment to D92418: [hip] Fix host object creation from fatbin.

Even there's no functionality change, the original one breaks the kernel extraction script, which is designed to find the .hip_fatbin section. That internal tool is still required to extract kernels from objects generated from RDC linking.

Dec 1 2020, 12:07 PM · Restricted Project
hliao requested review of D92418: [hip] Fix host object creation from fatbin.
Dec 1 2020, 12:05 PM · Restricted Project
hliao requested review of D92394: [amdgpu] Teach one more case for assumed global pointers..
Dec 1 2020, 7:34 AM · Restricted Project

Nov 22 2020

hliao abandoned D91928: [nvptx] Skip alloca for read-only byval arguments..

In case it's not used in PHI or SELECT and cannot ensure the result is also a pointer to the parameter space, we could skip alloca insertion.

I think an allowlist might be more appropriate than a denylist. Rather than, anything other than PHI and SELECT, could it be, if it's only transitively used by gep and load we're good?

I am not 100% sure even that works, though. The real problem is that this pass is trying to reason about what the addrspace inference pass is capable of. We can only do the transformation if here if we're positive that addrspace inference will eliminate all generic loads from the arg. That's a layering violation and ultimately is fragile.

Nov 22 2020, 3:02 PM · Restricted Project
hliao added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

I don't believe there's any exception to prove deduction [of the readonly attribute] wrong.

Understood.

The address space inference here only refers to the one in the backend directly after this argument lowering gpass.

Also understood.

This isn't speaking to my concern, though.

Suppose we have

__global__ void foo(int x, const int* y, int* out, bool flag) {
  int* ptr = flag ? &x : y;
  *out = *ptr;
}

In this case we can say with confidence that x is readonly.

But address space inference cannot infer the address space of ptr (how could it?). Therefore we will do a generic load, which is wrong.

Nov 22 2020, 12:49 PM · Restricted Project
hliao added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

This looks really simple, which is awesome. I am enthusiastic. But I am worried it may not be correct.

AIUI params are special in that they *must* be read from the param address space. It is illegal to do a generic load of a param.

So this change is correct only if we can guarantee that address space inference will infer the specific address space for all uses of the pointer.

But address space inference is not guaranteed. For example, you could select on two pointers of two different address spaces. So long as you only ever read from these pointers, the arg can still be marked as ReadOnly. But with this patch, we'd end up doing a generic load from the param space, which would be illegal.

Take it all with a grain of salt since I've also been out of the game for a while.

Nov 22 2020, 9:20 AM · Restricted Project

Nov 21 2020

hliao added a comment to D91928: [nvptx] Skip alloca for read-only byval arguments..

It turns out that the simplest way is to skip generating alloca once that byval argument is readonly. As readonly will be attributed once there's no write to that argument, it's safe to just cast that pointer to the parameter space if it has readonly. Basically, that argument lowering pass does a similar to D91590 but, instead, applies that in the backend. I verified that, for that simple test CUDA code, it would generate the same SASS.

Nov 21 2020, 11:55 PM · Restricted Project
hliao added a reviewer for D91928: [nvptx] Skip alloca for read-only byval arguments.: jlebar.
Nov 21 2020, 11:52 PM · Restricted Project
hliao requested review of D91928: [nvptx] Skip alloca for read-only byval arguments..
Nov 21 2020, 11:51 PM · Restricted Project
hliao committed rGdcc06597b1d6: Fix shared build. (authored by hliao).
Fix shared build.
Nov 21 2020, 2:08 PM

Nov 19 2020

hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Do you have permission to commit?

Nov 19 2020, 9:10 PM · Restricted Project

Nov 18 2020

hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

You are right, that's possible. That case should be rare as that's a def in the back-edge with acyclic dep. Could you merge the post-order change together with the iterative runs? so that, in the regular case, we at most run twice. Please keep on eye on compile time.

Post order visit and iteratively run are merged.

Nov 18 2020, 8:15 AM · Restricted Project
hliao added a comment to D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters..

As mentioned earlier, that's very experimental support. Even though the SASS looks reasonable, it still needs verifying on real systems. For non-kernel functions, it seems we share the path. So that we should do a similar thing. The current approach fixes that in the codegen phase by adding back the alloca to match the parameter space semantic. Once that alloca is dynamically indexed, it won't be promoted in SROA. Only instcomb eliminates that alloca when it is only modified once by copying from a constant memory. As instcomb won't break certain patterns prepared in the codegen preparation, it won't run in the backend. That dynamically indexed alloca won't be removed.

Nov 18 2020, 12:15 AM · Restricted Project, Restricted Project

Nov 17 2020

hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

BTW, please add a test case with that def in back-edge with acyclic dep.

Nov 17 2020, 10:23 PM · Restricted Project
hliao accepted D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

That's a great help, I pass all my related cases with this patch, Thanks a lot.

Now that we decide to use post order to visit all blocks of a function, I think we need to consider that what if CFG contains cycles?


From this picture, we can see that post order is not clearly defined cause there exits cycles, one of the possible orders is that [ m, g, d, e, c, b, t, x]
So m comes before g, if we define something in m and use it in g. Then even though both def and use are useless, cause we visit m first, we will still get a dead definition after we post-order visit all blocks.
So is it possible there still exist some cases theoretically that cannot be fixed by post-order visit? That is we may still need to iteratively run?

Nov 17 2020, 9:57 PM · Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

Using post-order is quite straight-forward and only involves several lines of change. Please check the attachment.

That test passed with this traverse order change.

Nov 17 2020, 10:58 AM · Restricted Project

Nov 16 2020

hliao added a comment to D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters..

This's an experimental or demo-only patch in my spare time on eliminating private memory usage in https://godbolt.org/z/EPPn6h. The attachment

includes both the reference and new IR, PTX, and SASS (sm_60) output. For the new code, that aggregate argument is loaded through LDC instruction in SASS instead of MOV due to the non-static address. I don't have sm_60 to verify that. Could you try that on the real hardware?

Nov 16 2020, 11:24 PM · Restricted Project, Restricted Project
hliao requested review of D91590: [NVPTX] Efficently support dynamic index on CUDA kernel aggregate parameters..
Nov 16 2020, 11:15 PM · Restricted Project, Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

It still works unless that value has a cyclic dependency through phi-node.

That's exactly what I had in mind, a phi node as the only way to get a cyclic dependency in SSA.

I tend to say this is LGTM. Although I wish to see a test with a cyclic dependency.

Nov 16 2020, 9:21 PM · Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Ah, I see. Even though we try to traverse basic blocks bottom-up, that's just the block placement order instead of block reachability. Could we replace that order with the post-order? So that, the use is always traversed before the defiine.

That probably will not help if we have a loop?

Nov 16 2020, 9:14 PM · Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

From the iteratively-run-dead-mi-elim.mir we can see that bb.5 defines %6, and %6 is used in bb.2. When we traverse the all basic blocks, that is, we runs bottome-up, we will meet bb.5 first, for %6, we find that it is not dead cause %3 in bb.2 use it. So %6 surrive. Then we continue traverse other BBs, When we meet bb.2, we see that no one use %5, so we kill it. So as %4, %3. Right now, actually %6 becomes dead cause we kill %3 thus there is no longer any one uses %6.
However, cause we only traverse blocks once, we can't erase %6 at the end. So if we iteratively visit all blocks until nothing change, then we can ensure that all dead mi is erased.

Nov 16 2020, 8:51 PM · Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 16 2020, 2:24 PM · Restricted Project, Restricted Project
hliao committed rGf375885ab86d: [InferAddrSpace] Teach to handle assumed address space. (authored by hliao).
[InferAddrSpace] Teach to handle assumed address space.
Nov 16 2020, 2:07 PM
hliao closed D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 16 2020, 2:06 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 16 2020, 10:15 AM · Restricted Project, Restricted Project
hliao added a comment to D91121: [InferAddrSpace] Teach to handle assumed address space..

Kindly ping for review.

Nov 16 2020, 8:29 AM · Restricted Project, Restricted Project
hliao added a comment to D91513: [DeadMachineInstrctionElim] Post order visit all blocks and Iteratively run DeadMachineInstructionElim pass until nothing dead.

could you elaborate more on why we need to run that iteratively? since the original one runs bottom-up, supposedly it should find all.

Nov 16 2020, 6:47 AM · Restricted Project

Nov 13 2020

hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 13 2020, 1:38 PM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Revise the interface of that target hook.
Add a dedicated test case for value reading from parameter even though most cases are already covered in the clang test.

Nov 13 2020, 1:31 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 13 2020, 9:57 AM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Revise the condition check.

Nov 13 2020, 9:57 AM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 13 2020, 8:11 AM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..
  • Add a note in the AMDGPU usage document on the assumption made here.
  • Revise the test in clang.
Nov 13 2020, 8:09 AM · Restricted Project, Restricted Project

Nov 12 2020

hliao committed rG8920ef06a138: [hip] Remove the coercion on aggregate kernel arguments. (authored by hliao).
[hip] Remove the coercion on aggregate kernel arguments.
Nov 12 2020, 6:20 PM
hliao closed D89980: [hip] Remove the coercion on aggregate kernel arguments..
Nov 12 2020, 6:19 PM · Restricted Project
hliao added inline comments to D89980: [hip] Remove the coercion on aggregate kernel arguments..
Nov 12 2020, 1:55 PM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Add a test case for the single element struct.

Nov 12 2020, 1:54 PM · Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 12 2020, 12:20 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 12 2020, 12:18 PM · Restricted Project, Restricted Project
hliao added inline comments to D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 12 2020, 12:15 PM · Restricted Project, Restricted Project

Nov 11 2020

hliao added a comment to D89980: [hip] Remove the coercion on aggregate kernel arguments..

PING for review

Nov 11 2020, 1:54 PM · Restricted Project
hliao added a comment to D91121: [InferAddrSpace] Teach to handle assumed address space..

PING for review

Nov 11 2020, 1:54 PM · Restricted Project, Restricted Project

Nov 10 2020

hliao added a reviewer for D89980: [hip] Remove the coercion on aggregate kernel arguments.: msearles.
Nov 10 2020, 9:42 PM · Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Rebase

Nov 10 2020, 6:17 PM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Fix clang-tidy warnings.

Nov 10 2020, 12:26 PM · Restricted Project, Restricted Project
hliao updated the diff for D91121: [InferAddrSpace] Teach to handle assumed address space..

Revise the fix.

Nov 10 2020, 8:58 AM · Restricted Project, Restricted Project
hliao retitled D89980: [hip] Remove the coercion on aggregate kernel arguments. from [hip] Remove kernel argument coercion. to [hip] Remove the coercion on aggregate kernel arguments..
Nov 10 2020, 7:24 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Revise the commit message.

Nov 10 2020, 7:23 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove the coercion on aggregate kernel arguments..

Remove aggregate kernel argument coercion only.

Nov 10 2020, 7:11 AM · Restricted Project
hliao abandoned D89900: [amdgpu] Enhance disjoint memory accesses checking..

with multiple MMO is supported in the scheduler, this patch is no longer for performance.

Nov 10 2020, 7:10 AM · Restricted Project

Nov 9 2020

hliao requested review of D91121: [InferAddrSpace] Teach to handle assumed address space..
Nov 9 2020, 9:30 PM · Restricted Project, Restricted Project

Nov 8 2020

hliao committed rGfa5d31f82569: [GlobalsAA] Teach to handle `addrspacecast`. (authored by hliao).
[GlobalsAA] Teach to handle `addrspacecast`.
Nov 8 2020, 9:05 PM

Nov 5 2020

hliao committed rG23c6d1501d80: [amdgpu] Add `llvm.amdgcn.endpgm` support. (authored by hliao).
[amdgpu] Add `llvm.amdgcn.endpgm` support.
Nov 5 2020, 4:07 PM
hliao closed D90809: [amdgpu] Add `llvm.amdgcn.endpgm` support..
Nov 5 2020, 4:07 PM · Restricted Project, Restricted Project