Page MenuHomePhabricator

hliao (Michael Liao)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 7 2014, 12:01 PM (320 w, 1 d)

Recent Activity

Yesterday

hliao updated the diff for D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..

Remove the irrelevant change on .clang-format.

Fri, Sep 25, 7:38 AM · Restricted Project
hliao requested review of D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..
Fri, Sep 25, 7:32 AM · Restricted Project

Tue, Sep 22

hliao committed rGd4e3e1e54879: Fix build due to renaming in LoopInfo. (authored by hliao).
Fix build due to renaming in LoopInfo.
Tue, Sep 22, 2:34 PM
hliao committed rG534f6e171808: [PeepholeOptimizer] Enhance the redundant COPY elimination. (authored by hliao).
[PeepholeOptimizer] Enhance the redundant COPY elimination.
Tue, Sep 22, 7:12 AM
hliao closed D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Tue, Sep 22, 7:12 AM · Restricted Project
hliao added inline comments to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Tue, Sep 22, 7:02 AM · Restricted Project

Mon, Sep 21

hliao added a comment to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.

Good point!

Now, I am wondering why is this change not just NFC then?

Mon, Sep 21, 12:04 PM · Restricted Project
hliao added a comment to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

Hi @hliao,

I must be missing something, but it feels to me that this patch is actually making the situation worse.

Could you look at my example inlined below and explain how it would still work with this patch?

Cheers,
-Quentin

Mon, Sep 21, 11:49 AM · Restricted Project
hliao added a comment to D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling.

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Snappy - you mean public https://github.com/google/snappy?

Well, it should be possible to analyze it...

@lebedev.ri any perf data from testsuite/rawspeed?

I did look.


This suggests that geomean is -0.8% runtime improvement,
with ups&downs.

But as i have said in the patch's description, i stumbled into this when writing new code, where the effect is much larger.

Mon, Sep 21, 7:44 AM · Restricted Project, Restricted Project
hliao updated the diff for D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

Rebase.

Mon, Sep 21, 6:32 AM · Restricted Project

Fri, Sep 18

hliao added inline comments to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Fri, Sep 18, 1:34 PM · Restricted Project
hliao added a comment to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

This patch enhances the peephole-opt to fix the redundant copy issues once to be fixed in D87556. With the enhancement, we could remove that redundant COPY locally. Test cases are revised due to the code quality improvement or change. Fortunately, AMDGPU and ARM tests need addressing that difference.

Fri, Sep 18, 1:29 PM · Restricted Project
hliao requested review of D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Fri, Sep 18, 1:25 PM · Restricted Project
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

MachineCSE and other optimizations are designed not to handle that target-independent COPY or like, which is added with the intention that, potentially, the source and destination operands are coalesced and that COPY is removed finally. As SGPR and VGPR are different register banks and won't be coalesced anyway, native instruction should be used instead.

This does not make sense, COPY is what generic optimizations do understand. If it's useful to CSE cross bank copies, MachineCSE should handle them. As a representational choice, not-copy is worse than copy

A target-independent cross-bank COPY is definitely useful but the current COPY should not be used for that purpose considering how it's used in RA-related passes, especially only architectural constraints are changed between the source and destination operands where the propagation should be stopped.

This is exactly what copy is for. It wouldn't make sense to have a separate copy for this

Fri, Sep 18, 12:04 PM
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

MachineCSE and other optimizations are designed not to handle that target-independent COPY or like, which is added with the intention that, potentially, the source and destination operands are coalesced and that COPY is removed finally. As SGPR and VGPR are different register banks and won't be coalesced anyway, native instruction should be used instead.

This does not make sense, COPY is what generic optimizations do understand. If it's useful to CSE cross bank copies, MachineCSE should handle them. As a representational choice, not-copy is worse than copy

Fri, Sep 18, 8:23 AM

Thu, Sep 17

hliao updated the diff for D87858: [hip] Add HIP scope atomic ops..

Revise formatting following the clang-format suggestion.

Thu, Sep 17, 2:37 PM · Restricted Project
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

Thu, Sep 17, 2:32 PM
hliao added a comment to D87858: [hip] Add HIP scope atomic ops..

singlethread scope is added for completeness for the atomicity considering signal handling. It may not be applicable to GPU yet. We could simply ignore them in the late phases of the compiler. If desired, I may remove it as well.

Thu, Sep 17, 2:04 PM · Restricted Project
hliao requested review of D87858: [hip] Add HIP scope atomic ops..
Thu, Sep 17, 1:59 PM · Restricted Project
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

Thu, Sep 17, 11:50 AM
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Thu, Sep 17, 11:45 AM
hliao committed rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel. (authored by hliao).
[amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel.
Thu, Sep 17, 8:04 AM
hliao closed D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..
Thu, Sep 17, 8:04 AM · Restricted Project

Wed, Sep 16

hliao committed rG4e4c89b22c3f: [EarlyCSE] Simplify max/min pattern matching. NFC. (authored by hliao).
[EarlyCSE] Simplify max/min pattern matching. NFC.
Wed, Sep 16, 3:35 PM
hliao committed rG6859d95ea2d0: Fix build. (authored by hliao).
Fix build.
Wed, Sep 16, 11:52 AM
hliao committed rG4d4f0922837d: [clang][codegen] Skip adding default function attributes on intrinsics. (authored by hliao).
[clang][codegen] Skip adding default function attributes on intrinsics.
Wed, Sep 16, 11:10 AM
hliao closed D87761: [clang][codegen] Skip adding default function attributes on intrinsics..
Wed, Sep 16, 11:10 AM · Restricted Project
hliao requested review of D87761: [clang][codegen] Skip adding default function attributes on intrinsics..
Wed, Sep 16, 7:26 AM · Restricted Project

Tue, Sep 15

hliao accepted D87709: InferAddressSpaces: Fix assert with unreachable code.

LGTM

Tue, Sep 15, 11:41 AM · Restricted Project

Mon, Sep 14

hliao updated the diff for D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Follow suggestions from clang-tidy.

Mon, Sep 14, 8:34 AM · Restricted Project

Fri, Sep 11

hliao added inline comments to D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..
Fri, Sep 11, 10:37 PM · Restricted Project
hliao added a comment to D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

With this patch, we reduce the register pressure for certain tests which could be improved up to 10%. As the code generated would be visibly different from the previous one, regression tests are updated as well.

Fri, Sep 11, 10:08 PM · Restricted Project
hliao requested review of D87556: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..
Fri, Sep 11, 10:06 PM · Restricted Project

Thu, Sep 10

hliao committed rGf787fe15d8e1: [EarlyCSE] Remove unnecessary operand swap. (authored by hliao).
[EarlyCSE] Remove unnecessary operand swap.
Thu, Sep 10, 11:14 PM
hliao added a comment to D86843: [EarlyCSE] Equivalent SELECTs should hash equally.

Basically, the logic is wrong as we only canonicalize the condition and operands in the select. But, we also need to inverse the flavor to match the original logic. The original change plus inversing flavor is committed in 41e68f7ee7b

Thu, Sep 10, 8:31 PM · Restricted Project
hliao committed rG41e68f7ee7b3: [EarlyCSE] Fix and recommit the revised c9826829d74e637163fdb0351870b8204e62d6e6 (authored by hliao).
[EarlyCSE] Fix and recommit the revised c9826829d74e637163fdb0351870b8204e62d6e6
Thu, Sep 10, 8:31 PM
hliao added a reverting change for rGc9826829d74e: [EarlyCSE] Equivalent SELECTs should hash equally: rG39dc75f66c60: Revert "[EarlyCSE] Equivalent SELECTs should hash equally".
Thu, Sep 10, 7:38 PM
hliao committed rG39dc75f66c60: Revert "[EarlyCSE] Equivalent SELECTs should hash equally" (authored by hliao).
Revert "[EarlyCSE] Equivalent SELECTs should hash equally"
Thu, Sep 10, 7:38 PM
hliao added a reverting change for D86843: [EarlyCSE] Equivalent SELECTs should hash equally: rG39dc75f66c60: Revert "[EarlyCSE] Equivalent SELECTs should hash equally".
Thu, Sep 10, 7:38 PM · Restricted Project
hliao committed rGb22d45049682: Remove dependency on clangASTMatchers. (authored by hliao).
Remove dependency on clangASTMatchers.
Thu, Sep 10, 7:18 PM

Thu, Sep 3

hliao committed rGbf41c4d29e44: [codegen] Ensure target flags are cleared/set properly. NFC. (authored by hliao).
[codegen] Ensure target flags are cleared/set properly. NFC.
Thu, Sep 3, 3:38 PM
hliao closed D87109: [codegen] Ensure target flags are cleared/set properly. NFC..
Thu, Sep 3, 3:38 PM · Restricted Project
hliao added a comment to D87109: [codegen] Ensure target flags are cleared/set properly. NFC..

Testcase? Or cleanup a few places where I've worked around this. I assume you found another

Thu, Sep 3, 1:30 PM · Restricted Project
hliao updated the diff for D87109: [codegen] Ensure target flags are cleared/set properly. NFC..

Clean-up.

Thu, Sep 3, 1:29 PM · Restricted Project
hliao added a reviewer for D87109: [codegen] Ensure target flags are cleared/set properly. NFC.: arsenm.
Thu, Sep 3, 12:30 PM · Restricted Project
hliao requested review of D87109: [codegen] Ensure target flags are cleared/set properly. NFC..
Thu, Sep 3, 12:28 PM · Restricted Project

Tue, Sep 1

hliao committed rG1f4e7463b5e3: [amdgpu] Run SROA after loop unrolling. (authored by hliao).
[amdgpu] Run SROA after loop unrolling.
Tue, Sep 1, 1:10 PM
hliao closed D84252: [amdgpu] Run SROA after loop unrolling..
Tue, Sep 1, 1:10 PM · Restricted Project
hliao added a comment to D84252: [amdgpu] Run SROA after loop unrolling..
Tue, Sep 1, 12:40 PM · Restricted Project
hliao updated the diff for D84252: [amdgpu] Run SROA after loop unrolling..

revise the test case

Tue, Sep 1, 12:40 PM · Restricted Project
hliao updated the diff for D84252: [amdgpu] Run SROA after loop unrolling..

clean up.

Tue, Sep 1, 12:18 PM · Restricted Project
hliao updated the diff for D84252: [amdgpu] Run SROA after loop unrolling..

Add a simple enough test case.

Tue, Sep 1, 12:12 PM · Restricted Project

Aug 20 2020

hliao committed rG5257a60ee02e: [amdgpu] Add codegen support for HIP dynamic shared memory. (authored by hliao).
[amdgpu] Add codegen support for HIP dynamic shared memory.
Aug 20 2020, 6:30 PM
hliao closed D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 20 2020, 6:29 PM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Add comment and revist the test case.

Aug 20 2020, 5:36 PM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 20 2020, 1:43 PM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Check the explicit alignment if any and fix the negative test case.

Aug 20 2020, 1:41 PM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Rebase.

Aug 20 2020, 7:48 AM · Restricted Project

Aug 19 2020

hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 19 2020, 9:54 AM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 19 2020, 9:50 AM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Add the invalid case for alignment parsing.

Aug 19 2020, 9:50 AM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 19 2020, 9:39 AM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 19 2020, 9:14 AM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 19 2020, 9:10 AM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Minor coding style fix.

Aug 19 2020, 7:05 AM · Restricted Project

Aug 18 2020

hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Rebase

Aug 18 2020, 10:45 PM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Revise following comments.

Aug 18 2020, 8:10 PM · Restricted Project

Aug 17 2020

hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Remove adjustLDSSizeForDynLDSAlign.

Aug 17 2020, 9:38 PM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 17 2020, 2:40 PM · Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 17 2020, 1:48 PM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Add preFinalizeLowering so that both DAGISel and GISel shares the same path
to adjust LDS size.

Aug 17 2020, 1:46 PM · Restricted Project

Aug 10 2020

hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 10 2020, 1:05 PM · Restricted Project
hliao added inline comments to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 10 2020, 8:03 AM · Restricted Project
hliao committed rGc7b683c126b8: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side… (authored by hliao).
[PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side…
Aug 10 2020, 8:02 AM
hliao closed D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 10 2020, 8:02 AM · Restricted Project

Aug 9 2020

hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Aug 9 2020, 11:15 PM · Restricted Project
hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Add GlobalISel and MIR support.

Aug 9 2020, 11:11 PM · Restricted Project

Aug 7 2020

hliao updated the diff for D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

Address the alignment issue.

Aug 7 2020, 1:05 AM · Restricted Project

Aug 6 2020

hliao added inline comments to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 6 2020, 10:38 AM · Restricted Project
hliao updated the diff for D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..

Revise the comment.

Aug 6 2020, 10:35 AM · Restricted Project
hliao added a comment to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
In D85276#2200108, @tra wrote:

Do we need to disable pgo and coverage mapping for device compilation? Or it is already disabled?

We already disable profiling during device compilation for NVIDIA and AMD GPUs:
https://github.com/llvm/llvm-project/blob/394db2259575ef3cac8d3d37836b11eb2373c435/clang/lib/Driver/ToolChains/Clang.cpp#L4876

Aug 6 2020, 9:41 AM · Restricted Project

Aug 5 2020

hliao added inline comments to D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 5 2020, 10:26 AM · Restricted Project
hliao requested review of D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions..
Aug 5 2020, 12:00 AM · Restricted Project

Jul 28 2020

hliao accepted D84523: AMDGPU: Serialize MFI spill fields.

LGTM

Jul 28 2020, 2:14 PM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.
In D80858#2177159, @tra wrote:

It's a good point. Perhaps this is one of the cases where we should *not* follow nvcc.
We can't have our cake (preserve static behavior) and eat it (treat it as non-static in case something on the host side may decide to use an API which uses symbol names). Something's got to give. While we could make it work in some cases, I don't think we can make it work consistently.
I think it would be reasonable to restrict APIs that access symbols by name to be applicable to visible symbols only.

Jul 28 2020, 12:29 AM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

We may try to solve the issue without RDC firstly, where we don't need to change that static variable name (if the runtime maintains the device binaries correctly.) We only need to ensure the linker won't remove their symbols.

This is essentially externalizing it in linker. However, it may not work.

Consider a static device var was accessed in host code through hipMemCpyToSymbol or hipMemCpyFromSymbol. In device code, this static var may be initialized through host code, or its final value may be read by host code after kernel execution. The existence of these operations mean that this static device variable is actually having external linkage, instead of internal linkage, since it is accessed by external modules. Fail to reflect this truth in IR will cause optimization passes to make incorrect assumptions about this variable and perform incorrect optimizations on it. e.g. the llvm passes can assume the value in this static var never changes, or its final value will not be used, then the llvm passes may simply remove it. Marking it as used will not solve the issue, since the llvm passes may still assume its value never changes after initialization, whereas in reality it may be changed by hipMemCpyToSymbol before kernel execution.

Jul 28 2020, 12:28 AM · Restricted Project

Jul 23 2020

hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.
In D80858#2170821, @tra wrote:

The problem is not whether we have solution to tell them but when we need to add that. Not all static device variables need to be visible to the host side. Externalizing them adds the overhead for the linker and may pose additional restrictions on aggressive optimizations. Do we have to support every ambiguous usage in the burden of the compiler change?

It's a multi-part problem.
The key request here is, IIUIC, to make TU-local variables visible within the TU on both host and the device.
If and how it should be implemented is up for the discusstion.
For me the request is reasonable and it would be good to have it as it would make the code behave according to general expectations of how TU-local variables behave -- "if it's in the same source file, it should be accessible from the functions in that file".

Jul 23 2020, 8:58 PM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.
In D80858#2170668, @tra wrote:
In D80858#2170547, @tra wrote:

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

We still need to define how a static device variable should be visible on the host side.

As far as the host is concerned, if a 'static' variable is within the same TU it (or, rather, its shadow) should be accessible.

How that behaves in the context of relocatable code generation as well as anonymous namespaces.

AFAICT it would just work -- each TU only sees static things within that TU on both host and device sides. Visible module-unique aliases would allow host-side binary to get address of an otherwise non-visible variables. I expect that will work with RDC -- because we didn't change the visibility of the symbols themselves, they behave according to the regular linking rules. The aliases are also globally unique, so they should present no issues either.

Jul 23 2020, 2:35 PM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.
In D80858#2170547, @tra wrote:

Would it work if we generate a globally unique visible aliases for the static vars and use the alias' name to register device-side entities without changing their visibility?

Jul 23 2020, 1:30 PM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

Another reason is that we need to support it in rdc mode, where different TU can have static var with the same name.

That's an issue of our current RDC support through LLVM IR instead of native code. The name conflicts are introduced as we link all TUs into a single module at IR level. The frontend should not be changed to support that.

I am not sure if ISA level linking would help for this. My understanding is that the linker will rename the internal symbols having the same name from different objects. It does not matter if it is llvm-link or native linker. Once they are renamed we can no longer find them by name.

I don't see a reason why we cannot support accessing static device variable in FE. We have requests for this feature from different users.

Jul 23 2020, 1:27 PM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

Another reason is that we need to support it in rdc mode, where different TU can have static var with the same name.

Jul 23 2020, 11:50 AM · Restricted Project
hliao added a comment to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

I don't that's proper way to support file-scope static device variables. As we discuss APIs like cudaMemcpyToSymol, that's a runtime API instead of driver API. The later needs to specify the module (or code object) id in addition to a symbol name and won't have the conflict issues. For the runtime API, all named device variables (static or not) are identified at the host side by their host stub variables. That stub variable is used to register the corresponding device variables associated with a module id to unique identify that variables across all TUs. As long as we look up device variables using their host stub variable pointers, they are uniquely identified already. The runtime implementation needs to find the module id and the variable symbol from the pointer of its host stub variable. It's not the compiler job to fabricate name uniquely across TUs.

The problem is that even though the static variable is registered through __hipRigisterVariable, the runtime still relies on looking up symbol name to get the address of the device variable. That's why we need to externalize the static variable.

Jul 23 2020, 11:47 AM · Restricted Project
hliao requested changes to D80858: [CUDA][HIP] Support accessing static device variable in host code for -fno-gpu-rdc.

I don't that's proper way to support file-scope static device variables. As we discuss APIs like cudaMemcpyToSymol, that's a runtime API instead of driver API. The later needs to specify the module (or code object) id in addition to a symbol name and won't have the conflict issues. For the runtime API, all named device variables (static or not) are identified at the host side by their host stub variables. That stub variable is used to register the corresponding device variables associated with a module id to unique identify that variables across all TUs. As long as we look up device variables using their host stub variable pointers, they are uniquely identified already. The runtime implementation needs to find the module id and the variable symbol from the pointer of its host stub variable. It's not the compiler job to fabricate name uniquely across TUs.

Jul 23 2020, 7:09 AM · Restricted Project

Jul 22 2020

hliao added a comment to D71227: [cuda][hip] Fix function overload resolution in the global initiailizer..

Hi @rsmith, @rjmccall and @tra what's your suggestion to make progress on this review?

Jul 22 2020, 1:45 PM · Restricted Project

Jul 21 2020

hliao added a comment to D84252: [amdgpu] Run SROA after loop unrolling..

I would expect the generic code to add this

Jul 21 2020, 2:19 PM · Restricted Project
Herald added a project to D84252: [amdgpu] Run SROA after loop unrolling.: Restricted Project.
Jul 21 2020, 10:55 AM · Restricted Project

Jul 20 2020

hliao accepted D84081: AMDGPU: Fix not accounting for constantexpr uses of LDS globals.

LGTM as a quick fix.

Jul 20 2020, 8:15 AM · Restricted Project

Jul 11 2020

hliao committed rGb8409c03ed90: Fix `-Wreturn-type` warning. NFC. (authored by hliao).
Fix `-Wreturn-type` warning. NFC.
Jul 11 2020, 1:21 PM
hliao committed rG81db614411bd: Fix `-Wunused-variable` warnings. NFC. (authored by hliao).
Fix `-Wunused-variable` warnings. NFC.
Jul 11 2020, 7:10 AM
hliao committed rG0b4cf802fad4: [fix-irreducible] Skip unreachable predecessors. (authored by hliao).
[fix-irreducible] Skip unreachable predecessors.
Jul 11 2020, 7:10 AM