Page MenuHomePhabricator

hliao (Michael Liao)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 7 2014, 12:01 PM (324 w, 5 d)

Recent Activity

Today

hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

PING for review

Tue, Oct 27, 12:05 PM · Restricted Project
hliao added a comment to D89980: [hip] Remove kernel argument coercion..

Besides the unpromotable alloca issue due to indirect accesses, such coercion to GLOBAL pointer directly is not safe as, in HIP/CUDA, both CONSTANT and GLOBAL pointers would be passed as the kernel arguments. Without introducing a new address space combing GLOBAL/CONSTANT, such coercion would be unsafe.

Tue, Oct 27, 11:30 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove kernel argument coercion..
Tue, Oct 27, 11:17 AM · Restricted Project
hliao committed rG46c3d5cb05d6: [amdgpu] Add the late codegen preparation pass. (authored by hliao).
[amdgpu] Add the late codegen preparation pass.
Tue, Oct 27, 11:08 AM
hliao closed D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..
Tue, Oct 27, 11:08 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove kernel argument coercion..
Tue, Oct 27, 9:33 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove kernel argument coercion..
Tue, Oct 27, 9:29 AM · Restricted Project
hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Fix coding style following clang-tidy.

Tue, Oct 27, 8:55 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove kernel argument coercion..

Add amdgpu-kernel-arg-pointer-type.cu back and revise its checks.

Tue, Oct 27, 8:49 AM · Restricted Project
hliao added inline comments to D89980: [hip] Remove kernel argument coercion..
Tue, Oct 27, 7:54 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove kernel argument coercion..

Revise the comment and point the safety issue by coercing the kernel argument
from a generic pointer to a global one.

Tue, Oct 27, 7:44 AM · Restricted Project
hliao added a comment to D89980: [hip] Remove kernel argument coercion..
In D89980#2348339, @tra wrote:

Are there any tests to illustrate what this change does to IR or generated code?

Tue, Oct 27, 7:41 AM · Restricted Project
hliao updated the diff for D89980: [hip] Remove kernel argument coercion..

Test case is enhanced to check that no kernel argument type is coerced.

Tue, Oct 27, 7:39 AM · Restricted Project
hliao committed rG0d092303b446: [amdgpu] Enable use of AA during codegen. (authored by hliao).
[amdgpu] Enable use of AA during codegen.
Tue, Oct 27, 6:46 AM
hliao closed D89320: [amdgpu] Enable use of AA during codegen..
Tue, Oct 27, 6:46 AM · Restricted Project

Yesterday

hliao added inline comments to D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Mon, Oct 26, 3:01 PM · Restricted Project
hliao updated the diff for D89320: [amdgpu] Enable use of AA during codegen..

Fix more regression tests due to the enhanced AMDGPU AA.

Mon, Oct 26, 7:38 AM · Restricted Project
hliao added inline comments to D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Mon, Oct 26, 7:14 AM · Restricted Project
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Remove unordered check.

Mon, Oct 26, 7:09 AM · Restricted Project

Sat, Oct 24

hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Rebase

Sat, Oct 24, 8:01 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Has the SystremZ Prefetch issues been resolved?

Sat, Oct 24, 8:51 AM · Restricted Project

Fri, Oct 23

hliao committed rG9497e2e7d88f: Fix shared build. NFC. (authored by hliao).
Fix shared build. NFC.
Fri, Oct 23, 12:53 PM
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Rebase

Fri, Oct 23, 8:27 AM · Restricted Project

Thu, Oct 22

hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

PING for review. As a similar patch was approved, shall I just commit it again with the compilation time issue is addressed?

Thu, Oct 22, 10:01 PM · Restricted Project
hliao added a reviewer for D89447: [MachineInstr] Add support for instructions with multiple memory operands.: dmgreen.
Thu, Oct 22, 1:11 PM · Restricted Project
hliao requested review of D89980: [hip] Remove kernel argument coercion..
Thu, Oct 22, 12:54 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Just kingly PING for review.

Thu, Oct 22, 8:08 AM · Restricted Project

Wed, Oct 21

hliao updated the diff for D89320: [amdgpu] Enable use of AA during codegen..

Add an option to turn on/off the use of AA during codegen.

Wed, Oct 21, 12:23 PM · Restricted Project
hliao added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

This patch is still required as MMO's alignment is calculated based on the offset from the base alignment. As the base alignment is the alignment from the pointer in the IR, it cannot be modified. We need extra logic to re-align MMO operand if we widen the original one. For instance of a 16-bit load from ptr has an alignment of 2, if ptr is equivalent to base - 2 and base's alignment is 4, we could widen that 16-bit load to 32-bit load from ptr - 2with an alignment 4. But, as we cannot change IR in MMO, we need extra stuff to in the new MMO could assume that new alignment.

Wed, Oct 21, 12:03 PM · Restricted Project
hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Rebase and revise.

Wed, Oct 21, 11:53 AM · Restricted Project
hliao added reviewers for D89900: [amdgpu] Enhance disjoint memory accesses checking.: arsenm, rampitec.
Wed, Oct 21, 11:27 AM · Restricted Project
hliao requested review of D89900: [amdgpu] Enhance disjoint memory accesses checking..
Wed, Oct 21, 11:16 AM · Restricted Project
hliao committed rG1bcec29afb32: Only run when `arm` is registered. NFC. (authored by hliao).
Only run when `arm` is registered. NFC.
Wed, Oct 21, 6:30 AM

Tue, Oct 20

hliao added a reverting change for rG1ed506deaddb: [clang] Fix warnings on the missing of explicitly copy constructor on the base…: rGe7a69158635a: Revert "[clang] Fix warnings on the missing of explicitly copy constructor on….
Tue, Oct 20, 7:25 AM
hliao committed rGe7a69158635a: Revert "[clang] Fix warnings on the missing of explicitly copy constructor on… (authored by hliao).
Revert "[clang] Fix warnings on the missing of explicitly copy constructor on…
Tue, Oct 20, 7:25 AM
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

PING for review.

Tue, Oct 20, 7:07 AM · Restricted Project
hliao committed rG1ed506deaddb: [clang] Fix warnings on the missing of explicitly copy constructor on the base… (authored by hliao).
[clang] Fix warnings on the missing of explicitly copy constructor on the base…
Tue, Oct 20, 7:07 AM
hliao committed rG2a0e4d1c01c9: [amdgpu] Enhance AMDGPU AA. (authored by hliao).
[amdgpu] Enhance AMDGPU AA.
Tue, Oct 20, 7:07 AM
hliao closed D89525: [amdgpu] Enhance AMDGPU AA..
Tue, Oct 20, 7:06 AM · Restricted Project

Sat, Oct 17

hliao added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

For OpenCL, since it won't allow generic pointer as kernel function arguments, there never be such a case, a generic pointer argument.

OK, that makes sense. Now we only need to make sure we never compile anything but HIP and OpenCL <= 2.0. For example that we will not support fortran maybe? And never extend any of the languages to allow different address space arguments.

My point here is that it is language specific, but there is nothing language specific on the AA implementation. A possible solution is to define some attributes or metadata saying that a certain situation may not ever happen and then check it in AA. That property has to ve set by the language though as AA may not know the source semantics.

Sat, Oct 17, 9:25 AM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

I removed that check that one of those two MMOs needs to be store. No change to test now. But, that sounds a little weird that one instruction claims mayLoad and mayStore but only has isLoad MMO. Maybe we need to change that MMO in prefetch to be both isLoad and isStore.

The SystemZ Prefetch Data (PFD) instruction has a flag bit to control if the prefetch is read or write, so in a way those two flags make sense. However, since the PFD does not in fact clobber any memory I think the idea behind the mayLoad and mayStore flags is to keep the instruction in place in the block as much as possible. It shouldn't necessarily matter, but generally it is probably better spread out the memory accesses / prefetches rather than having them all happen at once, I would think. If the block is big with many prefetches they shouldn't all end up in the end of the block...

If you think that check is valuable, then maybe we could try to find another way to keep the PFD instructions in their places. I tried adding hasSideEffects = 1 on PFD/PFDRL instructions instead of removing the check on the MMOs, but that did not seem to be NFC on benchmarks...

Sat, Oct 17, 9:02 AM · Restricted Project

Fri, Oct 16

hliao added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Fri, Oct 16, 9:03 PM · Restricted Project
hliao added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

But you are not checking for a constant pointer here!

In HIP __constant__ is a variable attribute, not the address space of the pointee. __constant__ int * means a pointer itself in constant address space and pointing to generic/flat address space.

Where do you check for this specifically in this block:

} else if (const Argument *Arg = dyn_cast<Argument>(ObjA)) {
   const Function *F = Arg->getParent();
   switch (F->getCallingConv()) {
   case CallingConv::AMDGPU_KERNEL:
     // In the kernel function, kernel arguments won't alias to (local)
     // variables in shared or private address space.
     return NoAlias;

I was talking about semantic check in language. Here is the IR. In IR a kernel arg can pointing to constant or global addr due to promotion. Originally all kernel arg of HIP points to generic addr space only.

But not in OpenCL.

Fri, Oct 16, 9:00 PM · Restricted Project
hliao added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

I think they are correct for OpenCL, since in OpenCL shared var can only be declared in kernel function or passed by kernel arg.

However I am not sure whether a constant pointer can pointer to shared memory, i.e, whether the address of a shared variable is compile time constant, or whether the following is valid code:

__shared__ int a;

__constant__ int *b = &a;

Currently clang allows it but nvcc does not https://godbolt.org/z/9W8vee

I tends to agree with nvcc's treatment since this allows more flexible way of implementing shared variable supports in backend. @tra for advice

Fri, Oct 16, 8:59 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

@jonpa can you check whether the SystemZ test case you added still checks what it was intended to check here?

I actually see a problem with this patch: The SystemZ test case passes the original test by not producing any scalar load + vector element insertions instructions. However now I see that all PFD (prefetch) instructions end up last in the block, whereas before they were not. This may or not be good if the block is huge and the prefetching is therefore not done too many iterations ahead. Maybe this should be checked with benchmarks to make sure the current prefetching tuning does not loose by this.

The reason seems to be that the SystemZ::PFD instruction is marked both with mayLoad and mayStore, but now this patch looks at the *memory operand* and figures out that it is only loading and therefore does not alias with another load. This check was previously only done with the MI flags. The post-RA scheduler now puts all of them at the end.

Fri, Oct 16, 6:30 PM · Restricted Project
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Remove the MMO non-store check.

Fri, Oct 16, 6:27 PM · Restricted Project
hliao added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Fri, Oct 16, 1:18 PM · Restricted Project
hliao added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

FYI I've done some Vulkan CTS testing using LLPC with this patch and didn't notice any problems.

Fri, Oct 16, 12:42 PM · Restricted Project
hliao added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Fri, Oct 16, 12:41 PM · Restricted Project
hliao committed rG98f254960f0c: [globalopt] Teach to look through `addrspacecast`. (authored by hliao).
[globalopt] Teach to look through `addrspacecast`.
Fri, Oct 16, 5:43 AM
hliao closed D89140: [globalopt] Teach to look through `addrspacecast`..
Fri, Oct 16, 5:43 AM · Restricted Project
hliao added inline comments to D89525: [amdgpu] Enhance AMDGPU AA..
Fri, Oct 16, 5:41 AM · Restricted Project
hliao updated the diff for D89525: [amdgpu] Enhance AMDGPU AA..

Fix typos and revise the coding style following clang-tidy.

Fri, Oct 16, 5:38 AM · Restricted Project

Thu, Oct 15

hliao added reviewers for D89140: [globalopt] Teach to look through `addrspacecast`.: greened, bkramer, nicholas, jmolloy.
Thu, Oct 15, 8:49 PM · Restricted Project
hliao added a comment to D89525: [amdgpu] Enhance AMDGPU AA..

@yaxunl could you double-check that OpenCL also follows that rule.
@nhaehnle could you check whether that potentially breaks graphics.

Thu, Oct 15, 8:00 PM · Restricted Project
hliao requested review of D89525: [amdgpu] Enhance AMDGPU AA..
Thu, Oct 15, 7:58 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Add limit on memory operand AA check.

Thu, Oct 15, 2:48 PM · Restricted Project
hliao updated the diff for D89447: [MachineInstr] Add support for instructions with multiple memory operands..

Add limit on memory operand AA check.

Thu, Oct 15, 2:47 PM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

This looks like the same as my old patch (https://reviews.llvm.org/D80161). There are additional details available on the commit's Phab page. To put it in a nutshell, some codes can trigger a very large amount of calls to the aliasing check (see the repro provided by @nemanjai: https://pastebin.com/tRtTQdSa), which results in a very large increase in compilation time.

Bounding the number of checks may be a good solution, even though it would be nicer to have something more clever that could allow all the operands to be checked. Not sure how feasible this would be though.

Thu, Oct 15, 9:45 AM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

This looks like the same as my old patch (https://reviews.llvm.org/D80161). There are additional details available on the commit's Phab page. To put it in a nutshell, some codes can trigger a very large amount of calls to the aliasing check (see the repro provided by @nemanjai: https://pastebin.com/tRtTQdSa), which results in a very large increase in compilation time.

Bounding the number of checks may be a good solution, even though it would be nicer to have something more clever that could allow all the operands to be checked. Not sure how feasible this would be though.

Thu, Oct 15, 5:54 AM · Restricted Project
hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

This looks familiar. It is the same as https://reviews.llvm.org/D80161? That patch was apparently reverted in 65cd2c7a8015577fea15c861f41d2e4b5768961f, because it was hitting timeouts on certain targets.

I'm not sure if @Kayjukh remembers more? But it may be worth putting a limit on the total number of alias checks it can perform.

The test changes seem fine to me.

Thu, Oct 15, 5:53 AM · Restricted Project

Wed, Oct 14

hliao added a comment to D89447: [MachineInstr] Add support for instructions with multiple memory operands..

That change triggers a few regression test failures. All of them are due to different code schedule due to memory instructions with multiple memory operands. Please help me double-check that's the case and whether that sounds a better code sequence. Thanks.

Wed, Oct 14, 10:13 PM · Restricted Project
hliao requested review of D89447: [MachineInstr] Add support for instructions with multiple memory operands..
Wed, Oct 14, 10:10 PM · Restricted Project
hliao committed rGae40d2858e20: Fix an apparent typo. `assert()` must not contain side-effects. NFC. (authored by hliao).
Fix an apparent typo. `assert()` must not contain side-effects. NFC.
Wed, Oct 14, 8:35 AM
hliao committed rGb21ad3b66bce: Fix `-Wparentheses` warnings. NFC. (authored by hliao).
Fix `-Wparentheses` warnings. NFC.
Wed, Oct 14, 7:12 AM

Tue, Oct 13

hliao added a comment to D89320: [amdgpu] Enable use of AA during codegen..

Register pressure tests have to disable AMDGPU AA to pass the test; otherwise, the register pressure is reduced after using AA.

Tue, Oct 13, 8:27 AM · Restricted Project
hliao requested review of D89320: [amdgpu] Enable use of AA during codegen..
Tue, Oct 13, 8:24 AM · Restricted Project

Mon, Oct 12

hliao added a comment to D89140: [globalopt] Teach to look through `addrspacecast`..

Kindly PING for review.

Mon, Oct 12, 9:07 PM · Restricted Project

Fri, Oct 9

hliao added a reviewer for D89140: [globalopt] Teach to look through `addrspacecast`.: espindola.
Fri, Oct 9, 9:40 AM · Restricted Project
hliao added a reviewer for D89140: [globalopt] Teach to look through `addrspacecast`.: eli.friedman.
Fri, Oct 9, 9:38 AM · Restricted Project
hliao requested review of D89140: [globalopt] Teach to look through `addrspacecast`..
Fri, Oct 9, 9:37 AM · Restricted Project

Thu, Oct 1

hliao committed rG8c36eaf03772: [clang][opencl][codegen] Remove the insertion of `correctly-rounded-divide-sqrt… (authored by hliao).
[clang][opencl][codegen] Remove the insertion of `correctly-rounded-divide-sqrt…
Thu, Oct 1, 8:08 AM
hliao closed D88424: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..
Thu, Oct 1, 8:07 AM · Restricted Project

Wed, Sep 30

hliao added a comment to D85254: [llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking..

This change is reverted as, on hosts without LBR supported but with LIBPFM installed and used, this change makes llvm/test/tools/llvm-exegesis/X86/lbr/mov-add.s failed. On that host, perf_event_open fails with EOPNOTSUPP on LBR config. That change's basic assumption

Wed, Sep 30, 8:26 PM · Restricted Project
hliao added a reverting change for rG4fcd1a8e6528: [llvm-exegesis] Add option to check the hardware support for a given feature…: rG2c9dc7bbbf51: Revert "[llvm-exegesis] Add option to check the hardware support for a given….
Wed, Sep 30, 8:22 PM
hliao committed rG2c9dc7bbbf51: Revert "[llvm-exegesis] Add option to check the hardware support for a given… (authored by hliao).
Revert "[llvm-exegesis] Add option to check the hardware support for a given…
Wed, Sep 30, 8:22 PM
hliao added a reverting change for D85254: [llvm-exegesis] Add option to check the hardware support for a given feature before benchmarking.: rG2c9dc7bbbf51: Revert "[llvm-exegesis] Add option to check the hardware support for a given….
Wed, Sep 30, 8:22 PM · Restricted Project

Mon, Sep 28

hliao updated the diff for D88424: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..

Rebase

Mon, Sep 28, 10:11 AM · Restricted Project
hliao committed rG5dbf80cad955: [clang][codegen] Annotate `correctly-rounded-divide-sqrt-fp-math` fn-attr for… (authored by hliao).
[clang][codegen] Annotate `correctly-rounded-divide-sqrt-fp-math` fn-attr for…
Mon, Sep 28, 8:41 AM
hliao closed D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..
Mon, Sep 28, 8:40 AM · Restricted Project
hliao added a comment to D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..

Thanks!

Mon, Sep 28, 8:39 AM · Restricted Project
hliao requested review of D88424: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..
Mon, Sep 28, 8:39 AM · Restricted Project
hliao updated the diff for D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..

Split the original into 2. This is the first part, which add
correctly-rounded-device-sqrt-fp-math for OpenCL only. The second part will
remove that attribute annotating completely.

Mon, Sep 28, 8:34 AM · Restricted Project

Sep 25 2020

hliao updated the diff for D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..

Remove the irrelevant change on .clang-format.

Sep 25 2020, 7:38 AM · Restricted Project
hliao requested review of D88303: [clang][codegen] Remove the insertion of `correctly-rounded-divide-sqrt-fp-math` fn-attr..
Sep 25 2020, 7:32 AM · Restricted Project

Sep 22 2020

hliao committed rGd4e3e1e54879: Fix build due to renaming in LoopInfo. (authored by hliao).
Fix build due to renaming in LoopInfo.
Sep 22 2020, 2:34 PM
hliao committed rG534f6e171808: [PeepholeOptimizer] Enhance the redundant COPY elimination. (authored by hliao).
[PeepholeOptimizer] Enhance the redundant COPY elimination.
Sep 22 2020, 7:12 AM
hliao closed D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Sep 22 2020, 7:12 AM · Restricted Project
hliao added inline comments to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Sep 22 2020, 7:02 AM · Restricted Project

Sep 21 2020

hliao added a comment to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

The comment seems outdated, if my understanding is right, and even the original code cannot perform that change since, once the 2nd COPY with same source is found in L1407, the check @ L1419 just skips that earlier as the 1st COPY has no subreg and the 2nd COPY has sub1.

Good point!

Now, I am wondering why is this change not just NFC then?

Sep 21 2020, 12:04 PM · Restricted Project
hliao added a comment to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

Hi @hliao,

I must be missing something, but it feels to me that this patch is actually making the situation worse.

Could you look at my example inlined below and explain how it would still work with this patch?

Cheers,
-Quentin

Sep 21 2020, 11:49 AM · Restricted Project
hliao added a comment to D87972: [OldPM] Pass manager: run SROA after (simple) loop unrolling.

I have tested this patch internally and seen gains and losses. On one document search related benchmark 3~5% improvement. One zippy (snappy) there is 3~5% regression. Perhaps we do need a conditional extra SROA run.

Snappy - you mean public https://github.com/google/snappy?

Well, it should be possible to analyze it...

@lebedev.ri any perf data from testsuite/rawspeed?

I did look.


This suggests that geomean is -0.8% runtime improvement,
with ups&downs.

But as i have said in the patch's description, i stumbled into this when writing new code, where the effect is much larger.

Sep 21 2020, 7:44 AM · Restricted Project, Restricted Project
hliao updated the diff for D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

Rebase.

Sep 21 2020, 6:32 AM · Restricted Project

Sep 18 2020

hliao added inline comments to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Sep 18 2020, 1:34 PM · Restricted Project
hliao added a comment to D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..

This patch enhances the peephole-opt to fix the redundant copy issues once to be fixed in D87556. With the enhancement, we could remove that redundant COPY locally. Test cases are revised due to the code quality improvement or change. Fortunately, AMDGPU and ARM tests need addressing that difference.

Sep 18 2020, 1:29 PM · Restricted Project
hliao requested review of D87939: [PeepholeOptimizer] Enhance the redundant COPY elimination..
Sep 18 2020, 1:25 PM · Restricted Project
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

MachineCSE and other optimizations are designed not to handle that target-independent COPY or like, which is added with the intention that, potentially, the source and destination operands are coalesced and that COPY is removed finally. As SGPR and VGPR are different register banks and won't be coalesced anyway, native instruction should be used instead.

This does not make sense, COPY is what generic optimizations do understand. If it's useful to CSE cross bank copies, MachineCSE should handle them. As a representational choice, not-copy is worse than copy

A target-independent cross-bank COPY is definitely useful but the current COPY should not be used for that purpose considering how it's used in RA-related passes, especially only architectural constraints are changed between the source and destination operands where the propagation should be stopped.

This is exactly what copy is for. It wouldn't make sense to have a separate copy for this

Sep 18 2020, 12:04 PM
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

MachineCSE and other optimizations are designed not to handle that target-independent COPY or like, which is added with the intention that, potentially, the source and destination operands are coalesced and that COPY is removed finally. As SGPR and VGPR are different register banks and won't be coalesced anyway, native instruction should be used instead.

This does not make sense, COPY is what generic optimizations do understand. If it's useful to CSE cross bank copies, MachineCSE should handle them. As a representational choice, not-copy is worse than copy

Sep 18 2020, 8:23 AM

Sep 17 2020

hliao updated the diff for D87858: [hip] Add HIP scope atomic ops..

Revise formatting following the clang-format suggestion.

Sep 17 2020, 2:37 PM · Restricted Project
hliao added a comment to rGc3492a1aa1b9: [amdgpu] Lower SGPR-to-VGPR copy in the final phase of ISel..

Can you revert this? I think that this transform itself is a workaround, and even if it were a good idea, I think it doesn't belong in another loop over the function in finalizeLowering

Could you elaborate on why that would be a workaround? Basically, after instruction selection, the COPY from SGPR to VGPR should be lowered to a native instruction.

Because this should be done after register allocation like is already done. Replacing a copy with something else should only interfere with generic optimizations

That would be too late for MachineCSE and other optimization to remove the redundant COPYs and reduce the register usage. Moving that after RA won't reduce register pressure.

So you are working around something MahcineCSE isn't doing on copies. You could just have MachineCSE do this for these copies

Sep 17 2020, 2:32 PM