Page MenuHomePhabricator

hliao (Michael Liao)
User

Projects

User does not belong to any projects.

User Details

User Since
Aug 7 2014, 12:01 PM (358 w, 4 d)

Recent Activity

Today

hliao updated the diff for D103282: [MIRParser] Add machine metadata..
  • Relocate tests and add extra tests for parsing errors.
  • Enhance the parser to report any missing definitions on forward referenced metadata.
Mon, Jun 21, 2:04 PM · Restricted Project

Sat, Jun 19

hliao abandoned D103106: Add pre-commit tests for [D102215](https://reviews.llvm.org/D102215)..
Sat, Jun 19, 10:23 AM · Restricted Project
hliao committed rGb9c05aff205b: [MIRPrinter] Add machine metadata support. (authored by hliao).
[MIRPrinter] Add machine metadata support.
Sat, Jun 19, 9:48 AM
hliao closed D103205: [MIRPrinter] Add machine metadata support..
Sat, Jun 19, 9:48 AM · Restricted Project
hliao committed rG940efa4f6981: [amdgpu] Improve the from f32 to i64. (authored by hliao).
[amdgpu] Improve the from f32 to i64.
Sat, Jun 19, 9:47 AM
hliao closed D104427: [amdgpu] Improve the from f32 to i64..
Sat, Jun 19, 9:47 AM · Restricted Project

Fri, Jun 18

hliao added a comment to D103205: [MIRPrinter] Add machine metadata support..

PING for the further comment on the round-trip test, which requires MIR parser support reviewed @ D103282

Fri, Jun 18, 12:16 PM · Restricted Project

Thu, Jun 17

hliao updated the diff for D104427: [amdgpu] Improve the from f32 to i64..

Fix typos.

Thu, Jun 17, 11:36 AM · Restricted Project
hliao updated the diff for D104427: [amdgpu] Improve the from f32 to i64..

Only apply that abs/flip on fptosi for f32.

Thu, Jun 17, 11:24 AM · Restricted Project
hliao added inline comments to D104427: [amdgpu] Improve the from f32 to i64..
Thu, Jun 17, 9:05 AM · Restricted Project
hliao updated the diff for D104427: [amdgpu] Improve the from f32 to i64..
  • Add global-isel support.
  • Revise the method name following the suggestion.
Thu, Jun 17, 9:02 AM · Restricted Project

Wed, Jun 16

hliao requested review of D104427: [amdgpu] Improve the from f32 to i64..
Wed, Jun 16, 4:39 PM · Restricted Project

Tue, Jun 15

hliao added a comment to D103205: [MIRPrinter] Add machine metadata support..

Needs standalone MIR tests (i.e. round trip tests not bundled in a unit test)

Also I would like to see some tests where there are inconsistencies between the metadata present in the IR section, and the explicit metadata in the MIR

Tue, Jun 15, 7:44 AM · Restricted Project

Wed, Jun 9

hliao added a comment to D103205: [MIRPrinter] Add machine metadata support..

Kindly PING again

Wed, Jun 9, 6:45 AM · Restricted Project

Tue, Jun 8

hliao committed rG27332968d85e: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`. (authored by hliao).
[amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`.
Tue, Jun 8, 12:43 PM
hliao closed D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..
Tue, Jun 8, 12:42 PM · Restricted Project
hliao updated the diff for D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..

Add 'amdgpu-' prefix in that option.

Tue, Jun 8, 12:38 PM · Restricted Project
hliao added a comment to D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..

This's the short-term solution until all issues exposed after the frontend change are fixed.

Tue, Jun 8, 11:43 AM · Restricted Project
hliao requested review of D103920: [amdgpu] Add `-enable-ocl-mangling-mismatch-workaround`..
Tue, Jun 8, 11:42 AM · Restricted Project

Tue, Jun 1

hliao added a comment to D103205: [MIRPrinter] Add machine metadata support..

Kindly PING

Tue, Jun 1, 9:15 AM · Restricted Project

Fri, May 28

hliao added inline comments to D103205: [MIRPrinter] Add machine metadata support..
Fri, May 28, 8:50 AM · Restricted Project

Thu, May 27

hliao updated the diff for D103205: [MIRPrinter] Add machine metadata support..

Fix warngins from clang-tidy.

Thu, May 27, 2:38 PM · Restricted Project
hliao updated the diff for D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Rebase

Thu, May 27, 2:01 PM · Restricted Project
hliao updated the diff for D103282: [MIRParser] Add machine metadata..

Rebase

Thu, May 27, 2:00 PM · Restricted Project
hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Clean up after splitting MIR printer and parser changes.

Thu, May 27, 1:56 PM · Restricted Project
hliao added a comment to D103205: [MIRPrinter] Add machine metadata support..

the patch is revised following comments.

Thu, May 27, 1:55 PM · Restricted Project
hliao updated the diff for D103205: [MIRPrinter] Add machine metadata support..

Separate that test into MIR.

Thu, May 27, 1:46 PM · Restricted Project
hliao updated the diff for D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Clean up after splitting MIR printer and parser changes.

Thu, May 27, 1:14 PM · Restricted Project
hliao requested review of D103282: [MIRParser] Add machine metadata..
Thu, May 27, 1:03 PM · Restricted Project
hliao updated the diff for D103205: [MIRPrinter] Add machine metadata support..

Revise that machine tracker a bit to ensure machine metadata is alway per-function state.

Thu, May 27, 11:50 AM · Restricted Project

Wed, May 26

hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

OK, I need to add unit tests to verify that as we won't be able to generate machine metadata within the current backend. I will prepare that this night.

Wed, May 26, 1:44 PM · Restricted Project
hliao requested review of D103205: [MIRPrinter] Add machine metadata support..
Wed, May 26, 1:44 PM · Restricted Project

Tue, May 25

hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Can you please split off the metadata parsing part into a separate patch? It's not directly related to the memcpy lowering.

Tue, May 25, 1:31 PM · Restricted Project
hliao updated the diff for D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Rebase.

Tue, May 25, 12:14 PM · Restricted Project
hliao committed rGc9dd29925f0c: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics. (authored by hliao).
[SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.
Tue, May 25, 11:43 AM
hliao closed D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..
Tue, May 25, 11:43 AM · Restricted Project
hliao committed rG4df3b60199ef: Add pre-commit tests for [D102215](https://reviews.llvm.org/D102215). (authored by hliao).
Add pre-commit tests for [D102215](https://reviews.llvm.org/D102215).
Tue, May 25, 11:43 AM
hliao abandoned D96980: [amdgpu] Revert agnostic SGPR spill..
Tue, May 25, 11:36 AM · Restricted Project
hliao updated the diff for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..

Rebase on the pre-committed test D103106.

Tue, May 25, 11:35 AM · Restricted Project
hliao requested review of D103106: Add pre-commit tests for [D102215](https://reviews.llvm.org/D102215)..
Tue, May 25, 11:34 AM · Restricted Project
hliao added inline comments to D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..
Tue, May 25, 9:25 AM · Restricted Project
hliao updated the diff for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..

Move the AAInfo preparation into the code lowering memcpy/memmove/memset.

Tue, May 25, 9:22 AM · Restricted Project
hliao added a comment to D102821: [SelectionDAG] Re-calculate scoped AA metadata when merging stores..

PING

Tue, May 25, 5:38 AM · Restricted Project
hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

PING

Tue, May 25, 5:38 AM · Restricted Project
hliao added a comment to D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..

PING for review

Tue, May 25, 5:37 AM · Restricted Project

May 20 2021

hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Just read the relevant threads and bugs reported on the change of allowing exact-overlap on llvm.memcpy. See the reference list at the end. Personally, I think it's OK to assume NoAlias added here. By allowing exact-overlap in llvm.memcpy, the most significant change is on the basic-aa, which must consider the case where the source and destination of llvm.memcpy is the same. The make senses at the IR level, where llvm.memcpy is treated as a single op as the exact-overlap means the copy is a no-op and won't always overwrite the destination memory. But, where we lower that copy into loads/stores, we say that loads/stores won't alias. That's fine as the order between loads and stores on the same offset (or location for the exact-overlap case) is established through the data dependency. In addition, the no-alias established here is a scoped one, which only applies to loads/stored from this llvm.memcpy only. It won't affect the AA result between them to loads/stores out of the scope. (This patch depends on https://reviews.llvm.org/D102215, which propagates scoped AA on mem ops into loads/stores after lowering.)

May 20 2021, 12:49 PM · Restricted Project

May 19 2021

hliao requested review of D102821: [SelectionDAG] Re-calculate scoped AA metadata when merging stores..
May 19 2021, 7:02 PM · Restricted Project
hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

I agree that rescheduling the loads/stores is correct even if src and dst are equal. However, the metadata itself is still incorrect: It will claim that the loads/stores are NoAlias, even though they are actually MustAlias.

The proposed version introduces 2 scopes, assigning all store to one scope and all loads to another.

I order to avoid the MustAlias vs NoAlias problem, we could introduce a scope per load-store pair. Of course, that might be a lot of extra scopes..

This was discussed earlier today in LLVM's AA Technical call. This method should work, but one possible gotcha came up: sometimes a memcpy lowering results in overlapping load/stores. Those overlapping load/stores must remain 'aliasing', so they should belong to the same scope.

possible example:

// memcpy(dst, src, 23) becomes:
store i64 dst+0, (load i64 src+0)    // scope 0
store i64 dst+8, (load i64, src+8)   // scope 1, not overlapping with previous pair
store i64 dst+15, (load i64, src+15) // also scope 1, overlapping with previous pair
May 19 2021, 1:08 PM · Restricted Project
hliao updated the diff for D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Add MIR printer and parser support for scoped AA metadata generated in the backend.

May 19 2021, 1:02 PM · Restricted Project
hliao added a reviewer for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.: jeroen.dobbelaere.
May 19 2021, 12:23 PM · Restricted Project
hliao updated the diff for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..

Revise the AA metadata matching.

May 19 2021, 12:18 PM · Restricted Project
hliao updated the diff for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..

Include tests examining the MIR to ensure scoped AA metadata are attached on loads/stores lowered from mem ops.

May 19 2021, 9:24 AM · Restricted Project

May 11 2021

hliao accepted D102278: AMDGPU: Fix assert on constant load from addrspacecasted pointer.

Good catch! LGTM!

May 11 2021, 2:27 PM · Restricted Project
hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

It does look like a valid way to indicate that the individual loads and stores are independent, except for their value dependency.

Given the very late introduction of new !alias.scope and !noalias metadata, is their a way to have a testcase look at a machine ir dump, together with the metadata output at that phase ?

May 11 2021, 2:11 PM · Restricted Project
hliao added a comment to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..

Do I understand correctly that this patch is trying to claim that the memcpy src and dst do not alias through scoped alias metadata? If so, I'm afraid this is incorrect. Our contract for llvm.memcpy requires that src/dst are either NoAlias or MustAlias (but not PartialAlias). From LangRef:

The ‘llvm.memcpy.*’ intrinsics copy a block of memory from the source location to the destination location, which must either be equal or non-overlapping.

May 11 2021, 1:17 PM · Restricted Project
hliao added inline comments to D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..
May 11 2021, 10:14 AM · Restricted Project
hliao added reviewers for D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy.: gchatelet, hfinkel, bogner, t.p.northover, nemanjai.
May 11 2021, 10:06 AM · Restricted Project
hliao requested review of D102255: [SelectionDAG] Generate scoped AA metadata when lowering memcpy..
May 11 2021, 10:03 AM · Restricted Project
hliao updated the diff for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..

Add more tests and revise following the comment.

May 11 2021, 8:00 AM · Restricted Project

May 10 2021

hliao added reviewers for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.: hfinkel, bogner.
May 10 2021, 8:52 PM · Restricted Project
hliao added a reviewer for D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics.: gchatelet.
May 10 2021, 8:40 PM · Restricted Project
hliao requested review of D102215: [SelectionDAG] Propagate scoped AA metadata when lowering mem intrinsics..
May 10 2021, 8:34 PM · Restricted Project

May 7 2021

hliao committed rG631da3b15203: Replace a remaining CRLF with LF. NFC. (authored by hliao).
Replace a remaining CRLF with LF. NFC.
May 7 2021, 10:10 PM

Apr 16 2021

hliao added a reverting change for rGef620c40f371: [Support] Don't include <algorithm> in Hashing.h: rG853da5977e74: Revert "[Support] Don't include <algorithm> in Hashing.h".
Apr 16 2021, 9:19 AM
hliao committed rG853da5977e74: Revert "[Support] Don't include <algorithm> in Hashing.h" (authored by hliao).
Revert "[Support] Don't include <algorithm> in Hashing.h"
Apr 16 2021, 9:19 AM
hliao added a reverting change for D100657: [Support] Don't include <algorithm> in Hashing.h: rG853da5977e74: Revert "[Support] Don't include <algorithm> in Hashing.h".
Apr 16 2021, 9:19 AM · Restricted Project

Apr 9 2021

hliao added inline comments to D99635: [SelectionDAG] Add extra check on asm operand legalization..
Apr 9 2021, 5:24 PM · Restricted Project
hliao added a comment to D99635: [SelectionDAG] Add extra check on asm operand legalization..

Missing test

Apr 9 2021, 6:50 AM · Restricted Project

Apr 5 2021

hliao abandoned D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Apr 5 2021, 8:06 AM · Restricted Project
hliao resigned from D96517: [AMDGPU] Optimize SGPR to scratch spilling.
Apr 5 2021, 8:02 AM · Restricted Project
hliao added a comment to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..

I think this requires a lot more thought.

+1

What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:

  1. Load from a fixed lane of a VGPR using V_READLANE.

That depends on how we spill a SGPR by writing a fixed lane or write an active lane. The 1st one, without saving/restoring, we will overwrite the live values in the inactive lanes. HPC workloads are hit by that issue and cannot run correctly.

Let me rephrase to make sure I understood you correctly. You're saying that spilling an SGPR to a fixed lane of a VGPR may cause data of an inactive lane to be overwritten. This is a problem if the spill/reload happens in a called function, because VGPR save/reload doesn't save those inactive lanes. (HPC is irrelevant here.)

Apr 5 2021, 7:06 AM · Restricted Project

Apr 3 2021

hliao added a comment to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..

I think this requires a lot more thought.

+1

What I'd like to know: why are we reloading a lane mask via V_READFIRSTLANE in the first place? I would expect one of two types of reload:

  1. Load from a fixed lane of a VGPR using V_READLANE.
Apr 3 2021, 8:16 AM · Restricted Project

Mar 31 2021

hliao added inline comments to D99635: [SelectionDAG] Add extra check on asm operand legalization..
Mar 31 2021, 6:03 PM · Restricted Project

Mar 30 2021

hliao abandoned D92394: [amdgpu] Teach one more case for assumed global pointers..

the case is no longer valid considering concurrent kernel execution.

Mar 30 2021, 9:05 PM · Restricted Project
hliao edited reviewers for D99635: [SelectionDAG] Add extra check on asm operand legalization., added: bogner; removed: JustinBorb.
Mar 30 2021, 8:24 PM · Restricted Project
hliao requested review of D99635: [SelectionDAG] Add extra check on asm operand legalization..
Mar 30 2021, 8:23 PM · Restricted Project
hliao added a comment to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..

It seems to me that we may need to revise CFG lowering to avoid updating EXEC directly and later revise it based on whether the restoring mask needs reloading or not. Here's the brief thought in my mind:

  • Instead of lowering CFG early before RA, lower it after RA. As a byproduct, it also remove the need of "terminator" version of exec mask manipulation instructions.
  • When CFG is being lowered, it could update EXEC eagerly if the merge point doesn't need to reload the mask; Otherwise, it just needs to translate as what we currently did.
Mar 30 2021, 6:11 AM · Restricted Project

Mar 29 2021

hliao added a comment to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..

I'm not comfortable adding a pass to fixup a bug in control flow lowering. I think we just need to actually try to model divergent predecessors/successors explicitly

Mar 29 2021, 12:57 PM · Restricted Project
hliao added inline comments to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Mar 29 2021, 10:30 AM · Restricted Project
hliao added inline comments to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Mar 29 2021, 10:03 AM · Restricted Project
hliao added inline comments to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Mar 29 2021, 9:59 AM · Restricted Project
hliao added inline comments to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Mar 29 2021, 9:56 AM · Restricted Project
hliao added a reviewer for D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask.: dstuttard.
Mar 29 2021, 8:12 AM · Restricted Project
hliao added inline comments to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Mar 29 2021, 7:54 AM · Restricted Project
hliao added a comment to D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..

This's the companion fix for D96980. That explain the myth why the original agnostic SGPR spill/reload is proposed to solve the issue SGRP spill/reload may be executed when exec mask goes to zero.
We did try to skip executing code when exec mask goes to zero by branch on EXECZ (to the target block) or EXECNZ (to the fallthrough block.) We may run instructions with zero exec mask. But, that's usually not an issue as we immediately restore the exec mask on the targeted block. Code following that exec mask restoration won't be executed in 0 mask. However, if that mask restoration needs to reload a spilled exec mask, we will run the SGPR reload with 0 mask, where v_readfristlane has undefined behavior when exec mask is zero.
This patch tries to mitigate that case by not evaluating exec mask that early or clearing the exec mask when the branch target has mask restoration following SGRP reload. Instead of checking EXECZ or EXECNZ, the exec mask evaluation is duplicated with a temporary SGRP as the destination (without updating exec mask directly), checking SCC0 is equivalent to EXECZ. Exec mask is only evaluated when the result won't be zero. For instance,

Mar 29 2021, 7:52 AM · Restricted Project
hliao requested review of D99507: [amdgpu] Add a pass to avoid jump into blocks with 0 exec mask..
Mar 29 2021, 7:38 AM · Restricted Project

Feb 25 2021

hliao added inline comments to D97318: [clang][CodeGen] Allow fp16 arg pass by register.
Feb 25 2021, 8:28 AM
hliao added inline comments to D97318: [clang][CodeGen] Allow fp16 arg pass by register.
Feb 25 2021, 8:24 AM

Feb 24 2021

hliao committed rG0d4e12e3c110: [amdgpu] Atomic should be source of divergence. (authored by hliao).
[amdgpu] Atomic should be source of divergence.
Feb 24 2021, 12:29 PM
hliao closed D97392: [amdgpu] Atomic should be source of divergence..
Feb 24 2021, 12:29 PM · Restricted Project
hliao added inline comments to D97392: [amdgpu] Atomic should be source of divergence..
Feb 24 2021, 12:06 PM · Restricted Project
hliao added inline comments to D97392: [amdgpu] Atomic should be source of divergence..
Feb 24 2021, 11:10 AM · Restricted Project
hliao updated the diff for D97392: [amdgpu] Atomic should be source of divergence..

Update tests with more atomic ops.

Feb 24 2021, 11:10 AM · Restricted Project
hliao added reviewers for D97392: [amdgpu] Atomic should be source of divergence.: arsenm, rampitec, cfang.
Feb 24 2021, 8:15 AM · Restricted Project
hliao updated the diff for D97392: [amdgpu] Atomic should be source of divergence..

Update summary.

Feb 24 2021, 8:15 AM · Restricted Project
hliao requested review of D97392: [amdgpu] Atomic should be source of divergence..
Feb 24 2021, 8:13 AM · Restricted Project
hliao added a reviewer for D96980: [amdgpu] Revert agnostic SGPR spill.: sebastian-ne.
Feb 24 2021, 7:04 AM · Restricted Project
hliao added a comment to D96980: [amdgpu] Revert agnostic SGPR spill..

In addition, we already heavily use v_readfirstlane in our codegen due to some patterns benefits by using vector instructions when no corresponding scalar instructions could be used. I believed it's quite safe that it's guaranteed v_readfirstlane won't be executed when exec mask goes to 0.

Feb 24 2021, 7:03 AM · Restricted Project
hliao updated the diff for D96980: [amdgpu] Revert agnostic SGPR spill..

Remove that unnecessary change and add rationale why that's safe for the original concerns.

Feb 24 2021, 6:59 AM · Restricted Project

Feb 23 2021

hliao added inline comments to D96980: [amdgpu] Revert agnostic SGPR spill..
Feb 23 2021, 9:33 AM · Restricted Project