Page MenuHomePhabricator
Feed Advanced Search

Mon, Jun 29

hliao accepted D82764: [NFC] Fixed ignored .hip test..

Thanks for catching that!

Mon, Jun 29, 7:31 AM · Restricted Project

Fri, Jun 26

hliao added a comment to D78655: [CUDA][HIP] Let lambda be host device by default.

Now, back to the specifics of your example. I'm still not 100% sure I understand what the problem is. Can you boil down the use case to an example on godbolt?

I dont have a specific example, but there could be code like this generic clip operator:

template<class F, class T>
void clip(F f,
          const T& min_val,
          const T& max_val)
{

    f([=](auto x) {
        return ::min<decltype(x)>(::max<decltype(x)>(min_val, x), max_val);
    });
}
Fri, Jun 26, 9:50 AM

Thu, Jun 25

hliao committed rG471c806a45bb: [hip] Refine `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu` (authored by hliao).
[hip] Refine `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu`
Thu, Jun 25, 9:17 PM
hliao abandoned D80237: [hip] Ensure pointer in struct argument has proper `addrspacecast`..
Thu, Jun 25, 8:45 PM · Restricted Project
hliao abandoned D81670: [TTI] Expose isNoopAddrSpaceCast from TLI. [SROA] Teach SROA to recognize no-op addrspacecast..
Thu, Jun 25, 8:45 PM · Restricted Project, Restricted Project
hliao committed rG0723b1891fac: [hip] Re-enable `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu` (authored by hliao).
[hip] Re-enable `clang/test/CodeGenCUDA/amdgpu-kernel-arg-pointer-type.cu`
Thu, Jun 25, 7:40 PM
hliao committed rGd3f437d35189: [hip] Disable test temporarily due to failures on build servers. (authored by hliao).
[hip] Disable test temporarily due to failures on build servers.
Thu, Jun 25, 7:08 PM
hliao committed rGdccfaacf93e1: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`. (authored by hliao).
[InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`.
Thu, Jun 25, 6:02 PM
hliao closed D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..
Thu, Jun 25, 6:02 PM · Restricted Project, Restricted Project
hliao added inline comments to D62911: WIP: AMDGPU: Use fixup for local linkage functions.
Thu, Jun 25, 4:23 PM
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

remove 'an'

Thu, Jun 25, 3:50 PM · Restricted Project, Restricted Project
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Revie the grammar. s/is/as/ in the first sentence.

Thu, Jun 25, 2:44 PM · Restricted Project, Restricted Project
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Revise

Thu, Jun 25, 12:28 PM · Restricted Project, Restricted Project
hliao added a comment to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

ping for code review

Thu, Jun 25, 9:40 AM · Restricted Project, Restricted Project
hliao added a comment to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

My understanding is this feature is equivalent to the OpenCL dynamic group segment allocation. The runtime would presumably implement it in a similar way.

So the HIP runtime must take the static LDS size, round up to the alignment requirement of the dynamic allocation (OpenCL just uses the maximally aligned OpenCL data type), then add the size of the dynamic LDS. The AQL packet group segment field is set to the total LDS size.

In OpenCL there can be multiple kernel arguments, and the LDS address is passed to each. But for HIP there is only one dynamic area denoted by this weird extern. How is the dynamic LDS storage accessed? Is the address passed as an implicit kernel argument, or does the compiler implicitly use the aligned static LDS size?

Thu, Jun 25, 8:00 AM · Restricted Project

Wed, Jun 24

hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Rebase to the trunk.

Wed, Jun 24, 10:14 PM · Restricted Project, Restricted Project
hliao added inline comments to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Wed, Jun 24, 2:07 PM · Restricted Project
hliao added a comment to D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..

I just found that change for non-HSA/-PAL environment. I need to check how it works and fit into other tests. So far, that's a critical change to ensure we won't change the original source code too much. Is it possible to address that relocation in a long run (says 1~3 weeks) to avoid the tight schedule.

Wed, Jun 24, 2:07 PM · Restricted Project
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Revise again.

Wed, Jun 24, 2:07 PM · Restricted Project, Restricted Project
hliao created D82496: [amdgpu] Add codegen support for HIP dynamic shared memory..
Wed, Jun 24, 1:01 PM · Restricted Project
hliao committed rGebc9e0f1f078: Fix coding style. NFC. (authored by hliao).
Fix coding style. NFC.
Wed, Jun 24, 10:18 AM
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

enhance test more.

Wed, Jun 24, 8:36 AM · Restricted Project, Restricted Project
hliao added inline comments to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..
Wed, Jun 24, 8:36 AM · Restricted Project, Restricted Project
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Add comments and enhance tests.

Wed, Jun 24, 8:36 AM · Restricted Project, Restricted Project
hliao added inline comments to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..
Wed, Jun 24, 8:04 AM · Restricted Project, Restricted Project

Tue, Jun 23

hliao added a comment to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

ping for code review

Tue, Jun 23, 8:31 PM · Restricted Project, Restricted Project

Mon, Jun 22

hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Rebase to trunk. All prerequisites are landed.

Mon, Jun 22, 11:25 PM · Restricted Project, Restricted Project
hliao committed rGf95850ce9c75: [SROA] Teach SROA to perform no-op pointer conversion. (authored by hliao).
[SROA] Teach SROA to perform no-op pointer conversion.
Mon, Jun 22, 11:05 PM
hliao closed D81943: [SROA] Teach SROA to perform no-op pointer conversion..
Mon, Jun 22, 11:04 PM · Restricted Project
hliao committed rGb1360caa823d: [SDAG] Add new AssertAlign ISD node. (authored by hliao).
[SDAG] Add new AssertAlign ISD node.
Mon, Jun 22, 10:01 PM
hliao closed D81711: [SDAG] Add new AssertAlign ISD node..
Mon, Jun 22, 10:01 PM · Restricted Project
hliao updated the diff for D81943: [SROA] Teach SROA to perform no-op pointer conversion..

Add a new test on non-integral pointers.

Mon, Jun 22, 9:29 PM · Restricted Project
hliao updated the diff for D81943: [SROA] Teach SROA to perform no-op pointer conversion..

fix typo!

Mon, Jun 22, 8:56 PM · Restricted Project
hliao added inline comments to D81943: [SROA] Teach SROA to perform no-op pointer conversion..
Mon, Jun 22, 8:56 PM · Restricted Project
hliao updated the diff for D81943: [SROA] Teach SROA to perform no-op pointer conversion..

Check non-integral pointers.

Mon, Jun 22, 8:56 PM · Restricted Project
hliao added a comment to D81711: [SDAG] Add new AssertAlign ISD node..

ping for code review

Mon, Jun 22, 10:44 AM · Restricted Project
hliao added a comment to D81943: [SROA] Teach SROA to perform no-op pointer conversion..

ping for code review

Mon, Jun 22, 9:07 AM · Restricted Project

Sun, Jun 21

hliao committed rG20a1700293f6: [amdgpu] Fix REL32 relocations with negative offsets. (authored by hliao).
[amdgpu] Fix REL32 relocations with negative offsets.
Sun, Jun 21, 8:13 PM
hliao closed D82234: [amdgpu] Fix REL32 relocations with negative offsets..
Sun, Jun 21, 8:13 PM · Restricted Project

Fri, Jun 19

hliao added a comment to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

PING

Fri, Jun 19, 6:59 PM · Restricted Project, Restricted Project
hliao added a comment to D81943: [SROA] Teach SROA to perform no-op pointer conversion..

PING

Fri, Jun 19, 6:59 PM · Restricted Project
hliao added a comment to D82234: [amdgpu] Fix REL32 relocations with negative offsets..

GlobalISel part isn't tested

Fri, Jun 19, 6:59 PM · Restricted Project
hliao added a comment to D81711: [SDAG] Add new AssertAlign ISD node..

PING

Fri, Jun 19, 6:59 PM · Restricted Project
hliao added a comment to D82234: [amdgpu] Fix REL32 relocations with negative offsets..

BTW, in the real example, that negative offset is created by LSR pass to reduce the code strength in a loop.

Fri, Jun 19, 2:41 PM · Restricted Project
hliao added a comment to D82234: [amdgpu] Fix REL32 relocations with negative offsets..

This patch only handles the case where that offset is representable in a 32-bit signed integer. For a generic 64-bit offset out of range of 32-bit integer, we need to revise the relocation spec to enhance REL32_HI from the orginal

Fri, Jun 19, 2:09 PM · Restricted Project
hliao created D82234: [amdgpu] Fix REL32 relocations with negative offsets..
Fri, Jun 19, 2:09 PM · Restricted Project

Thu, Jun 18

hliao added inline comments to D81711: [SDAG] Add new AssertAlign ISD node..
Thu, Jun 18, 6:03 PM · Restricted Project
hliao added inline comments to D81711: [SDAG] Add new AssertAlign ISD node..
Thu, Jun 18, 2:14 PM · Restricted Project
hliao committed rG2defe557226d: [TTI] Expose isNoopAddrSpaceCast in TTI. (authored by hliao).
[TTI] Expose isNoopAddrSpaceCast in TTI.
Thu, Jun 18, 12:01 PM
hliao closed D82025: [TTI] Expose isNoopAddrSpaceCast in TTI..
Thu, Jun 18, 12:01 PM · Restricted Project
hliao updated the diff for D81711: [SDAG] Add new AssertAlign ISD node..

Revise following reviewer's comments.

Thu, Jun 18, 12:01 PM · Restricted Project
hliao added a comment to D81943: [SROA] Teach SROA to perform no-op pointer conversion..

PING

Thu, Jun 18, 9:13 AM · Restricted Project
hliao added a comment to D82025: [TTI] Expose isNoopAddrSpaceCast in TTI..

PING

Thu, Jun 18, 9:13 AM · Restricted Project

Wed, Jun 17

hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Add TTI hooks to double-check that address space casting is no-op.

Wed, Jun 17, 7:58 PM · Restricted Project, Restricted Project
hliao created D82025: [TTI] Expose isNoopAddrSpaceCast in TTI..
Wed, Jun 17, 10:13 AM · Restricted Project
hliao added a comment to D81711: [SDAG] Add new AssertAlign ISD node..

ping

Wed, Jun 17, 8:03 AM · Restricted Project

Tue, Jun 16

hliao added inline comments to D81943: [SROA] Teach SROA to perform no-op pointer conversion..
Tue, Jun 16, 7:42 PM · Restricted Project
hliao updated the diff for D81943: [SROA] Teach SROA to perform no-op pointer conversion..

remove redundant code.

Tue, Jun 16, 7:42 PM · Restricted Project
hliao added a comment to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

I'm not entirely convinced this is safe in all contexts. I think you can argue that this is safe if it directly feeds a memory instruction, as the access would be undefined if it weren't valid to do the no-op cast. However, I'm not sure if this is safe if used purely in arithmetic contexts. If you're just comparing the reinterpreted pointer values for example, I don't think that would be undefined

Tue, Jun 16, 2:18 PM · Restricted Project, Restricted Project
hliao added a comment to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..
In D81938#2095982, @tra wrote:

This should be two separate patches - inferaddressspace and SROA.

Yes, I prepared that into 2 commits but arc combines them together.

I usually put the patches into separate branches, with one having the other one as an upstream branch and then do arc diff with each branch checked out.
The downside is that you will need to rebase infer-branch every time you update sroa-branch.
Another approach that may work is to set arc config to only consider the checked out commit only. arc set-config base "arc:this, arc:prompt" should do that.

Tue, Jun 16, 12:39 PM · Restricted Project, Restricted Project
hliao updated the diff for D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

Fix constant expression handling.

Tue, Jun 16, 12:39 PM · Restricted Project, Restricted Project
hliao retitled D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`. from [SROA] Teach SROA to perform no-op pointer conversion. [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`. to [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..
Tue, Jun 16, 9:21 AM · Restricted Project, Restricted Project
hliao created D81943: [SROA] Teach SROA to perform no-op pointer conversion..
Tue, Jun 16, 9:21 AM · Restricted Project
hliao added a comment to D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..

This should be two separate patches - inferaddressspace and SROA.

Tue, Jun 16, 8:16 AM · Restricted Project, Restricted Project
hliao committed rGe830fa260da9: [clang][amdgpu] Prefer not using `fp16` conversion intrinsics. (authored by hliao).
[clang][amdgpu] Prefer not using `fp16` conversion intrinsics.
Tue, Jun 16, 7:43 AM
hliao closed D81849: [clang][amdgpu] Prefer not using `fp16` conversion intrinsics..
Tue, Jun 16, 7:43 AM · Restricted Project
hliao created D81938: [InferAddressSpaces] Handle the pair of `ptrtoint`/`inttoptr`..
Tue, Jun 16, 7:42 AM · Restricted Project, Restricted Project

Mon, Jun 15

hliao created D81849: [clang][amdgpu] Prefer not using `fp16` conversion intrinsics..
Mon, Jun 15, 9:45 AM · Restricted Project
hliao added inline comments to D81711: [SDAG] Add new AssertAlign ISD node..
Mon, Jun 15, 8:07 AM · Restricted Project

Fri, Jun 12

hliao updated the diff for D81711: [SDAG] Add new AssertAlign ISD node..

Rebase to trunk.

Fri, Jun 12, 9:01 PM · Restricted Project
hliao committed rGec02635d104c: [amdgpu] Skip OR combining on 64-bit integer before legalizing ops. (authored by hliao).
[amdgpu] Skip OR combining on 64-bit integer before legalizing ops.
Fri, Jun 12, 12:36 PM
hliao closed D81710: [amdgpu] Skip OR combining on 64-bit integer before legalizing ops..
Fri, Jun 12, 12:35 PM · Restricted Project
hliao added a reviewer for D81711: [SDAG] Add new AssertAlign ISD node.: rampitec.
Fri, Jun 12, 12:35 PM · Restricted Project
hliao added a comment to D81710: [amdgpu] Skip OR combining on 64-bit integer before legalizing ops..

What's the rational and practical test?

The full or is a better canonical form and splitting it will interfere with other combines

OK, but is there any test?

Fri, Jun 12, 12:02 PM · Restricted Project
hliao committed rGe7b920e6fe7c: [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)) (authored by hliao).
[DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2))
Fri, Jun 12, 10:54 AM
hliao closed D81708: [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)).
Fri, Jun 12, 10:54 AM · Restricted Project
hliao updated the diff for D81708: [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)).

Revise again

Fri, Jun 12, 8:38 AM · Restricted Project
hliao added inline comments to D81711: [SDAG] Add new AssertAlign ISD node..
Fri, Jun 12, 8:37 AM · Restricted Project
hliao updated the diff for D81708: [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)).

Revise the original comment.

Fri, Jun 12, 8:03 AM · Restricted Project

Thu, Jun 11

hliao created D81711: [SDAG] Add new AssertAlign ISD node..
Thu, Jun 11, 8:52 PM · Restricted Project
hliao created D81710: [amdgpu] Skip OR combining on 64-bit integer before legalizing ops..
Thu, Jun 11, 7:47 PM · Restricted Project
hliao created D81708: [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2)).
Thu, Jun 11, 7:15 PM · Restricted Project
hliao added a comment to D81670: [TTI] Expose isNoopAddrSpaceCast from TLI. [SROA] Teach SROA to recognize no-op addrspacecast..

We should instead allow bitcast to perform no-op addrspacecasts

Thu, Jun 11, 1:13 PM · Restricted Project, Restricted Project
hliao updated the diff for D81670: [TTI] Expose isNoopAddrSpaceCast from TLI. [SROA] Teach SROA to recognize no-op addrspacecast..

Revise the formatting.

Thu, Jun 11, 10:27 AM · Restricted Project, Restricted Project
hliao created D81670: [TTI] Expose isNoopAddrSpaceCast from TLI. [SROA] Teach SROA to recognize no-op addrspacecast..
Thu, Jun 11, 9:53 AM · Restricted Project, Restricted Project

Wed, Jun 10

hliao committed rG6dd058083208: [hip] Fix the failed test case due to the additional backend phase. (authored by hliao).
[hip] Fix the failed test case due to the additional backend phase.
Wed, Jun 10, 12:15 PM
hliao added a comment to D81427: [hip] Fix device-only relocatable code compilation..

This doesn't pass tests: http://45.33.8.238/linux/19977/step_7.txt

Please take a look, and please revert for now if fixing takes a while.

Wed, Jun 10, 12:13 PM · Restricted Project
hliao committed rG8b6821a5843b: [hip] Fix device-only relocatable code compilation. (authored by hliao).
[hip] Fix device-only relocatable code compilation.
Wed, Jun 10, 11:44 AM
hliao closed D81427: [hip] Fix device-only relocatable code compilation..
Wed, Jun 10, 11:44 AM · Restricted Project
hliao added a comment to D81427: [hip] Fix device-only relocatable code compilation..

This doesn't pass tests: http://45.33.8.238/linux/19977/step_7.txt

Please take a look, and please revert for now if fixing takes a while.

Wed, Jun 10, 11:43 AM · Restricted Project

Tue, Jun 9

hliao updated the diff for D81427: [hip] Fix device-only relocatable code compilation..

Revise following reviewer's comment.

Tue, Jun 9, 8:46 AM · Restricted Project

Mon, Jun 8

hliao created D81427: [hip] Fix device-only relocatable code compilation..
Mon, Jun 8, 1:51 PM · Restricted Project
hliao committed rG43793b89a079: [lld] Fix shared library build by adding the missing dependency. (authored by hliao).
[lld] Fix shared library build by adding the missing dependency.
Mon, Jun 8, 1:19 PM
hliao accepted D63403: Make myself code owner of InferAddressSpaces.

LGTM

Mon, Jun 8, 11:35 AM
hliao added inline comments to D81297: AMDGPU: Implement computeKnownAlignForTargetInstr.
Mon, Jun 8, 7:40 AM · Restricted Project

Jun 2 2020

hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Add test case and comment on why we need to run scalar load widening after LSV.

Jun 2 2020, 9:20 AM · Restricted Project

May 29 2020

hliao added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

I did some experiments locally and think this can stay in AMDGPUCodeGenPrepare, and doesn't need the split pass. Since you restrict this widening to the case where you're rebasing the load anyway, I don't think this will cause the same problems with the vectorizer the previous IR load widening had (and may help it even?)

test3 should also come back, but should have the explicit align 4 added to the load. This could also use some loads of i8, and <2 x i8>. We could also extend this to handle wider, sub-dword aligned types but that's a separate patch.

May 29 2020, 8:42 PM · Restricted Project

May 28 2020

hliao added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Remove an inpractical test.

I mean beyond the test, I think the whole patch is unnecessary now. Do you have an example of real source that still needs this?

May 28 2020, 8:19 PM · Restricted Project
hliao updated the diff for D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

Remove an inpractical test.

May 28 2020, 2:17 PM · Restricted Project
hliao added a comment to D80364: [amdgpu] Teach load widening to handle non-DWORD aligned loads..

I'd still like to find a way to avoid a whole extra pass run for this. In the test here, the LoadStoreVectorizer should have vectorized these? Why didn't it?

I'd still like to find a way to avoid a whole extra pass run for this. In the test here, the LoadStoreVectorizer should have vectorized these? Why didn't it?

That's due to the misaligned load after coalescing. This example is written intentionally to skip LSV and verify that common widened load could be CSEd within DAG. In practice, there won't be such input as the 1st i16 load should be properly annotated with align 4.

But the load isn't misaligned. The point of adding the alignment to the intrinsic declaration was so that the whole optimization pipeline would know about the alignment. Taking this example:

after opt -instcombine:

May 28 2020, 1:13 PM · Restricted Project