Page MenuHomePhabricator

alex-t (Alexander)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 26 2016, 7:17 AM (164 w, 5 d)

Recent Activity

Fri, Sep 20

alex-t accepted D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.

LGTM but make sure it passes recently updated lit tests.

Fri, Sep 20, 12:04 AM

Thu, Sep 19

alex-t committed rL372340: [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from….
[AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from…
Thu, Sep 19, 9:45 AM
alex-t closed D67712: [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from min/max lit tests..
Thu, Sep 19, 9:45 AM · Restricted Project
alex-t added inline comments to D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.
Thu, Sep 19, 12:48 AM · Restricted Project

Wed, Sep 18

alex-t added a comment to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.

I also have a question: why do you need to remove that predicates at all? Does it mean that to enable GlobalISel you'd need to remove all the divergent predicates stuff?

The patterns are rejected for having custom predicate code. We don't need it as a standard pattern checks the register banks, which is what we really want. I think the cleanest compatibility solution would be to use divergence to guide register class selection, and somehow add the equivalent checks in the DAG patterns. Defining the equivalent GlobalISel pattern as always true would probably work, although leave a lot of clutter in the matching tables

Wed, Sep 18, 11:48 PM
alex-t created D67712: [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from min/max lit tests..
Wed, Sep 18, 8:06 AM · Restricted Project
alex-t added a comment to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.

I also have a question: why do you need to remove that predicates at all? Does it mean that to enable GlobalISel you'd need to remove all the divergent predicates stuff?

Wed, Sep 18, 6:55 AM
alex-t added inline comments to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.
Wed, Sep 18, 6:55 AM

Tue, Sep 17

alex-t created D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.
Tue, Sep 17, 8:32 AM · Restricted Project
alex-t added inline comments to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.
Tue, Sep 17, 2:58 AM
alex-t committed rL372086: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed.
[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Tue, Sep 17, 2:08 AM

Mon, Sep 16

alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Changed according the reviewer's request + some refactoring done.

Mon, Sep 16, 9:33 AM · Restricted Project

Fri, Sep 13

alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Wave32/Wave64 cases

Fri, Sep 13, 1:42 PM · Restricted Project
alex-t added inline comments to D67101: Target hooks for custom COPY insertion..
Fri, Sep 13, 1:32 PM · Restricted Project
alex-t added inline comments to D67101: Target hooks for custom COPY insertion..
Fri, Sep 13, 1:32 PM · Restricted Project
alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Empty block handling added

Fri, Sep 13, 1:32 PM · Restricted Project
alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Commit was reverted due to the EXPENSIVE_CHECKS failure.
Issue fixed

Fri, Sep 13, 11:33 AM · Restricted Project
alex-t committed rL371873: Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion..
Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Fri, Sep 13, 10:35 AM
alex-t reopened D67101: Target hooks for custom COPY insertion..
Fri, Sep 13, 10:35 AM · Restricted Project
alex-t added a comment to D67101: Target hooks for custom COPY insertion..
Fri, Sep 13, 12:37 AM · Restricted Project

Tue, Sep 10

alex-t closed D67101: Target hooks for custom COPY insertion..
Tue, Sep 10, 5:05 AM · Restricted Project
alex-t committed rL371508: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion..
[AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Tue, Sep 10, 3:57 AM

Thu, Sep 5

alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Changed according to the reviewers requests. Tests added.

Thu, Sep 5, 7:00 AM · Restricted Project

Tue, Sep 3

alex-t added a comment to D67101: Target hooks for custom COPY insertion..

Would be good to have test coverage either in the same patch or have dependent patch explicitly marked as such.

Tue, Sep 3, 10:45 AM · Restricted Project
alex-t added a reviewer for D67101: Target hooks for custom COPY insertion.: vpykhtin.
Tue, Sep 3, 10:00 AM · Restricted Project
alex-t created D67101: Target hooks for custom COPY insertion..
Tue, Sep 3, 8:03 AM · Restricted Project

Aug 21 2019

alex-t committed rL369532: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
Aug 21 2019, 8:14 AM
alex-t closed D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Aug 21 2019, 8:14 AM · Restricted Project

Aug 15 2019

alex-t updated the diff for D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.

Suggested nits added.

Aug 15 2019, 7:58 AM · Restricted Project

Aug 8 2019

alex-t updated the diff for D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.

According the reviewer's request the check for implicit defs has been added.
New helper functions in MachineInstr.h are necessary because the current interface like MachineInstr::getNumExplicitDefs returns just the MCInstDesc::NumDefs for non variadic opcodes.

Aug 8 2019, 5:58 AM · Restricted Project

Jul 30 2019

alex-t added a comment to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.

Just ping! Does anybody has any objections?

Jul 30 2019, 5:26 AM · Restricted Project
alex-t added a reviewer for D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions: qcolombet.
Jul 30 2019, 5:25 AM · Restricted Project

Jul 2 2019

alex-t committed rL364952: [AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit.
[AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit
Jul 2 2019, 11:18 AM
alex-t committed rL364950: [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent….
[AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent…
Jul 2 2019, 11:01 AM
alex-t closed D63953: [AMDGPU] LCSSA pass added in preISel. .
Jul 2 2019, 11:01 AM · Restricted Project

Jul 1 2019

alex-t updated the diff for D63953: [AMDGPU] LCSSA pass added in preISel. .

isLCSSAForm check is under EXPENSIVE_CHECKS

Jul 1 2019, 8:29 AM · Restricted Project

Jun 28 2019

alex-t created D63953: [AMDGPU] LCSSA pass added in preISel. .
Jun 28 2019, 12:14 PM · Restricted Project

Jun 25 2019

alex-t added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 25 2019, 10:31 AM · Restricted Project
alex-t added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 25 2019, 6:57 AM · Restricted Project

Jun 24 2019

alex-t added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 24 2019, 12:00 PM · Restricted Project
alex-t created D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 24 2019, 11:09 AM · Restricted Project

Jun 20 2019

alex-t abandoned D63489: [InstSimplify] LCSSA PHIs should not be simplified away.
Jun 20 2019, 6:35 AM · Restricted Project

Jun 19 2019

alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

Can't you just schedule an LCSSAPass somewhere late in pipeline, before SelectionDAG and after early-cse?

Jun 19 2019, 8:52 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

Jun 19 2019, 8:49 AM · Restricted Project

Jun 18 2019

alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

Jun 18 2019, 11:53 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Jun 18 2019, 11:53 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?
Jun 18 2019, 10:58 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

The other condition should probably be TTI.hasBranchDivergence().

Jun 18 2019, 10:20 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

Jun 18 2019, 7:55 AM · Restricted Project
alex-t updated the diff for D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

MIR test added

Jun 18 2019, 7:42 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Okay. I'm about to start partial revert of the change.
Could you please provide me test cases so that I can check if my further fixes help.

See below the good and bad outputs for one CTS failure:

GOOD: https://hastebin.com/muwuwivofu
BAD: https://hastebin.com/gofawejoku

Thanks again for looking into this.

Note that this change also breaks https://bugs.freedesktop.org/show_bug.cgi?id=110811

Jun 18 2019, 6:17 AM · Restricted Project
alex-t created D63489: [InstSimplify] LCSSA PHIs should not be simplified away.
Jun 18 2019, 6:13 AM · Restricted Project

Jun 6 2019

alex-t committed rL362749: [AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd.
[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd
Jun 6 2019, 2:10 PM

Jun 5 2019

alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Jun 5 2019, 2:07 AM · Restricted Project

Jun 4 2019

alex-t created D62869: [AMDGPU] Partial revert for the "Assign register class for cross block values according to the divergence.".
Jun 4 2019, 11:26 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

A number of Mesa piglit tests are also affected at least on Bonaire (but it seems to not be GPU-specific, I haven't had a chance to look at it further).

- bin/ext_transform_feedback-order elements triangles
- bin/ext_transform_feedback-order elements points
- bin/ext_transform_feedback-order elements lines
- bin/ext_transform_feedback-order arrays triangles>
- bin/ext_transform_feedback-order arrays points
- bin/ext_transform_feedback-order arrays lines
- arb_clear_buffer_object-formats (96-bit clears)

It's unclear whether the regression is caused by this particular commit or by the subsequent ASAN fix.

Jun 4 2019, 5:13 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

This patch fixes the CTS failures on my side. I have just tried the latest version.

Err, only a subset is fixed actually.

Jun 4 2019, 5:07 AM · Restricted Project

Jun 3 2019

alex-t added inline comments to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.
Jun 3 2019, 8:01 AM · Restricted Project

Jun 2 2019

alex-t updated the diff for D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

Alternative fix.

Jun 2 2019, 9:27 AM · Restricted Project

May 31 2019

alex-t added a comment to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

See D60834 for what I think is the right direction to fix this.

Once again:

LCSSA is not sufficient for the following reason:
Since definition is considered uniform it is selected to SALU and produces SGPR.
The value in this SGPR is correct for all laves leave in loop body.
LCSSA PHI node that inserted in loop exit block will turn into the copy from SGPR to VGPR – this is incorrect.

Why isn't sgpr to vgpr copy correct? If it is done still inside the loop it is suboptimal but correct. When done outside of the loop it is incorrect, that's true.

May 31 2019, 9:32 AM · Restricted Project
alex-t updated the diff for D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

added the Divergent Analysis test update that was missed

May 31 2019, 8:45 AM · Restricted Project
alex-t added a comment to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

I don't agree that the enhancement the definition of the "divergence" to the scope is correct way at all.
literally, the value is uniform if all threads observe same value. Nothing about exec mask, lanes or GPU :)
All threads in our case are all executing loop body. That's it.

May 31 2019, 3:35 AM · Restricted Project
alex-t added a comment to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

See D60834 for what I think is the right direction to fix this.

May 31 2019, 3:26 AM · Restricted Project

May 30 2019

alex-t added a reviewer for D62614: Fix for the OCL/LC to failure on some OCLPerf tests: nhaehnle.
May 30 2019, 7:02 AM · Restricted Project
alex-t added inline comments to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.
May 30 2019, 4:38 AM · Restricted Project
alex-t updated the diff for D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

Added comments describing the reason for the change.

May 30 2019, 4:38 AM · Restricted Project

May 29 2019

alex-t added a reviewer for D62614: Fix for the OCL/LC to failure on some OCLPerf tests: rampitec.
May 29 2019, 11:29 AM · Restricted Project
alex-t created D62614: Fix for the OCL/LC to failure on some OCLPerf tests.
May 29 2019, 11:29 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

May 29 2019, 9:34 AM · Restricted Project

May 28 2019

alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

May 28 2019, 10:26 AM · Restricted Project

May 27 2019

alex-t committed rL361776: [AMDGPU] Fix for the address sanitizer failure. Fixing typo.
[AMDGPU] Fix for the address sanitizer failure. Fixing typo
May 27 2019, 11:15 AM
alex-t committed rL361770: [AMDGPU] Fix for the address sanitizer failure caused by the ifollowing….
[AMDGPU] Fix for the address sanitizer failure caused by the ifollowing…
May 27 2019, 8:00 AM

May 26 2019

alex-t committed rL361741: [AMDGPU] Divergence driven ISel. Assign register class for cross block….
[AMDGPU] Divergence driven ISel. Assign register class for cross block…
May 26 2019, 1:31 PM

May 24 2019

alex-t committed rL361644: [AMDGPU] Divergence driven ISel. Assign register class for cross block values….
[AMDGPU] Divergence driven ISel. Assign register class for cross block values…
May 24 2019, 8:34 AM
alex-t closed D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
May 24 2019, 8:33 AM · Restricted Project

May 23 2019

alex-t updated the diff for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

rebased

May 23 2019, 9:15 AM · Restricted Project

May 20 2019

alex-t updated the diff for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

more formatting + new test updated

May 20 2019, 1:37 AM · Restricted Project

May 15 2019

alex-t updated the diff for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

formatting etc

May 15 2019, 5:13 AM · Restricted Project

May 14 2019

alex-t updated the diff for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Added fixes after extended testing. Also GFX10 related update.

May 14 2019, 5:07 AM · Restricted Project

Apr 23 2019

alex-t accepted D60999: AMDGPU: Fix LCSSA phi lowering in SILowerI1Copies.

LGTM

Apr 23 2019, 4:53 AM · Restricted Project

Apr 8 2019

alex-t added inline comments to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Apr 8 2019, 8:52 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Apr 8 2019, 8:10 AM · Restricted Project

Apr 5 2019

alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Adding llvm-commits to the CC. Please be more careful about that in the future... see http://llvm.org/docs/Phabricator.html

Apr 5 2019, 1:42 AM · Restricted Project

Apr 4 2019

alex-t added inline comments to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Apr 4 2019, 7:50 AM · Restricted Project

Apr 3 2019

alex-t added inline comments to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Apr 3 2019, 4:37 AM · Restricted Project

Apr 2 2019

alex-t added a reviewer for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence.: efriedma.
Apr 2 2019, 6:00 AM · Restricted Project
alex-t updated the diff for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

changed according the reviewer request

Apr 2 2019, 5:58 AM · Restricted Project

Mar 29 2019

alex-t updated the diff for D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Mar 29 2019, 8:15 AM · Restricted Project
alex-t created D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..
Mar 29 2019, 6:41 AM · Restricted Project

Jan 3 2019

alex-t committed rL350350: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression..
[AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Jan 3 2019, 11:59 AM
alex-t closed D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Jan 3 2019, 11:59 AM
alex-t updated the summary of D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Jan 3 2019, 10:50 AM
alex-t updated the summary of D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Jan 3 2019, 10:08 AM
alex-t retitled D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression from [AMDGPU] Fix scalar operand folding. to [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Jan 3 2019, 10:06 AM

Dec 30 2018

alex-t created D56161: [AMDGPU] Fix scalar operand folding bug that causes SHOC performance regression.
Dec 30 2018, 11:18 AM

Nov 14 2018

alex-t added inline comments to D54340: AMDGPU: Fix various issues around the VirtReg2Value mapping.
Nov 14 2018, 3:44 AM

Oct 26 2018

alex-t added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

It seems like we have to further develop this approach to deal with the scalar comparison instructions.
For instance, S_CMP_* does not produce any result but implicitly defines SCC.
Thus, InstrEmitter will insert the copies all the time.
Since DAG operator SETCC produces i1 value there will be the SCC to VReg_1 copies.
I not trying to invent a method to lower that copies.
First issue: in case all the uses are not divergent I don't need the V_CND_MASK -1,0 -> V_CMP_NE 0 pair
I need S_CSELECT -1, 0 immediately after the definition (to save SCC) and S_CMP_NE 0 just before use to rematerialize SCC
Second issue: I only need to save/restore if there are SCC defs in between.
So, we need to take into account not divergent flow as well.

Oct 26 2018, 6:53 AM

Oct 25 2018

alex-t added inline comments to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.
Oct 25 2018, 6:27 AM

Oct 16 2018

alex-t accepted D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.

LGTM

Oct 16 2018, 12:26 AM · Restricted Project

Oct 1 2018

alex-t committed rL343455: [AMDGPU] Divergence driven instruction selection. Shift operations..
[AMDGPU] Divergence driven instruction selection. Shift operations.
Oct 1 2018, 4:08 AM