Page MenuHomePhabricator
Feed Advanced Search

Sat, Jan 4

alex-t committed rGca8b20ca3ba1: [AMDGPU] need to insert wait between the scalar load and vector store to the… (authored by alex-t).
[AMDGPU] need to insert wait between the scalar load and vector store to the…
Sat, Jan 4, 7:28 AM
alex-t closed D71934: [AMDGPU] need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict..
Sat, Jan 4, 7:28 AM · Restricted Project

Thu, Jan 2

alex-t updated the diff for D71934: [AMDGPU] need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict..

Test updated.

Thu, Jan 2, 9:19 AM · Restricted Project

Fri, Dec 27

alex-t updated the summary of D71934: [AMDGPU] need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict..
Fri, Dec 27, 8:32 AM · Restricted Project
alex-t created D71934: [AMDGPU] need to insert wait between the scalar load and vector store to the same address to avoid WAR conflict..
Fri, Dec 27, 8:31 AM · Restricted Project

Dec 17 2019

alex-t committed rGe7f585ed6109: PostRA Machine Sink should take care of COPY defining register that is a sub… (authored by alex-t).
PostRA Machine Sink should take care of COPY defining register that is a sub…
Dec 17 2019, 4:26 AM
alex-t closed D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.
Dec 17 2019, 4:25 AM · Restricted Project

Dec 13 2019

alex-t updated the diff for D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.

Updated according to reviewer request

Dec 13 2019, 12:23 PM · Restricted Project

Dec 12 2019

alex-t added inline comments to D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.
Dec 12 2019, 8:42 AM · Restricted Project

Dec 11 2019

alex-t updated the diff for D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.

Just correct solution + tests

Dec 11 2019, 1:42 PM · Restricted Project
alex-t retitled D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand from PostRA Machine Sink should count COPY defining register used by another COPY to PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.
Dec 11 2019, 1:42 PM · Restricted Project

Dec 9 2019

alex-t added a comment to D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.

With this change, it looks as though accumulateUsedDefed is called multiple times on some paths. That seems wrong.

Dec 9 2019, 6:04 AM · Restricted Project

Dec 6 2019

alex-t added a comment to D71089: [AMDGPU] Optimizing unnecessary copies for REG_SEQUENCE PHI operand. Also fixes rocBLAS error.

Definitely needs a test. I'm not sure I understand the fix here. This is some kind of workaround? I would be happier with the subreg fix happening first. Removing unnecessary copies should not be relevant to the correctness fix?

Dec 6 2019, 10:31 AM · Restricted Project
alex-t created D71132: PostRA Machine Sink should take care of COPY defining register that is a sub-register by another COPY source operand.
Dec 6 2019, 10:31 AM · Restricted Project
alex-t added a reviewer for D71089: [AMDGPU] Optimizing unnecessary copies for REG_SEQUENCE PHI operand. Also fixes rocBLAS error: arsenm.
Dec 6 2019, 4:57 AM · Restricted Project

Dec 5 2019

alex-t created D71089: [AMDGPU] Optimizing unnecessary copies for REG_SEQUENCE PHI operand. Also fixes rocBLAS error.
Dec 5 2019, 1:22 PM · Restricted Project

Nov 12 2019

alex-t updated the diff for D70085: [AMDGPU] NFC target dependent requiresUniformRegister refactored out.

Typo and formatting fixed.

Nov 12 2019, 6:25 AM · Restricted Project

Nov 11 2019

alex-t created D70085: [AMDGPU] NFC target dependent requiresUniformRegister refactored out.
Nov 11 2019, 8:04 AM · Restricted Project

Oct 24 2019

alex-t closed D68635: [AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'.
Oct 24 2019, 8:35 AM · Restricted Project

Oct 14 2019

alex-t committed rL374767: [AMDGPU] Come back patch for the 'Assign register class for cross block values….
[AMDGPU] Come back patch for the 'Assign register class for cross block values…
Oct 14 2019, 5:01 AM

Oct 9 2019

alex-t updated the diff for D68635: [AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'.

Changed according reviewers requests.

Oct 9 2019, 10:46 AM · Restricted Project

Oct 8 2019

alex-t created D68635: [AMDGPU] Come back patch for the 'Assign register class for cross block values according to the divergence.'.
Oct 8 2019, 5:19 AM · Restricted Project

Oct 1 2019

alex-t closed D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.
Oct 1 2019, 7:23 AM · Restricted Project

Sep 30 2019

alex-t committed rL373221: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.
[AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition
Sep 30 2019, 8:32 AM

Sep 27 2019

alex-t updated the diff for D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.

test added

Sep 27 2019, 10:21 AM · Restricted Project

Sep 25 2019

alex-t updated the diff for D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.

lightweight solution.

Sep 25 2019, 5:58 AM · Restricted Project

Sep 23 2019

alex-t added a reviewer for D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition: nhaehnle.
Sep 23 2019, 11:45 AM · Restricted Project
alex-t updated the diff for D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.

LoopFinder class was employed to detect if the register copy source cannot be folded.
The idea is to use uniform method to detect the exec mask merge necessity and the COPY movement/folding restrictions.
LoopFinder was refactored out of the SILowerI1Copies to AMDGPU/Utils for this purpose.
addLoopEntries method was refactored from the LoopFinder to SILowerI1Copies
because it changes the Machine Function and naturally belongs to the pass that is the ancestor of MachineFunctionPass.
LoopFinder is the analysis that should not change the MachineFunction.

Sep 23 2019, 11:15 AM · Restricted Project
alex-t closed D67101: Target hooks for custom COPY insertion..
Sep 23 2019, 7:46 AM · Restricted Project

Sep 20 2019

alex-t accepted D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.

LGTM but make sure it passes recently updated lit tests.

Sep 20 2019, 12:04 AM

Sep 19 2019

alex-t committed rL372340: [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from….
[AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from…
Sep 19 2019, 9:45 AM
alex-t closed D67712: [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from min/max lit tests..
Sep 19 2019, 9:45 AM · Restricted Project
alex-t added inline comments to D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.
Sep 19 2019, 12:48 AM · Restricted Project

Sep 18 2019

alex-t added a comment to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.

I also have a question: why do you need to remove that predicates at all? Does it mean that to enable GlobalISel you'd need to remove all the divergent predicates stuff?

The patterns are rejected for having custom predicate code. We don't need it as a standard pattern checks the register banks, which is what we really want. I think the cleanest compatibility solution would be to use divergence to guide register class selection, and somehow add the equivalent checks in the DAG patterns. Defining the equivalent GlobalISel pattern as always true would probably work, although leave a lot of clutter in the matching tables

Sep 18 2019, 11:48 PM
alex-t created D67712: [AMDGPU] Unnecessary -amdgpu-scalarize-global-loads=false flag removed from min/max lit tests..
Sep 18 2019, 8:06 AM · Restricted Project
alex-t added a comment to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.

I also have a question: why do you need to remove that predicates at all? Does it mean that to enable GlobalISel you'd need to remove all the divergent predicates stuff?

Sep 18 2019, 6:55 AM
alex-t added inline comments to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.
Sep 18 2019, 6:55 AM

Sep 17 2019

alex-t created D67662: [AMDGPU] SIFoldOperands should not fold register acrocc the EXEC definition.
Sep 17 2019, 8:32 AM · Restricted Project
alex-t added inline comments to D67596: AMDGPU/GlobalISel: Allow selection of scalar min/max.
Sep 17 2019, 2:58 AM
alex-t committed rL372086: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed.
[AMDGPU]: PHI Elimination hooks added for custom COPY insertion. Fixed
Sep 17 2019, 2:08 AM

Sep 16 2019

alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Changed according the reviewer's request + some refactoring done.

Sep 16 2019, 9:33 AM · Restricted Project

Sep 13 2019

alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Wave32/Wave64 cases

Sep 13 2019, 1:42 PM · Restricted Project
alex-t added inline comments to D67101: Target hooks for custom COPY insertion..
Sep 13 2019, 1:32 PM · Restricted Project
alex-t added inline comments to D67101: Target hooks for custom COPY insertion..
Sep 13 2019, 1:32 PM · Restricted Project
alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Empty block handling added

Sep 13 2019, 1:32 PM · Restricted Project
alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Commit was reverted due to the EXPENSIVE_CHECKS failure.
Issue fixed

Sep 13 2019, 11:33 AM · Restricted Project
alex-t committed rL371873: Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion..
Revert for: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Sep 13 2019, 10:35 AM
alex-t reopened D67101: Target hooks for custom COPY insertion..
Sep 13 2019, 10:35 AM · Restricted Project
alex-t added a comment to D67101: Target hooks for custom COPY insertion..
Sep 13 2019, 12:37 AM · Restricted Project

Sep 10 2019

alex-t closed D67101: Target hooks for custom COPY insertion..
Sep 10 2019, 5:05 AM · Restricted Project
alex-t committed rL371508: [AMDGPU]: PHI Elimination hooks added for custom COPY insertion..
[AMDGPU]: PHI Elimination hooks added for custom COPY insertion.
Sep 10 2019, 3:57 AM

Sep 5 2019

alex-t updated the diff for D67101: Target hooks for custom COPY insertion..

Changed according to the reviewers requests. Tests added.

Sep 5 2019, 7:00 AM · Restricted Project

Sep 3 2019

alex-t added a comment to D67101: Target hooks for custom COPY insertion..

Would be good to have test coverage either in the same patch or have dependent patch explicitly marked as such.

Sep 3 2019, 10:45 AM · Restricted Project
alex-t added a reviewer for D67101: Target hooks for custom COPY insertion.: vpykhtin.
Sep 3 2019, 10:00 AM · Restricted Project
alex-t created D67101: Target hooks for custom COPY insertion..
Sep 3 2019, 8:03 AM · Restricted Project

Aug 21 2019

alex-t committed rL369532: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
[AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions
Aug 21 2019, 8:14 AM
alex-t closed D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Aug 21 2019, 8:14 AM · Restricted Project

Aug 15 2019

alex-t updated the diff for D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.

Suggested nits added.

Aug 15 2019, 7:58 AM · Restricted Project

Aug 8 2019

alex-t updated the diff for D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.

According the reviewer's request the check for implicit defs has been added.
New helper functions in MachineInstr.h are necessary because the current interface like MachineInstr::getNumExplicitDefs returns just the MCInstDesc::NumDefs for non variadic opcodes.

Aug 8 2019, 5:58 AM · Restricted Project

Jul 30 2019

alex-t added a comment to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.

Just ping! Does anybody has any objections?

Jul 30 2019, 5:26 AM · Restricted Project
alex-t added a reviewer for D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions: qcolombet.
Jul 30 2019, 5:25 AM · Restricted Project

Jul 2 2019

alex-t committed rL364952: [AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit.
[AMDGPU] LCSSA pass added in preISel. Fixing typo in previous commit
Jul 2 2019, 11:18 AM
alex-t committed rL364950: [AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent….
[AMDGPU] LCSSA pass added in preISel. Uniform values defined in the divergent…
Jul 2 2019, 11:01 AM
alex-t closed D63953: [AMDGPU] LCSSA pass added in preISel. .
Jul 2 2019, 11:01 AM · Restricted Project

Jul 1 2019

alex-t updated the diff for D63953: [AMDGPU] LCSSA pass added in preISel. .

isLCSSAForm check is under EXPENSIVE_CHECKS

Jul 1 2019, 8:29 AM · Restricted Project

Jun 28 2019

alex-t created D63953: [AMDGPU] LCSSA pass added in preISel. .
Jun 28 2019, 12:14 PM · Restricted Project

Jun 25 2019

alex-t added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 25 2019, 10:31 AM · Restricted Project
alex-t added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 25 2019, 6:57 AM · Restricted Project

Jun 24 2019

alex-t added inline comments to D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 24 2019, 12:00 PM · Restricted Project
alex-t created D63731: [AMDGPU] Prevent VGPR copies from moving across the EXEC mask definitions.
Jun 24 2019, 11:09 AM · Restricted Project

Jun 20 2019

alex-t abandoned D63489: [InstSimplify] LCSSA PHIs should not be simplified away.
Jun 20 2019, 6:35 AM · Restricted Project

Jun 19 2019

alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

Can't you just schedule an LCSSAPass somewhere late in pipeline, before SelectionDAG and after early-cse?

Jun 19 2019, 8:52 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

That means I have to change all the function declarations on the way to be able to pass TTI to check if it is the "divergent" target (

I don't think InstSimplify should depend on target information

Jun 19 2019, 8:49 AM · Restricted Project

Jun 18 2019

alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Nothing is invalid from the IR point of view. This is all a kludge to get divergence information into SelectionDAG. There needs to be an IR instruction at the use point for the DAG to query the divergence analysis

Jun 18 2019, 11:53 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

In fact I still insist that this is a bug in -instsimplify pass.
The algorithm is written in such a way that it always expect more then one input in PHI node.
That means that the person who's written it had no intention to really remove the LCSSA PHIs but was just unaware of their existence!
It is not correct to remove LCSSA in similar manner as PHI nodes with equal inputs are removed.
So, any other backend should not suffer from this change. It is not about the convenience for AMDGPU backend but about correctness.
Could you provide an example where LCSSA PHIs are added and then intentionally removed?

Can you please specify why this transform is invalid from LLVM IR point of view? https://godbolt.org/z/D8gKNc
In endloop BB, which has a single predecessor BB - loop, %counter.lcssa value can only be %counter value.

Jun 18 2019, 11:53 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?
Jun 18 2019, 10:58 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

It will result in syntactically correct asm and no crashes. In runtime we'll get incorrect result though :)
Adding LCSSA pass again later on is difficult in the sense of the pass dependencies.
So, it's better to fix the explicit bug in SimplifyPHI....

Aha, so it's not -instsimplify pass itself, but how it's used during transition into backend.

  1. You certainly don't want to make this blacklist unconditional, it should still run when the -instsimplify pass itself is run. (+instsimplify test)
  2. How does this affect other targets (backends)? Does this need some TLI hook?

The other condition should probably be TTI.hasBranchDivergence().

Jun 18 2019, 10:20 AM · Restricted Project
alex-t added a comment to D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

This causes generation of incorrect code in AMDGPU backend.

This sounds like some other check is missing elsewhere?
What happens if you feed it such an ir as-if after this transform, but manually written?
("that will result in broken asm/crashes" is hopefully not the answer)

That being said, why is LCSSAPass not sufficient?
It's already supposed to undo transforms like this.

Jun 18 2019, 7:55 AM · Restricted Project
alex-t updated the diff for D63489: [InstSimplify] LCSSA PHIs should not be simplified away.

MIR test added

Jun 18 2019, 7:42 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Okay. I'm about to start partial revert of the change.
Could you please provide me test cases so that I can check if my further fixes help.

See below the good and bad outputs for one CTS failure:

GOOD: https://hastebin.com/muwuwivofu
BAD: https://hastebin.com/gofawejoku

Thanks again for looking into this.

Note that this change also breaks https://bugs.freedesktop.org/show_bug.cgi?id=110811

Jun 18 2019, 6:17 AM · Restricted Project
alex-t created D63489: [InstSimplify] LCSSA PHIs should not be simplified away.
Jun 18 2019, 6:13 AM · Restricted Project

Jun 6 2019

alex-t committed rL362749: [AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd.
[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd
Jun 6 2019, 2:10 PM

Jun 5 2019

alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Jun 5 2019, 2:07 AM · Restricted Project

Jun 4 2019

alex-t created D62869: [AMDGPU] Partial revert for the "Assign register class for cross block values according to the divergence.".
Jun 4 2019, 11:26 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

A number of Mesa piglit tests are also affected at least on Bonaire (but it seems to not be GPU-specific, I haven't had a chance to look at it further).

- bin/ext_transform_feedback-order elements triangles
- bin/ext_transform_feedback-order elements points
- bin/ext_transform_feedback-order elements lines
- bin/ext_transform_feedback-order arrays triangles>
- bin/ext_transform_feedback-order arrays points
- bin/ext_transform_feedback-order arrays lines
- arb_clear_buffer_object-formats (96-bit clears)

It's unclear whether the regression is caused by this particular commit or by the subsequent ASAN fix.

Jun 4 2019, 5:13 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

This patch fixes the CTS failures on my side. I have just tried the latest version.

Err, only a subset is fixed actually.

Jun 4 2019, 5:07 AM · Restricted Project

Jun 3 2019

alex-t added inline comments to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.
Jun 3 2019, 8:01 AM · Restricted Project

Jun 2 2019

alex-t updated the diff for D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

Alternative fix.

Jun 2 2019, 9:27 AM · Restricted Project

May 31 2019

alex-t added a comment to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

See D60834 for what I think is the right direction to fix this.

Once again:

LCSSA is not sufficient for the following reason:
Since definition is considered uniform it is selected to SALU and produces SGPR.
The value in this SGPR is correct for all laves leave in loop body.
LCSSA PHI node that inserted in loop exit block will turn into the copy from SGPR to VGPR – this is incorrect.

Why isn't sgpr to vgpr copy correct? If it is done still inside the loop it is suboptimal but correct. When done outside of the loop it is incorrect, that's true.

May 31 2019, 9:32 AM · Restricted Project
alex-t updated the diff for D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

added the Divergent Analysis test update that was missed

May 31 2019, 8:45 AM · Restricted Project
alex-t added a comment to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

I don't agree that the enhancement the definition of the "divergence" to the scope is correct way at all.
literally, the value is uniform if all threads observe same value. Nothing about exec mask, lanes or GPU :)
All threads in our case are all executing loop body. That's it.

May 31 2019, 3:35 AM · Restricted Project
alex-t added a comment to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

See D60834 for what I think is the right direction to fix this.

May 31 2019, 3:26 AM · Restricted Project

May 30 2019

alex-t added a reviewer for D62614: Fix for the OCL/LC to failure on some OCLPerf tests: nhaehnle.
May 30 2019, 7:02 AM · Restricted Project
alex-t added inline comments to D62614: Fix for the OCL/LC to failure on some OCLPerf tests.
May 30 2019, 4:38 AM · Restricted Project
alex-t updated the diff for D62614: Fix for the OCL/LC to failure on some OCLPerf tests.

Added comments describing the reason for the change.

May 30 2019, 4:38 AM · Restricted Project

May 29 2019

alex-t added a reviewer for D62614: Fix for the OCL/LC to failure on some OCLPerf tests: rampitec.
May 29 2019, 11:29 AM · Restricted Project
alex-t created D62614: Fix for the OCL/LC to failure on some OCLPerf tests.
May 29 2019, 11:29 AM · Restricted Project
alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

May 29 2019, 9:34 AM · Restricted Project

May 28 2019

alex-t added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

May 28 2019, 10:26 AM · Restricted Project

May 27 2019

alex-t committed rL361776: [AMDGPU] Fix for the address sanitizer failure. Fixing typo.
[AMDGPU] Fix for the address sanitizer failure. Fixing typo
May 27 2019, 11:15 AM