Page MenuHomePhabricator

hakzsam (Samuel Pitoiset)
User

Projects

User does not belong to any projects.

User Details

User Since
Jan 4 2017, 12:11 PM (132 w, 3 h)

Recent Activity

Thu, Jul 11

hakzsam added a comment to D64604: AMDGPU: s_waitcnt field should be treated as unsigned.

Thanks for the quick fix Matt!

Thu, Jul 11, 11:11 PM
hakzsam added a comment to D64333: AMDGPU: Move waitcnt intrinsic to instruction definition pattern.
; ModuleID = 'mesa-shader'
source_filename = "mesa-shader"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-ni:7"
target triple = "amdgcn-mesa-mesa3d"
Thu, Jul 11, 3:30 PM
hakzsam added a comment to D64333: AMDGPU: Move waitcnt intrinsic to instruction definition pattern.

Is this expected that a bunch of tests with Mesa now crash because amdgcn.s.waitcnt is not found?

Thu, Jul 11, 3:26 PM
hakzsam added a comment to D64497: [AMDGPU] Allow abs/neg source modifiers on v_cndmask_b32.

This change introduced CTS regressions with RADV like dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_3.atan2_frag and friends.
Can you investigate ?

Thu, Jul 11, 3:21 PM · Restricted Project

Fri, Jun 28

hakzsam added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Do you plan to fix it? It breaks a bunch of games.

Fri, Jun 28, 6:55 AM · Restricted Project

Wed, Jun 26

hakzsam added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Yes.

Wed, Jun 26, 11:46 PM · Restricted Project
hakzsam added a comment to D61494: AMDGPU: Write LDS objects out as global symbols in code generation.

Hi Nicolai,

Wed, Jun 26, 2:17 AM · Restricted Project

Jun 14 2019

Herald added a project to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics: Restricted Project.

Hi Nicolai,

Jun 14 2019, 5:23 AM · Restricted Project
hakzsam added a comment to D63301: [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32.

Sorry for the noise.
Should be fixed with https://patchwork.freedesktop.org/patch/310419/?series=62097&rev=1

Jun 14 2019, 2:59 AM · Restricted Project
hakzsam added a comment to D63301: [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32.

Do we need to update the intrinsic name from mesa?
Btw, it's not related to RADV, RadeonSI is probably affected too because the LLVM backend is common.

Jun 14 2019, 2:42 AM · Restricted Project
hakzsam added a comment to D63301: [AMDGPU] gfx1010 wave32 icmp/fcmp intrinsic changes for wave32.

This change introduces a bunch of failures with CTS on GFX8/GFX9. Looks like the IR validation fail now, see below:

Jun 14 2019, 2:03 AM · Restricted Project

Jun 5 2019

hakzsam added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

D62614 doesn't fix the issue.

Okay. I'm about to start partial revert of the change.
Could you please provide me test cases so that I can check if my further fixes help.

Jun 5 2019, 3:29 AM · Restricted Project
hakzsam added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

I have updated the change ttps://reviews.llvm.org/D62614 this Sunday.
The new one takes completely different approach. I'd appreciate very much If you could try it.

Jun 5 2019, 12:14 AM · Restricted Project

Jun 3 2019

hakzsam added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

This patch fixes the CTS failures on my side. I have just tried the latest version.

Jun 3 2019, 5:04 AM · Restricted Project
hakzsam added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

This change introduces a regression with RADV, all dEQP-VK.subgroups.arithmetic.framebuffer.* are failing now.
Can someone look into this?
Thanks!

I have a patch that fixes the issue in another test suite. Could you please suggest how to check if it also fixes RADV?
https://reviews.llvm.org/D62614

Jun 3 2019, 5:03 AM · Restricted Project

May 27 2019

hakzsam added a comment to D59990: AMDGPU. Divergence driven ISel. Assign register class for cross block values according to the divergence..

Hi there,

May 27 2019, 2:23 AM · Restricted Project

May 16 2019

hakzsam added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

Thank you too!
Can you eventually push the patch for me? I think I no longer have commit access since the LLVM relicense stuff.

May 16 2019, 12:19 AM · Restricted Project
hakzsam updated the diff for D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

v3: - remove most of the LLVM IR

May 16 2019, 12:03 AM · Restricted Project

May 15 2019

hakzsam added inline comments to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.
May 15 2019, 4:33 AM · Restricted Project
hakzsam added inline comments to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.
May 15 2019, 4:21 AM · Restricted Project

May 9 2019

hakzsam added a comment to D52019: [AMDGPU] Divergence driven instruction selection. Part 1..

Hi there,

This change introduces a regression with RADV and Doom 2016, see https://bugs.freedesktop.org/show_bug.cgi?id=110636
We just discovered this, so LLVM 8 and master are affected. :(
I'm trying to find the root cause, I will let you know.

Did r360293 fix it?

Yes, it did fix the problem.
Do you plan to backport to LLVM 8?

May 9 2019, 2:53 AM
hakzsam added a comment to D52019: [AMDGPU] Divergence driven instruction selection. Part 1..

Hi there,

May 9 2019, 2:32 AM

May 8 2019

hakzsam added a comment to D60846: [ValueTracking] Improve isKnowNonZero for Ints.

Hi folks,

This introduces a regression with RADV (the open source Vulkan driver in mesa) and some CTS tests, here's the list of failures:

Is the regression a compiler-crash, program crash, miscompile, or performance loss?
Probably related to addrspace?

May 8 2019, 7:24 AM · Restricted Project
hakzsam added a comment to D60846: [ValueTracking] Improve isKnowNonZero for Ints.

Hi folks,

May 8 2019, 1:42 AM · Restricted Project

May 7 2019

hakzsam updated the diff for D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.
May 7 2019, 6:39 AM · Restricted Project
hakzsam added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

That's an interesting catch. Basically a Write-after-Write hazard.

Though this seems like the wrong place to add this check, since the function is specifically called canMoveInstsAcrossMemOp.

I rather suspect that instead addToListsIfDependent needs to be fixed instead, so that when the scan hits the S_CMP_EQ_U32 it recognizes the WAW hazard and adds it to the instructions to be moved.

I also wonder if this is related to D60459.

May 7 2019, 2:46 AM · Restricted Project

May 6 2019

hakzsam added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

have you run all the lit tests on this patch ?

May 6 2019, 12:32 AM · Restricted Project
hakzsam added a comment to D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.

This sure does look like the same problem to me. https://reviews.llvm.org/D60459

May 6 2019, 12:29 AM · Restricted Project

Apr 30 2019

hakzsam added reviewers for D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions: tpr, arsenm, nhaehnle.
Apr 30 2019, 7:34 AM · Restricted Project
hakzsam created D61313: [AMDGPU] detect WaW hazards when moving/merging load/store instructions.
Apr 30 2019, 7:34 AM · Restricted Project

Apr 26 2019

hakzsam added a comment to D60457: [CodeGen] Fixed de-optimization of legalize subvector extract.

ping?
SI is still broken without this patch.

Apr 26 2019, 12:40 AM · Restricted Project

Apr 22 2019

hakzsam added a comment to D59772: AMDGPU: Remove unnecessary check for isFullCopy.

Hi Matt,

This change introduces a regression with Overwatch and RADV (cf. https://bugs.freedesktop.org/show_bug.cgi?id=110476).

Here's the attached LLVM IR https://pastebin.com/raw/VqRF4unW

Can you have a look?
Thanks!

r358890

Apr 22 2019, 11:55 PM

Apr 21 2019

hakzsam added a comment to D59772: AMDGPU: Remove unnecessary check for isFullCopy.

Hi Matt,

Apr 21 2019, 12:56 PM

Apr 9 2019

hakzsam added a comment to D60457: [CodeGen] Fixed de-optimization of legalize subvector extract.

This fixes the SI regression with RADV.
Thanks a lot Tim.

Apr 9 2019, 7:26 AM · Restricted Project

Mar 28 2019

hakzsam added a comment to D58902: [AMDGPU] Support for v3i32/v3f32.

Hi,
This change breaks SI, at least with RADV.
Here's how to reproduce: ./deqp-vk --deqp-case=dEQP-VK.glsl.builtin.function.integer.bitfieldreverse.ivec3_highp_tess_control

Mar 28 2019, 2:18 AM · Restricted Project

Mar 26 2019

hakzsam added a comment to D59851: AMDGPU: Fix areLoadsFromSameBasePtr for DS atomics.

Thanks for the quick fix Matt!

Mar 26 2019, 3:30 PM
hakzsam added a comment to D58991: AMDGPU: Don't bother checking the chain in areLoadsFromSameBasePtr.

This introduces a regression with this LLVM IR https://hastebin.com/yuzomezazi
It crashes inside llvm::SIInstrInfo::areLoadsFromSameBasePtr for some reasons.

Mar 26 2019, 11:10 AM
hakzsam added a comment to D58918: [AMDGPU] Add support for 64 bit buffer atomic artihmetic instructions.

Hi there,
Do you have the cmpswap fix somewhere?
Thanks

Mar 26 2019, 9:09 AM · Restricted Project

Jan 9 2019

hakzsam added a comment to D56434: [AMDGPU] Fix dwordx3/southern-islands failures..

Thanks for the fix!

Jan 9 2019, 6:50 AM · Restricted Project

Dec 21 2018

hakzsam added a comment to D54042: [AMDGPU] Extend the SI Load/Store optimizer to combine more things..

Please have look at https://bugs.llvm.org/show_bug.cgi?id=40129

Dec 21 2018, 5:02 AM · Restricted Project

Dec 14 2018

hakzsam added a comment to D50200: AMDGPU: Handle "uniform-work-group-size" attribute.

Unfortunately, I can't get any LLVM IR because it crashes too early in the process. You can also reproduce the problem with RadeonSI btw.

Dec 14 2018, 12:39 AM · Restricted Project

Dec 13 2018

hakzsam added a comment to D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.

Yes, it does fix the issue. Thanks!

Dec 13 2018, 3:07 AM
hakzsam added a comment to D50200: AMDGPU: Handle "uniform-work-group-size" attribute.

This patch breaks RADV (and probably RadeonSI as well). Here's a backtrace of the problem:

Dec 13 2018, 2:58 AM · Restricted Project

Dec 11 2018

hakzsam added a comment to D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.

Did you get a chance to look into this?

Dec 11 2018, 3:03 AM

Dec 6 2018

hakzsam added a comment to D55314: AMDGPU: Turn on the DPP combiner by default.

This change breaks most of the subgroups tests with RADV (ie. dEQP-VK.subgroups.arithmetic.*).

Dec 6 2018, 3:10 AM

Dec 4 2018

hakzsam added a comment to D54231: AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo.

Hi Nicolai,

Dec 4 2018, 3:22 AM

Oct 31 2018

hakzsam added a comment to D53496: AMDGPU: Rewrite SILowerI1Copies to always stay on SALU.

This regresses the following tests on RADV:

Oct 31 2018, 9:50 AM
hakzsam added a comment to D53283: AMDGPU: Divergence-driven selection of scalar buffer load intrinsics.

Hi Nicolai,

Oct 31 2018, 6:59 AM · Restricted Project

Oct 17 2018

hakzsam added a comment to D53359: AMDGPU: Remove PHI loop condition optimization.

Thank you for this really important fix!

Oct 17 2018, 2:48 AM

Aug 22 2018

hakzsam added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

r340417

Aug 22 2018, 9:10 AM
hakzsam added a comment to D50973: AMDGPU: fix existing alias rules for constant and global.

r340416

Aug 22 2018, 9:10 AM
hakzsam committed rL340417: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space
Aug 22 2018, 9:09 AM
hakzsam committed rL340416: AMDGPU: fix existing alias rules for constant and global.
AMDGPU: fix existing alias rules for constant and global
Aug 22 2018, 9:09 AM
hakzsam added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

Should I push the patch as is? Or does it need another revision?

Aug 22 2018, 12:22 AM

Aug 21 2018

hakzsam updated the diff for D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

v5: rename MAX_COMMON_ADDRESS to MAX_AMDGPU_ADDRESS

Aug 21 2018, 9:26 AM
hakzsam added inline comments to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
Aug 21 2018, 12:56 AM

Aug 20 2018

hakzsam abandoned D50974: AMDGPU: fix updating the alias rules since r340171.

LGTM

Aug 20 2018, 12:55 PM
hakzsam updated the diff for D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

Both patches (r340171 and r340172) have been reverted with r340202 because it will be easier for a backport.
This new version squashes these two and includes the alias rules table fix (ie. out of bounds access).

Aug 20 2018, 12:54 PM
hakzsam added inline comments to D50974: AMDGPU: fix updating the alias rules since r340171.
Aug 20 2018, 12:24 PM
hakzsam updated the diff for D50973: AMDGPU: fix existing alias rules for constant and global.

v2: add a test with swapped parameters

Aug 20 2018, 12:20 PM
hakzsam added inline comments to D50973: AMDGPU: fix existing alias rules for constant and global.
Aug 20 2018, 12:15 PM
hakzsam created D50974: AMDGPU: fix updating the alias rules since r340171.
Aug 20 2018, 8:21 AM
hakzsam created D50973: AMDGPU: fix existing alias rules for constant and global.
Aug 20 2018, 8:03 AM
hakzsam added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

NACK. This patch is clearly wrong.
MAX_COMMON_ADDRESS is used in AMDGPUAAResult::ASAliasRulesTy::getAliasResult to filter indices to the ASAliasRules table which is 6x6. Allowing address space 6 leads to out of bounds access to the array.

This has been reported and answered in https://bugs.llvm.org/show_bug.cgi?id=38113

Aug 20 2018, 6:56 AM
hakzsam committed rL340172: AMDGPU: fix compilation errors since r340171.
AMDGPU: fix compilation errors since r340171
Aug 20 2018, 6:32 AM
hakzsam committed rL340171: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space
Aug 20 2018, 6:19 AM
hakzsam updated the diff for D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

v3: use static_assert()

Aug 20 2018, 3:01 AM

Aug 17 2018

hakzsam updated the diff for D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

v2: add a very simple test for 32-bit addr space

Aug 17 2018, 3:27 AM

May 25 2018

hakzsam added a comment to D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.

Needs a test, preferably the full set of AA checks with 32 bit constant

May 25 2018, 5:27 AM

May 23 2018

hakzsam created D47261: AMDGPU: bump AS.MAX_COMMON_ADDRESS to 6 since 32-bit addr space.
May 23 2018, 7:54 AM

May 3 2018

hakzsam updated the diff for D46351: StructurizeCFG: fix inverting conditions.

Without this patch, it appears to me that we are selecting
the wrong operand when inverting conditions. In the attached
test, it will select %tmp3 instead of %tmp4. To fix it, just
use 'A' as everywhere.

May 3 2018, 1:20 AM

May 2 2018

hakzsam added reviewers for D46351: StructurizeCFG: fix inverting conditions: spatel, arsenm.
May 2 2018, 5:25 AM
hakzsam created D46351: StructurizeCFG: fix inverting conditions.
May 2 2018, 1:25 AM

Apr 25 2018

hakzsam added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

Sorry for the delay. Just submitted to trunk.
It was not about the reverse patch only. I had to change tests accordingly.

Apr 25 2018, 5:46 AM
hakzsam added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

Any news on the revert? Thanks!

Apr 25 2018, 12:13 AM

Apr 13 2018

hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.
  • Change isLoopBottom() to return true in the case of a single basic block loop
  • Clear LoopWaitcntDataMap in between functions
Apr 13 2018, 12:38 AM
hakzsam added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

In fact yes. This is the most correct solution. The stuff in
SIFixSGPRCopies.cpp has got to gone.
So it does not make sense to spend efforts on it. Moreover there is no
correct solution at all
unless we implement one more DivergenceAnalysis upon the machine IR.
So, if it's okay I'd prefer to revert.

Apr 13 2018, 12:36 AM

Apr 9 2018

hakzsam added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

It seems like this patch should be reverted. Recently we have no reliable
way to determine the PHI (or whatever else) divergence on the MI level.
The only correct way is to add the DA algorithm that re-computes all the MI
divergence.
Since we're going to remove the SGPRFix stuff as soon as diverence driven
ISel is ready I consider this patch is not necessary.

Apr 9 2018, 8:39 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

Hi Mark,

Apr 9 2018, 12:31 AM

Apr 6 2018

hakzsam added a comment to D44974: AMDGPU: enable 128-bit for local addr space under an option.

ping?

Apr 6 2018, 5:34 AM

Apr 3 2018

hakzsam updated the diff for D44974: AMDGPU: enable 128-bit for local addr space under an option.

v3:

  • init EnableDS128 in the constructor
Apr 3 2018, 2:33 AM
hakzsam added inline comments to D44974: AMDGPU: enable 128-bit for local addr space under an option.
Apr 3 2018, 2:30 AM

Mar 30 2018

hakzsam updated the diff for D44974: AMDGPU: enable 128-bit for local addr space under an option.

v2:

  • use a return ternary operator
  • fix feature format
  • add small test load-local-f32-no-ds128.ll
Mar 30 2018, 8:52 AM

Mar 29 2018

hakzsam added inline comments to D44974: AMDGPU: enable 128-bit for local addr space under an option.
Mar 29 2018, 11:12 AM

Mar 28 2018

hakzsam created D44974: AMDGPU: enable 128-bit for local addr space under an option.
Mar 28 2018, 2:10 AM

Mar 15 2018

hakzsam added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

Hi guys,

Mar 15 2018, 8:46 AM
hakzsam added inline comments to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.
Mar 15 2018, 1:17 AM

Mar 7 2018

hakzsam updated the diff for D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

v8:

  • add a sanity check when a block is looping over itself
  • cosmetic changes
Mar 7 2018, 8:27 AM
hakzsam updated the diff for D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

v7:

  • replace the LLVM IR testcase with a MIR one
  • remove one unused variable in isLoopBottom()
  • do not rely on block numbfer in isLoopBottom()
Mar 7 2018, 7:07 AM

Mar 6 2018

hakzsam updated the diff for D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

v6: fix the multiple back-edges case and update the testcase

Mar 6 2018, 7:03 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

I haven't yet had the time to fully grok what's going on here, but I suspect this code is fundamentally broken because the whole assumption of having a single loopBottom is wrong. It happens to be true for structured loops with non-uniform control flow, but when there's uniform control flow, there may well be multiple back-edges. What happens in those cases?

I'm not sure either if the current code correctly supports multiple back-edges. I think if there is more than one, loopBottom() will select the basic block with the highest number if it's a successor of the header loop.

The other way around: it will select the basic block with the highest number which has the loop header as its successor.

The question is whether that ends up doing the right thing in all cases.

https://hastebin.com/joqihayohe One case that doesn't work (ie. a vmcnt(0) is missing).

Mar 6 2018, 6:10 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

I haven't yet had the time to fully grok what's going on here, but I suspect this code is fundamentally broken because the whole assumption of having a single loopBottom is wrong. It happens to be true for structured loops with non-uniform control flow, but when there's uniform control flow, there may well be multiple back-edges. What happens in those cases?

I'm not sure either if the current code correctly supports multiple back-edges. I think if there is more than one, loopBottom() will select the basic block with the highest number if it's a successor of the header loop.

The other way around: it will select the basic block with the highest number which has the loop header as its successor.

The question is whether that ends up doing the right thing in all cases.

Mar 6 2018, 2:16 AM
hakzsam updated the diff for D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

v5: run -dce to remove redundant phis

Mar 6 2018, 1:56 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

I haven't yet had the time to fully grok what's going on here, but I suspect this code is fundamentally broken because the whole assumption of having a single loopBottom is wrong. It happens to be true for structured loops with non-uniform control flow, but when there's uniform control flow, there may well be multiple back-edges. What happens in those cases?

I'm not sure either if the current code correctly supports multiple back-edges. I think if there is more than one, loopBottom() will select the basic block with the highest number if it's a successor of the header loop.

The other way around: it will select the basic block with the highest number which has the loop header as its successor.

The question is whether that ends up doing the right thing in all cases.

Mar 6 2018, 1:50 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

I haven't yet had the time to fully grok what's going on here, but I suspect this code is fundamentally broken because the whole assumption of having a single loopBottom is wrong. It happens to be true for structured loops with non-uniform control flow, but when there's uniform control flow, there may well be multiple back-edges. What happens in those cases?

Mar 6 2018, 1:33 AM

Mar 5 2018

hakzsam updated the diff for D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

v4: - run 'opt -S -deadarghaX0r -strip -strip-debug -strip-dead-prototypes -instnamer'

  • remove RFC on the subject
Mar 5 2018, 10:54 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

For the testcase, would you also run -instnamer to rename instances of %<number>? You may want to combine it with other clean-up options, something like this: opt -S -deadarghaX0r -strip -strip-debug -strip-dead-prototypes -instnamer

Mar 5 2018, 10:44 AM
hakzsam updated the diff for D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

v3: reduced testcase

Mar 5 2018, 7:31 AM
hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

Could you please reduce the testcase?

Mar 5 2018, 5:21 AM

Mar 2 2018

hakzsam added a comment to D43831: [AMDGPU] Do not only rely on BB number when finding bottom loop.

So, the expcnt(0) is generated because there is a buffer_atomic_add in that test case, which actually makes sense to me. Looks like this patch also fixes that case.

Mar 2 2018, 5:55 AM