Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

bcahoon (Brendon Cahoon)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 10 2015, 3:10 PM (441 w, 2 d)

Recent Activity

Fri, Sep 22

bcahoon accepted D158579: [AMDGPU] Add DAG ISel support for preloaded kernel arguments.

Looks good to me. I also tried some additional tests and they also worked as expected.

Fri, Sep 22, 3:29 PM · Restricted Project, Restricted Project
bcahoon accepted D156853: [AMDGPU] Add IR lowering changes for preloaded kernargs.

LGTM. The part I'm wondering about is in tryAllocPreloadSGPRs, the part about multiple arguments in the same SGPR. I think there is code in https://reviews.llvm.org/D158579 that handles this case as well.

Fri, Sep 22, 7:58 AM · Restricted Project, Restricted Project

Mon, Aug 28

bcahoon added a comment to D155077: [MachinePipeliner] Fix a bug in write-after-read scheduling.

In your test case, does the addLoopCarriedDependences function add a Barrier edge between the load and store? If so, that should prevent the case described in the comment.

Mon, Aug 28, 3:28 PM · Restricted Project, Restricted Project

Jul 12 2023

bcahoon updated subscribers of D155080: [AMDGPU] Corrrectly emit AGPR copies in tryFoldPhiAGPR.
Jul 12 2023, 7:26 AM · Restricted Project, Restricted Project

Jun 29 2023

bcahoon committed rG853b2a84cb99: [AMDGPU] Reserve SGPR pair when long branches are present (authored by bcahoon).
[AMDGPU] Reserve SGPR pair when long branches are present
Jun 29 2023, 2:53 PM · Restricted Project, Restricted Project

May 15 2023

bcahoon added a comment to D149281: Don't disable loop unroll for vectorized loops on AMDGPU target.

Here are some observations about the performance regression that we see once the unrolling is disabled by the vectorizer.

May 15 2023, 6:36 PM · Restricted Project, Restricted Project

Apr 12 2023

bcahoon accepted D148139: [BranchRelaxation] Correct JumpToFT value.
Apr 12 2023, 7:18 PM · Restricted Project, Restricted Project
bcahoon added a reviewer for D148139: [BranchRelaxation] Correct JumpToFT value: dhoekwater.
Apr 12 2023, 7:15 PM · Restricted Project, Restricted Project

Feb 14 2023

bcahoon committed rG20cdf7c70667: [InstCombine] Increase limit for max copied from constant fold (authored by bcahoon).
[InstCombine] Increase limit for max copied from constant fold
Feb 14 2023, 2:09 PM · Restricted Project, Restricted Project
bcahoon closed D144032: [InstCombine] Increase limit for max copied from constant fold.
Feb 14 2023, 2:08 PM · Restricted Project, Restricted Project
bcahoon requested review of D144032: [InstCombine] Increase limit for max copied from constant fold.
Feb 14 2023, 10:27 AM · Restricted Project, Restricted Project

Dec 20 2022

bcahoon added inline comments to D137954: Enable roundeven..
Dec 20 2022, 9:02 AM · Restricted Project, Restricted Project

Dec 19 2022

bcahoon added inline comments to D140168: Do not mark non-phi loop var def as loop-carried dep.
Dec 19 2022, 10:35 AM · Restricted Project, Restricted Project

Nov 29 2022

bcahoon committed rGb32a5666a8ee: [AMDGPU] Unify uniform return and divergent unreachable blocks (authored by bcahoon).
[AMDGPU] Unify uniform return and divergent unreachable blocks
Nov 29 2022, 11:34 AM · Restricted Project, Restricted Project
bcahoon closed D136892: [AMDGPU] Unify uniform return and divergent unreachable blocks.
Nov 29 2022, 11:34 AM · Restricted Project, Restricted Project

Nov 16 2022

bcahoon added a comment to D136051: [CodeGen] Introduce a flag to allow same cycle def-use schedule.

Hi @ssarda, thanks for the test case. That helps to understand the problem. Which appears to be the following MIR,

Nov 16 2022, 8:58 AM · Restricted Project, Restricted Project

Nov 6 2022

bcahoon accepted D136463: [SWP] Recognize mem carried dep with different base.
Nov 6 2022, 3:41 PM · Restricted Project, Restricted Project

Nov 4 2022

bcahoon updated the diff for D136892: [AMDGPU] Unify uniform return and divergent unreachable blocks.

I changed the patch based upon @ruiling comment. He's correct that my patch doesn't catch the case when there is a uniform exit and a divergent exit. I looked at adding a HasDivergentBranch condition suggested in his comments, but that seemed to be overly conservative and created a lot of lit test changes.. Instead, the patch checks for is there is any exit block (return/unreachable) that is not uniformly reached. If that condition occurs, then all returns and unreachable blocks are unify. That fixes the test case, and only effects a couple of existing test cases. Two of those test cases, bool-legalization and skip-if-dead, are different because unreachable blocks are now added only if there is a divergent exit block. I think this conservatively will create a unified exit block correctly, but I'm interested in feedback.

Nov 4 2022, 6:48 PM · Restricted Project, Restricted Project

Oct 31 2022

bcahoon committed rGf59205aef957: [BasicBlockUtils] Add a new way for CreateControlFlowHub() (authored by bcahoon).
[BasicBlockUtils] Add a new way for CreateControlFlowHub()
Oct 31 2022, 7:04 AM · Restricted Project, Restricted Project
bcahoon committed rG9265b7fa83b9: NFC: restructure code for CreateControlFlowHub() (authored by bcahoon).
NFC: restructure code for CreateControlFlowHub()
Oct 31 2022, 6:45 AM · Restricted Project, Restricted Project

Oct 30 2022

bcahoon added inline comments to D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().
Oct 30 2022, 9:24 AM · Restricted Project, Restricted Project

Oct 28 2022

bcahoon added a comment to D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().

Why do we still need the "old way"? Does the "new way" generate worse code in some cases?

Good question. The new way does require an additional compare instruction to test where control continues after the loop. One for each output edge. But, yes, there is a tradeoff here. I think it's worth assessing if the "new way" can replace the "old way" at some point. My thinking here is to be conservative and keep the "old way" for now as I'm a little hesitant to replace it without more performance data.

Right, I understand that using an i32 instead of an i1 means that you need extra icmp instructions in the IR, but does it actually cause more instructions in the final generated code (for AMDGPU)? If the i1 was stored in a general purpose register then you would need a cmp anyway to use it to control a conditional branch.

Oct 28 2022, 3:45 PM · Restricted Project, Restricted Project
bcahoon updated the diff for D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().

Based upon reviewer feedback, this update changes the name of the command-line option, and passes the command-line value as a parameter to CreateControlFlowHub

Oct 28 2022, 3:30 PM · Restricted Project, Restricted Project
bcahoon added inline comments to D136463: [SWP] Recognize mem carried dep with different base.
Oct 28 2022, 7:59 AM · Restricted Project, Restricted Project

Oct 27 2022

bcahoon added a comment to D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().

Why do we still need the "old way"? Does the "new way" generate worse code in some cases?

Oct 27 2022, 3:37 PM · Restricted Project, Restricted Project
bcahoon requested review of D136892: [AMDGPU] Unify uniform return and divergent unreachable blocks.
Oct 27 2022, 3:18 PM · Restricted Project, Restricted Project

Oct 23 2022

bcahoon added inline comments to D136463: [SWP] Recognize mem carried dep with different base.
Oct 23 2022, 12:48 PM · Restricted Project, Restricted Project

Oct 21 2022

bcahoon added a comment to D136051: [CodeGen] Introduce a flag to allow same cycle def-use schedule.

Hi @ssarda, can you share a test case that is fixed by your patch?

Oct 21 2022, 9:51 AM · Restricted Project, Restricted Project

Oct 18 2022

bcahoon added a comment to D136051: [CodeGen] Introduce a flag to allow same cycle def-use schedule.

Hi @ssarda, I assume the case in question is for a .new use on a physical register? It sounds like the issue is that the use appears before the definition in the linear list of instructions after the pipeliner because orderDependences doesn't order them properly? If so, that's a bug. I'm surprised that doesn't cause other issues. For example, there is no guarantee that the use will appear in the same packet as the def. if they aren't ordered correctly.

Oct 18 2022, 11:41 AM · Restricted Project, Restricted Project

Oct 11 2022

bcahoon updated the diff for D127830: NFC: restructure code for CreateControlFlowHub().

Remove \n from assert.

Oct 11 2022, 12:23 PM · Restricted Project, Restricted Project
bcahoon updated subscribers of D135434: Propagate tied operands when copying a MachineInstr..

I think the changes in this patch, moving the handling to tied operands to CloneMachineInstr and removing the code
from ModuleSchedule.cpp, is a good change. I'm sure there is a test case for this in the Hexagon internal repo. Adding
@sgundapa in case he is able to check? I don't remember why I added this code to MachinePipeliner rather than
change CloneMachineInstr, other than I was trying to quickly fix a pipeliner bug and wasn't sure what effect changing
CloneMachineInstr would cause. But it does make more sense to put it in CloneMachineInstr.

Oct 11 2022, 6:44 AM · Restricted Project, Restricted Project

Oct 10 2022

bcahoon updated the diff for D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().

I'd like to try to move this patch by Ruiling forward. Using integer values instead of
boolean values for regions with many outgoing blocks is beneficial by reducing
register pressure. Using boolean vlaues requires a live value for each outgoing block,
but using an integer requires one only. In loops with lot of outgoing blocks, we see
a significant reduction in register pressure and faster compile-time.

Oct 10 2022, 2:22 PM · Restricted Project, Restricted Project
bcahoon commandeered D127831: BasicBlockUtils: Add a new way for CreateControlFlowHub().
Oct 10 2022, 2:00 PM · Restricted Project, Restricted Project
bcahoon updated the diff for D127830: NFC: restructure code for CreateControlFlowHub().

Just rebasing. No change intended.

Oct 10 2022, 1:59 PM · Restricted Project, Restricted Project
bcahoon commandeered D127830: NFC: restructure code for CreateControlFlowHub().
Oct 10 2022, 1:57 PM · Restricted Project, Restricted Project

Oct 3 2022

bcahoon added inline comments to D134557: [BranchRelaxation] Fall through only if block has no unconditional branches.
Oct 3 2022, 6:55 PM · Restricted Project, Restricted Project

Sep 26 2022

bcahoon added inline comments to D134557: [BranchRelaxation] Fall through only if block has no unconditional branches.
Sep 26 2022, 10:46 AM · Restricted Project, Restricted Project

Sep 23 2022

bcahoon added inline comments to D134557: [BranchRelaxation] Fall through only if block has no unconditional branches.
Sep 23 2022, 1:47 PM · Restricted Project, Restricted Project
bcahoon added inline comments to D134557: [BranchRelaxation] Fall through only if block has no unconditional branches.
Sep 23 2022, 11:28 AM · Restricted Project, Restricted Project

Aug 30 2022

bcahoon accepted D132879: [AMDGPU] Limit TID / wavefrontsize uniformness to 1D kernels.

LGTM - thanks!

Aug 30 2022, 12:19 PM · Restricted Project, Restricted Project

Aug 26 2022

bcahoon accepted D132511: [AMDGPU] Detect uniformness of TID / wavefrontsize.

LGTM. Thanks for adding this patch!

Aug 26 2022, 2:42 PM · Restricted Project, Restricted Project

Aug 24 2022

bcahoon added a comment to D132511: [AMDGPU] Detect uniformness of TID / wavefrontsize.

This looks good to me. The patch enables the compiler to generate s_load when the user writes code that divides threadIdx.x by the wavefront size. The only suggestion I have is to add some test cases showing explicitly that the amdgpu.uniform metadata is added via the divergence analysis. (by the AnnotateUniformValues pass). The test cases provided rely upon that working correctly, though they show the end result rather than the steps needed to get the result.

Aug 24 2022, 7:14 PM · Restricted Project, Restricted Project

Aug 16 2022

bcahoon added inline comments to D131951: Use ISA versions instead of attributes to determine intrinsic legality..
Aug 16 2022, 6:45 AM · Restricted Project, Restricted Project

Aug 8 2022

bcahoon accepted D127840: [MachinePipeliner] Fix Phi generation failure for large stages.

Thanks for the fix. LGTM.

Aug 8 2022, 7:00 PM · Restricted Project, Restricted Project

Aug 5 2022

bcahoon added a comment to D131276: AMDGPU: Implicit kernel arguments related optimization when uniform-workgroup-size=true.

I also think we're still missing a module flag to indicate the code object version

The module flag was implemented quite a while back. See D119026 from February. From a recent compile:

!llvm.module.flags = !{!0, !1, !2}
!0 = !{i32 1, !"amdgpu_code_object_version", i32 500}

We were using --amdhsa-code-object-version=5 to run LIT tests. Do you mean this flag will no longer take effect if we
switch to module flag for code object version?

Aug 5 2022, 4:59 PM · Restricted Project, Restricted Project

Aug 4 2022

bcahoon added a comment to D128073: [SROA] Try harder to find a vector promotion viable type when rewriting.

Is anything else needed for this patch or any additional review comments?

Aug 4 2022, 6:49 AM · Restricted Project, Restricted Project

Aug 2 2022

bcahoon added a comment to D127840: [MachinePipeliner] Fix Phi generation failure for large stages.

This is an interesting bug. The patch does fix a flaw with the current Phi generation by the pipeliner. I think the approach here is a good one. It is a little difficult to understand the reason needed for the two different VRMaps though (much of the code in generatePhis is difficult to understand). But, the two maps do reduce some of the complexity in the code, which is good. I'll accept it once the test case comment is resolved. Thanks for fix the issue!

Aug 2 2022, 3:25 PM · Restricted Project, Restricted Project

Aug 1 2022

bcahoon added inline comments to D123693: Transform illegal intrinsics to V_ILLEGAL.
Aug 1 2022, 7:28 AM · Restricted Project, Restricted Project

Jul 29 2022

bcahoon added a comment to D123693: Transform illegal intrinsics to V_ILLEGAL.

A couple more tests are needed. A test for for the image intrinsic. Also an assembler test for the v_illegal encoding.

Jul 29 2022, 10:10 AM · Restricted Project, Restricted Project

Jul 27 2022

bcahoon added inline comments to D123693: Transform illegal intrinsics to V_ILLEGAL.
Jul 27 2022, 3:33 PM · Restricted Project, Restricted Project

Jul 25 2022

bcahoon added inline comments to D123693: Transform illegal intrinsics to V_ILLEGAL.
Jul 25 2022, 7:03 PM · Restricted Project, Restricted Project

Jul 14 2022

bcahoon added a reverting change for rGe13248ab0e79: [UnifyLoopExits] Reduce number of guard blocks: rG58fec78231dc: Revert "[UnifyLoopExits] Reduce number of guard blocks".
Jul 14 2022, 8:46 AM · Restricted Project, Restricted Project
bcahoon committed rG58fec78231dc: Revert "[UnifyLoopExits] Reduce number of guard blocks" (authored by bcahoon).
Revert "[UnifyLoopExits] Reduce number of guard blocks"
Jul 14 2022, 8:46 AM · Restricted Project, Restricted Project
bcahoon added a reverting change for D123230: [UnifyLoopExits] Reduce number of guard blocks: rG58fec78231dc: Revert "[UnifyLoopExits] Reduce number of guard blocks".
Jul 14 2022, 8:45 AM · Restricted Project, Restricted Project
bcahoon added a reverting change for rGf1b05a0a2bbb: [StructurizeCFG] Improve basic block ordering: rGc945d88d2b88: Revert "[StructurizeCFG] Improve basic block ordering".
Jul 14 2022, 7:45 AM · Restricted Project, Restricted Project
bcahoon committed rGc945d88d2b88: Revert "[StructurizeCFG] Improve basic block ordering" (authored by bcahoon).
Revert "[StructurizeCFG] Improve basic block ordering"
Jul 14 2022, 7:45 AM · Restricted Project, Restricted Project
bcahoon added a reverting change for D123231: [StructurizeCFG] Improve basic block ordering: rGc945d88d2b88: Revert "[StructurizeCFG] Improve basic block ordering".
Jul 14 2022, 7:45 AM · Restricted Project, Restricted Project

Jun 28 2022

bcahoon added a reviewer for D128001: apply DivergenceAnalysis for SLU: sameerds.
Jun 28 2022, 7:39 AM · Restricted Project, Restricted Project

Jun 27 2022

bcahoon added a comment to D123231: [StructurizeCFG] Improve basic block ordering.

The issue @critson reported in the related change reminds to take a careful look of this change. I think there is one critical correctness issue need to be fixed. When we are moving the exit block into the loop, we need to make sure there is no convergent operation in the exit block, otherwise we may change the threads that will participate the convergent operation if the threads exit the loop non-uniformly. So we have to scan all the instructions in the exit block before making the reorder decision.

Jun 27 2022, 2:11 PM · Restricted Project, Restricted Project

Jun 25 2022

bcahoon added a comment to D123230: [UnifyLoopExits] Reduce number of guard blocks.

Hi @critson. Thanks for reducing the test case. I don't see an error with the test case though, so I may be missing something. Or, perhaps I need to see the complete test case? Here's my understanding. With the patch, control flow still converges at TrueExit. It's just that the Exit3 and Exit4 blocks have moved in the CFG so that they appear prior to loop.exit.guard rather than after. Following the exit paths from F, G, and C. I think that should be ok?

Jun 25 2022, 10:58 AM · Restricted Project, Restricted Project

Jun 24 2022

bcahoon added a comment to D123230: [UnifyLoopExits] Reduce number of guard blocks.

Hi @critson, thanks for pointing out the issue, and a potential fix. I'm looking at it now.

Jun 24 2022, 10:13 AM · Restricted Project, Restricted Project

Jun 22 2022

bcahoon committed rGf1b05a0a2bbb: [StructurizeCFG] Improve basic block ordering (authored by bcahoon).
[StructurizeCFG] Improve basic block ordering
Jun 22 2022, 2:13 PM · Restricted Project, Restricted Project
bcahoon closed D123231: [StructurizeCFG] Improve basic block ordering.
Jun 22 2022, 2:13 PM · Restricted Project, Restricted Project
bcahoon committed rGe13248ab0e79: [UnifyLoopExits] Reduce number of guard blocks (authored by bcahoon).
[UnifyLoopExits] Reduce number of guard blocks
Jun 22 2022, 1:49 PM · Restricted Project, Restricted Project
bcahoon closed D123230: [UnifyLoopExits] Reduce number of guard blocks.
Jun 22 2022, 1:49 PM · Restricted Project, Restricted Project

Jun 19 2022

bcahoon added a comment to D123231: [StructurizeCFG] Improve basic block ordering.

I had tried a different approach to avoid inserting excessive number of boolean values during the loop-exit-unify in D127831. I just did some testing of that change against the LLVM IR Brendon shared with me. It shows the change could help reducing the number of registers as well as compile time. But it is sad that I still hit the error: "unhandled SGPR spill to memory" from SGPRSpillBuilder in SIRegisterInfo.cpp. Can the limitation be fixed? I did some register pressure comparison, seems the way I proposed would use much less VGPR than (D123230 + D123231), but use more SGPR. I haven't looked further why there is such behavior difference. I think we need more investigation to know why. But looks like D127831 might help us generate better code because we can use one VGPR as the backup storage for spilling of 64/32 SGPRs. And the idea used there is much easy to follow.

Jun 19 2022, 4:32 PM · Restricted Project, Restricted Project
bcahoon added inline comments to D123231: [StructurizeCFG] Improve basic block ordering.
Jun 19 2022, 4:27 PM · Restricted Project, Restricted Project
bcahoon updated the diff for D123231: [StructurizeCFG] Improve basic block ordering.

Updates to the comments in the improve-order.ll test, based up reviewer suggestions.

Jun 19 2022, 4:12 PM · Restricted Project, Restricted Project

Jun 16 2022

bcahoon accepted D127971: [MachinePipeliner] Handle failing constrainRegClass.

Thanks for making this change. It's an interesting case. We handled a related issue, with subregisters, by adding code to preprocessPhiNodes to create copies. I don't have a problem with approach in this patch. There may be other cases in this code that need a similar change? Not sure...

Jun 16 2022, 4:17 PM · Restricted Project, Restricted Project

Jun 14 2022

bcahoon added a comment to D123693: Transform illegal intrinsics to V_ILLEGAL.

The rationale is for library code that looks like:
void example(float *p, float v) {

if (_ISA_verison == 9008 || __ISA_verison == 9010)
   global_atomic_fadd(p, v);
else
   generic_atomic_fadd(p, v);

}

Jun 14 2022, 6:52 AM · Restricted Project, Restricted Project

Jun 13 2022

bcahoon updated the diff for D123231: [StructurizeCFG] Improve basic block ordering.

No changes with this patch. It's been a while since this patch was added. I was investigating an internal test failure, which turns out to be unrelated to this patch. So, I'd like to move forward with reviewing this again.

Jun 13 2022, 7:41 AM · Restricted Project, Restricted Project

May 24 2022

bcahoon accepted D126009: [AMDGPU] Enforce alignment of image vaddr on gfx90a.
May 24 2022, 9:42 AM · Restricted Project, Restricted Project

May 20 2022

bcahoon added a comment to D126009: [AMDGPU] Enforce alignment of image vaddr on gfx90a.

This looks good. Just a couple of minor comments.

May 20 2022, 8:01 PM · Restricted Project, Restricted Project

Apr 24 2022

bcahoon added inline comments to D124267: [MachinePipeliner] Fix unscheduled instruction.
Apr 24 2022, 7:08 PM · Restricted Project, Restricted Project

Apr 22 2022

bcahoon added inline comments to D124267: [MachinePipeliner] Fix unscheduled instruction.
Apr 22 2022, 1:43 PM · Restricted Project, Restricted Project
bcahoon added a reviewer for D124267: [MachinePipeliner] Fix unscheduled instruction: sgundapa.

Hi @sgundapa, just checking if you made a similar change that wasn't upstreamed?

Apr 22 2022, 8:44 AM · Restricted Project, Restricted Project

Apr 14 2022

bcahoon added a comment to D123231: [StructurizeCFG] Improve basic block ordering.

which increases compilation time and register pressure.

Have you looked at which part is responsible for the compilation time increase? Is is possible that we hit inefficiency in certain pass?
The "register pressure" here specifically means SGPR usage, right?

Apr 14 2022, 7:49 AM · Restricted Project, Restricted Project
bcahoon updated the diff for D123231: [StructurizeCFG] Improve basic block ordering.

Made several changes based upon suggestions and to fix a correctness issue:

  • The new function, reorderNodes, must appear before collectInfos because collectInfos collects branch predicate information based upon the order of the nodes, so the wrong branch information was used after nodes were reordered.
  • Changed name of the function, from improveNodeOrder to reorderNodes because changing the order is an improvement under certain conditions.
  • Only reorder the nodes for large regions. The size of the region can be changed with a command-line option. The issues fixed by this patch are seen only with very large regions, so this flag limits the impact of this patch. I'm open to other ways to do this.
  • Since reorderNodes appears before collectInfos, it no longer can use Visited. Instead, reorderNodes keeps track of the basic blocks in Order and uses that as part of the criteria to reorder. We only want to reorder a node that has a predecessor in Order.
Apr 14 2022, 7:19 AM · Restricted Project, Restricted Project

Apr 11 2022

bcahoon added reviewers for D123231: [StructurizeCFG] Improve basic block ordering: foad, nhaehnle.
Apr 11 2022, 12:20 PM · Restricted Project, Restricted Project

Apr 9 2022

bcahoon added inline comments to D123230: [UnifyLoopExits] Reduce number of guard blocks.
Apr 9 2022, 2:46 PM · Restricted Project, Restricted Project
bcahoon updated the diff for D123230: [UnifyLoopExits] Reduce number of guard blocks.

Messed up the last revision by including another patch. This hopefully fixes it. The only new change is an additional test case, three_loops, added to test/Transforms/UnifyLoopExits/reduce_guards.ll

Apr 9 2022, 2:44 PM · Restricted Project, Restricted Project
bcahoon updated the diff for D123230: [UnifyLoopExits] Reduce number of guard blocks.

Added a test to show the case when we need to move the guard block predecessor from an outer loop to an inner loop. Three nested loops are needed for this. Initially, the guard block predecessor is added to the outer loop, but then need to change to the middle loop.

Apr 9 2022, 2:27 PM · Restricted Project, Restricted Project

Apr 7 2022

bcahoon added inline comments to D123230: [UnifyLoopExits] Reduce number of guard blocks.
Apr 7 2022, 10:08 AM · Restricted Project, Restricted Project
bcahoon updated the diff for D123230: [UnifyLoopExits] Reduce number of guard blocks.

Changes based upon review comments.

Apr 7 2022, 10:02 AM · Restricted Project, Restricted Project

Apr 6 2022

bcahoon added a comment to D122672: [CodeGen][ARM] Enable Swing Module Scheduling for ARM.

It's great to see extensions to the pipeliner so that it works on more targets and loops that don't assume a hardware loop. A little while back a couple of functions, see shoulgnoreForPipelining, were added to support regular loops. However, I don't see that function used. Perhaps that effort was never completely finished? Do we need to get rid of that function, or try to combine the work you've done with that work?

Apr 6 2022, 6:39 PM · Restricted Project, Restricted Project
bcahoon requested review of D123231: [StructurizeCFG] Improve basic block ordering.
Apr 6 2022, 9:08 AM · Restricted Project, Restricted Project
bcahoon requested review of D123230: [UnifyLoopExits] Reduce number of guard blocks.
Apr 6 2022, 9:05 AM · Restricted Project, Restricted Project

Feb 22 2022

bcahoon committed rG3cc15e2cb657: [SLP] Fix assert from non-constant index in insertelement (authored by bcahoon).
[SLP] Fix assert from non-constant index in insertelement
Feb 22 2022, 2:01 PM
bcahoon closed D120223: [SLP] Fix assert from non-constant index in insertelement.
Feb 22 2022, 2:01 PM · Restricted Project
bcahoon updated the diff for D120223: [SLP] Fix assert from non-constant index in insertelement.

Changes based upon review. Don't just skip over insertelements with a
non-constant index. Instead, fall through to include additional cost
calculation.

Feb 22 2022, 12:55 PM · Restricted Project
bcahoon added inline comments to D120223: [SLP] Fix assert from non-constant index in insertelement.
Feb 22 2022, 11:28 AM · Restricted Project
bcahoon added inline comments to D120223: [SLP] Fix assert from non-constant index in insertelement.
Feb 22 2022, 11:13 AM · Restricted Project
bcahoon added inline comments to D120223: [SLP] Fix assert from non-constant index in insertelement.
Feb 22 2022, 10:06 AM · Restricted Project

Feb 20 2022

bcahoon requested review of D120223: [SLP] Fix assert from non-constant index in insertelement.
Feb 20 2022, 4:43 PM · Restricted Project

Jan 31 2022

bcahoon accepted D118272: AMDGPU: Add code object v5 support and generate metadata for implicit kernel args.

There are some clang-format issues to be addressed before committing. Other than that, the patch LGTM. Thanks!

Jan 31 2022, 5:09 PM · Restricted Project

Jan 7 2022

bcahoon accepted D116803: [GlobalISel] Use getPreferredShiftAmountTy in one more G_UBFX combine.

Looks good to me.

Jan 7 2022, 12:30 PM · Restricted Project
bcahoon accepted D116802: [GlobalISel] Fix legality checks for G_UBFX combines.

Looks good to me. Thanks!

Jan 7 2022, 12:27 PM · Restricted Project

Dec 23 2021

bcahoon committed rGd45a24799899: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info (authored by bcahoon).
[AMDGPU] Don't remove VGPR to AGPR dead spills from frame info
Dec 23 2021, 9:12 AM
bcahoon closed D115996: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info.
Dec 23 2021, 9:12 AM · Restricted Project

Dec 22 2021

bcahoon updated the diff for D115996: [AMDGPU] Don't remove VGPR to AGPR dead spills from frame info.

Changes so that the VGPR to AGPR spill slot can be removed if stack slot coloring has not allocated multiple objects to the same slot. Added a couple extra test cases to check for different cases when the slot can/can't be removed.

Dec 22 2021, 3:08 PM · Restricted Project