alex-t (Alexander)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 26 2016, 7:17 AM (102 w, 5 d)

Recent Activity

Jun 4 2018

alex-t added a comment to D46298: AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering.
should I just disable this analysis completely for r600?
Jun 4 2018, 3:34 AM
alex-t accepted D47148: [CodeGen] Always update divergence in SelectionDAG::UpdateNodeOperands.
Jun 4 2018, 3:25 AM
alex-t added a comment to D47148: [CodeGen] Always update divergence in SelectionDAG::UpdateNodeOperands.

My apologies for the delay. Thanks for handling this. LGTM.

Jun 4 2018, 3:24 AM

May 24 2018

alex-t added a comment to D46298: AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering.

I agree isVGPR should be illegal to call for R600

May 24 2018, 3:02 AM

May 21 2018

alex-t accepted D47151: [AMDGPU] Add divergence analysis as a dependency for ISel.

Thanks for catching this

May 21 2018, 11:21 AM

May 3 2018

alex-t added a comment to D46298: AMDGPU: Move isSDNodeSourceOfDivergence() implementation to SITargetLowering.

Could you please clarify - why do you consider that check meaningless for r600?
I see that this line : " const SISubtarget &ST = MF->getSubtarget<SISubtarget>(); " is misleading and in fact is not correct.
I'd better check and choose the R600Subtarget or SISubtarget.
If I understand right we need just check which subtarget to retrieve for physregs check.

May 3 2018, 4:49 AM

Apr 25 2018

alex-t added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

Sorry for the delay. Just submitted to trunk.
It was not about the reverse patch only. I had to change tests accordingly.

Apr 25 2018, 5:40 AM
alex-t committed rL330818: [AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556).
[AMDGPU] Revert b0efc4fd6 (https://reviews.llvm.org/D40556)
Apr 25 2018, 5:36 AM

Apr 16 2018

alex-t added a comment to D45372: [AMDGPU] Fix issues for backend divergence tracking.
Apr 16 2018, 8:01 AM

Apr 13 2018

alex-t added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

I have no idea how to backport yet.
I 'my going to submit revert patch.
This change existed for a few months and I' my not sure what may be broken
if reverted. I have to figure out this. Some precheckin tests needed.

Apr 13 2018, 12:57 AM

Apr 10 2018

alex-t accepted D45372: [AMDGPU] Fix issues for backend divergence tracking.
Apr 10 2018, 6:37 AM
alex-t added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

In fact yes. This is the most correct solution. The stuff in
SIFixSGPRCopies.cpp has got to gone.
So it does not make sense to spend efforts on it. Moreover there is no
correct solution at all
unless we implement one more DivergenceAnalysis upon the machine IR.
So, if it's okay I'd prefer to revert.

Apr 10 2018, 6:34 AM
alex-t added a comment to D45372: [AMDGPU] Fix issues for backend divergence tracking.

In general LGTM.
I also concerned about the magic test that has no checks and has no visible side effects on shared data but is necessary to reproduce buggy behavior.
Could you please clarify what do you need it for?

Apr 10 2018, 6:28 AM

Apr 9 2018

alex-t added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

It seems like this patch should be reverted. Recently we have no reliable
way to determine the PHI (or whatever else) divergence on the MI level.
The only correct way is to add the DA algorithm that re-computes all the MI
divergence.
Since we're going to remove the SGPRFix stuff as soon as diverence driven
ISel is ready I consider this patch is not necessary.

Apr 9 2018, 4:37 AM

Mar 27 2018

alex-t accepted D40547: AMDGPU: Fix copying i1 value out of loop with non-uniform exit.
Mar 27 2018, 6:04 AM
alex-t accepted D43743: StructurizeCFG: Test for branch divergence correctly.
Mar 27 2018, 6:04 AM

Mar 17 2018

alex-t added a comment to D40556: SIFixSGPRCopies should not change non-divergent PHI.

I m looking on this.

Mar 17 2018, 8:20 AM

Mar 15 2018

alex-t updated subscribers of D40556: SIFixSGPRCopies should not change non-divergent PHI.

Hi Samuel,

Mar 15 2018, 9:29 AM

Mar 14 2018

alex-t committed rL327488: [AMDGPU] Fix for DAGCombiner infinite loop in OCLtst.
[AMDGPU] Fix for DAGCombiner infinite loop in OCLtst
Mar 14 2018, 2:51 AM
alex-t closed D44417: Fix for DAGCombiner infinite loop in AMDGPU OCLtst.
Mar 14 2018, 2:51 AM

Mar 13 2018

alex-t created D44417: Fix for DAGCombiner infinite loop in AMDGPU OCLtst.
Mar 13 2018, 4:38 AM

Mar 5 2018

alex-t committed rL326703: Pass Divergence Analysis data to Selection DAG to drive divergence.
Pass Divergence Analysis data to Selection DAG to drive divergence
Mar 5 2018, 7:17 AM
alex-t closed D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Mar 5 2018, 7:17 AM
alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Mar 5 2018, 6:00 AM

Mar 2 2018

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Mar 2 2018, 9:21 AM
alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

ready to land

Mar 2 2018, 5:39 AM

Mar 1 2018

alex-t committed rL326451: [AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found.
[AMDGPU] : fix for the crash in SIRegisterInfo when the regiser class not found
Mar 1 2018, 9:41 AM

Feb 26 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

make check-llvm has passed

Feb 26 2018, 12:00 PM
alex-t updated the diff for D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash.

Following the discussion. The only register classes that appeared necessary according to the test coverage were added.
This preview can be the starting point for the discussion with respect to the proper approach.

Feb 26 2018, 11:24 AM
alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

One test fixed

Feb 26 2018, 11:20 AM
alex-t added a comment to D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash.

One more question - where should R600 reg classes be processed?
Should we implement getRegClass in R600RegisterInfo?
Or it's okay to handle all them in SIRegisterInfo?

Feb 26 2018, 9:54 AM
alex-t added inline comments to D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash.
Feb 26 2018, 7:50 AM
alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Formatting fixed.
DAG divergence verification for "divergent" targets only.

Feb 26 2018, 3:09 AM

Feb 22 2018

alex-t added inline comments to D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash.
Feb 22 2018, 11:41 AM
alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Verification algorithm of linear complexity

Feb 22 2018, 11:36 AM

Feb 21 2018

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

This is the preview of the implementation that provide walk-through divergence bits consistency.
Please note that the verification algorithm has polynomial complexity and is expected to be switched ON/OFF by the option (upcoming soon) with default to OFF.

Feb 21 2018, 12:22 PM
alex-t accepted D40546: StructurizeCFG: Test for branch divergence correctly.
Feb 21 2018, 7:46 AM

Feb 20 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Feb 20 2018, 6:02 AM
alex-t added inline comments to D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash.
Feb 20 2018, 2:43 AM

Feb 16 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

ping @efriedma

Feb 16 2018, 7:39 AM

Feb 15 2018

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Some bugfixes and changes according to the reviewers requirements.

Feb 15 2018, 9:03 AM
alex-t created D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash.
Feb 15 2018, 6:56 AM
alex-t added a reviewer for D43334: AMDGPU: fix for SIRegisterInfo::isVGPR() crash: rampitec.
Feb 15 2018, 6:56 AM
alex-t added a comment to D41651: AMDGPU: Add 32-bit constant address space.

In fact v_readfirstlane is inserted by the ISel to glue vector input to the unexpected scalar instruction.
This means that compiler user writing valid IR will get unexpected behavior.
Is this documented somewhere?

Feb 15 2018, 4:04 AM

Feb 14 2018

alex-t added inline comments to D41651: AMDGPU: Add 32-bit constant address space.
Feb 14 2018, 11:48 AM
alex-t added inline comments to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Feb 14 2018, 1:00 AM

Feb 13 2018

alex-t added inline comments to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Feb 13 2018, 10:06 AM
alex-t added inline comments to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Feb 13 2018, 7:34 AM
alex-t added inline comments to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Feb 13 2018, 6:30 AM

Feb 9 2018

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Preliminary revision illustrating possible approach to keeping divergence information consistent along the DAG transformation

Feb 9 2018, 7:42 AM
alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

If we're going to include the "divergent" bit in SDNodes, so we can query it all the time, the bit needs to be correct all the time. The goal of a verifier is to ensure that at any given point, the bits stored in the SelectionDAG are the same as the bits we would compute from scratch. So code still needs to do the right thing to update the divergence bits, if necessary, but the verifier lets us catch mistakes early. This is similar to the way we have a domtree verifier, to ensure transforms correctly update the domtree.

Feb 9 2018, 5:08 AM

Feb 8 2018

alex-t added inline comments to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Feb 8 2018, 2:18 AM
alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

I'd like to see a verifier somewhere that the divergence bit is still correct after DAGCombine (it could be different from what SelectionDAG::createOperands would compute given how ReplaceAllUsesWith works).

Feb 8 2018, 2:14 AM

Feb 5 2018

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Here is alternative implementation based on the TargetLoweringInfo hooks.

Feb 5 2018, 11:12 AM

Jan 31 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
  1. FunctionLoweringInfo::ValueMap is created during the SelectionDAGBuilder walk through the BasicBlock. So we cannot query live-in register divergence from the CreateOperands => TargetLoweringInfo::isSDNodeSourceOfDivergence. By this point ValueMap has not yet been filled in.

Really? I thought we fill it in before we actually start building the SelectionDAG (in FunctionLoweringInfo::set). But you can move it earlier if you need to.

All above means that we cannot just validate the flag values and assert if it does not match. We have to run iterative solver for each block just before the selection to count the control dependencies and to propagate the flag values.

Jan 31 2018, 9:06 AM

Jan 30 2018

alex-t added inline comments to D40546: StructurizeCFG: Test for branch divergence correctly.
Jan 30 2018, 7:00 AM

Jan 29 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

One more item that should be discussed is the target-specific exceptions to the common divergence modeling algorithm.
For instance in AMDGPU target we have amdgcn.readfirstlane/readlane intrinsics. They accept vector register and return the first or specific lane value.
So both accept naturally divergent VGPR but return the scalar value.
Following the common divergence computing algorithm - "the divergence of operation's result is superposition of the operands divergence" we'd set %scalar = tail call i32 @llvm.amdgcn.readfirstlane(i32 %tid) to divergent that is not true.
In the IR form of the divergence-driven selection we rely on the TargetTransformInfo::isAlwaysUniform hook that was added to interface for this purpose.
It allows the target to declare arbitrary set of target operations as "always uniform" so that the analysis does not count for their operands divergence.

Jan 29 2018, 5:11 AM
alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Please also note that this addition does not depend on "tid" or any other divergent data. It is not possible to discover this dependency analyzing individual block. We need CFG information.

Yes, this is what I was getting at with "We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register."; the nodes which need CFG information are precisely CopyFromReg nodes from virtual registers. Each virtual register created by the SelectionDAGBuilder should correspond to exactly one IR instruction.

Jan 29 2018, 5:01 AM
alex-t added a comment to D40547: AMDGPU: Fix copying i1 value out of loop with non-uniform exit.

If I understand everything correct...
The problem you're trying to solve is well known.
You have divergent loop-exit and a value that is uniformly defined inside the loop but used outside the loop.

More or less. However, whether the value is uniform or not doesn't really make a difference: I can change the test case so that %cc is non-uniform, and the same issue occurs. So this isn't really about DivergenceAnalysis.

Could you please look here: https://reviews.llvm.org/D40556

Could you use same approach?

You have 2 blocks: defBlock and useBlock and you want to know:

  1. is useBlock is control dependent of defBlock ?
  2. if 1 is true is defBlock's termination branch uniform? The set of control dependencies for defBlock is it's post-dominance frontier set The set of control dependencies for useBlock is it's post-dominance frontier set We need to check the branches that are NOT common in 2 sets above.

I don't think this works, but perhaps I'm misunderstanding you. In the test case which I've added, the defBlock is %for.body, and the useBlock is %for.end.

%for.end post-dominates the entire loop, so its post-dominance frontier is empty.

%for.body post-dominates %entry and %end.loop, so its PDF is only %mid.loop.

None of that information seems to help?

Jan 29 2018, 3:56 AM

Jan 23 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Specifically which nodes are a problem here? We should query the IR DivergenceAnalysis to compute isSDNodeSourceOfDivergence for a CopyFromReg from a live-in virtual register. (Not sure there's an existing map from registers to values, but you could easily construct one; basically the inverse of FunctionLoweringInfo::ValueMap.)

Jan 23 2018, 5:42 AM

Jan 16 2018

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

This is only true if your original computation is correct, and if DAGCombine/Legalization doesn't create any nodes which are naturally divergent. Neither of those are safe assumptions, I think. DAGCombine and legalization will transform loads and stores, which could end up creating a naturally divergent node.

So, my question is: could you imagine even theoretical sensible transformation that convert the graph in such a way that uniform node will get divergent income?

No, but that isn't the point. The problem is that you could replace a naturally divergent node with an equivalent naturally divergent node, but the new node doesn't have the divergent bit set (since the bit only gets set in DAGCombine for nodes with divergent operands, and naturally divergent nodes might not have divergent operands). Thinking about it a bit more, I guess regular load/store operations are a bad example; if a load produced multiple values given a uniform address, it would be a data race. But I think atomic memory operations could run into this issue? (Consider, for example, the code in DAGTypeLegalizer::PromoteIntRes_Atomic1.)

And some divergent nodes will never be passed to SelectionDAGBuilder::setValue when you build the DAG, due to the way SelectionDAGBuilder handles values with illegal types. But I'm not sure that's a complete list of the issues with the current version, and there's no practical way to check without a verifier.

Even if it creates new DAG pattern it returns it's root that (because of CreateOperands) has correct divergence that will be passed to setValue. Or I did not understand what you meant?

That's not what I meant.

Say you have a call to a divergent function which returns an i64, but i64 isn't legal on your target (so the function effectively returns two values of type i32). We create the call, a couple CopyFromReg nodes, and then a MERGE_VALUES to merge the value. Then you set the MERGE_VALUES to be divergent... but that isn't really helpful: legalization for MERGE_VALUES erases the node, so the "divergent" bit goes away.

Jan 16 2018, 8:37 AM
alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

This is a draft of the divergence analysis solver on the selection DAG. In the course of discussion the divergence bit verification was requested.
Analysis of the one given block cannot cover control dependencies. Thus the divergence bits set from the IR reflecting control dependencies cannot match those computed on the one isolated block DAG. That's why it is not exactly the verification. The analysis performed on the DAG augments the divergence information passed from the IR.

Jan 16 2018, 8:26 AM

Dec 21 2017

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

To start with, let's make sure that we're agreed on terms.
Divergent machine runs a set of threads (warp or wavefront) that execute same set of instructions in same order (SIMT).
Divergent operation operates on "vector" registers such that each register consists of many lanes - each thread operates on the data in corresponding lane.
From the above immediately follows that the only source of divergence is thread ID or any data that is derived from thread ID.
Usually it is a small set of target intrinsics that may be the source of such a data.

Dec 21 2017, 11:40 PM
alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Actually, what I'd really like to see here is some sort of verifier for the divergent bit. It should be possible to recompute the divergence of the SelectionDAG at any point from first principles. There's a small set of operations which are fundamentally divergent: CopyFromReg where the register contains a divergent value (you should be able to derive this from DivergenceAnalysis), divergent memory accesses, and some target-specific intrinsics. (Not sure that's a complete list, but should be close.) All other operations are divergent if and only if they have a divergent predecessor.

Dec 21 2017, 11:24 AM

Dec 13 2017

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Divergence bit propagation added to ReplaceAllUsesWith

Dec 13 2017, 8:58 AM

Dec 12 2017

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

That's true. The problem is that in SelectionDAG::getNode (where the SCEMap insertion is) we have no Value and no chance to check it's divergence.
And this is correct: SelectionDAG is for selection and we should not expose the IR Values to it.

The only way I see is to pass the Divergence parameter to getNode from all the SelectionDAGBuilder visitors. This will be correct but requires to change each of 109 visitors and getNode().

Dec 12 2017, 1:02 AM

Dec 11 2017

alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Any DAG transformation that change divergent pattern to not-divergent or vice versa is illegal.

Transforming "x*0 -> 0" is illegal if x is divergent? That seems surprising.

Dec 11 2017, 2:30 AM

Dec 8 2017

alex-t added inline comments to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.
Dec 8 2017, 5:06 AM
alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

There actually can be problem with folding the node if we patch it after creation. At least this needs to be checked.

Dec 8 2017, 4:42 AM
alex-t added a comment to D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Does ReplaceAllUsesWith need to propagate changes to the "IsDivergent" bit?

Dec 8 2017, 4:38 AM

Dec 6 2017

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Attention please! If nobody has objections this will be committed next Friday.

Dec 6 2017, 7:27 AM

Dec 5 2017

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Targets that have no divergence do not depend on Divergence Analysis anymore.

Dec 5 2017, 10:53 AM

Dec 1 2017

alex-t committed rL319534: [AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI.
[AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI
Dec 1 2017, 3:57 AM
alex-t closed D40556: SIFixSGPRCopies should not change non-divergent PHI by committing rL319534: [AMDGPU] SiFixSGPRCopies should not modify non-divergent PHI.
Dec 1 2017, 3:57 AM

Nov 29 2017

alex-t updated the diff for D40556: SIFixSGPRCopies should not change non-divergent PHI.

updated according to the comments

Nov 29 2017, 6:04 AM
alex-t added a comment to D40547: AMDGPU: Fix copying i1 value out of loop with non-uniform exit.

If I understand everything correct...
The problem you're trying to solve is well known.
You have divergent loop-exit and a value that is uniformly defined inside the loop but used outside the loop.
In this case different threads would have different values.
Traditional Divergence Analysis cannot handle this. Since definition inside the loop body is uniform the use is uniform as well.
Since the value has no explicit data dependency of the loop index, the PHI-node in the loop header (that is divergent if loop-exit is) does not affect it's divergence formally.
The value in fact does have loop-carried dependency. For example:

Nov 29 2017, 4:05 AM
alex-t added a comment to D40546: StructurizeCFG: Test for branch divergence correctly.

In general looks correct to me. You definitely should check the branch itself - not the condition.
In your case (divergent loop-exit) branch itself is divergent because of the control dependency.

Nov 29 2017, 3:01 AM

Nov 28 2017

alex-t created D40556: SIFixSGPRCopies should not change non-divergent PHI.
Nov 28 2017, 6:38 AM

Nov 10 2017

alex-t added a reviewer for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection: bogner.
Nov 10 2017, 4:40 AM
alex-t committed rL317884: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the….
[AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the…
Nov 10 2017, 4:21 AM
alex-t closed D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one by committing rL317884: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the….
Nov 10 2017, 4:21 AM

Nov 9 2017

alex-t updated the diff for D35267: Pass Divergence Analysis data to selection DAG to drive divergence dependent instruction selection.

Implementation changed according to the reviewers suggestions.

Nov 9 2017, 11:06 AM

Oct 20 2017

alex-t added a comment to D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.

In other words:

Oct 20 2017, 10:38 AM
alex-t added a comment to D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.

I'm not sure I understand what exactly is going on. dead and kill flags work just as well for subregisters; I don't see a single COPY in your MI excerpt so I'm not sure what is going on or how copy propagation could get it wrong.

Oct 20 2017, 10:23 AM

Oct 17 2017

alex-t updated the diff for D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.

Back to initial approach.

Oct 17 2017, 12:49 PM
alex-t closed D38293: Avoid predicated execution of the basic blocks containing scalar instructions.

r314828

Oct 17 2017, 12:23 PM
alex-t reopened D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.

The fix you suggested breaks x86 backend on SingleSource/Benchmarks/Adobe-C++/CMakeFiles/simple_types_loop_invariant.dir/simple_types_loop_invariant.cpp

Oct 17 2017, 12:15 PM
alex-t reopened D38293: Avoid predicated execution of the basic blocks containing scalar instructions.
Oct 17 2017, 12:03 PM

Oct 16 2017

alex-t committed rL315916: [AMDGPU] : revert r315908.
[AMDGPU] : revert r315908
Oct 16 2017, 9:58 AM
alex-t committed rL315908: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the….
[AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the…
Oct 16 2017, 7:35 AM
alex-t closed D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one by committing rL315908: [AMDGPU] Prevent Machine Copy Propagation from replacing live copy with the….
Oct 16 2017, 7:35 AM

Oct 13 2017

alex-t updated the diff for D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.

Fixed according to the Matthias suggestion.

Oct 13 2017, 8:32 AM

Oct 11 2017

alex-t added a comment to D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.

Do you know whether the situation is already detected by DetectDeadLanes? I'd be more open to add dead code elimination to that pass (as we could run it instead of dead code elimination I believe).

Oct 11 2017, 12:24 PM
alex-t added a comment to D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.
Oct 11 2017, 7:54 AM

Oct 10 2017

alex-t created D38754: Prevent Machine Copy Propagation from replacing live copy with the dead one.
Oct 10 2017, 11:43 AM

Oct 3 2017

alex-t committed rL314828: [AMDGPU] Avoid predicated execution of the basic blocks containing scalar.
[AMDGPU] Avoid predicated execution of the basic blocks containing scalar
Oct 3 2017, 11:57 AM
alex-t closed D38293: Avoid predicated execution of the basic blocks containing scalar instructions by committing rL314828: [AMDGPU] Avoid predicated execution of the basic blocks containing scalar.
Oct 3 2017, 11:57 AM
alex-t updated the diff for D38293: Avoid predicated execution of the basic blocks containing scalar instructions.

Reverted back to the simplest approach.

Oct 3 2017, 10:06 AM

Oct 2 2017

alex-t updated the diff for D38293: Avoid predicated execution of the basic blocks containing scalar instructions.

It is really make sense to take care of the V_READFIRSTLANE/V_READLANE destination register under exec == 0 condition in case their source VGPR is re-defined in SI_MASK_BRANCH target block. Otherwise we assume that source VGPR is defined in one of the dominating blocks and contain correct value.

Oct 2 2017, 1:50 PM
alex-t updated the diff for D38293: Avoid predicated execution of the basic blocks containing scalar instructions.
  1. Added the code that checks that scalar register produced by V_READFIRSTLANE/V_READLANE

Is really used by some SALU instruction. This is necessary to avoid de-optimization of those cases when the scalar register is distinctly the scalar operand of vector instruction.

Oct 2 2017, 9:17 AM