Page MenuHomePhabricator

alex-t (Alexander)
User

Projects

User does not belong to any projects.

User Details

User Since
Jul 26 2016, 7:17 AM (286 w, 2 d)

Recent Activity

Today

alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

Initial DAG


DAG after the transformation and constant folding

This can be selected to v_perm_b32

Thu, Jan 20, 9:30 AM · Restricted Project
alex-t added inline comments to D116270: [AMDGPU] Enable divergence-driven XNOR selection.
Thu, Jan 20, 9:04 AM · Restricted Project
alex-t updated the diff for D116270: [AMDGPU] Enable divergence-driven XNOR selection.

condition was made more readable

Thu, Jan 20, 7:09 AM · Restricted Project

Mon, Jan 17

alex-t updated the diff for D116270: [AMDGPU] Enable divergence-driven XNOR selection.

DAG combiner hook added to control divergence-driven peephole optimizatoins.

Mon, Jan 17, 12:29 PM · Restricted Project

Thu, Jan 13

alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

In general, I don't like the idea of making the DAGCombiner::reassociateOpsCommutative take into account the divergence.

Thu, Jan 13, 6:27 AM · Restricted Project

Tue, Jan 11

alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

I think it is worth trying to change this generic combine to give up if x is uniform and y is divergent.

Tue, Jan 11, 12:11 PM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

Once again, in my case BOTH nodes (not,xor) are divergent!

 %s.load = load i32, i32 addrspace(4)* %s.kernarg.offset.cast, align 4, !invariant.load !0
DIVERGENT:       %v = call i32 @llvm.amdgcn.workitem.id.x(), !range !1
DIVERGENT:       %xor = xor i32 %v, %s.load
DIVERGENT:       %d = xor i32 %xor, -1
DIVERGENT:       store i32 %d, i32 addrspace(1)* %out.load, align 4

I know. I am suggesting that a DAG combine can rewrite this code to the equivalent of:

                 %s.load = load i32, i32 addrspace(4)* %s.kernarg.offset.cast, align 4, !invariant.load !0
DIVERGENT:       %v = call i32 @llvm.amdgcn.workitem.id.x(), !range !1
                 %not = xor i32 %s.load, -1
DIVERGENT:       %d = xor i32 %v, %not
DIVERGENT:       store i32 %d, i32 addrspace(1)* %out.load, align 4

Now %not is uniform, so it is trivial to select it to s_not.

Tue, Jan 11, 8:36 AM · Restricted Project

Mon, Jan 10

alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

Once again, in my case BOTH nodes (not,xor) are divergent!

 %s.load = load i32, i32 addrspace(4)* %s.kernarg.offset.cast, align 4, !invariant.load !0
DIVERGENT:       %v = call i32 @llvm.amdgcn.workitem.id.x(), !range !1
DIVERGENT:       %xor = xor i32 %v, %s.load
DIVERGENT:       %d = xor i32 %xor, -1
DIVERGENT:       store i32 %d, i32 addrspace(1)* %out.load, align 4

I know. I am suggesting that a DAG combine can rewrite this code to the equivalent of:

                 %s.load = load i32, i32 addrspace(4)* %s.kernarg.offset.cast, align 4, !invariant.load !0
DIVERGENT:       %v = call i32 @llvm.amdgcn.workitem.id.x(), !range !1
                 %not = xor i32 %s.load, -1
DIVERGENT:       %d = xor i32 %v, %not
DIVERGENT:       store i32 %d, i32 addrspace(1)* %out.load, align 4

Now %not is uniform, so it is trivial to select it to s_not.

Mon, Jan 10, 8:09 AM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

Now:

We select the divergent NOT to V_NOT_B32_e32 and divergent XOR to V_XOR_B32_e64. The selection is correct but we missed the opportunity to exploit the fact that even divergent NOT may be selected to S_NOT_B32 w/o the correctness lost.

No, you cannot correctly select divergent NOT to S_NOT_B32. That is not what was happening before your patch (see https://reviews.llvm.org/D116270?vs=on&id=396159#change-5HrmrjqhUdXJ). What was happening was that an input like ~(uniform ^ divergent) was being "reassociated" to ~uniform ^ divergent so it could be correctly selected to S_NOT + V_XOR. I assume this was done with a very clever selection pattern, but I am suggesting that instead of that you could implement it as a DAG combine (to do the reassociation), so there is no need for clever selection patterns.

Mon, Jan 10, 7:56 AM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

This looks like a regression in xnor.ll :

	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0

but it is not really. All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.

To repeat what I have already said elsewhere: this is not a correctness issue. This is just an optimization, where you can choose to calculate either ~s0 ^ v0 or s0 ^ ~v0 (or ~(s0 ^ v0)) and get exactly the same result. The optimization is to prefer the first form, because the intermediate result ~s0 is uniform, so you can keep it in an sgpr and not waste vgprs and valu instructions.

Mon, Jan 10, 7:37 AM · Restricted Project

Fri, Jan 7

alex-t added inline comments to D116270: [AMDGPU] Enable divergence-driven XNOR selection.
Fri, Jan 7, 1:12 PM · Restricted Project
alex-t updated the diff for D116270: [AMDGPU] Enable divergence-driven XNOR selection.

Added postprocessing of the selected machine IR. This makes it on par with the existing selection mechanism.

Fri, Jan 7, 7:56 AM · Restricted Project
alex-t committed rG5d46263a5ac5: [AMDGPU] Enable divergence-driven 'ctpop' selection (authored by alex-t).
[AMDGPU] Enable divergence-driven 'ctpop' selection
Fri, Jan 7, 5:05 AM
alex-t closed D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection.
Fri, Jan 7, 5:05 AM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

This looks like a regression in xnor.ll :

	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0

but it is not really. All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.

This is indeed a regression. It is always safe to keep s_not_b32 on SALU. Also note this effectively makes SIInstrInfo::lowerScalarXnor() useless. This is why XNOR was left behind by the D111907.

SIInstrInfo::lowerScalarXnor() is exactly the part of the "manual" SALU to VALU lowering that I am trying to get rid of.
The divergent "not" must be selected to the "V_NOT_B32_e32/64" otherwise we still have illegal VGPR to SGPR copies.
This happens because the divergent "not" node has divergent operands and their result will be likely in VGPR.
Also, we should select everything correctly first and can apply some peephole optimizations after.
In other words: we should not "cheat ourselves" during the selection. The selection should be done fairly corresponding to the node divergence bit.
Then we can apply the optimization in case it is safe.
Note that this is not the only case when we would like to further optimize the code after selection.
I'm planning to further add a separate pass for that.

We cannot solve the problem in the custom selection procedure because NOT node operand has not yet been selected and we do not know if it is SGPR or VGPR.
The only way, for now, is to post-process not(xor)/xor(not) in SIFixSGPRCopies. This may be considered a temporary hack until we have no proper pass for that.

SIInstrInfo::lowerScalarXnor() is dead after your patch and thus the patch has to remove it.

Then this is a clear regression, so if this requires a separate peephole later we need that peephole first and make sure the test does not regress.

Fri, Jan 7, 3:21 AM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

SIInstrInfo::lowerScalarXnor() is dead after your patch

I don't understand why it is dead. In general moveToVALU moves instructions to VALU if any of their inputs are VGPRs, which can happen even if the result is uniform -- e.g. if some of the inputs are derived from a floating point calculation which had to use VALU instructions.

Fri, Jan 7, 3:10 AM · Restricted Project

Thu, Jan 6

alex-t added inline comments to D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection.
Thu, Jan 6, 5:15 AM · Restricted Project
alex-t updated the diff for D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection.

odd COPY_TO_REGCLAS removed. Test updated.

Thu, Jan 6, 5:11 AM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

This looks like a regression in xnor.ll :

	s_not_b32 s0, s0                        	v_not_b32_e32 v0, v0
	v_xor_b32_e32 v0, s0, v0                        v_xor_b32_e32 v0, s4, v0

but it is not really. All the nodes in the example are divergent and the divergent ( xor, x -1) is selected to V_NOT_B32 as of https://reviews.llvm.org/D115884 has been committed.
S_NOT_B32 appears at the left because of the custom optimization that converts S_XNOR_B32 back to NOT (XOR) for the targets which have no V_XNOR. This optimization relies on the fact that if the NOT operand is SGPR and V_XOR_B32_e32 can accept SGPR as a first source operand.
I am not sure if it is always safe. The VALU instructions execution is controlled by the EXEC mask but SALU is not.

This is indeed a regression. It is always safe to keep s_not_b32 on SALU. Also note this effectively makes SIInstrInfo::lowerScalarXnor() useless. This is why XNOR was left behind by the D111907.

Thu, Jan 6, 3:39 AM · Restricted Project

Sun, Dec 26

alex-t updated the diff for D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection.

test file attributes corrected

Sun, Dec 26, 4:32 AM · Restricted Project
alex-t requested review of D116284: [AMDGPU] Enable divergence-driven 'ctpop' selection.
Sun, Dec 26, 4:30 AM · Restricted Project

Fri, Dec 24

alex-t committed rG8020458c5dc2: [AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1… (authored by alex-t).
[AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1…
Fri, Dec 24, 7:22 AM
alex-t closed D116241: [AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1 pattern.
Fri, Dec 24, 7:22 AM · Restricted Project
alex-t added a comment to D116270: [AMDGPU] Enable divergence-driven XNOR selection.

This looks like a regression in xnor.ll :

Fri, Dec 24, 7:19 AM · Restricted Project
alex-t updated the diff for D116270: [AMDGPU] Enable divergence-driven XNOR selection.

LIT test file attributes corrected

Fri, Dec 24, 6:57 AM · Restricted Project
alex-t requested review of D116270: [AMDGPU] Enable divergence-driven XNOR selection.
Fri, Dec 24, 6:55 AM · Restricted Project

Thu, Dec 23

alex-t updated the diff for D116241: [AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1 pattern.

test attributes corrected

Thu, Dec 23, 2:08 PM · Restricted Project
alex-t requested review of D116241: [AMDGPU] Changing S_AND_B32 to V_AND_B32_e64 in the divergent 'trunc' to i1 pattern.
Thu, Dec 23, 2:05 PM · Restricted Project

Wed, Dec 22

alex-t committed rGe4103c91f857: [AMDGPU] Select build_vector DAG nodes according to the divergence (authored by alex-t).
[AMDGPU] Select build_vector DAG nodes according to the divergence
Wed, Dec 22, 3:27 PM
alex-t closed D116187: [AMDGPU] Select build_vector DAG nodes according to the divergence.
Wed, Dec 22, 3:27 PM · Restricted Project
alex-t updated the summary of D116187: [AMDGPU] Select build_vector DAG nodes according to the divergence.
Wed, Dec 22, 1:46 PM · Restricted Project
alex-t requested review of D116187: [AMDGPU] Select build_vector DAG nodes according to the divergence.
Wed, Dec 22, 1:45 PM · Restricted Project

Dec 20 2021

alex-t committed rG19727e31fb2c: [AMDGPU] Enable divergence predicates for ctlz/cttz (authored by alex-t).
[AMDGPU] Enable divergence predicates for ctlz/cttz
Dec 20 2021, 9:52 AM
alex-t closed D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 9:51 AM · Restricted Project
alex-t updated the diff for D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.

test corrected

Dec 20 2021, 8:28 AM · Restricted Project
alex-t retitled D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz from [AMDGPU] Enable devergence predicates for ctlz/cttz to [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 8:15 AM · Restricted Project
alex-t updated the summary of D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 8:14 AM · Restricted Project
alex-t updated the summary of D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 8:14 AM · Restricted Project
alex-t updated the summary of D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 8:12 AM · Restricted Project
alex-t requested review of D116044: [AMDGPU] Enable divergence predicates for ctlz/cttz.
Dec 20 2021, 8:10 AM · Restricted Project
alex-t committed rG98d09705e15c: [AMDGPU] Re-enabling divergence predicates for min/max (authored by alex-t).
[AMDGPU] Re-enabling divergence predicates for min/max
Dec 20 2021, 5:09 AM
alex-t closed D115954: [AMDGPU] Re-enabling divergence predicates for min/max.
Dec 20 2021, 5:08 AM · Restricted Project
alex-t committed rG1448aa9dbdd9: [AMDGPU] Expand not pattern according to the XOR node divergence (authored by alex-t).
[AMDGPU] Expand not pattern according to the XOR node divergence
Dec 20 2021, 3:40 AM
alex-t closed D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.
Dec 20 2021, 3:40 AM · Restricted Project

Dec 17 2021

alex-t updated the summary of D115954: [AMDGPU] Re-enabling divergence predicates for min/max.
Dec 17 2021, 10:41 AM · Restricted Project
alex-t requested review of D115954: [AMDGPU] Re-enabling divergence predicates for min/max.
Dec 17 2021, 10:40 AM · Restricted Project
alex-t updated the diff for D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.

Uniform XOR src, -1 pattern removed and existing S_NOT_B32/64 patterns equipped with the divergence predicates

Dec 17 2021, 7:23 AM · Restricted Project
alex-t added inline comments to D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.
Dec 17 2021, 7:19 AM · Restricted Project
alex-t added inline comments to D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.
Dec 17 2021, 4:59 AM · Restricted Project

Dec 16 2021

alex-t updated the summary of D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.
Dec 16 2021, 9:05 AM · Restricted Project
alex-t requested review of D115884: [AMDGPU] Expand not pattern according to the XOR node divergence.
Dec 16 2021, 9:03 AM · Restricted Project

Nov 23 2021

alex-t committed rG9e03e8c99ec5: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection. (authored by alex-t).
[AMDGPU] Enable fneg and fabs divergence-driven instruction selection.
Nov 23 2021, 8:35 AM
alex-t closed D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..
Nov 23 2021, 8:35 AM · Restricted Project

Nov 22 2021

alex-t added a comment to D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..

GlobalISel tests were updated to make them really auto-generatable.
update_mir_test_checks.py doesn't work if the prefixes in different RUN lines are the same.

Nov 22 2021, 12:57 PM · Restricted Project
alex-t updated the summary of D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..
Nov 22 2021, 12:50 PM · Restricted Project
alex-t updated the diff for D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..

VOP2 forms changed to VOP3. Tests updated.

Nov 22 2021, 12:50 PM · Restricted Project
alex-t updated the diff for D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..

test changed to enable fp16 patterns for fp16-capable subtarget

Nov 22 2021, 7:49 AM · Restricted Project
alex-t added inline comments to D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..
Nov 22 2021, 7:22 AM · Restricted Project

Nov 19 2021

alex-t retitled D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection. from [AMDGPU] Enable fneg and fabs divergence-deriven instruction selection. to [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..
Nov 19 2021, 9:35 AM · Restricted Project
alex-t requested review of D114257: [AMDGPU] Enable fneg and fabs divergence-driven instruction selection..
Nov 19 2021, 9:31 AM · Restricted Project

Nov 3 2021

alex-t committed rG0a3d755ee9fc: [AMDGPU] Enable divergence-driven BFE selection (authored by alex-t).
[AMDGPU] Enable divergence-driven BFE selection
Nov 3 2021, 1:25 PM
alex-t closed D110950: [AMDGPU] Enable divergence-driven BFE selection.
Nov 3 2021, 1:25 PM · Restricted Project

Nov 2 2021

alex-t updated the diff for D110950: [AMDGPU] Enable divergence-driven BFE selection.

in getBFE32 IsDivergent argument removed. BFE node divergence evolved from its variable operand.

Nov 2 2021, 4:11 PM · Restricted Project
alex-t updated the diff for D110950: [AMDGPU] Enable divergence-driven BFE selection.

test updated to check constants, no else after return.

Nov 2 2021, 2:59 PM · Restricted Project
alex-t added inline comments to D110950: [AMDGPU] Enable divergence-driven BFE selection.
Nov 2 2021, 2:32 PM · Restricted Project

Oct 31 2021

alex-t updated the diff for D110950: [AMDGPU] Enable divergence-driven BFE selection.

sign_extend_inreg pattterns added
getS_BFE and getV_BFE replaced with the unified getBFE32 function

Oct 31 2021, 1:37 PM · Restricted Project

Oct 19 2021

alex-t added a reviewer for D112060: [NARY-REASSOCIATE] Fix infinite recursion optimizing min\max: alex-t.
Oct 19 2021, 9:17 AM · Restricted Project

Oct 1 2021

alex-t requested review of D110950: [AMDGPU] Enable divergence-driven BFE selection.
Oct 1 2021, 11:05 AM · Restricted Project

Sep 28 2021

alex-t accepted D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

The change LGTM
It follows the currently accepted way of copying SCC.
And it does nothing except replacing the one incorrect SCC copying with the correct one.

Sep 28 2021, 9:45 AM · Restricted Project
alex-t added a comment to D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

The problem is "COPY from SCC" by itself is not a semantically meaningful concept. We can make up whatever we want. I think ZeroOrOneBooleanContent is a better choice, since it's not fighting an uphill battle for optimization priority, and there really is only one bit. Places that semantically need to use -1 can emit the select directly.

Sep 28 2021, 9:35 AM · Restricted Project
alex-t added a comment to D109889: AMDGPU: Broadcast scalar boolean to vector boolean explicitly.

Overall I think we should not have contexts where copy from SCC is being used as a broadcast to a vector boolean. I think these only arise as a side effect of the hacky way SIFixSGPRCopies rewrites the function instruction at a time

Sep 28 2021, 9:25 AM · Restricted Project

Sep 21 2021

alex-t committed rG1a33294652b2: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC (authored by alex-t).
[AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC
Sep 21 2021, 11:18 AM
alex-t closed D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.
Sep 21 2021, 11:18 AM · Restricted Project

Sep 20 2021

alex-t updated the diff for D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.

Test changed to end-to-end variant.
Minor change regarding the variable name.

Sep 20 2021, 3:40 AM · Restricted Project
alex-t added a comment to D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.

IMO, it is more useful to use an end-to-end test ( from LLVM IR to assembly). We do lots of work scattered in different places to deal with boolean values. Things may change in the future, and we may move this logic to other passes.

Sep 20 2021, 3:33 AM · Restricted Project

Sep 17 2021

alex-t updated the diff for D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.
  • detailed summary
  • MIR test added
  • formatting corrected
Sep 17 2021, 12:16 PM · Restricted Project
alex-t updated the summary of D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.
Sep 17 2021, 10:16 AM · Restricted Project

Sep 16 2021

alex-t added reviewers for D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC: piotr, ruiling.
Sep 16 2021, 10:35 AM · Restricted Project
alex-t requested review of D109900: [AMDGPU] Filtering out the inactive lanes bits when lowering copy to SCC.
Sep 16 2021, 10:33 AM · Restricted Project

Sep 2 2021

alex-t accepted D109099: [AMDGPU] Use S_BITCMP0_* to replace AND in optimizeCompareInstr.

LGTM

Sep 2 2021, 8:02 AM · Restricted Project

Sep 1 2021

alex-t committed rGe3cbf1d43741: [AMDGPU] enable scalar compare in truncate selection (authored by alex-t).
[AMDGPU] enable scalar compare in truncate selection
Sep 1 2021, 1:35 PM
alex-t closed D108925: [AMDGPU] enable scalar compare in truncate selection.
Sep 1 2021, 1:35 PM · Restricted Project

Aug 31 2021

alex-t added inline comments to D108925: [AMDGPU] enable scalar compare in truncate selection.
Aug 31 2021, 7:33 AM · Restricted Project

Aug 30 2021

alex-t added a reviewer for D108925: [AMDGPU] enable scalar compare in truncate selection: critson.
Aug 30 2021, 10:34 AM · Restricted Project
alex-t requested review of D108925: [AMDGPU] enable scalar compare in truncate selection.
Aug 30 2021, 10:33 AM · Restricted Project

Aug 25 2021

alex-t committed rGed0f4415f002: [AMDGPU] Divergence-driven compare operations instruction selection (authored by alex-t).
[AMDGPU] Divergence-driven compare operations instruction selection
Aug 25 2021, 8:30 AM
alex-t closed D106079: [AMDGPU] Divergence-driven compare operations instruction selection.
Aug 25 2021, 8:30 AM · Restricted Project

Aug 19 2021

alex-t updated the diff for D106079: [AMDGPU] Divergence-driven compare operations instruction selection.

Unused variable initialization removed.

Aug 19 2021, 1:45 PM · Restricted Project
alex-t added a comment to D106079: [AMDGPU] Divergence-driven compare operations instruction selection.

Following the discussion regarding uniform "setcc" with a divergent use:
In the current implementation of the divergence-driven ISel node is selected only depending on the divergence bit value regardless of VALU or SALU uses.
The latter is not necessarily related to the user divergence. The user may be uniform but selected to VALU just because corresponding SALU instruction does not exist.
As mentioned above, the alternative approach is to select the given node to VALU if it has VALU users. At first, this requires some reasonable heuristic.
Let's say we have a uniform SDNode that has 1 VALU but 10 SALU users. Is it profitable to select it to VALU?
The selection hook that looks ahead for the users needs to be controlled by the option to try different heuristics.
Also, the problem in question is common for all the opcodes - not the "setcc" only. Thus, adding such a hook would require changing all the places in the target where the divergence bit is currently checked. That is why I insist this should go to a separate patch.

Aug 19 2021, 1:10 PM · Restricted Project

Jul 22 2021

alex-t added inline comments to D106079: [AMDGPU] Divergence-driven compare operations instruction selection.
Jul 22 2021, 8:27 AM · Restricted Project

Jul 19 2021

alex-t added inline comments to D106079: [AMDGPU] Divergence-driven compare operations instruction selection.
Jul 19 2021, 11:02 AM · Restricted Project
alex-t updated the diff for D106079: [AMDGPU] Divergence-driven compare operations instruction selection.

Formatting fixed

Jul 19 2021, 10:52 AM · Restricted Project

Jul 17 2021

alex-t added inline comments to D106079: [AMDGPU] Divergence-driven compare operations instruction selection.
Jul 17 2021, 7:44 AM · Restricted Project
alex-t added inline comments to D106079: [AMDGPU] Divergence-driven compare operations instruction selection.
Jul 17 2021, 7:39 AM · Restricted Project
alex-t updated the diff for D106079: [AMDGPU] Divergence-driven compare operations instruction selection.

Fixed amdgpu-codegenprepare-idiv.ll test regression + several minor fixes.

Jul 17 2021, 7:27 AM · Restricted Project

Jul 15 2021

alex-t requested review of D106079: [AMDGPU] Divergence-driven compare operations instruction selection.
Jul 15 2021, 9:47 AM · Restricted Project

Jun 30 2021

alex-t committed rGe585b332e423: [AMDGPU] PHI node cost should not be counted for the size and latency. (authored by alex-t).
[AMDGPU] PHI node cost should not be counted for the size and latency.
Jun 30 2021, 6:11 AM
alex-t closed D105104: [AMDGPU] PHI node cost should not be counted for the size and latency..
Jun 30 2021, 6:11 AM · Restricted Project
alex-t abandoned D105186: [AMDGPU] PHI node cost should not be counted for the size and latency..
Jun 30 2021, 6:10 AM · Restricted Project
alex-t updated the diff for D105104: [AMDGPU] PHI node cost should not be counted for the size and latency..

:

Jun 30 2021, 6:08 AM · Restricted Project