- User Since
- Jul 26 2016, 7:17 AM (286 w, 2 d)
DAG after the transformation and constant folding
This can be selected to v_perm_b32
condition was made more readable
Mon, Jan 17
DAG combiner hook added to control divergence-driven peephole optimizatoins.
Thu, Jan 13
In general, I don't like the idea of making the DAGCombiner::reassociateOpsCommutative take into account the divergence.
Tue, Jan 11
I think it is worth trying to change this generic combine to give up if x is uniform and y is divergent.
Mon, Jan 10
Fri, Jan 7
Added postprocessing of the selected machine IR. This makes it on par with the existing selection mechanism.
Thu, Jan 6
odd COPY_TO_REGCLAS removed. Test updated.
Sun, Dec 26
test file attributes corrected
Fri, Dec 24
This looks like a regression in xnor.ll :
LIT test file attributes corrected
Thu, Dec 23
test attributes corrected
Wed, Dec 22
Dec 20 2021
Dec 17 2021
Uniform XOR src, -1 pattern removed and existing S_NOT_B32/64 patterns equipped with the divergence predicates
Dec 16 2021
Nov 23 2021
Nov 22 2021
GlobalISel tests were updated to make them really auto-generatable.
update_mir_test_checks.py doesn't work if the prefixes in different RUN lines are the same.
VOP2 forms changed to VOP3. Tests updated.
test changed to enable fp16 patterns for fp16-capable subtarget
Nov 19 2021
Nov 3 2021
Nov 2 2021
in getBFE32 IsDivergent argument removed. BFE node divergence evolved from its variable operand.
test updated to check constants, no else after return.
Oct 31 2021
sign_extend_inreg pattterns added
getS_BFE and getV_BFE replaced with the unified getBFE32 function
Oct 19 2021
Oct 1 2021
Sep 28 2021
The change LGTM
It follows the currently accepted way of copying SCC.
And it does nothing except replacing the one incorrect SCC copying with the correct one.
The problem is "COPY from SCC" by itself is not a semantically meaningful concept. We can make up whatever we want. I think ZeroOrOneBooleanContent is a better choice, since it's not fighting an uphill battle for optimization priority, and there really is only one bit. Places that semantically need to use -1 can emit the select directly.
Overall I think we should not have contexts where copy from SCC is being used as a broadcast to a vector boolean. I think these only arise as a side effect of the hacky way SIFixSGPRCopies rewrites the function instruction at a time
Sep 21 2021
Sep 20 2021
Test changed to end-to-end variant.
Minor change regarding the variable name.
Sep 17 2021
- detailed summary
- MIR test added
- formatting corrected
Sep 16 2021
Sep 2 2021
Sep 1 2021
Aug 31 2021
Aug 30 2021
Aug 25 2021
Aug 19 2021
Unused variable initialization removed.
Following the discussion regarding uniform "setcc" with a divergent use:
In the current implementation of the divergence-driven ISel node is selected only depending on the divergence bit value regardless of VALU or SALU uses.
The latter is not necessarily related to the user divergence. The user may be uniform but selected to VALU just because corresponding SALU instruction does not exist.
As mentioned above, the alternative approach is to select the given node to VALU if it has VALU users. At first, this requires some reasonable heuristic.
Let's say we have a uniform SDNode that has 1 VALU but 10 SALU users. Is it profitable to select it to VALU?
The selection hook that looks ahead for the users needs to be controlled by the option to try different heuristics.
Also, the problem in question is common for all the opcodes - not the "setcc" only. Thus, adding such a hook would require changing all the places in the target where the divergence bit is currently checked. That is why I insist this should go to a separate patch.
Jul 22 2021
Jul 19 2021
Jul 17 2021
Fixed amdgpu-codegenprepare-idiv.ll test regression + several minor fixes.
Jul 15 2021
Jun 30 2021