This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Implement copy to scc with s_bitcmp1_b32
Needs ReviewPublic

Authored by arsenm on Jul 20 2020, 5:32 PM.

Details

Summary

This avoids a dependence on the high bits which I was never entirely
comfortable with. This should also allow folding out some clamping to
a single bit in the future.

Diff Detail

Event Timeline

arsenm created this revision.Jul 20 2020, 5:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 20 2020, 5:32 PM
piotr added a comment.EditedJul 21 2020, 9:35 AM

Hmm, I am getting Vulkan CTS failures with this patch. I'll try to gather more information.

Hmm, I am getting Vulkan CTS failures with this patch. I'll try to gather more information.

Did you ever solve this?

piotr added a comment.Aug 11 2020, 8:30 AM

Yes, this patch is not safe in general, because in some cases isel produces patterns where all bits of an SCC source are significant.

I think this can be related to how the lowering of boolean operations works (XOR/AND/OR) and the fact that the true value gets at times confused (1 becomes -1).

Already noticed a related problem in SITargetLowering::performAddCombine where ZERO_EXTEND as an operand of ADD is dropped and that leads to incorrect code (true becoming -1). I am working on fixing that.

Anyway, in the context of this patch since I realized the high bits cannot be currently ignored I submitted a fix for 64-bit copy to SCC in D85207.

Yes, this patch is not safe in general, because in some cases isel produces patterns where all bits of an SCC source are significant.

I think this can be related to how the lowering of boolean operations works (XOR/AND/OR) and the fact that the true value gets at times confused (1 becomes -1).

I think we need to just stop treating scc as a 1-bit register, and start treating it as 32-bit. This is what GlobalISel is doing, and makes more sense in general. 1 bit values are always a vector mask, and 32-bit booleans are 32-bit copies from scc