This is an archive of the discontinued LLVM Phabricator instance.

[TargetLowering] Update optimization to check for boolean content
ClosedPublic

Authored by ErikHogeman on Oct 14 2020, 6:28 AM.

Details

Summary

Updates an optimization that relies on boolean contents being either 0 or 1 to properly check for this before triggering.

The following:
(X & 8) != 0 --> (X & 8) >> 3
Produces unexpected results when a boolean 'true' value is represented by negative one.

Diff Detail

Event Timeline

ErikHogeman created this revision.Oct 14 2020, 6:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 14 2020, 6:28 AM
ErikHogeman requested review of this revision.Oct 14 2020, 6:28 AM

I added you (Craig) since you have done similar changes fairly recently in this file, but let me know if you think I should include someone else.

Looks like we have a couple of in-tree targets where this will make a difference, so we should add a test.

Try something like this with -mtriple=nvptx ?

define i32 @pow2_mask_cmp(i32 %x) {
  %a = and i32 %x, 8
  %cmp = icmp ne i32 %a, 0
  %r = zext i1 %cmp to i32
  ret i32 %r
}

Looks like we have a couple of in-tree targets where this will make a difference, so we should add a test.

Try something like this with -mtriple=nvptx ?

define i32 @pow2_mask_cmp(i32 %x) {
  %a = and i32 %x, 8
  %cmp = icmp ne i32 %a, 0
  %r = zext i1 %cmp to i32
  ret i32 %r
}

Thanks for your comment. I already spent quite some time trying write a test, it seems pretty difficult to write a small one. I think in your suggested example we still end up getting correct code when this transform triggers, the generated code produces 0 or 1 but that's expected since we are zero extending the i1 result I guess? We do end up generating slightly different code with this patch, so I could write a test that fails before and passes afterwards, but the new code still produces 0 or 1, so not sure that would really be a valid test case then, since the original code is still technically correct?

To see this bug I suspect you might need a SETCC DAG node that has a type larger than i1. The case that I saw that made me write this patch was a 32-bit SETCC later used by an arithmetic instruction (SUB). This was constructed by several other DAG transforms triggering first.
I tried to modify this test a bit to hopefully get a similar pattern to trigger, for example changing the zext to a sext hoping that would result in the SETCC being promoted to 32 bits and then triggering the bug, but it didn't work unfortunately.

I can spend some more time if you think it's important to have a test for this, but it seems non-trivial to me to write one.

Looks like we have a couple of in-tree targets where this will make a difference, so we should add a test.

Try something like this with -mtriple=nvptx ?

define i32 @pow2_mask_cmp(i32 %x) {
  %a = and i32 %x, 8
  %cmp = icmp ne i32 %a, 0
  %r = zext i1 %cmp to i32
  ret i32 %r
}

Thanks for your comment. I already spent quite some time trying write a test, it seems pretty difficult to write a small one. I think in your suggested example we still end up getting correct code when this transform triggers, the generated code produces 0 or 1 but that's expected since we are zero extending the i1 result I guess? We do end up generating slightly different code with this patch, so I could write a test that fails before and passes afterwards, but the new code still produces 0 or 1, so not sure that would really be a valid test case then, since the original code is still technically correct?

I agree that the best case would demonstrate a miscompile, but if my suggestion still shows a diff, that's better than nothing. We want to make sure that the code doesn't accidentally change back without someone noticing. If you add the test with a code comment describing what we have discussed here, it should meet the basic need. Even better - add the test first to show what we currently generate, then apply this patch and update/rebase the FileCheck lines. Let me know if that makes sense and/or you need help creating/committing the test file.

Looks like we have a couple of in-tree targets where this will make a difference, so we should add a test.

Try something like this with -mtriple=nvptx ?

define i32 @pow2_mask_cmp(i32 %x) {
  %a = and i32 %x, 8
  %cmp = icmp ne i32 %a, 0
  %r = zext i1 %cmp to i32
  ret i32 %r
}

Thanks for your comment. I already spent quite some time trying write a test, it seems pretty difficult to write a small one. I think in your suggested example we still end up getting correct code when this transform triggers, the generated code produces 0 or 1 but that's expected since we are zero extending the i1 result I guess? We do end up generating slightly different code with this patch, so I could write a test that fails before and passes afterwards, but the new code still produces 0 or 1, so not sure that would really be a valid test case then, since the original code is still technically correct?

I agree that the best case would demonstrate a miscompile, but if my suggestion still shows a diff, that's better than nothing. We want to make sure that the code doesn't accidentally change back without someone noticing. If you add the test with a code comment describing what we have discussed here, it should meet the basic need. Even better - add the test first to show what we currently generate, then apply this patch and update/rebase the FileCheck lines. Let me know if that makes sense and/or you need help creating/committing the test file.

Alright, that sounds good to me. I will give that a try then, and let you know if I have any issues. Thanks a lot!

I took at look, and there is one thing I thought I should clarify with you before I push another diff. So the code looks like this before applying this patch:

ld.param.u32    %r1, [pow2_mask_cmp_param_0];
bfe.u32     %r2, %r1, 3, 1;
st.param.b32    [func_retval0+0], %r2;
ret;

After this patch is applied:

ld.param.u32    %r1, [pow2_mask_cmp_param_0];
and.b32     %r2, %r1, 8;
setp.ne.s32     %p1, %r2, 0;
selp.u32    %r3, 1, 0, %p1;
st.param.b32    [func_retval0+0], %r3;
ret;

So it looks like we generate more instructions when disabling this transformation, which is unfortunate.
But I'm thinking now, this transformation is probably correct for setcc nodes with i1 type regardless of the boolean content, so maybe I should update the patch to only check the boolean contents when the type has a larger bitsize than 1?
That would make sure we still generate the code as in the first case above, but it would mean I'm back to not being sure how to write a good test for this.
Any thoughts?

I took at look, and there is one thing I thought I should clarify with you before I push another diff. So the code looks like this before applying this patch:

ld.param.u32    %r1, [pow2_mask_cmp_param_0];
bfe.u32     %r2, %r1, 3, 1;
st.param.b32    [func_retval0+0], %r2;
ret;

After this patch is applied:

ld.param.u32    %r1, [pow2_mask_cmp_param_0];
and.b32     %r2, %r1, 8;
setp.ne.s32     %p1, %r2, 0;
selp.u32    %r3, 1, 0, %p1;
st.param.b32    [func_retval0+0], %r3;
ret;

So it looks like we generate more instructions when disabling this transformation, which is unfortunate.
But I'm thinking now, this transformation is probably correct for setcc nodes with i1 type regardless of the boolean content, so maybe I should update the patch to only check the boolean contents when the type has a larger bitsize than 1?
That would make sure we still generate the code as in the first case above, but it would mean I'm back to not being sure how to write a good test for this.
Any thoughts?

Let's add the nvptx test with current codegen as a preliminary step. That way, we'll at least have better coverage for today's behavior.
Then, it's a question of how to expose the buggy behavior. I'm not familiar with nvptx; can we make a larger test that would still show a diff? Is your motivating example/target public, or can we adapt the larger example to work with nvptx?
There's another way to improve this whole block and probably demonstrate the bug on multiple in-tree targets if you're up for it: fix the TODO to allow vector types.
If none of the above, then we can continue as originally proposed without a regression test.

Let's add the nvptx test with current codegen as a preliminary step. That way, we'll at least have better coverage for today's behavior.
Then, it's a question of how to expose the buggy behavior. I'm not familiar with nvptx; can we make a larger test that would still show a diff? Is your motivating example/target public, or can we adapt the larger example to work with nvptx?
There's another way to improve this whole block and probably demonstrate the bug on multiple in-tree targets if you're up for it: fix the TODO to allow vector types.
If none of the above, then we can continue as originally proposed without a regression test.

The original IR that triggered this issue is very large, and unfortunately for a target not checked in upstream, so cannot be used as-is for a test. I have tried cutting it down a bit and keep the relevant parts but I have not been able to get the bug to trigger, either the SETCC is rewritten in some way early or it stays as an i1 type and doesn't trigger the bug. I am also not too familiar with nvptx so not sure what conditions would trigger the SETCC to be rewritten to a larger type. I spent a bit more time trying to cut down the original problematic IR but I'm struggling to reproduce it.
If there was a way to write unit tests directly for the selection DAG (so you could specify exactly which DAG node patterns to test) it would have been easy to write a test I think.

Sorry just to clarify that I understood you correctly, would you prefer that I add a test with the nvptx code using a SETCC with i1 type, or would you rather I change the patch to not abort on i1 types since that resulted in more instructions in the example you posted?
Uploading this example as a test works for me, but I'm slightly concerned that it seems to generate more instructions now in a case where it should not have to do that.

The original IR that triggered this issue is very large, and unfortunately for a target not checked in upstream, so cannot be used as-is for a test. I have tried cutting it down a bit and keep the relevant parts but I have not been able to get the bug to trigger, either the SETCC is rewritten in some way early or it stays as an i1 type and doesn't trigger the bug. I am also not too familiar with nvptx so not sure what conditions would trigger the SETCC to be rewritten to a larger type. I spent a bit more time trying to cut down the original problematic IR but I'm struggling to reproduce it.

Thanks for experimenting with it. Let's just proceed with what we have.

Sorry just to clarify that I understood you correctly, would you prefer that I add a test with the nvptx code using a SETCC with i1 type, or would you rather I change the patch to not abort on i1 types since that resulted in more instructions in the example you posted?
Uploading this example as a test works for me, but I'm slightly concerned that it seems to generate more instructions now in a case where it should not have to do that.

IIUC, we can do it as 2 patches: (1) add the boolean content clause to cause the extra instructions to show up and (2) add the i1 clause to restore it to optimal. We end up where we started for nvptx, but we still have commit history and this review to document the logic.

The original IR that triggered this issue is very large, and unfortunately for a target not checked in upstream, so cannot be used as-is for a test. I have tried cutting it down a bit and keep the relevant parts but I have not been able to get the bug to trigger, either the SETCC is rewritten in some way early or it stays as an i1 type and doesn't trigger the bug. I am also not too familiar with nvptx so not sure what conditions would trigger the SETCC to be rewritten to a larger type. I spent a bit more time trying to cut down the original problematic IR but I'm struggling to reproduce it.

Thanks for experimenting with it. Let's just proceed with what we have.

Sorry just to clarify that I understood you correctly, would you prefer that I add a test with the nvptx code using a SETCC with i1 type, or would you rather I change the patch to not abort on i1 types since that resulted in more instructions in the example you posted?
Uploading this example as a test works for me, but I'm slightly concerned that it seems to generate more instructions now in a case where it should not have to do that.

IIUC, we can do it as 2 patches: (1) add the boolean content clause to cause the extra instructions to show up and (2) add the i1 clause to restore it to optimal. We end up where we started for nvptx, but we still have commit history and this review to document the logic.

Alright, that sounds good to me. Thanks a lot for the feedback.
Then I will update this diff with the example from earlier as an nvptx test, then once this has been committed I will open a second diff (adding you again), updating the patch to check for the i1 case, and update the test for that as well.

Added test according to comments.

spatel accepted this revision.Oct 20 2020, 7:17 AM

LGTM - although it would be nicer to add the test first (without this patch), so we see the difference in output.

This revision is now accepted and ready to land.Oct 20 2020, 7:17 AM

LGTM - although it would be nicer to add the test first (without this patch), so we see the difference in output.

Ah right, I assumed it would be fine since the instruction diff will be visible in the next patch I will upload.
If you prefer it I can submit the test first and then the patch as a follow-up, and then a third commit that improves the condition.

LGTM - although it would be nicer to add the test first (without this patch), so we see the difference in output.

Ah right, I assumed it would be fine since the instruction diff will be visible in the next patch I will upload.
If you prefer it I can submit the test first and then the patch as a follow-up, and then a third commit that improves the condition.

Yeah, I like the layered commits, so it's clear how the code is changing with the compiler change.

Adds the test without the patch, to make it explicit what code was generated before.

Adds the patch as a diff on top of the test, so that the new code diff becomes more visible.

Added two new diffs, let me know if this is what you had in mind, and in that case I can commit these two diffs as two separate commits.

Added two new diffs, let me know if this is what you had in mind, and in that case I can commit these two diffs as two separate commits.

Yes - LGTM.