This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Fix shr/and pair replace with bfe
ClosedPublic

Authored by AidanBeltonS on Jan 12 2022, 7:40 AM.

Details

Summary

This PR changes the types of two unsigned 64-bit ints to signed 64-bit ints.
This fixes an issue where GoodBits should be a negative value, however due to the type being unsigned it becomes a very large positive number.
The replacement causes a bug where shr/and pairs are incorrectly replaced with bfe. Despite the shr shifting all values to become the same value as the signed bit.

Two checks are added to test that the correct behaviour with too large shifts is occurring.

Diff Detail

Event Timeline

AidanBeltonS created this revision.Jan 12 2022, 7:40 AM
AidanBeltonS requested review of this revision.Jan 12 2022, 7:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 12 2022, 7:40 AM
npmiller added a reviewer: tra.Feb 9 2023, 5:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 9 2023, 5:42 AM
tra added a comment.Feb 9 2023, 11:04 AM

Can you, please, update the patch with more context (e.g. git diff -U999999), as described in LLVM documentation here: https://llvm.org/docs/Phabricator.html#requesting-a-review-via-the-web-interface

tra added a comment.Feb 9 2023, 11:24 AM

Does your patch make any difference for the first test case? AFACIT, it currently does not produce bfe: https://godbolt.org/z/94qG3E4xM

In the second example, we do produce bfe.u64 %rd2, %rd1, 63, 3; which extracts the field with upper bits outside of the input. AFAICT, PTX manual says (https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#integer-arithmetic-instructions-bfe) that in cases like this, extracted field will replicate the MSB of the input value, which is indeed not what we want here.

Does your patch make any difference for the first test case? AFACIT, it currently does not produce bfe: https://godbolt.org/z/94qG3E4xM

It does not affect the first case indeed. This can be either updated to use signed Int64 to cause the overflow and properly test this change, or removed leaving only the second test case which is the relevant one to testing this change.

@tra I think removing the current bfe3 test and leaving bfe4 as the only test, as well as adding more context to the diff should be sufficient completion of this patch. What do you think?

Thanks!

tra added a comment.Feb 15 2023, 10:45 AM

It's OK to keep both 32-bit and 64-bit variants. Even if we already generate correct code for 32-bit variant, it's still useful to make sure we do not regress in the future.

llvm/test/CodeGen/NVPTX/bfe.ll
34

CHECK-LABEL:

I'd also give it a more meaningful name. no_bfe_on_32bit_oveflow.

36

This would never match as the actual instruction would be bfe.u32

46

This will also never match. The instruction will be bfe.u64 and the registers are 64-bit %r*d*.

Please run the tests with unmodified llvm and make sure it does fail when it detects bfe.

As it happens, it will still fail when it would fail to find shr, but it would still be great to correctly test for what we intended to test.
As an alternative, you could simplify it all to CHECK-NOT: bfe

gmirazchiyski added a comment.EditedFeb 16 2023, 9:49 AM

I can simplify this but I don't feel it's ideal to strip the checks to a plain CHECK-NOT: bfe only, because the test is intended to test that bfe is not replacing shr/and pair due to the overflow issue.
For that reason I've split the test in two identical labels but one only checking for no bfe and the other for the presence of the shr/and pair.

So the following version I think should be good.

; CHECK-LABEL: no_bfe_on_64bit_overflow
define i64 @no_bfe_on_64bit_overflow(i64 %a) {
; CHECK-NOT: bfe.u64 %rd{{[0-9]+}}, %rd{{[0-9]+}}, 63, 3
  %val0 = ashr i64 %a, 63
  %val1 = and i64 %val0, 7
  ret i64 %val1
}

; CHECK-LABEL: no_bfe_on_64bit_overflow_shr_and_pair
define i64 @no_bfe_on_64bit_overflow_shr_and_pair(i64 %a) {
; CHECK: shr.s64 %rd{{[0-9]+}}, %rd{{[0-9]+}}, 63
; CHECK: and.b64 %rd{{[0-9]+}}, %rd{{[0-9]+}}, 7
  %val0 = ashr i64 %a, 63
  %val1 = and i64 %val0, 7
  ret i64 %val1
}

While not the prettiest, this we we at least should be able to properly test the complete thing as we intended.
With the applied changes, the assembly for the 64-bit test look like:

	// .globl	no_bfe_on_64bit_overflow // -- Begin function no_bfe_on_64bit_overflow
.visible .func  (.param .b64 func_retval0) no_bfe_on_64bit_overflow(
	.param .b64 no_bfe_on_64bit_overflow_param_0
)                                       // @no_bfe_on_64bit_overflow
{
	.reg .b64 	%rd<4>;

// %bb.0:
	ld.param.u64 	%rd1, [no_bfe_on_64bit_overflow_param_0];
	shr.s64 	%rd2, %rd1, 63;
	and.b64  	%rd3, %rd2, 7;
	st.param.b64 	[func_retval0+0], %rd3;
	ret;
                                        // -- End function
}

So everything will be tested correctly and if it ever regresses again will report nicely but at the cost of duplicating the labels in order for us to be able to run the CHECK-NOT exclusively.

@tra Thanks for your feedback! What do you think about this approach instead of just simplifying the test?

llvm/test/CodeGen/NVPTX/bfe.ll
34

Sure! Looks more clear that way.

36

Yeah, thanks! This needed changed to bfe.u32 and the check for the 64-bit test-case changed to bfe.u64 to match.

46

This will also never match. The instruction will be bfe.u64 and the registers are 64-bit %r*d*.

Good catch!
Yes, the expression also wasn't correct as you pointed out. I checked exactly what to match the produced bfe to and it has to be as follows:

; CHECK-NOT: bfe.u64 %rd{{[0-9]+}}, %rd{{[0-9]+}}, 63, 3

Please run the tests with unmodified llvm and make sure it does fail when it detects bfe.

I have run those and it does fail when bfe is detected. Without the llvm changes, bfe.u64 is produced and unfortunately failed by reporting it's not being able find shr rather than because the bfe instruction was present.

That is not great because we're not communicating the issue too well in case this ever regresses in the future. This is because CHECKs seem to take precedence over CHECK-NOTs.

tra added a comment.Apr 20 2023, 1:42 PM

What do you think about this approach instead of just simplifying the test?

SGTM.

llvm/test/CodeGen/NVPTX/bfe.ll
46

This is because CHECKs seem to take precedence over CHECK-NOTs.

I guess it's a fundamental difference between proving that something exists and proving that something does not exist.
Proving a negative is hard as it has to be done on all possible locations we're interested in, while a positive check needs to match only once, anywhere.
So, in order for a negative checker to work, we need to define the area which we'll be searching. That area is determined by the surrounding positive checks, so they have to be matched first.

I usually suggest using -NOT checks in a separate test run. Mixing them with positive checks is often tricky in non-obvious ways.

@tra The revision has been updated. Does it look like enough at this point? Thanks!

tra accepted this revision.Jul 10 2023, 11:49 AM
This revision is now accepted and ready to land.Jul 10 2023, 11:49 AM
This revision was automatically updated to reflect the committed changes.