This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Don't merge DS opcodes on SI to fix corruption in Hitman
AbandonedPublic

Authored by mareko on Oct 4 2018, 2:18 PM.

Details

Reviewers
arsenm
nhaehnle

Diff Detail

Event Timeline

mareko created this revision.Oct 4 2018, 2:18 PM
arsenm requested changes to this revision.Oct 4 2018, 7:02 PM

This isn't a correct fix. If there's an issue with 64-bit DS instructions, it's a lowering problem. If we can't use them for some reason, changing this here might be a helpful heuristic but as-is this is not a real fix

This revision now requires changes to proceed.Oct 4 2018, 7:02 PM

This isn't a correct fix. If there's an issue with 64-bit DS instructions, it's a lowering problem. If we can't use them for some reason, changing this here might be a helpful heuristic but as-is this is not a real fix

So what are you suggesting?

What is it exactly that breaks with the merging?

mareko added a comment.Oct 5 2018, 1:33 PM

Multi-dword LDS opcodes seem to be the culprit.

arsenm added a comment.Oct 8 2018, 6:17 PM

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

mareko added a comment.Oct 8 2018, 6:25 PM

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

arsenm added a comment.Oct 8 2018, 7:08 PM

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Are you saying that on SI, if the base address (from VGPR) is negative, but (base address + offset) is in range, the instruction won't execute correctly? Is there documentation on this somewhere?

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive

Are you saying that on SI, if the base address (from VGPR) is negative, but (base address + offset) is in range, the instruction won't execute correctly? Is there documentation on this somewhere?

The problem is specifically on SI the adder for the offset is only 16-bit, so if a carry happens it computes the wrong address. The overly strong condition we use for this is that the base address is known positive (see isDSOffsetLegal)

That was a good hint.

It turns out that there is a shader which unconditionally loads from lds_array[n] and lds_array[n+1] in a loop that starts with n == -1...

This patch should be superceded by D53160.

mareko abandoned this revision.Nov 12 2018, 7:37 PM

Why exactly? Is it possible the Mesa lds allocation isn’t aligning properly?

Since the issue only occurs on SI, I don't think Mesa is doing anything bad. Unless there is some LDS hw difference on SI...

I do remember one bug we have that may be related. We try to use the ds_read2_b32 with 4-byte signed trick on SI, without checking that we can use the offsets if the base address isn't known positive