This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU : Add V_SAD_U32 instruction pattern and it's corresponding LIT tests.
ClosedPublic

Authored by wdng on Aug 2 2016, 10:06 AM.

Details

Summary

AMDGPU : Add V_SAD_U32 instruction patterns and it's corresponding LIT tests.

Diff Detail

Repository
rL LLVM

Event Timeline

wdng updated this revision to Diff 66496.Aug 2 2016, 10:06 AM
wdng retitled this revision from to AMDGPU : Add V_SAD_U32 instruction patterns and it's corresponding LIT tests..
wdng updated this object.
wdng added reviewers: tstellarAMD, arsenm.
wdng set the repository for this revision to rL LLVM.
wdng retitled this revision from AMDGPU : Add V_SAD_U32 instruction patterns and it's corresponding LIT tests. to AMDGPU : Add V_SAD_U32 instruction pattern and it's corresponding LIT tests..Aug 2 2016, 10:13 AM
test/CodeGen/AMDGPU/sad.ll
18–19 ↗(On Diff #66496)

I think we should also do a test case that selects from the result of the two sub operations, because I think this will give you different DagNodes. For example:

%sub0 = sub i32 %a, %b
%sub1 = sub i32 %b, %a
%ret0 = select i1 %icmp0, i32 %sub0, i32 %sub1

wdng updated this revision to Diff 66535.Aug 2 2016, 1:04 PM
wdng removed rL LLVM as the repository for this revision.
wdng marked an inline comment as done.

Add a second pattern for V_SAD_U32 based on Tom's comments.

wdng updated this revision to Diff 66539.Aug 2 2016, 1:19 PM

Add a line break to reformat the V_SAD_U32 pattern.

arsenm edited edge metadata.Aug 2 2016, 6:58 PM

You should add more tests using i32 vectors, as well as i8/i16 types. The packed sads won't be matched yet, but you should still see the legalized to i32 select to this

test/CodeGen/AMDGPU/sad.ll
1 ↗(On Diff #66539)

Remove the -mcpu=SI

4 ↗(On Diff #66539)

This should be GCN. You should also check the operands

arsenm added a comment.Aug 2 2016, 8:07 PM

You should also test when there are multiple uses of the components. You might want to add hasOneUse checks

wdng marked 2 inline comments as done.Aug 3 2016, 1:17 PM

You should also test when there are multiple uses of the components. You might want to add hasOneUse checks

Could you please say in more details about "add hasOneUse checks"? Thanks!

You should also test when there are multiple uses of the components. You might want to add hasOneUse checks

Wouldn't you only want to not match the pattern when all the intermediate nodes had more than one use?

You should also test when there are multiple uses of the components. You might want to add hasOneUse checks

Wouldn't you only want to not match the pattern when all the intermediate nodes had more than one use?

For the first pattern, there are 4 instructions in the input (min, max, add sub), so
min or max multiple use -> min, sad
min and max multiple use -> min, max, sad
sub multiple use -> sub, min, max, sad

so I think the first pattern just needs a hasOneUse on the sub. The other one should be similar

lib/Target/AMDGPU/SIInstructions.td
3411–3413 ↗(On Diff #66539)

I think you can remove most of the i32s. Only one should be sufficient

3411–3413 ↗(On Diff #66539)

Probably just need to keep the ones on the named operands and not the results

wdng updated this revision to Diff 67765.EditedAug 11 2016, 3:41 PM
wdng edited edge metadata.
wdng set the repository for this revision to rL LLVM.
  1. Added LIT tests for i8, i16, i32 vector
  2. Add hasOneUse() for V_SAD pattern and added multiple use of component LIT tests.
  3. Remove unnecessary i32s from the V_SAD pattern.
arsenm added inline comments.Aug 12 2016, 10:33 AM
lib/Target/AMDGPU/AMDGPUInstructions.td
621 ↗(On Diff #67765)

sub_su?

test/CodeGen/AMDGPU/sad.ll
34 ↗(On Diff #67765)

The sad is not supposed to be emitted if the sub has multiple uses

42–48 ↗(On Diff #67765)

It's easiest to add another use by storing the vale, rather than adding a lot more instructions

wdng updated this revision to Diff 67897.Aug 12 2016, 1:36 PM
  1. Renamed hasOneUse for sub instruction
  2. Added a LIT test with no sad is generated.
arsenm added inline comments.Aug 15 2016, 1:16 PM
lib/Target/AMDGPU/SIInstructions.td
3417–3419 ↗(On Diff #67897)

For this pattern the hasOneUse check belongs on the select.

test/CodeGen/AMDGPU/sad.ll
1 ↗(On Diff #67897)

Space before <

70 ↗(On Diff #67897)

You should end the label lines with the :, this prevents it from accidentally matching for something like max_pat12

104–110 ↗(On Diff #67897)

It's simpler to use a store of a value instead of adding more complex uses for the multi use. It also helps to avoid adding instruction types similar to the ones you want to write checks for

wdng updated this revision to Diff 68242.Aug 16 2016, 12:39 PM
wdng marked 2 inline comments as done.
  1. Use store to replace multiple use of components.
  2. Fix LIT test syntax issues.
arsenm added inline comments.Aug 18 2016, 9:54 AM
lib/Target/AMDGPU/SIInstructions.td
3417–3419 ↗(On Diff #68242)

The hasOneUse belongs on the select here

test/CodeGen/AMDGPU/sad.ll
163–164 ↗(On Diff #68242)

I don't think it's necessary to have multiple use checks for all of the vector versions

wdng added inline comments.Aug 19 2016, 10:04 AM
lib/Target/AMDGPU/SIInstructions.td
3417–3419 ↗(On Diff #68242)

Why do we have to apply hasOneUse on "select" instead of "sub" for the second pattern?

wdng updated this revision to Diff 68724.Aug 19 2016, 12:59 PM
  1. Apply hasOneUse on select for the second pattern.
  2. Remove checks for all of the vector versions.

You should add some more tests with some constant operands. For the sub patterns, that may require looking for add of negative constants

lib/Target/AMDGPU/AMDGPUInstructions.td
628 ↗(On Diff #68724)

select_oneuse should not go into the Properties block since it isn't commutative or associative

test/CodeGen/AMDGPU/sad.ll
101–114 ↗(On Diff #68724)

There's no point to testing multiple uses of the final result

132 ↗(On Diff #68724)

This is to specific for a not check. Not checks can be fragile especially when checking the operands. It would be better to check for the separate subs, select and add

wdng updated this revision to Diff 68741.Aug 19 2016, 3:34 PM

This patch only adds constant test case for the first pattern. For the second pattern, it cannot handle it. As for an improvement, I will create a second patch for handling the constant case for the second pattern.

wdng updated this revision to Diff 68743.Aug 19 2016, 4:00 PM
wdng marked an inline comment as done.

Changes based on Matt's last comments.

wdng updated this revision to Diff 68900.Aug 22 2016, 12:21 PM

Removed one test which has multiple use of add for pattern 1.

arsenm added inline comments.Aug 22 2016, 12:37 PM
lib/Target/AMDGPU/SIInstructions.td
3418–3419 ↗(On Diff #68900)

These operands should be indented to the same level as the other select operand to make it clear it's part of the select, and the add src2 should be on the next line after those

test/CodeGen/AMDGPU/sad.ll
41 ↗(On Diff #68900)

You should ad another test where the compared values don't match, and another where the second sub uses a different value to make sure those aren't matched

wdng marked an inline comment as done.Aug 22 2016, 2:05 PM
wdng added inline comments.
test/CodeGen/AMDGPU/sad.ll
41 ↗(On Diff #68900)

What do you mean the compared values don't match? If the second sub uses a different value, this is incorrect logic for the v_sad_u32 second pattern?

wdng updated this revision to Diff 68919.Aug 22 2016, 3:35 PM
wdng removed rL LLVM as the repository for this revision.
  1. Add indentation for v_sad_u32 patterns for better code understanding.
  2. Added one test where the compared values don't match, and another one where the second sub uses a different value to make sure those aren't matched.
arsenm added inline comments.Aug 22 2016, 3:41 PM
test/CodeGen/AMDGPU/sad.ll
153 ↗(On Diff #68919)

This should be checking 4 times

169 ↗(On Diff #68919)

This should be checking 4 times

wdng updated this revision to Diff 68924.Aug 22 2016, 3:57 PM

Add check for 4 times.

arsenm added inline comments.Aug 23 2016, 10:44 AM
test/CodeGen/AMDGPU/sad.ll
191–193 ↗(On Diff #68924)

These should not have zexts, the point of the test is what happens when it is legalized

211–213 ↗(On Diff #68924)

Ditto

245–247 ↗(On Diff #68924)

ditto, this should be like the previous test without the extensions

wdng updated this revision to Diff 69026.Aug 23 2016, 12:35 PM

Remove zexts.

arsenm accepted this revision.Aug 23 2016, 4:49 PM
arsenm edited edge metadata.

LGTM

This revision is now accepted and ready to land.Aug 23 2016, 4:49 PM
This revision was automatically updated to reflect the committed changes.