This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC.
ClosedPublic

Authored by artem.tamazov on May 30 2016, 11:07 AM.

Details

Summary

The change unifies llvm assembler/disassembler syntax with sp3's one.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.

Diff Detail

Repository
rL LLVM

Event Timeline

artem.tamazov retitled this revision from to [AMDGPU][llvm-mc] v_cndmask_b32: src2 is mandatory; do not enforce VOP2 when src2 == VCC..
artem.tamazov updated this object.
artem.tamazov set the repository for this revision to rL LLVM.
artem.tamazov added a project: Restricted Project.
artem.tamazov added subscribers: Restricted Project, vpykhtin, nhaustov.
SamWot accepted this revision.May 31 2016, 2:50 AM
SamWot edited edge metadata.
This revision is now accepted and ready to land.May 31 2016, 2:50 AM
artem.tamazov requested a review of this revision.Jun 3 2016, 7:51 AM
artem.tamazov edited edge metadata.

Ping

This revision was automatically updated to reflect the committed changes.
arsenm edited edge metadata.Jun 6 2016, 8:45 AM

Why did this change codegen at all?

artem.tamazov added a comment.EditedJun 6 2016, 8:56 AM

Why did this change codegen at all?

The CodeGen itself is not changed, but its output is.

Prior this change VOP3 was enforced when src2 is specified, even if src2 is VCC. The change allows for 32-bit encoding in that case.

arsenm added inline comments.Jun 6 2016, 1:14 PM
llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll
28

Where did the s_and_b64 go?

llvm/trunk/test/CodeGen/AMDGPU/sint_to_fp.i64.ll
28

This changes looks like it regressed to now use e64

artem.tamazov added inline comments.Jun 7 2016, 4:06 AM
llvm/trunk/test/CodeGen/AMDGPU/fceil64.ll
28

There was superfluous s_and_b64. Output of fceil_f64 contained only one s_and_b64 even prior this change. The issue somehow became apparent after my changes.

llvm/trunk/test/CodeGen/AMDGPU/sint_to_fp.i64.ll
28

Yes, this is a case of regression. However, overall stats for this test looks progressed:

Function                 codeSize nSgps  nVprs
                         (prior/after change)
----------------------------------------------
s_sint_to_fp_i64_to_f32  152/148  14/14  4/4
v_sint_to_fp_i64_to_f32  180/188  12/14  7/6
s_sint_to_fp_v2i64       296/284  17/17  8/8
v_sint_to_fp_v4i64       656/624  15/17  21/18
----------------------------------------------
Total diff               -40      +4     -4

For fceil64.ll, stats progressed as well:

                         codeSize  nSgps  nVprs
Function                 (prior/after change)
-----------------------------------------------
fceil_f64                132/128   12/12  4/4
fceil_v2f64              248/240   16/16  7/7
fceil_v4f64              480/464   20/20  11/11
fceil_v8f64              916/876   27/27  21/20
fceil_v16f64             1996/1988 62/62  46/46
-----------------------------------------------
Total diff               -76       0      -1

As overall stats looks good, I decided that no further investigation necessary right away.

Regarding v_sint_to_fp_i64_to_f32 regression. After looking at ISA changes, I suspect that instruction scheduler prefers to save a VGPR at the cost of code size. I can send you both ISAs if you would like to look at this.