This is an archive of the discontinued LLVM Phabricator instance.

[NVPTX] Improve lowering of llvm.ctlz.
ClosedPublic

Authored by jlebar on Jan 13 2017, 7:17 PM.

Details

Summary
  • Disable "ctlz speculation", which inserts a branch on every ctlz(x) which has defined behavior on x == 0 to check whether x is, in fact zero.
  • Add DAG patterns that avoid re-truncating or re-expanding the result of the 16- and 64-bit ctz instructions.

Event Timeline

jlebar updated this revision to Diff 84421.Jan 13 2017, 7:17 PM
jlebar retitled this revision from to [NVPTX] Improve lowering of llvm.ctlz..
jlebar updated this object.
jlebar added a reviewer: tra.
jlebar added a subscriber: llvm-commits.
tra accepted this revision.Jan 17 2017, 1:28 PM
tra added inline comments.
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
2802

PTX has mov.b32 %dest, {%src1, %src2}
Instead of explicit conversion + subtracting 16, perhaps we could do something like this:

mov.b32 %t, {%src, 0xffff}
clz.b32 %result, %t

I'm not sure whether it makes any difference in SASS, though.

This revision is now accepted and ready to land.Jan 17 2017, 1:28 PM
jlebar updated this revision to Diff 84746.Jan 17 2017, 2:28 PM

Add TODO.

jlebar marked an inline comment as done.Jan 17 2017, 2:28 PM
jlebar added inline comments.
llvm/lib/Target/NVPTX/NVPTXInstrInfo.td
2802

Oh, that is sneaky. I like it. It is one less SASS instruction.

Orig:

/*0008*/                   MOV R1, c[0x0][0x44];       /* 0x64c03c00089c0006 */
/*0010*/                   LDC.U16 R0, c[0x0][0x140];  /* 0x7c900000a01ffc02 */
/*0018*/                   FLO.U32 R0, R0;             /* 0xe1800000001c0002 */
/*0020*/                   ISUB R0, 0x1f, R0;          /* 0xc09000000f9c0001 */
/*0028*/                   I2I.U16.U32 R2, R0;         /* 0xe6000000001c240a */
/*0030*/                   MOV R0, c[0x0][0x144];      /* 0x64c03c00289c0002 */
/*0038*/                   IADD R2, R2, -0x10;         /* 0xc88003fff81c0809 */
                                                       /* 0x080000000000b810 */
/*0048*/                   ST.U16 [R0], R2;            /* 0xe2000000001c0008 */
/*0050*/                   EXIT;                       /* 0x18000000001c003c */

Clever hack:

/*0008*/                   MOV R1, c[0x0][0x44];         /* 0x64c03c00089c0006 */
/*0010*/                   LDC.U16 R0, c[0x0][0x140];    /* 0x7c900000a01ffc02 */
/*0018*/                   ISCADD R0, R0, 0xffff, 0x10;  /* 0xc0c0407fff9c0001 */
/*0020*/                   FLO.U32 R2, R0;               /* 0xe1800000001c000a */
/*0028*/                   MOV R0, c[0x0][0x144];        /* 0x64c03c00289c0002 */
/*0030*/                   ISUB R2, 0x1f, R2;            /* 0xc09000000f9c0809 */
/*0038*/                   ST.U16 [R0], R2;              /* 0xe2000000001c0008 */
                                                         /* 0x08000000000000b8 */
/*0048*/                   EXIT;                         /* 0x18000000001c003c */

However, we don't currently have a mechanism to generate mov.b32 b32reg, {imm, b16reg}. If it's OK with you, I'll just leave a TODO. Clever as it is, I seriously doubt it will ever matter.

This revision was automatically updated to reflect the committed changes.
jlebar marked an inline comment as done.