There is no v_mov_b64, but a v_lshlrev_b64 can accomplish the same by
shifting a 64-bit register by 0.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Interesting. I see you're doing this when expanding V_MOV_B64_PSEUDO, but I don't really understand when we use V_MOV_B64_PSEUDO in the first place. copyPhysReg() does not generate it, instead it copies the logic from here to emit V_PK_MOV_B32. So does that mean you need to add your V_LSHLREV_B64_e64 code to copyPhysReg too?
The Write64Bit definitions in SISchedule.td suggest they are half rate on most subtargets and full rate on gfx90a.
I think that's probably wrong. Comments in performShlCombine for example say it's quarter rate
It seems to be quarter rate (or something slow) on gfx9, full rate on gfx90a and half rate on gfx10?
Then it would be worth using on gfx90a and gfx10+.
You do not need this on gfx90a because there is pk_mov. It is arguably the same performance as 2 moves on gfx10.
Pseudo was created to deal with 64 bit immediates and fold these. It is not needed that late.
For GFX10, I don't think this is worth doing unless V_LSHLREV_B64 is full rate.
2x V_MOV_B32 in VOP1 takes the same space as V_LSHLREV_B64 in VOP3.
I think this is right. It can also be scheduled apart leaving room for something else to be scheduled in between. A 64 bit shift is rarely beneficial in general if you can get away without it.