SMEM opcodes are faster, so we want to use them if possible.
Deus Ex performance: +13%
(with on-demand shader compilation enabled in Deus Ex, so the real
improvement should be higher)
mareko on Jan 22 2017, 2:16 PM.Authored by
I see the same problem as Tom here. Do those shaders use read-only SSBOs? If so, this could perhaps be done at the Mesa level. But even then, there'd be a problem if the same memory is bound to two different SSBOs, and one of them is written to, unless the SSBO is marked 'restrict'.
Can you be more specific about why it's incorrect? I only see an issue with L1 coherency (GLC=0).
Another possible issue is that SMEM instructions ignore bits of the resource descriptor. So you would need some way to tell the compiler that it wouldn't be ignoring some relevant resource bits by selecting to SMEM.
Doesn't this make this change unworkable? Presumably the front-end would need to annotate in some way to indicate that this is a legitimate transformation, in which case you might as well use a different intrinsic anyway. Are there any circumstances where you can determine if this is definitely the case?
I've got a situation that benefits from this change, but equally could use the solution in D27586. Perhaps that change could be enhanced with the non-const offset change in this review?
Yes, having separate intrinsics like D27586 is preferable not just because of the sL1 vs vL1 coherency stuff, but also because SMEM instructions have many differences compared to VMEM and sometimes even the same looking VMEM and SMEM instructions have different behavior. It's also important that s.load intrinsics support non-constant offsets and are lowered to VMEM when the address comes from a VGPR.