This brings LLVM-generated PTX closer to what nvcc generates and avoids
triggering issues in ptxas.
For instance, ptxas does not accept .s16 (or .u16) registers as operands
for .fp16 instructions.
Differential D23460
[NVPTX] Use untyped (.b) integer registers in PTX. tra on Aug 12 2016, 11:31 AM. Authored by
Details This brings LLVM-generated PTX closer to what nvcc generates and avoids For instance, ptxas does not accept .s16 (or .u16) registers as operands
Diff Detail Event TimelineComment Actions Since we're working around a ptxas bug, I feel like we should have a comment somewhere explaining ourselves.
Comment Actions
I believe that different --check-prefixes are evaluated independently, so if you did FOO-NOT, that would effectively check that something never appears in the file. Would that work? Comment Actions Empirical evidence suggests that -check-prefixes are *not* evaluated independently. As far as I can tell, it just creates set of labels to pay attention to and then all labels within the set are treated the same. That said, I can always do negative check in a separate RUN line, so the idea is helpful. Comment Actions Test looks good to me, but I do think we want a comment in the source explaining why we use b.
|
As written, this comment says we're making this change to avoid the hypothetical possibility of hitting bugs in ptxas in the future. But that's not really true, right? My understanding is there's an explicit bug in ptxas with fp16 that we are working around.
We chose to use this big hammer to work around the ptxas bug for the reasons you explain here, and that's also an important part, but it's not the whole story.
Sorry to be nitpicky about this, but spelling it out (ideally with examples of good/bad PTX) is really important so that if someone in the future comes along and wants to change this, they have some chance of being able to evaluate whether they will regress something.