Modify ValidateLiveOuts to track 'FalseZeros' more precisely, including checks on specific operations that can generate non-zeros from zero values, e.g VMVN. We can then check that any instructions that retain some information in their output register (all narrowing instructions) that they only use and def registers that always have zeros in their falsely predicated bytes, whether or not tail predication happens.
Most of the logic remains the same, just the names of the data structures and helpers have been renamed to reflect the change in logic. The key change, apart from the opcode checkers, is that the FalseZeros set now strictly contains only instructions which will always generate zeros, and not instructions that could also have their false bytes masked away later.
Details
Diff Detail
Event Timeline
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
521 | This is becoming a very tricky pass (if it wasn't already that), for a bunch of different reason: the analysis, the transformations, etc. To keep things readable, I am now going to nitpick on comments. I believe there is one school of thought that code should be self-documenting, and comments will always be out of date or misleading, which I think I can mostly agree with. But here in this case, this function, that checks certain opcodes and returning true/false to a function named canGenerateNonZeros doesn't tell me much. So my request is here is if you can define what exactly canGenerateNonZeros mean, and why these opcodes. | |
538 | Same request here: what is retainsPreviousValue, and why these opcodes? | |
583 | Same here | |
588 | Probably this comment describes the function's purpose, so can be moved up. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
521 | Fair! Here, any MVE instruction that could generate a non-zero result given nothing but zero'd registers should be listed. This allows us to track zeros in means zeros out. | |
538 | All these instructions operate upon half a lane of the source register, writing to the destination, but they also leave the other half of the destination register untouched. The reference manual uses this sentence: 'The other half of the destination vector element retains its previous value'. |
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
521 | Thanks, makes sense. If you can later add this as comments, that would be much appreciated. I am now going to do a second pass over this patch, let things sink in, and continue the review. |
I am now up to date with this, and don't have any questions about the logic, that looks good.
I would like to go over it once more when (some of) the nits have been addressed as I would like to get an overview how things read and look as this introduces quite a few new things and concepts.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
538 | Thanks for pointing this out. Just a nit/suggestion for the function name then: A few questions about the list below. Is this list meant to be complete? From looking at the ArmARM a bit more for this behaviour, it looks like there are more, the first search result e.g. showed VQMOVN. Are we for example not interested in this one? Also curious if it would be worth describing this property in a different way, or if this switch will do. | |
583 | nit: perhaps something along the lines of: alwaysZero -> laneAlwaysZero | |
617 | nit: bytes -> lanes? | |
625 | nit: KnownFalseZeros -> Predicated? | |
629 | nit: perhaps something with "lane" in it:
|
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp | ||
---|---|---|
538 | It is supposed to be complete, so thanks! Which also reminds me that there's no tests for any movn either! |
This is becoming a very tricky pass (if it wasn't already that), for a bunch of different reason: the analysis, the transformations, etc. To keep things readable, I am now going to nitpick on comments. I believe there is one school of thought that code should be self-documenting, and comments will always be out of date or misleading, which I think I can mostly agree with. But here in this case, this function, that checks certain opcodes and returning true/false to a function named canGenerateNonZeros doesn't tell me much. So my request is here is if you can define what exactly canGenerateNonZeros mean, and why these opcodes.