Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
| llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp | ||
|---|---|---|
| 580 | Nit: Asserts are only present if not NDEBUG anyway, so the ifndef seems superfluous (I guess, delete it from the nearby ones since we're here?) | |
| llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp | ||
|---|---|---|
| 485 | This "#ifndef NDEBUG" is probably still necessary. | |
- Opt to put the #ifndef DEBUG back in, rationale being that it's clearer even if technically the asserts don't need to be surrounded by them.
| llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp | ||
|---|---|---|
| 500 | This looks weird to me because there's nothing to say DOPReg is actually unique. Here I think ConstructiveUnaryPassthru is special as for it's associated pseudo instructions the passthru operand is effectively irrelevant because all active lanes come from the data operand with the inactive lanes being undef or zero. I imagine you are forcing true so that the new assert doesn't fire, but I think that new assert is also irrelevant because we don't care about the other operands [1]. I don't have a problem forcing true but the assert just seems like dead code. That said I'm then wondering what value there is referencing the passthrough value at all, but instead treating the data operand as DOP instead. This seems weird but at the same time consistent with other instructions in that DOP contains real data for the active lanes. If you did this then you will not need special handling within the Create the destructive operation block and in fact the if (FalseZero) part of that block will now be correct for if we even need zeroing pseudos for these unary ops. What do you think? I guess I'm suggesting: [1] For what it worth I suspect there's a bug somewhere in this logic whereby DstReg is part of some of the DOPRegIsUnique computations but that's not your problem. | |
| llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
| 39 | I'll admit that I had no issues with this name until I wrote the large comment for expand_DestructiveOp. That said it has me wondering if these instructions can truely be considered constructive. As I see it the unary passthrough instructions take two operands and munges them to produce a result. This result is then used to overwrite the first operand (the one that is used when calculating the result for inactive lanes), which to my mind means that it is still a destructive operation. So DestructiveUnaryPassthru is a more accurate name. What do you think? | |
| llvm/test/CodeGen/AArch64/sve2-unary-movprfx.ll | ||
| 8 | I don't think you need to test all instructions but there should be a test for each new pattern. Can you ensure there is a test for one instruction within each modified group. For example, I don't see any tests for the changes made to sve_int_un_pred_arit_1_fp, or is it the case there are existing tests for these? | |
| 10 | For the first few tests can you add a short comment to describe what the test is verifying. I ask because I was initially confused with the output of sqabs_i8_active until I realised _active meant the predicate is all active and thus making the passthrough value dead. | |
| llvm/lib/Target/AArch64/AArch64ExpandPseudoInsts.cpp | ||
|---|---|---|
| 500 | I think it possibly makes sense to do std::tie(PredIdx, DOPIdx, SrcIdx) = std::make_tuple(2, 3, 3);, as that would make it more consistent with the others, and would remove the special case in the movprfx construction (I assume you meant there rather than the Create the destructive operation block?), 2,3,1 seems strange to me as the passthru is not the source operand? | |
| llvm/lib/Target/AArch64/AArch64InstrFormats.td | ||
| 39 | I agree, now I that I think about it like that, I'll change the name of this. | |
| llvm/test/CodeGen/AArch64/sve2-unary-movprfx.ll | ||
| 8 | sve_int_un_pred_arit_1_fp is tested in sve-unary-movprfx.ll (which you possibly missed because it is being hidden since it's large), but point taken, I probably have gone overboard here on tests.. | |
- Rename ConstructiveUnaryPassthru to DestructiveUnaryPassthru
- Alter/Simplfy pseudo expand to treat the source register as the data operand
- Reduce number of tests to only cover each pattern once
- Add comments clarifying what the tests are checking
Hi, @bsmith, sorry to bother you, could you tell me the background of the patch? 
I have a question about the patch, as I notice that the patch adds movprfx before some instructions, and the optimization may affect the performance(degradation).
; CHECK-NEXT:    ptrue p0.d
+; CHECK-NEXT:    movprfx z0, z1
; CHECK-NEXT:    fsqrt z0.d, p0/m, z1.d
The rational for the patch is to ensure the inactive lanes of unary operations don't have a dependence on previous operations when the results for inactive lanes is not important (i.e. undef). Taking the above example, the problem with not breaking the dependency is that if z0.d is used as the result for a preceding long latency instruction then it can negatively impact the issue latency of this instructions despite there being no "real" dependency.
For this reason it is preferred to always break such dependencies via a movprfx instruction. The intent here is that movprfx is suppose to be free because the architecture prefers implementations to combine movprfx'd destructive instructions into constructive instructions. However, we are aware of issues where register allocation is making some poor decisions that is leading to movprfx being over used because dedicated destination registers are being allocated even though it would be advantageous to reuse an operand register.
This "#ifndef NDEBUG" is probably still necessary.