Page MenuHomePhabricator

[CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr
ClosedPublic

Authored by peterwaller-arm on Tue, Apr 27, 5:55 AM.

Details

Summary

When a ptest is used to set flags from the output of rdffr, the ptest
can be eliminated, using a flags-setting rdffrs instead.

Additionally, check that nothing consumes flags between rdffr and ptest;
this case appears to have been missed previously.

  • There is no unpredicated RDFFRS instruction.
  • If substituting RDFFR_PP, require that the mask argument of the PTEST matches that of the RDFFR_PP.
  • Move some precondition code up inside optimizePTestInstr, so that it covers the new code paths for RDFFR which return earlier.
    • Only consider RDFFR, PTEST in same basic block.
    • Check for other flag setting instructions between the two, abort if found.
    • Drop an old TODO comment about removing dead PTEST instructions.

RDFFR_P to follow in later patch.

Diff Detail

Event Timeline

peterwaller-arm requested review of this revision.Tue, Apr 27, 5:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptTue, Apr 27, 5:55 AM

Fix clang-format issues.

Matt added a subscriber: Matt.Tue, Apr 27, 9:33 AM
paulwalker-arm added inline comments.Sun, May 2, 3:37 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1369–1372

Is this true? I don't see how you know the value of p0 just prior to rdffr. There's nothing to say p0 has been defined at this point nor that it contains the same value as it will by the time the ptest is hit.

Although not exactly the same this is what the

auto *RDFFRMask = MRI->getUniqueVRegDef(RDFFR->getOperand(1).getReg());
if (Mask != RDFFRMask)
  return false;

code for the AArch64::RDFFR_PPz case is protecting against.

Perhaps this case is worth a separate patch?

1394–1396

Not saying there's anything wrong but I just wondered why you cannot fall through here? Is the following code specific to the other case blocks? I only ask because RDFFR_PPz->RDFFRS_PPz looks very similar to BRKN_PPzP->BRKNS_PPzP.

peterwaller-arm added inline comments.Tue, May 4, 3:39 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1369–1372

Great catch, agree that seems problematic.

If the only use of p1.b after the rdffr is at the ptest, can we think of it as optimizing the ptest instead (as the comment is written)? e.g. hasOneNonDBGUser(rdffr.Operand) && getUniqueVRegDef(ptest.Pred) == rdffr.

If that is a simple enough solution and seems obviously correct to you, I'll keep it in-patch, otherwise split.

1394–1396

I think you are correct that this could fall through.

It ended up like this because I first implemented RDFFR_P, which AIUI cannot fall through because it grows an operand. Then I used the same implementation strategy for RDFFR_PP. When I came to consider falling through I concluded that the non-fall-through code was more straightforward to read/reason about. It didn't seem to me a too much extra code (we 'spend 4 extra lines of obvious code' to 'get to a return true 30 lines of code earlier' which seemed a worthwhile trade).

I'm guessing that updating the existing MI is done to avoid additional allocations and reduce the total code slightly? Additionally it has to add NZCV as an operand and mark it live, none of which looks necessary if the instruction has been built.

What is your preference in this case? Shall I change to fall-through?

  • Update for code review.
  • Drop RDFFR_P - harder than I thought, for a later patch.
peterwaller-arm marked an inline comment as done.Thu, May 6, 5:59 AM
peterwaller-arm edited the summary of this revision. (Show Details)

Update commit message.

paulwalker-arm accepted this revision.Tue, May 11, 4:51 AM

Looks good to me. I've a few comments that you're free to ignore.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1313–1321

I have no proof on this so feel free to ignore if you disagree.

Hoisting the Pred->getParent() != PTest->getParent() part offers a faster route to termination so I'm good with that.

However, I feel iterating across the BB's instructions may not and so that part might be better to keep until after we've decided Mask and Pred are interesting.

1373

Given there's only a single use, which matches the style of the previous blocks, I don't see much value in creating this alias to Pred. Especially as the comment describes the situation.

This revision is now accepted and ready to land.Tue, May 11, 4:51 AM
peterwaller-arm retitled this revision from [CodeGen][AArch64][SVE] Substitute [rdffr, ptest] => rdffrs to [CodeGen][AArch64][SVE] Fold [rdffr, ptest] => rdffrs; bugfix for optimizePTestInstr.
peterwaller-arm edited the summary of this revision. (Show Details)
  • Update for code review
  • Factor loop with pre-existing areCFlagsAccessedBetweenInstrs.
  • Add test case for flags used between Pred and PTest, and fix bug where this was not covered.
peterwaller-arm marked 2 inline comments as done.Wed, May 12, 4:31 AM

Found and fixed additional bug here.

llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1313–1321

Opted to leave the code as is for now and change it in next patch.

paulwalker-arm added inline comments.Wed, May 12, 4:52 AM
llvm/lib/Target/AArch64/AArch64InstrInfo.cpp
1388–1392

I believe this case is handled within areCFlagsAccessedBetweenInstrs, making this redundant code that can be removed?

peterwaller-arm marked an inline comment as done.

Drop redundant same-bb check.

peterwaller-arm marked an inline comment as done.Wed, May 12, 5:53 AM
paulwalker-arm accepted this revision.Wed, May 12, 7:03 AM