This is an archive of the discontinued LLVM Phabricator instance.

[ARM] ParallelDSP: multiple reduction stmts in loop
ClosedPublic

Authored by SjoerdMeijer on Jul 10 2018, 1:37 AM.

Details

Summary

This fixes an issue that we were not properly supporting multiple
reduction stmts in a loop, and not generating SMLADs for these cases. The
alias analysis checks were done too early, making it too conservative.

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.Jul 10 2018, 1:37 AM
samparker added inline comments.Jul 10 2018, 3:21 AM
lib/Target/ARM/ARMParallelDSP.cpp
584 ↗(On Diff #154768)

For this to be safe, don't we have to ensure that the reductions are all rooted at the same phi?

Thanks for the suggestion.

This update makes explicit that the different MAC chains don't interfere/alias with each other. I.e. MAC-chains are a simple chains that starts with a reducing integer add statement, and then a chain of adds and muls, which only has sext and loads as operands. Thus, we don't write memory, and we set a ReadOnly flag.

I still think you're overcomplicating things here... Keeping the ReadOnly flag, you can then just check what's writing to memory in the rest of the block:

static void AliasCandidates(BasicBlock *Header,
                            Instructions &Reads, Instructions &Writes) {
  for (auto &I : *Header) {
    if (I.mayReadFromMemory())
      Reads.push_back(&I);
    if (I.mayWriteToMemory())
      Writes.push_back(&I);
  }
}

static bool AreAliased(AliasAnalysis *AA, Instructions &Reads,
                       Instructions &Writes,
                       ParallelMACList &MACCandidates) {
  LLVM_DEBUG(dbgs() << "Alias checks:\n");
  for (auto &MAC : MACCandidates) {
    LLVM_DEBUG(dbgs() << "mul: "; MAC.Mul->dump());
    if (!MAC.isReadOnly)
      return true;
    for (auto *I : Writes) {
      LLVM_DEBUG(dbgs() << "- "; I->dump());
      assert(MAC.MemLocs.size() >= 2 && "expecting at least 2 memlocs");
      for (auto &MemLoc : MAC.MemLocs) {
        if (isModOrRefSet(intersectModRef(AA->getModRefInfo(I, MemLoc),
                                          ModRefInfo::ModRef))) {
          LLVM_DEBUG(dbgs() << "Yes, aliases found\n");
          return true;
        }
      }
    }
  }
  return false;
}

As before, this will allow each MAC chain to be processed independently (which makes sense because they are independent) and allows us to be continue being assumption free. I'm sorry this removes the for_each loop though :p

thanks, suggestions implemented.

samparker accepted this revision.Jul 11 2018, 3:52 AM

Cheers!

This revision is now accepted and ready to land.Jul 11 2018, 3:52 AM
This revision was automatically updated to reflect the committed changes.