[WIP] [Polly] [ManagedMemoryRewrite] Rewrite global arrays with global pointers that are polly_mallocManage'd
ClosedPublic

Authored by bollu on Wed, Aug 9, 6:06 AM.

Diff Detail

Repository
rL LLVM
There are a very large number of changes, so older changes are hidden. Show Older Changes

The function editAllUses look more complicated than necessary. Here is what I'd try:

  1. For every function, insert a arrptr.load in the entry (or lazily when needed) (might be beneficial to have just one such load per function)
  1. Iterate of the instructions that contain a use of GlobalArray.
  1. Recurse over the operands of the instruction. Pseudo code:
Value *recursiveReplace(Value *Replacee, GlobalArray*Old, LoadInst *New) {
  if (Value == Old)
    return New;

  Instuction *Replacement = dyn_cast<Instruction>(Replacee); // Replacee is only an instruction for the recursion root.
  for (Use &Op : Value->operands()) {
    auto Replacer= recursiveReplace(Op>get());
    if (Replacer== Op->get())
     continue;
    if (!Replacement) {
     Replacement = Replacee->getAsInstruction();
     // Insert Replacement somewhere, e.g. right after arrptr.load (It was previously a constant, i.e. there are no other dependencies than to arrptr.load)
   }
    Replacement->setOperand(Op.getOperandNo(),  Replacer);
  }
  return Replacement;
}
  1. Call
for (Instruction *UserOfArrayInst : ArrayUserInstructions) {
  auto New = recursiveReplace(UserOfArrayInst , Array, `arrptr.load`);
  if (New != UserOfArrayInst)
    UserOfArrayInst->replaceAllUsesWith(New);
}
lib/CodeGen/ManagedMemoryRewrite.cpp
100 ↗(On Diff #110693)

[Suggestion] SmallVector instead of std::vector. Most instructions have only one or two operands, such that we don't need a memory location in the majority of cases.

120–122 ↗(On Diff #110693)

[Style] Doxygen comment.

128 ↗(On Diff #110693)

[Style] GEP->getPointerOperand() != ArrToRewrite

132 ↗(On Diff #110693)

[Suggestion] SmallVector and .reserve()

143–145 ↗(On Diff #110693)

[Suggestion] std::set is inefficient. Try SmallPtrSet?

151–152 ↗(On Diff #110693)

Why does GEP need to be handled different than other instructions?

There is also GetElementPtrConstantExpr which is not handled here.

159–165 ↗(On Diff #110693)

An assert is enough.

222 ↗(On Diff #110693)

[Style]

if (auto *I = dyn_cast<Instruction>(Current))
230–235 ↗(On Diff #110693)

AFAIK there is nothing else than Instruction and Constant deriving from User. An assert(isa<Constant>(Current)) or auto *C = cast<Constant>(Current) would be enough.

239 ↗(On Diff #110693)

Isn't it more replacing than removing?

Also: functions start with lower-case letters

247 ↗(On Diff #110693)

What does this return for multidimension arrays?

258 ↗(On Diff #110693)

Uninitialized data would be ok as well, no? Could handle them the same as zero-initialized data, which is a value uninitialized data can have.

267 ↗(On Diff #110693)

[Style] cast<> instead of dyn_cast<> gives you a type check in asset-builds.

279 ↗(On Diff #110693)

Where does the magic number 100 come from?

291–292 ↗(On Diff #110693)

Is the priority even important? AFAIK it can be just 0 for everything.

297 ↗(On Diff #110693)

unfinished sentence

test/GPGPU/simple-managed-memory-rewrite.ll
7 ↗(On Diff #110693)

REQUIRES: pollyacc missing

bollu updated this revision to Diff 110889.Sun, Aug 13, 1:09 PM
  • [Code dump] Dump of all changes that now allow global arrays to be rewritten

correctly.

We still need to rewrite allocas correctly.

bollu updated this revision to Diff 110894.Sun, Aug 13, 2:17 PM
  • [UNDEBUG] remove debug printing in GPUJIT
  • [NFC] check polly and fix test case to reduce size by x100
bollu marked 13 inline comments as done.Tue, Aug 15, 2:00 AM
bollu added inline comments.
lib/CodeGen/ManagedMemoryRewrite.cpp
128 ↗(On Diff #110693)

rewriteGEP has been removed.

132 ↗(On Diff #110693)

rewriteGEP has been removed.

151–152 ↗(On Diff #110693)

Yep, GEP special casing is removed.

247 ↗(On Diff #110693)

It would return the inner array type. However, I don't believe this is a problem due to the way we allocate memory.

We issue a single cudaMallocManaged, and then we convert the [T] to T*. This should still work, memory-layout wise unless I'm missing something.

258 ↗(On Diff #110693)

Yes, it would. I set it to zero as a safe default. Do we gain / lose anything by switching to uninitialized?

279 ↗(On Diff #110693)

Removed.

bollu updated this revision to Diff 111142.Tue, Aug 15, 4:11 AM
bollu marked 4 inline comments as done.
  • [WIP] Update according to Michael's comments.
bollu updated this revision to Diff 111198.Tue, Aug 15, 10:15 AM
  • [NFC] undo all changes to GPUJIT.

@Meinersbur - Second round of review, please :)

grosser accepted this revision.Wed, Aug 16, 2:40 AM

LGTM.

lib/CodeGen/ManagedMemoryRewrite.cpp
98 ↗(On Diff #111198)

is populated WITH all the

123 ↗(On Diff #111198)

No braces and no ";"

Maybe also reduce casting.

127 ↗(On Diff #111198)

You don't rewrite GEP, so remove the comment.

136 ↗(On Diff #111198)

keeps

153 ↗(On Diff #111198)

No braces for single statements

166 ↗(On Diff #111198)

No braces for single-statements

218 ↗(On Diff #111198)

Why is there a "false"?

310 ↗(On Diff #111198)

Not needed any more.

lib/Support/RegisterPasses.cpp
48 ↗(On Diff #111198)

Drop.

353 ↗(On Diff #111198)

Unrelated change?

This revision is now accepted and ready to land.Wed, Aug 16, 2:40 AM

The idea seems sensible. Some nits inline. Most importantly the intent and algorithm of the pass should be documented much more explicitely.

lib/CodeGen/ManagedMemoryRewrite.cpp
37 ↗(On Diff #111198)

Unused include?

58 ↗(On Diff #111198)

Function names should be humbleCamelCase

100 ↗(On Diff #111198)

This algorithm requires a much more extensive documentation.

106 ↗(On Diff #111198)

This is unused?

134 ↗(On Diff #111198)

InstsToBeDeleted is unused?

190 ↗(On Diff #111198)

Use SmalVvector instead of std::vector

206 ↗(On Diff #111198)

What's the purpose of this?

212 ↗(On Diff #111198)

ElemTy->getPointerTo(AddrSpace)

237 ↗(On Diff #111198)

No need to pass the empty array here.

244 ↗(On Diff #111198)

Isn't this off by a factor of 8?

257 ↗(On Diff #111198)

Use a SmallVector instead

263 ↗(On Diff #111198)

Superfluous braces.

268 ↗(On Diff #111198)

InstsToBeDeleted is never populated with anything.

284 ↗(On Diff #111198)

static

303 ↗(On Diff #111198)

static

316 ↗(On Diff #111198)

Just Builder.getInt64(Size)

324 ↗(On Diff #111198)

Is this actually ever relevant?

341 ↗(On Diff #111198)

Why is this a member?

369 ↗(On Diff #111198)

Here and below: Superfluous braces

383 ↗(On Diff #111198)

Use a SmallSet instead.

bollu updated this revision to Diff 111351.Wed, Aug 16, 8:25 AM
bollu marked 25 inline comments as done.
  • [NFC] Rewrite based on Phillip & Tobias' comments
bollu updated this revision to Diff 111475.Thu, Aug 17, 1:28 AM
  • [Bugfix] Send bytes, not bits. Also, move the alloca function extraction to the correct place.
bollu added inline comments.Thu, Aug 17, 1:28 AM
lib/CodeGen/ManagedMemoryRewrite.cpp
206 ↗(On Diff #111198)

The AddrSpace is a parameter to all pointers. I believe the default address space is zero.

As for why LLVM uses it, for example, the NVPTX backend uses address spaces to refer to where the pointer lives in memory.

218 ↗(On Diff #111198)

I need some help with this one, actually. What is the correct linkage type?

324 ↗(On Diff #111198)

Probably not, but I'd much rather be defensive about things like this.

bollu added a comment.Thu, Aug 17, 1:29 AM

@philip.pfaffe Another round of review, please?

bollu updated this revision to Diff 111476.Thu, Aug 17, 1:32 AM
  • [Re-upload] diff against the newest HEAD.
bollu updated this revision to Diff 111478.Thu, Aug 17, 1:36 AM
  • [Merge] Merged with master, hoping that GPUJIT does not show up from arc diff this time.
philip.pfaffe added inline comments.Thu, Aug 17, 1:58 AM
lib/CodeGen/ManagedMemoryRewrite.cpp
206 ↗(On Diff #111198)

The question was not what an AddrSpace is, but why you're actively passing the default everywhere, and even store this in an extra variable.

122 ↗(On Diff #111478)

Should you not assert before using I above?

149 ↗(On Diff #111478)

Does this actually ever happen? You're recursively expanding ConstExpr operands, so after a single pass that shouldn't be necessary anymore. Does this mean that this loop's trip count is actually never greater than two?

219 ↗(On Diff #111478)

Internal or Common linkage should be used.

bollu updated this revision to Diff 111484.Thu, Aug 17, 2:32 AM
  • [NFC] Discuss algorithm with phillip offline, he helped to simplify it further.
  • [Linkage] Update linkage code to use the correct linker options as well as the ignore linkage flag. Update test case to match this change
  • [ReplaceUsesOfWith] remove double-loop that was not required.
philip.pfaffe accepted this revision.Thu, Aug 17, 3:32 AM

Besides the AddrSpace, LGTM!

bollu updated this revision to Diff 111488.Thu, Aug 17, 3:40 AM
  • [NFC] remove 0 address space because that is the default value.
This revision was automatically updated to reflect the committed changes.