This is an archive of the discontinued LLVM Phabricator instance.

Elide argument copies during instruction selection
ClosedPublic

Authored by rnk on Feb 7 2017, 11:53 AM.

Details

Summary

Avoids tons of prologue boilerplate when arguments are passed in memory
and left in memory. This can happen in a debug build or in a release
build when an argument alloca is escaped. This will dramatically affect
the code size of x86 debug builds, because X86 fast isel doesn't handle
arguments passed in memory at all. It only handles the x86_64 case of up
to 6 basic register parameters.

This is implemented by analyzing the entry block before to ISel to
identify copy elision candidates. If an argument is a copy elision
candidate, we set a flag on the InputArg that we pass to
TargetLowering::LowerFormalArguments, similar to how we mark byval
arguments. The target does normal calling convention processing to
compute argument locations, and if the argument lives in memory and the
target recognizes the elision candidate flag, it will return a
FrameIndexSDNode instead of the usual load or physical register copy
SDValue. The generic ISel code recognizes this and replaces the old
frame index with the new fixed frame index in the StaticAllocaMap and
the debug declare table.

Supersedes D28388

Fixes PR26328

Diff Detail

Repository
rL LLVM

Event Timeline

rnk created this revision.Feb 7 2017, 11:53 AM
hans edited edge metadata.Feb 7 2017, 2:47 PM

Looks good as far as I can tell.

It adds more complexity to SelectionDAG, but I suppose it's the best place to do it.

include/llvm/CodeGen/SelectionDAGISel.h
59 ↗(On Diff #87492)

These are pretty generalist names in a broad scope. Maybe name them around "ArgCopy" rather than just Copy or Elided instructions?

I suppose for the consumer of this set it doesn't care what kind of instruction it is really.. bot for ElisionFrameIndexMap at least maybe ArgCpyElisionFrameIndexMap would be better? (Pretty long though..)

lib/CodeGen/AsmPrinter/DwarfDebug.h
88 ↗(On Diff #87492)

Unrelated, just commit?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8057 ↗(On Diff #87492)

I might have gone with just return &StaticAllocas[AI]

8463 ↗(On Diff #87492)

You cleared it above too before starting to fill it, so maybe one of these is redundant?

test/CodeGen/X86/arg-copy-elide.ll
19 ↗(On Diff #87492)

Is there a pop or something restoring the stack after the call? Should we check for that too?

rnk updated this revision to Diff 87670.Feb 8 2017, 9:46 AM
rnk marked 2 inline comments as done.
  • Rename things & tweak code
chandlerc added inline comments.Feb 8 2017, 10:54 AM
include/llvm/CodeGen/SelectionDAGISel.h
318 ↗(On Diff #87670)

no doxygen?

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8047 ↗(On Diff #87670)

SmallDenseMap? Seems like we can predict common upper bounds and this is a leaf-y function, so can avoid most memory allocation here.

8077 ↗(On Diff #87670)

For a sequence of N casts of an alloca, doesn't this do O(N^2) work?

Maybe make GetInfoIfStaticAlloca fully memoize each Value?

8168–8171 ↗(On Diff #87670)

Both findArgumentCopyElisionCandidates and elideArgumentCopy are only ever called from this function. Maybe make them local static helpers instead of putting them into the API?

Also, the only data structure used outside this function is the ElidedArgCopyInstrs. Maybe sink all the other data structtures down to be local? That seems like it would also help address Hans' comments about name.

I also don't see why you need flags for this when you have a data structure that tracks which things are candidates...

rnk added inline comments.Feb 8 2017, 11:13 AM
lib/CodeGen/AsmPrinter/DwarfDebug.h
88 ↗(On Diff #87492)

It's related because this change triggers the assertion. As written, the assert is equivalent to:

assert(FI != -1);

However, -1 is a valid fixed frame index, and it is typically the one used for the first argument in memory.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8057 ↗(On Diff #87492)

This is C++! We can't do two hash lookups. :)

8168–8171 ↗(On Diff #87670)

I had it that way originally, but the set of elided instructions needs to make it into SelectBasicBlock so we can skip over them. At that point I made these things instance methods and fields. I'll flip it around and pass the data explicitly and see if we like that better.

test/CodeGen/X86/arg-copy-elide.ll
19 ↗(On Diff #87492)

There is, but I don't want to overfit our choice for how to adjust the stack. It could be "pop" or "add $4, %esp". Matching the push doesn't feel like overfitting because it's a profitable transform that we're likely to keep.

rnk updated this revision to Diff 87700.Feb 8 2017, 1:11 PM
rnk marked an inline comment as done.
  • Sink elision back to static helpers
  • Try to optimize away heap allocation in elision analysis
rnk added inline comments.Feb 8 2017, 2:21 PM
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8057 ↗(On Diff #87492)

Elaborating on this, I think that it would be less clear that on first use, a static alloca starts in the "unknown" state.

8047 ↗(On Diff #87670)

Done, but I don't think the signals are that good.

8077 ↗(On Diff #87670)

Do you think the memory use of memoization is worth optimizing for long chains of casts that mid-level optimizations usually remove?

MatzeB added inline comments.Feb 9 2017, 8:59 PM
include/llvm/CodeGen/MachineFrameInfo.h
573 ↗(On Diff #87700)

Do not repeat the name of the documented function.

include/llvm/Target/TargetCallingConv.h
71–72 ↗(On Diff #87700)

This header makes me cry: Define everything twice as bit and as the number of the bit. Just to use the number of the bit in a One << XXX expression below... Anyway unrelated to this patch which stays consistent with the existing madness.

128–129 ↗(On Diff #87700)

Wrong Copy&Paste!

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8051 ↗(On Diff #87700)

FuncInfo could be a reference as it mustn't be nullptr.

8074 ↗(On Diff #87700)

Maybe document that the address of those allocas must not escape until a store is found (because that explains some of the complexity in the loop).

8077–8082 ↗(On Diff #87700)

You could move those two checks into the if (!SI) block.

8131–8135 ↗(On Diff #87700)

Most arguments could be reference as they mustn't be nullptr.

8402 ↗(On Diff #87700)

Intuitively I would assume this should get a MO_Invariant flag for pretty much all ABIs here. I think however there is other code deducing this from the frame index, so it is probably fine to use the default pointer info here. If you haven't yet, please test that when you build an optimized function with/without your patch you get the same memory operands.

8462 ↗(On Diff #87700)

extra space after =

lib/Target/X86/X86ISelLowering.cpp
2718 ↗(On Diff #87700)

I don't know right now what the ABIs say, but I would expect the argument memory to be immutable. So maybe something like !AlwayseUseMutable would work here?

rnk marked 6 inline comments as done.Feb 13 2017, 4:51 PM

Thanks for the review!

include/llvm/Target/TargetCallingConv.h
71–72 ↗(On Diff #87700)

Given my copy-pasto, I fixed this in rL294989. I can't believe how long we've lived with this manual masking.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8131–8135 ↗(On Diff #87700)

Many of these are used in pointer-y ways like being inserted into a pointer set. IMO they're better as pointers.

8402 ↗(On Diff #87700)

Actually, we need to ensure that the load is not invariant. Once we elide the copy, we can end up mutating the argument in the caller allocated memory. I have a test for this.

lib/Target/X86/X86ISelLowering.cpp
2718 ↗(On Diff #87700)

x86 at least says the opposite. If the ABI required that we not mutate the argument in memory, we wouldn't be able to do this transformation. So, maybe this is a good place to comment that the non-immutability is important.

Also, it looks like during the evolution of this patch, I ended up not needing setIsImmutableObjectIndex. I'll remove that. It's redundant with this object creation.

rnk updated this revision to Diff 88278.Feb 13 2017, 4:52 PM
rnk marked an inline comment as done.
  • Implement suggestions from Mathias
rnk added a reviewer: sunfish.Feb 13 2017, 6:10 PM
MatzeB accepted this revision.Feb 13 2017, 6:58 PM

LGTM

include/llvm/Target/TargetCallingConv.h
71–72 ↗(On Diff #87700)

Thanks, this new version is so much simpler!

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8131–8135 ↗(On Diff #87700)

It's certainly a subjective thing given that much of the existing code is written with pointers. I don't have a strong opinion on it, so keep it.

8402 ↗(On Diff #87700)

Interesting, I didn't know that is legal. Fine with me then.

This revision is now accepted and ready to land.Feb 13 2017, 6:58 PM
chandlerc edited edge metadata.Feb 13 2017, 7:19 PM

More comments...

include/llvm/Target/TargetCallingConv.h
48 ↗(On Diff #88278)

This isn't really that the argument is an elision candidate, it means the argument's copy is *elided* and you transform based on this.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8106 ↗(On Diff #88278)

What if the value being stored is the address of a static alloca? We don't mark it as clobbered above because we're in the store instruction case.

8132–8134 ↗(On Diff #88278)

Would it be worth stopping the scan when you're out of arguments?

8278–8279 ↗(On Diff #88278)

You erase things from the set below but don't clear the flags.... I assume this can't manifest because of how the flag is currently used, but it seems surprising at the least. Maybe sink this to the below code where you're going to do the elision?

majnemer added inline comments.
lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8108–8109 ↗(On Diff #88278)

Why not just pass Dst into GetInfoIfStaticAlloca?

8122 ↗(On Diff #88278)

Would !Arg->getType()->isSized() be more precise?

rnk marked 2 inline comments as done.Feb 14 2017, 11:23 AM
rnk added inline comments.
include/llvm/Target/TargetCallingConv.h
48 ↗(On Diff #88278)

No, we don't. This flag tells the target that if the argument lives in memory, then it should return a FrameIndexSDNode, similar to the way that we would if the IsByVal flag was set. If the argument doesn't live in memory, it should ignore this flag. We only do the transform if a frame index comes back from TLI->LowerFormalArguments.

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8122 ↗(On Diff #88278)

That ignores element counts on arrays and vectors, right? Do you think that's right? I think what I really wanted to say was "I don't know what the heck would happen if we elided the copy of a zero sized argument, so let's not try".

8132–8134 ↗(On Diff #88278)

Yeah, probably for -O0 builds with tons of allocas.

8278–8279 ↗(On Diff #88278)

The flags are just temporary data, and they're only supposed to indicate candidates, not successfully transformed arguments. It doesn't seem worth clearing them.

I did notice that I can check the flag instead of doing a hash lookup below, though.

rnk updated this revision to Diff 88405.Feb 14 2017, 11:23 AM
rnk marked an inline comment as done.
  • David and Chandler's comments
rnk updated this revision to Diff 88938.Feb 17 2017, 11:59 AM
  • Fix bug when splitting i64 between registers and memory
  • Add support for simple copy elision on non-x86 platforms
rnk updated this revision to Diff 88978.Feb 17 2017, 3:38 PM
  • Fix bug when splitting i64 between registers and memory
  • Add support for simple copy elision on non-x86 platforms
rnk requested review of this revision.Feb 17 2017, 3:54 PM
rnk edited edge metadata.

This changed enough that it's worth another look.

FWIW, this looks fine to me with minor tweaks below. Would be great to have @MatzeB be happy as well though, this is a part of the backend I'm somewhat less familiar with.

include/llvm/Target/TargetCallingConv.h
48 ↗(On Diff #88278)

Ah, OK. The term "candidate" is confusing me I think, but I don't reall yhave a better idea for a name...

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
8077 ↗(On Diff #87670)

I just don't want a quadratic path. You could add a recursion limit instead if you never expect the tall chains in practice?

lib/Target/X86/X86ISelLowering.cpp
2733 ↗(On Diff #88978)

Dy ou need the else? It's after a return...

2747–2755 ↗(On Diff #88978)

You could just put this inside the if above in the for loop... Dunno, the loop above just feels awkward, but not sure there is really better way to do this.

FWIW, this looks fine to me with minor tweaks below. Would be great to have @MatzeB be happy as well though, this is a part of the backend I'm somewhat less familiar with.

My LGTM still stands (though I agree with all the comments made in between). I'm not super familiar with SelectionDAG either though.

test/CodeGen/X86/pr30430.ll
104–111 ↗(On Diff #88978)

Why are those classified as spills?

This revision was automatically updated to reflect the committed changes.
MatzeB added a comment.Mar 3 2017, 4:26 PM

This seems to break swift. We probably have an i1 parameter ending in

int FI = MFI.CreateFixedObject(ArgVT.getSizeInBits() / 8,
                               VA.getLocMemOffset(), /*Immutable=*/false);

resulting in Assertion failed: (Size != 0 && "Cannot allocate zero size fixed stack objects!"), function CreateFixedObject, file /Volumes/Data/swift-next/lib/CodeGen/MachineFunction.cpp, line 824.

I am still working on a reproducer.

MatzeB added a comment.Mar 3 2017, 5:53 PM

I disabled the argument copy elision for illegal types in r296950 to get the swift bots back to green.