This is an archive of the discontinued LLVM Phabricator instance.

[WinEH] Require token linkage in EH pad/ret signatures
ClosedPublic

Authored by JosephTremoulet on Aug 18 2015, 11:17 AM.

Details

Summary

WinEHPrepare is going to require that cleanuppad and catchpad produce values
of token type which are consumed by any cleanupret or catchret exiting the
pad. This change updates the signatures of those operators to require/enforce
that the type produced by the pads is token type and that the rets have an
appropriate argument.

The catchpad argument of a CatchReturnInst must be a CatchPadInst (and
similarly for CleanupReturnInst/CleanupPadInst). To accommodate that
restriction, this change adds a notion of an operator constraint to both
LLParser and BitcodeReader, allowing appropriate sentinels to be constructed
for forward references and appropriate error messages to be emitted for
illegal inputs.

Also add a verifier rule (noted in LangRef) that a catchpad with a catchpad
predecessor must have no other predecessors; this ensures that WinEHPrepare
will see the expected linear relationship between sibling catches on the
same try.

Lastly, remove some superfluous/vestigial casts from instruction operand
setters operating on BasicBlocks.

Diff Detail

Event Timeline

JosephTremoulet retitled this revision from to [WinEH] Require token linkage in EH pad/ret signatures.
JosephTremoulet updated this object.
JosephTremoulet added reviewers: majnemer, rnk.
JosephTremoulet added a subscriber: llvm-commits.

Rebase/update to include state numbering change

rnk added inline comments.Aug 18 2015, 4:02 PM
docs/LangRef.rst
5197–5198

s/consumes the catchpad's value/consumes the catchpad/ maybe?

I don't think the parenthetical makes this any clearer. WDYT?

include/llvm/IR/Instructions.h
3581–3584

Should we tighten up these Value* types to be CleanupBlockInst*?

3622–3623

Regardless of whether we tighten the constructor, these should probably use CleanupPadInst* instead of Value*. Use cast<> to pull it out, it'll assert if things go wrong.

4022–4025

Ditto, should we be more strict, or does that have wide-ranging complexity? At the very least, we should assert in init if we don't already.

4048–4049

Ditto, we should tighten the accesssors.

Tightening the constructors and accessors isn't straightforward because we allow UndefValue in addition to FooPadInst. I thought it was important to allow UndefValue so that it's legal to replaceAllUsesWith(UndefValue) on, say, an unreachable instruction that happens to be an EH pad.

Since we allow UndefValue, we need to be able to serialize it, which in turn means we need to deserialize it, so we need a constructor that takes UndefValue *. I thought typing the constructor as Value * was cleanest, but I could switch to different overloads for the Undef and non-Undef cases if you prefer (assuming I was misinterpreting the dummy operands thing).

I actually first tried this change with the accessors tightened. The getFooPad() methods were straightforward enough; just dynamic_cast and let UndefValue get translated to nullptr. The setFooPad() methods are where I got stuck. I felt like you should be able to round-trip FooRetX->setFooPad(FooRetY->getFooPad()), like we do in the copy ctor, and since getFooPad() can return nullptr that meant that setFooPad needed to accept nullptr and translate it to UndefValue. With setFooPad doing that translation, I thought that FooPadInst * was a more appropriate type for the argument than Value *. But once I made setFooPad take a FooPadInst *, the constructor (that I had taking a Value * for the reasons above) couldn't pass it straight through, and that didn't feel right either.

docs/LangRef.rst
5197–5198

s/consumes the catchpad's value/consumes the catchpad/ maybe?

Yeah, that's better, will update.

I don't think the parenthetical makes this any clearer. WDYT?

I wish the parenthetical were more clear, but I think the sentence would be incorrect without it. I need EH Prep to be able to assume that there isn't a cycle from a pad to itself in a single function, but entering the same pad twice is of course totally fine in different activations of a recursive function. I'd be happy to reword if you have a suggestion that makes it clear. And I can remove the parenthetical if you think it's obvious that the recursive case is allowed, but that was my thinking.

include/llvm/IR/Instructions.h
4022–4025

At the very least, we should assert in init if we don't already

I originally had an assert in init, but it was failing running llvm-as over feature/exception.ll. I hypothesized that maybe dummy operands are created during parsing to facilitate lexically-forward references? I will put that assert back and debug deeper.

I suppose that the accessors could be tightened by presenting it as two logical properties -- a bool indicating whether the value is undef (what would be a good name for that?) and a FooPadInst *, each with a get/set pair. The constructors could take Value*s and translate, or maybe have different overloads (one taking a FooPadInst * and one not) and force callers to decide.

rnk edited edge metadata.Aug 18 2015, 4:40 PM

Oh, shoot, undef. I think we may be able to get by without allowing undef then. LLVM's main utility for removing unreachable code is removeUnreachableBlocks(), and it doesn't do RAUW(undef). I think it assumes that if something is unreachable, then all direct uses of it must also be unreachable, since they must be dominated by unreachable code. Because of the token type rules, the EH pads will always dominate their EH return instructions, and they'll all get knocked out at once.

docs/LangRef.rst
5197–5198

I think the recursive case is eliminated because the recursive call would produce and consume a new catchpad value, sort of like how re-executing the same alloca in a recursive frame gives you new stack memory.

Ok, if we can disallow undef that will certainly simplify this.

I'm not sure what to do about the llvm-as case, though. I debugged into it, and it is indeed a forward reference. What happens is we process this in exception.ll:

  catchret %cbv to label %exit
...
  %cbv = catchpad [i7 4] to label %bb unwind label %bb3

and then we go to construct the CatchReturnInst with this code in LLParser.cpp:

bool LLParser::ParseCatchRet(Instruction *&Inst, PerFunctionState &PFS) {
  Value *CatchPad = nullptr;

  if (ParseValue(Type::getTokenTy(Context), CatchPad, PFS))
    return true;

  BasicBlock *BB;
  if (ParseToken(lltok::kw_to, "expected 'to' in catchret") ||
      ParseTypeAndBasicBlock(BB, PFS))
      return true;

  Inst = CatchReturnInst::Create(CatchPad, BB);

that call to ParseValue eventually winds up in this code in PerFunctionState::GetVal:

// Otherwise, create a new forward reference for this value and remember it.
Value *FwdVal;
if (Ty->isLabelTy())
  FwdVal = BasicBlock::Create(F.getContext(), Name, &F);
else
  FwdVal = new Argument(Ty, Name);

so we need a way to create a CatchReturnInst with an Argument. Should I just reinterpret_cast the Argument * to a CatchPadInst * in LLParser? Maybe tighten the accessor and the public Create method (and IRBuilder's method) but leave the private constructors typed as Value * , make LLParser a friend class, and have the parser call the private constructor directly? The bitcode reader has the same issue btw (BitcodeReader::parseFunctionBody calls geValue which gets down into BitcodeReaderValueList::getValueFwdRef that creates an Argument).

majnemer edited edge metadata.Aug 18 2015, 5:38 PM

I think it's fine to let them take Value * instead of something more specific. The verifier should catch any mistakes.

JosephTremoulet edited edge metadata.
  • Tidy up some LangRef language per review feedback
  • Fix a gcc build warning
JosephTremoulet marked 3 inline comments as done.Aug 18 2015, 6:01 PM

I suppose I could change just the FooReturnInst::getFooPad() accessors' return types to FooPad * (and remove support for undef from the verifier, LangRef, examples, and tests). That way:

  • Parsers could still sneak in an Argument through the public API (and likewise for any transforms that temporarily use placeholder values during a rewrite)
  • Create and setFooPad() would agree with each other in accepting arbitrary Value * (but the verifier would catch non-FooPads if any were left)
  • Callers of getFooPad() wanting a FooPadInst * wouldn't have to downcast, and wouldn't have to deal with the possibility of undef

That would mean we'll have values that can't have all their uses replaced with undef (unless those uses are rewritten/removed before the next verifier pass), but if that's not an important property to preserve for LLVM (I may have been biased by other codebases) then callers of getFooPad will benefit.

Would that be a happy compromise?

I suppose I could change just the FooReturnInst::getFooPad() accessors' return types to FooPad * (and remove support for undef from the verifier, LangRef, examples, and tests). That way:

  • Parsers could still sneak in an Argument through the public API (and likewise for any transforms that temporarily use placeholder values during a rewrite)
  • Create and setFooPad() would agree with each other in accepting arbitrary Value * (but the verifier would catch non-FooPads if any were left)
  • Callers of getFooPad() wanting a FooPadInst * wouldn't have to downcast, and wouldn't have to deal with the possibility of undef

That would mean we'll have values that can't have all their uses replaced with undef (unless those uses are rewritten/removed before the next verifier pass), but if that's not an important property to preserve for LLVM (I may have been biased by other codebases) then callers of getFooPad will benefit.

Would that be a happy compromise?

I think it's OK to let non-pad operands have undef so long as they are transient and will get wiped out before the next pass runs. We permit this sort of thing for degenerate phi and basic blocks.

docs/ExceptionHandling.rst
676

Shouldn't the cleanuppad be followed by []

681

Should the void be removed?

JosephTremoulet marked 2 inline comments as done.
  • Fix some bad syntax in docs per review feedback

Updating just the accessors' return types didn't work like I'd hoped. It would require the verifier to reject mismatches, but in order to do so with a proper error message both the verifier and the asm printer would have needed to get the bogus operand without calling getFooPad(), which meant either hardcoding the pad operand index in the verifier/printer (seems bad) or adding another method to the FooReturnInst classes returning the index or returning the pad operand as a Value *, at which point the API surface on the FooRetInstrs looked pretty convoluted. I also explored using a dynamic_cast in the parsers, but of course that just leads to the same issues with the verifier/printer. So I think that if we don't want the FooReturnInst ctors/accessors/mutators to all take Value * (which certainly seems in-line with keeping the IR approachable), the only thing to do that doesn't involve horrible hacks is to teach the parsers that some forward references need to be FooPads. I will look into doing that, but let me know if that seems off-base, and feel free to suggest something better.

docs/ExceptionHandling.rst
676

Yep, thanks.

681

Yes, updated.

JosephTremoulet updated this object.
  • Disallow undef in catchret/cleanupret
  • Change signatures of ctors/getters/setters to on FooReturnInsts to use FooPad * rather than Value *
  • Update parsers to understand operator contraints and create appropriate sentinels for forward references
  • Add negative assembler tests for operator constraints
JosephTremoulet marked 7 inline comments as done.Aug 21 2015, 8:40 AM
majnemer added inline comments.Aug 21 2015, 10:48 AM
include/llvm/IR/Instructions.h
37

Please sort this relative to the other forward declarations.

3707

Why is the reinterpret_cast necessary here?

4035–4036

We can also assert that CatchPad isn't null.

lib/AsmParser/LLParser.cpp
2237–2253

Could we add a case for none? I think it'd let us reduce the indentation here.

lib/AsmParser/LLParser.h
114–116 ↗(On Diff #32829)

Enumerators should start with a capital letter per the coding standards: http://llvm.org/docs/CodingStandards.html#name-types-functions-variables-and-enumerators-properly

lib/Bitcode/Reader/BitcodeReader.cpp
46–50

Likewise with this these enumerators.

include/llvm/IR/Instructions.h
3707

Evidently it's not, or the current ToT would be broken. When I tried to omit the cast from CatchPadInst * or CleanupPadInst * to Value * in setCatchPad or setCleanupPad, MSVC rejected it, which I presume has to do with some rule about concatenating implicit and explicit conversions that I didn't fully dig into. I took a look, and in the rest of this file (outside the new EH instructions) all the setters that take BasicBlock * use reinterpret_cast<Value *>, so I decided to follow suit and reinterpret_cast<Value *> in all the new EH instruction setters taking CatchPadInst * or CleanupPadInst * or BasicBlock *, for fear that different host compilers accept/reject different patterns. Since this one was clearly working before, I'm happy to revert it if you want me to, but that was my reasoning.

  • sort forward declarations
  • assert catchpads/cleanuppads are non-null in catchret/cleanupret constructors and setters
  • PascalCase new enum values, add prefix and drop qualifier in references
  • fold OC_None into switches to save indentation
JosephTremoulet marked 5 inline comments as done.Aug 21 2015, 12:17 PM

Feedback incorporated.

lib/AsmParser/LLParser.h
114–116 ↗(On Diff #32852)

Updated, thanks for the pointer.

majnemer added inline comments.Aug 21 2015, 1:11 PM
include/llvm/IR/Instructions.h
3707

I am having trouble imagining what use those reinterpret_cast are doing. If there are places where MSVC rejects, can you use static_cast instead of reinterpret_cast?

JosephTremoulet updated this object.
JosephTremoulet marked an inline comment as done.
JosephTremoulet marked 4 inline comments as done.Aug 21 2015, 9:37 PM
JosephTremoulet added inline comments.
include/llvm/IR/Instructions.h
3707

I dug deeper into this. It turned out that the case MSVC was rejecting was in CleanupReturnInst::setCleanupPad, which was in fact quite reasonable since at that point CleanupPadInst was just forward-referenced, not defined, so of course it couldn't have known how to statically cast it, let alone realize that it could do so implicitly. Obvious in hindsight :/ I moved CleanupReturnInst later in the file and removed the cast.

I also took a closer look at the similar casts in the file. I was incorrect when I said "the rest of this file (outside the new EH instructions) all the setters that take BasicBlock * use reinterpret_cast<Value *>, " -- what was actually the case was that switch and invoke used reinterpret_cast, and br and indirectbr used c-style casts. I looked back through the history of InvokeInst, and it actually has bounced back and forth between c-style and reinterpret casts, but the commit messages don't discuss it. At any rate, either kind of cast appears unnecessary now (Instructions.h #includes Function.h which #includes BasicBlock.h), so I've removed those as well.

majnemer accepted this revision.Aug 21 2015, 9:50 PM
majnemer edited edge metadata.

LGTM, thanks so much for working on this!

This revision is now accepted and ready to land.Aug 21 2015, 9:50 PM
JosephTremoulet marked an inline comment as done.