For these cases, we already omit the prologue directives, if (!AFI->hasStackFrame() && !windowsRequiresStackProbe && !NumBytes).
When writing the epilogue (after the prolog has been written), if the function doesn't have the WinCFI flag set (i.e. if no prologue was generated), assume that no epilogue will be needed either, and don't emit any epilog start pseudo instruction. After completing the epilogue, make sure that we end up with matching epilog start/end.
Previously, when epilog start/end was generated, but no prologue, the unwind info for such functions actually was huge; 12 bytes xdata (4 bytes header, 4 bytes for one non-folded epilogue, 4 bytes for padded opcodes) and 8 bytes pdata. Because the epilog consisted of one opcode (end) but the prolog was empty (no .seh_endprologue), the epilogue couldn't be folded into the prologue, and thus couldn't be considered for packed form either.
On a 6.5 MB DLL with 110 KB pdata and 166 KB xdata, this gets rid of 38 KB pdata and 62 KB xdata.
Not sure the HasEpilogStart boolean is useful here; could you just assert(HasWinCFI == MF.hasWinCFI())? And we shouldn't ever need to call setHasWinCFI().