Both X86FrameLowering and X86CallFrameOptimization emit PUSHes/POPs and ADDs/SUBs to the stack pointer. If each of these instructions is not accompanied by an appropriate CFI instruction, then when single-stepping through the resulting binaries in a debugger, backtraces will be unavailable at many points. Actually, it's worse than that; sometimes the debugger will show a wildly incorrect backtrace.
There are still places where more work is needed on CFI generation, but this seems to cover the most important ones. At least, I can single-step through a test program and get a correct backtrace at every point.
Note that some optimizations which merge consecutive instructions may need to become smarter to perform merging when a CFI instruction is in the middle.
I'm not happy with the duplication of the (rather verbose) code to check whether generation of Dwarf data was requested or not; it seems that this should just be done once per build, or once per compilation unit, and the results used everywhere else.
I won't ask you to refactor this goo, I'll see if I can do it myself at some point.