The offsets were wrong. The result is now the same as what the compiler would generate for a function that spills lr normally.
(Looking over the code, I'm concerned this doesn't work correctly on Windows, since the unwind info there is different, but that's a separate issue, I guess.)