This is a refactored and rebased version of D49691 so that it looks reasonable on Phabricator. I've factored out a initial NFC patch (the first snapshot in history) to highlight the actual functional change.
I don't follow why it matters if the output is glued, given there isn't a glue input. If I'm understanding correctly, if the copy isn't glued, in it doesn't modify memory. If the copy is glued, the only way to reach this case should be through a recursive call of ImproveChain. Or are you just trying to be conservative here?
In terms of the actual checks here, is this sufficient to catch any inline asm that accesses memory? Do you need to check for Extra_HasSideEffects? Do you need to check for "indirect" operands separately? (I think it's possible to write an indirect register operand in IR, although I'm not sure clang ever generates them...)