The TC_RETURN/TCRETURNdi under Arm does not currently add the register-mask operand when tail folding, which leads to the register (like LR) not being 'used' by the return. This changes the code to unconditionally set the register mask on the call, as opposed to skipping it for tail calls.
I don't believe this will currently alter any codegen, but should glue things together better post-frame lowering. It matches the AArch64 code better, but I don't know this code very well.