When -save-temps is specified, add two actions to the compilation to generate two new .ll outputs: one containing unoptimized IR and another one containing optimized IR.
Note that these new additional outputs will not be generated in compilations that target multiple architectures, because in that case the compiler generates an error when not all outputs generated by Actions can be processed by lipo.
I'm not sure I understand why unoptimized IR can't be written out for multi-arch builds like CUDA. Could you elaborate?