Added a NVPTX codegen test to verify that our change is in effect. It
also shows the unnecessary register pressure caused by over-sinking.
Updated an affected test in AArch64. I am not an expert on AArch64, but
the new machine code for this test seems equivalent to the original one.