Instead of generating implicit loads within basic blocks, put them before the instructions of the statment itself, including non-affine subregions. The region's entry node is dominating all blocks in the region and therefore the loaded value will be available there.
Implicit writes in block-stmts were already stored back at the end of the block. Now, also generate the stores of non-affine subregions when leaving the statement, i.e. in the exiting block.
This change is required for array-mapped implicits ("De-LICM") to ensure that there are no dependencies of demoted scalars within statments. Statement load all required values, operator on copied in registers, and then write back the changed value to the demoted memory. Lifetimes analysis within statements becomes unecessary.
If we keep the SetInsertPoint and the generateScalarXXX calls in the other copyBB function we don't need to set the insert point manually outside (noted somewhere below). If there is a reason you moved it here please explain.