AMDGPUPrintfRuntimeBindingPass is not run in the IR optimization
pipeline with -O0.
This means that with OpenCL the printf definition coming from
device_libs gets linked with the user's code, which blocks
AMDGPUPrintfRuntimeBindingPass from working after the linkage is done.
This patch also adds an opt-pipeline.ll test inspired from
llc-pipeline.ll to document the optimization pipeline.
This will be a really fragile negative check, just use generated checks for the output