Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
this patch enables the dumping of actions in the hierarchy or tree. In most cases, it's a linear list but, for offload compilation, a tree representation is more intuitive. Even though there are cross-subtree edges, they are rare and also noted in the corresponding actions.
$ clang -x cuda -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_60 -c ~/dummy.cpp 0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda) 1: preprocessor, {0}, cuda-cpp-output, (host-cuda) 2: compiler, {1}, ir, (host-cuda) 3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30) 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30) 5: compiler, {4}, ir, (device-cuda, sm_30) 6: backend, {5}, assembler, (device-cuda, sm_30) 7: assembler, {6}, object, (device-cuda, sm_30) 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler 10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60) 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60) 12: compiler, {11}, ir, (device-cuda, sm_60) 13: backend, {12}, assembler, (device-cuda, sm_60) 14: assembler, {13}, object, (device-cuda, sm_60) 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda) 18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir 19: backend, {18}, assembler, (host-cuda) 20: assembler, {19}, object, (host-cuda)
For HIP
$ clang -x hip -ccc-print-phases --cuda-gpu-arch=gfx900 --cuda-gpu-arch=gfx906 -c ~/dummy.cpp 0: input, "/home/michliao/dummy.cpp", hip, (host-hip) 1: preprocessor, {0}, hip-cpp-output, (host-hip) 2: compiler, {1}, ir, (host-hip) 3: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx900) 4: preprocessor, {3}, hip-cpp-output, (device-hip, gfx900) 5: compiler, {4}, ir, (device-hip, gfx900) 6: linker, {5}, image, (device-hip, gfx900) 7: offload, "device-hip (amdgcn-amd-amdhsa:gfx900)" {6}, image 8: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx906) 9: preprocessor, {8}, hip-cpp-output, (device-hip, gfx906) 10: compiler, {9}, ir, (device-hip, gfx906) 11: linker, {10}, image, (device-hip, gfx906) 12: offload, "device-hip (amdgcn-amd-amdhsa:gfx906)" {11}, image 13: linker, {7, 12}, hip-fatbin, (device-hip) 14: offload, "host-hip (x86_64-unknown-linux-gnu)" {2}, "device-hip (amdgcn-amd-amdhsa)" {13}, ir 15: backend, {14}, assembler, (host-hip) 16: assembler, {15}, object, (host-hip)
This is... rather oddly-structured output. My brain refuses to accept that the most-indented phase is the input.
Perhaps we should do llvm::errs().indent(MaxIdent-Ident). This should give us something like this (withMaxIdent=9), which is somewhat easier to grok, IMO:
0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda) 1: preprocessor, {0}, cuda-cpp-output, (host-cuda) 2: compiler, {1}, ir, (host-cuda) 3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30) 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30) 5: compiler, {4}, ir, (device-cuda, sm_30) 6: backend, {5}, assembler, (device-cuda, sm_30) 7: assembler, {6}, object, (device-cuda, sm_30) 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler 10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60) 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60) 12: compiler, {11}, ir, (device-cuda, sm_60) 13: backend, {12}, assembler, (device-cuda, sm_60) 14: assembler, {13}, object, (device-cuda, sm_60) 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda) 18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir 19: backend, {18}, assembler, (host-cuda) 20: assembler, {19}, object, (host-cuda)
revise the output by drawing tree lines. now, the output looks like
$ clang -x cuda -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_60 -c dummy.cpp +- 0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda) +- 1: preprocessor, {0}, cuda-cpp-output, (host-cuda) +- 2: compiler, {1}, ir, (host-cuda) | +- 3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30) | +- 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30) | +- 5: compiler, {4}, ir, (device-cuda, sm_30) | +- 6: backend, {5}, assembler, (device-cuda, sm_30) | +- 7: assembler, {6}, object, (device-cuda, sm_30) | +- 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object | |- 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler | | +- 10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60) | | +- 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60) | | +- 12: compiler, {11}, ir, (device-cuda, sm_60) | | +- 13: backend, {12}, assembler, (device-cuda, sm_60) | | +- 14: assembler, {13}, object, (device-cuda, sm_60) | |- 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object | |- 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler |- 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda) +- 18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir +- 19: backend, {18}, assembler, (host-cuda) 20: assembler, {19}, object, (host-cuda)
$ clang -x hip -ccc-print-phases --cuda-gpu-arch=gfx900 --cuda-gpu-arch=gfx906 -c dummy.cpp +- 0: input, "/home/michliao/dummy.cpp", hip, (host-hip) +- 1: preprocessor, {0}, hip-cpp-output, (host-hip) +- 2: compiler, {1}, ir, (host-hip) | +- 3: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx900) | +- 4: preprocessor, {3}, hip-cpp-output, (device-hip, gfx900) | +- 5: compiler, {4}, ir, (device-hip, gfx900) | +- 6: linker, {5}, image, (device-hip, gfx900) | +- 7: offload, "device-hip (amdgcn-amd-amdhsa:gfx900)" {6}, image | | +- 8: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx906) | | +- 9: preprocessor, {8}, hip-cpp-output, (device-hip, gfx906) | | +- 10: compiler, {9}, ir, (device-hip, gfx906) | | +- 11: linker, {10}, image, (device-hip, gfx906) | |- 12: offload, "device-hip (amdgcn-amd-amdhsa:gfx906)" {11}, image |- 13: linker, {7, 12}, hip-fatbin, (device-hip) +- 14: offload, "host-hip (x86_64-unknown-linux-gnu)" {2}, "device-hip (amdgcn-amd-amdhsa)" {13}, ir +- 15: backend, {14}, assembler, (host-hip) 16: assembler, {15}, object, (host-hip)
As the top-level actions are the last actions to be performed, they should have no indentation
it's difficult to choose a proper MaxIdent though to avoid unnecessary leading whitespaces or misalign the output. How about we draw the tree edges in the original output? It looks much easier for me to identify sibling actions.
Neat. I like the visual cues showing what gets passed on to the next processing stage.