Page MenuHomePhabricator

[clang][driver] Print compilation phases with indentation.
ClosedPublic

Authored by hliao on Oct 17 2019, 11:06 AM.

Diff Detail

Event Timeline

hliao created this revision.Oct 17 2019, 11:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2019, 11:06 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

this patch enables the dumping of actions in the hierarchy or tree. In most cases, it's a linear list but, for offload compilation, a tree representation is more intuitive. Even though there are cross-subtree edges, they are rare and also noted in the corresponding actions.

tra added a comment.Oct 17 2019, 11:29 AM

Could you give an example of before/after output?

In D69124#1713360, @tra wrote:

Could you give an example of before/after output?

$ clang -x cuda -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_60 -c ~/dummy.cpp 
     0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda)
    1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
   2: compiler, {1}, ir, (host-cuda)
         3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30)
        4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
       5: compiler, {4}, ir, (device-cuda, sm_30)
      6: backend, {5}, assembler, (device-cuda, sm_30)
     7: assembler, {6}, object, (device-cuda, sm_30)
    8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object
    9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler
         10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60)
        11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60)
       12: compiler, {11}, ir, (device-cuda, sm_60)
      13: backend, {12}, assembler, (device-cuda, sm_60)
     14: assembler, {13}, object, (device-cuda, sm_60)
    15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object
    16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler
   17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
  18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
 19: backend, {18}, assembler, (host-cuda)
20: assembler, {19}, object, (host-cuda)
hliao added a comment.EditedOct 17 2019, 11:41 AM
In D69124#1713360, @tra wrote:

Could you give an example of before/after output?

For HIP

$ clang -x hip -ccc-print-phases --cuda-gpu-arch=gfx900 --cuda-gpu-arch=gfx906 -c ~/dummy.cpp 
     0: input, "/home/michliao/dummy.cpp", hip, (host-hip)
    1: preprocessor, {0}, hip-cpp-output, (host-hip)
   2: compiler, {1}, ir, (host-hip)
        3: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx900)
       4: preprocessor, {3}, hip-cpp-output, (device-hip, gfx900)
      5: compiler, {4}, ir, (device-hip, gfx900)
     6: linker, {5}, image, (device-hip, gfx900)
    7: offload, "device-hip (amdgcn-amd-amdhsa:gfx900)" {6}, image
        8: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx906)
       9: preprocessor, {8}, hip-cpp-output, (device-hip, gfx906)
      10: compiler, {9}, ir, (device-hip, gfx906)
     11: linker, {10}, image, (device-hip, gfx906)
    12: offload, "device-hip (amdgcn-amd-amdhsa:gfx906)" {11}, image
   13: linker, {7, 12}, hip-fatbin, (device-hip)
  14: offload, "host-hip (x86_64-unknown-linux-gnu)" {2}, "device-hip (amdgcn-amd-amdhsa)" {13}, ir
 15: backend, {14}, assembler, (host-hip)
16: assembler, {15}, object, (host-hip)
tra added a comment.Oct 17 2019, 12:03 PM

This is... rather oddly-structured output. My brain refuses to accept that the most-indented phase is the input.
Perhaps we should do llvm::errs().indent(MaxIdent-Ident). This should give us something like this (withMaxIdent=9), which is somewhat easier to grok, IMO:

    0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda)
     1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
      2: compiler, {1}, ir, (host-cuda)
3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30)
 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
  5: compiler, {4}, ir, (device-cuda, sm_30)
   6: backend, {5}, assembler, (device-cuda, sm_30)
    7: assembler, {6}, object, (device-cuda, sm_30)
     8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object
     9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler
10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60)
 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60)
  12: compiler, {11}, ir, (device-cuda, sm_60)
   13: backend, {12}, assembler, (device-cuda, sm_60)
    14: assembler, {13}, object, (device-cuda, sm_60)
     15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object
     16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler
      17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
       18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
        19: backend, {18}, assembler, (host-cuda)
         20: assembler, {19}, object, (host-cuda)
hliao updated this revision to Diff 225620.Oct 18 2019, 7:44 AM

revise the output by drawing tree lines. now, the output looks like

$ clang -x cuda -ccc-print-phases --cuda-gpu-arch=sm_30 --cuda-gpu-arch=sm_60 -c dummy.cpp
            +- 0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda)
         +- 1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
      +- 2: compiler, {1}, ir, (host-cuda)
      |                 +- 3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30)
      |              +- 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
      |           +- 5: compiler, {4}, ir, (device-cuda, sm_30)
      |        +- 6: backend, {5}, assembler, (device-cuda, sm_30)
      |     +- 7: assembler, {6}, object, (device-cuda, sm_30)
      |  +- 8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object
      |  |- 9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler
      |  |              +- 10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60)
      |  |           +- 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60)
      |  |        +- 12: compiler, {11}, ir, (device-cuda, sm_60)
      |  |     +- 13: backend, {12}, assembler, (device-cuda, sm_60)
      |  |  +- 14: assembler, {13}, object, (device-cuda, sm_60)
      |  |- 15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object
      |  |- 16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler
      |- 17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
   +- 18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
+- 19: backend, {18}, assembler, (host-cuda)
20: assembler, {19}, object, (host-cuda)
$ clang -x hip -ccc-print-phases --cuda-gpu-arch=gfx900 --cuda-gpu-arch=gfx906 -c dummy.cpp
            +- 0: input, "/home/michliao/dummy.cpp", hip, (host-hip)
         +- 1: preprocessor, {0}, hip-cpp-output, (host-hip)
      +- 2: compiler, {1}, ir, (host-hip)
      |              +- 3: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx900)
      |           +- 4: preprocessor, {3}, hip-cpp-output, (device-hip, gfx900)
      |        +- 5: compiler, {4}, ir, (device-hip, gfx900)
      |     +- 6: linker, {5}, image, (device-hip, gfx900)
      |  +- 7: offload, "device-hip (amdgcn-amd-amdhsa:gfx900)" {6}, image
      |  |           +- 8: input, "/home/michliao/dummy.cpp", hip, (device-hip, gfx906)
      |  |        +- 9: preprocessor, {8}, hip-cpp-output, (device-hip, gfx906)
      |  |     +- 10: compiler, {9}, ir, (device-hip, gfx906)
      |  |  +- 11: linker, {10}, image, (device-hip, gfx906)
      |  |- 12: offload, "device-hip (amdgcn-amd-amdhsa:gfx906)" {11}, image
      |- 13: linker, {7, 12}, hip-fatbin, (device-hip)
   +- 14: offload, "host-hip (x86_64-unknown-linux-gnu)" {2}, "device-hip (amdgcn-amd-amdhsa)" {13}, ir
+- 15: backend, {14}, assembler, (host-hip)
16: assembler, {15}, object, (host-hip)
hliao added a comment.Oct 18 2019, 7:46 AM
In D69124#1713427, @tra wrote:

This is... rather oddly-structured output. My brain refuses to accept that the most-indented phase is the input.
Perhaps we should do llvm::errs().indent(MaxIdent-Ident). This should give us something like this (withMaxIdent=9), which is somewhat easier to grok, IMO:

    0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda)
     1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
      2: compiler, {1}, ir, (host-cuda)
3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30)
 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
  5: compiler, {4}, ir, (device-cuda, sm_30)
   6: backend, {5}, assembler, (device-cuda, sm_30)
    7: assembler, {6}, object, (device-cuda, sm_30)
     8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object
     9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler
10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60)
 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60)
  12: compiler, {11}, ir, (device-cuda, sm_60)
   13: backend, {12}, assembler, (device-cuda, sm_60)
    14: assembler, {13}, object, (device-cuda, sm_60)
     15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object
     16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler
      17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
       18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
        19: backend, {18}, assembler, (host-cuda)
         20: assembler, {19}, object, (host-cuda)

As the top-level actions are the last actions to be performed, they should have no indentation

In D69124#1713427, @tra wrote:

This is... rather oddly-structured output. My brain refuses to accept that the most-indented phase is the input.
Perhaps we should do llvm::errs().indent(MaxIdent-Ident). This should give us something like this (withMaxIdent=9), which is somewhat easier to grok, IMO:

    0: input, "/home/michliao/dummy.cpp", cuda, (host-cuda)
     1: preprocessor, {0}, cuda-cpp-output, (host-cuda)
      2: compiler, {1}, ir, (host-cuda)
3: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_30)
 4: preprocessor, {3}, cuda-cpp-output, (device-cuda, sm_30)
  5: compiler, {4}, ir, (device-cuda, sm_30)
   6: backend, {5}, assembler, (device-cuda, sm_30)
    7: assembler, {6}, object, (device-cuda, sm_30)
     8: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {7}, object
     9: offload, "device-cuda (nvptx64-nvidia-cuda:sm_30)" {6}, assembler
10: input, "/home/michliao/dummy.cpp", cuda, (device-cuda, sm_60)
 11: preprocessor, {10}, cuda-cpp-output, (device-cuda, sm_60)
  12: compiler, {11}, ir, (device-cuda, sm_60)
   13: backend, {12}, assembler, (device-cuda, sm_60)
    14: assembler, {13}, object, (device-cuda, sm_60)
     15: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {14}, object
     16: offload, "device-cuda (nvptx64-nvidia-cuda:sm_60)" {13}, assembler
      17: linker, {8, 9, 15, 16}, cuda-fatbin, (device-cuda)
       18: offload, "host-cuda (x86_64-unknown-linux-gnu)" {2}, "device-cuda (nvptx64-nvidia-cuda)" {17}, ir
        19: backend, {18}, assembler, (host-cuda)
         20: assembler, {19}, object, (host-cuda)

it's difficult to choose a proper MaxIdent though to avoid unnecessary leading whitespaces or misalign the output. How about we draw the tree edges in the original output? It looks much easier for me to identify sibling actions.

tra accepted this revision.Oct 18 2019, 4:15 PM

Neat. I like the visual cues showing what gets passed on to the next processing stage.

This revision is now accepted and ready to land.Oct 18 2019, 4:15 PM
This revision was automatically updated to reflect the committed changes.