Page MenuHomePhabricator

[mlir][vulkan-runner] Add basic timing for compute pipeline
ClosedPublic

Authored by antiagainst on Mar 3 2020, 8:32 AM.

Details

Summary

This commit adds timestamp query commands in Vulkan runner's
compute pipeline to gain insights into how long it takes to
run the compute shader. This commit also adds timing from CPU
side for VkQueueSubmit and vkQueueWaitIdle.

Diff Detail

Event Timeline

antiagainst created this revision.Mar 3 2020, 8:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2020, 8:32 AM
rriddle added inline comments.Mar 3 2020, 9:24 AM
mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
17
benvanik added inline comments.Mar 3 2020, 2:05 PM
mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
721

another*

722

you want to insert this after the vkCmdDispatch below - the timestamp only latches the value after previous commands complete (in this case, just the timestamp reset/initial query)

antiagainst marked 4 inline comments as done.

Address comments

mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
17

Changed to llvm::outs. I saw we are using iostream already in RunnerUtils.h and cblas_interface.h so wanted to be consistent there. Should those be fixed?

722

Thanks! I find the spec a bit difficult to parse: "latches the value of the timer when all previous commands have completed executing as far as the specified pipeline stage". I interpret it as the pipeline stage (VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT) controls when to write out the timestamp, not the placement of the vkCmdWriteTimestamp command. (Otherwise the pipeline stage specification does not really make sense.) But I do see there is a difference if I move the second vkCmdWriteTimestamp below vkCmdDispatch. So I'm a bit puzzled here.

benvanik added inline comments.Mar 4 2020, 8:41 AM
mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
722

I believe that vkCmdWriteTimestamp acts as an implicit vkCmdPipelineBarrier with the srcStageMask set to what you pass in (BOTTOM_OF_PIPE), so it effectively partitions the command buffer into two synchronization scopes based on the order the commands were recorded.

mehdi_amini added inline comments.Mar 4 2020, 9:22 AM
mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
17

Isn’t cblas_interface for generated code intended to be used a runtime for generated code, including outside of a JIT environment?

Address comments

antiagainst marked 5 inline comments as done.Mar 4 2020, 10:37 AM
antiagainst added inline comments.
mlir/tools/mlir-vulkan-runner/VulkanRuntime.cpp
17

Good point!

722

Done. I think your interpretation is correct. This makes me wanting to look into the driver implementation over this. :)

mravishankar resigned from this revision.Mar 4 2020, 10:37 AM
benvanik accepted this revision.Mar 4 2020, 1:25 PM
This revision is now accepted and ready to land.Mar 4 2020, 1:25 PM
This revision was automatically updated to reflect the committed changes.
antiagainst marked 2 inline comments as done.