Before serializing, optimizations on llvm were only called on path to
hsaco, and not cubin. Define opt-level for gpu-to-cubin pass as well,
and move call to optimize llvm to a common place.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
---|---|---|
127 | I am a little skeptical about adding this. Please let me know if there is a better way to do this. |
The update looks good but the layering needs to be fixed.
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
---|---|---|
127 | This can't be right. GPUTransforms can't depend on the ExecutionEngine. You'll need to move the OptUtils to its own library. |
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
---|---|---|
127 | You can create an MLIRLLVMOptUtils with the needed functions shared between SerializeToBlob and ExecutionEngine. |
mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp | ||
---|---|---|
323 | Since this was removed, a header include would need to be updated/trimmed? |
mlir/lib/Dialect/GPU/Transforms/SerializeToHsaco.cpp | ||
---|---|---|
323 | The declaration was in the same file. I have removed it. |
mlir/include/mlir/Dialect/GPU/Transforms/Passes.h | ||
---|---|---|
136 | ... with optLevel (default level 2). | |
mlir/lib/Dialect/GPU/CMakeLists.txt | ||
127 | Missing dep on MLIRExecutionEngineUtils - please see below. | |
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp | ||
16 | This is again an issue I believe. OptUtils.h/cpp is part of MLIRExecutionEngineUtils. If you include this file, you'll need to depend on MLIRExecutionEngineUtils. Without it and in this form, the build/link will simply fail with shared libs turned on. |
mlir/include/mlir/Dialect/GPU/Transforms/Passes.h | ||
---|---|---|
115 | I wouldn't have the default level be 2. This is surprising for both "classical compiler" users that would expect it to be 0 (as in clang/gcc) and "ml performance-oriented compiler" that would expect it ot be 3 or the maximum level. I see that this seems to have been the case for HSACO before, can we track down the commit that introduced that and ask the author for their rationale? | |
140 | Ditto. Especially dangerous because CUBIN defaults to 2 and HSACO doesn't have a default. | |
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp | ||
109 | Ultra-nit: don't start diagnostic messages from a letter. |
mlir/include/mlir/Dialect/GPU/Transforms/Passes.h | ||
---|---|---|
115 | Sure. @krzysz00 Can you please check this comment on rationale behind opt level? |
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp | ||
---|---|---|
109 | You meant capital letters? |
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp | ||
---|---|---|
114 | This method is in the MLIRExecutionEngineUtils library. The dependency on LLVMTarget itself won't help. |
mlir/include/mlir/Dialect/GPU/Transforms/Passes.h | ||
---|---|---|
115 | As for HSACO, if we want to change the default opt level, it should be done in another revision. This revision is on adding the optimizer on the serialize-to-cubin path. The commit shouldn't surprise users by changing the existing default. |
mlir/lib/Dialect/GPU/Transforms/SerializeToBlob.cpp | ||
---|---|---|
109 | Diagnostic messages shouldn't end with a new line. It is appropriately suffixed with more things. You are also missing a space after "level". The message should start in lowercase, I believe. |
A note about the default opt level.
Before:
HSACO path had default opt level 2. CUBIN path had no(dummy) call to optimizeLlvm - effectively opt level was 0.
Since we make the default opt level common to both paths in this patch,
if we keep it at 2, cubin users will see the change/difference.
if we make it 0, hsaco users will see the change/difference.
The commit title and purpose already captures the fact that the cubin path will now have optimization. So it's fine this way since HSACO path's behavior is unchanged. We can change both to O3 in a subsequent commit if that's natural for the amd/hsaco path as well.
I wouldn't have the default level be 2. This is surprising for both "classical compiler" users that would expect it to be 0 (as in clang/gcc) and "ml performance-oriented compiler" that would expect it ot be 3 or the maximum level. I see that this seems to have been the case for HSACO before, can we track down the commit that introduced that and ask the author for their rationale?