Modify export clustering DAG mutation to move position exports
before other exports types.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Simplify the code and improve scheduling by removing unnecessary
dependencies edges on exports.
llvm/lib/Target/AMDGPU/AMDGPUExportClustering.cpp | ||
---|---|---|
38 | Can you say why it's beneficial to do position exports first? | |
41 | It's a shame that this sorting is O(n^2) but I guess it's not a problem because the average chain length will be 2? You could probably do this sorting in a cute way with std::partition_copy if you felt inclined: https://en.cppreference.com/w/cpp/algorithm/partition_copy | |
89–94 | Why are the barrier edges there in the first place? Either the exports can be reordered, so the barrier edges should not be there; or they can't be, so we shouldn't ignore the barrier edges! |
llvm/lib/Target/AMDGPU/AMDGPUExportClustering.cpp | ||
---|---|---|
38 | I will add comment. | |
41 | This sort is O(n), it passes through the list only once moving elements to the top as it goes. | |
89–94 | At a high level, what is a barrier dependency on an export? call void @llvm.amdgcn.exp.f32(i32 32, i32 15, float 1.0, float 1.0, float 1.0, float 1.0, i1 false, i1 false) call void @llvm.amdgcn.exp.f32(i32 33, i32 15, float 1.0, float 1.0, float 1.0, float 0.5, i1 false, i1 false) %load = call float @llvm.amdgcn.raw.buffer.load.f32(<4 x i32> undef, i32 %idx, i32 0, i32 0) call void @llvm.amdgcn.exp.f32(i32 12, i32 15, float 0.0, float 0.0, float 0.0, float %load, i1 true, i1 false) The load forces ordering between the exports either side of it. I do not think there is a hardware motivation for this? |
llvm/lib/Target/AMDGPU/AMDGPUExportClustering.cpp | ||
---|---|---|
41 | Yeah but each move is done with an erase and an insert, which both have to move the all the remaining items in the vector. | |
89–94 | EXP and EXP_DONE instructions have mayStore=1 and it seems reasonable that the scheduler doesn't have any information about what they're storing or where, so it adds barriers to stop them from being reordered. In particular they satisfy isGlobalMemoryObject in ScheduleDAGInstrs.cpp. So I think it's fine to remove the existing barriers and add your own, if you know more about it than the scheduler :-) |
- Improve efficiency of sorting position exports
- Comment on why position exports should come before others
- Add test for reordering over loads
Can you say why it's beneficial to do position exports first?