A function with less memory instructions but wider access
is the same as a function with more but narrower accesses
in terms of memory boundness. In fact the pass would give
different answers before and after vectorization without
this change.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
222–223 | The "InstCount" names seem like a lie since we are no longer counting the number of instructions, but I don't have a better idea. |
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
222–223 | Yes, I also didn't come up with a better name. In fact it always was a lie since IR instruction is not a HW instruction. All of that was inspired by the desire to move pass later in the pipeline and then I realized to preserve its behavior I need to adjust it for LD/ST combining. But the metric name has drifred even more from reality as it used to be. |
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
222–223 | How about Count => Cost ? Also, we need to do the same change to IAMInstCoount and LSMInstCount to keep the way to calculate the cost consistent. |
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
222–223 | Thanks, cost sounds better. |
- Renamed variables to read 'cost' instead of 'count'.
- Adjusted IAM and LSM costs accordingly.
The "InstCount" names seem like a lie since we are no longer counting the number of instructions, but I don't have a better idea.