A function with less memory instructions but wider access
is the same as a function with more but narrower accesses
in terms of memory boundness. In fact the pass would give
different answers before and after vectorization without
this change.
Details
Diff Detail
Unit Tests
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
220–221 | The "InstCount" names seem like a lie since we are no longer counting the number of instructions, but I don't have a better idea. |
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
220–221 | Yes, I also didn't come up with a better name. In fact it always was a lie since IR instruction is not a HW instruction. All of that was inspired by the desire to move pass later in the pipeline and then I realized to preserve its behavior I need to adjust it for LD/ST combining. But the metric name has drifred even more from reality as it used to be. |
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
220–221 | How about Count => Cost ? Also, we need to do the same change to IAMInstCoount and LSMInstCount to keep the way to calculate the cost consistent. |
llvm/lib/Target/AMDGPU/AMDGPUPerfHintAnalysis.cpp | ||
---|---|---|
220–221 | Thanks, cost sounds better. |
- Renamed variables to read 'cost' instead of 'count'.
- Adjusted IAM and LSM costs accordingly.
The "InstCount" names seem like a lie since we are no longer counting the number of instructions, but I don't have a better idea.