Edit : Updating the summary based on comments
Even though granularity is 8, the roundup must be an even number of 8-granules for GFX9.
Probably this also needs to be mentioned in https://llvm.org/docs/AMDGPUUsage.html#amdgpu-amdhsa-compute-pgm-rsrc1-gfx6-gfx10-table for GRANULATED_WAVEFRONT_SGPR_COUNT.
The difference is seen when a the rounded value aligns to 8 but not to 16. (for example 40, 56).
This patch corrects the roundup for GFX9, hence correcting the number of SGPRBlocks.