When using nvgpu.tma.async.load Op to asynchronously load data into shared memory, it fails to account for provided offsets, potentially leading to incorrect memory access. Using offset is common practice especially with the dynamic shared memory. This work addresses the problem by ensuring proper consideration of offsets.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo