Trim elements that won't be written. The equivalent still needs to be
done for writes. Also start widening 3 elements to 4
elements. Selection will get the count from the dmask.
Details
Diff Detail
Event Timeline
Mostly looks good, but I do have some questions.
llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp | ||
---|---|---|
3201 | Why the type change? DMaskLanes doesn't need to be signed, does it? | |
3210–3211 | This seems like it might mess with atomics. RMW atomics are 32- or 64-bits depending on whether dmask is 1 or 3. cmpswap atomics are 32- or 64-bits depending on whether dmask is 3 or 15. I guess you're not testing atomics on the GlobalISel path yet? | |
3254–3255 | You do now have an Observer argument here, right? |
Why the type change? DMaskLanes doesn't need to be signed, does it?