If we have a masked.load from a location we know to be dereferenceable, we can simply issue a speculative unconditional load against that address. The key advantage is that it produces IR which is well understood by the optimizer. The select (cnd, load, passthrough) form produced should be pattern matchable back to hardware predication if profitable.
Details
Diff Detail
Event Timeline
I'm not really an expert in InstCombine but it looks good to me. Someone else should probably take a look as well. I added some minor comments though.
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1169 | Isn't that implemented below (in this patch)? | |
1247 | I would have preferred an actual type here (and in one or two other places below where it is not obvious). However, I don't know if this is the "standard" in this part of the code base. | |
test/Transforms/InstCombine/masked_intrinsics.ll | ||
211 | The zero here is the lane, correct? It looks "like a special case", even if it's not. Could we copy this test, change the width to 8 and gather lane 5? That should make sure it works "in general" with minimal effort on your part. Idk if there is, or if we need, a negative test as well, e.g., two active lanes in a constant mask. |
These are 2 independent transforms, right? It would be better to split that into 2 smaller reviews/commits, so we are not missing anything (and easier to diagnose if there's trouble post-commit). The tests should show minimal IR patterns to exercise those 2 independent transforms.
You can view them that way, or not. I was looking at this as "what was needed to decompose a single element gather from a dereferenceable address", but I can see your point. I'll split.
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1169 | Nope, that's the gather case. Analogous, but distinct intrinsics. | |
1247 | I'm happy to change to explicit types, which did you find non-obvious? | |
test/Transforms/InstCombine/masked_intrinsics.ll | ||
211 | Will do. |
LGTM. This canonicalization is reverse of the typical (shorter code is preferred), but the reasoning in the summary makes sense: we have lots of IR optimizer logic for selects, and this should be reversible in the backend.
lib/Transforms/InstCombine/InstCombineCalls.cpp | ||
---|---|---|
1183 | typo: sensative | |
1187 | I think LLVM standards prefer to make this an explicit "Value *" rather than "auto *": |
Isn't that implemented below (in this patch)?