The patch allows to produce dwordx3 loads out of v3i32/v3f32 loads.
There is still a future work to allow vectorizer to create that
This is the wrong way to handle this. I did most of the work to avoid having to select the machine nodes so early a long time ago. I have the patches to add v3* to MVT. Short of that a new LOAD_V3 node would be better than going direct to the instruction here
There's a global offset subtarget feature. I also have the patch to start selecting global, but I haven't committed it yet
V3 is an alien to LLVM, so I had to do it this way. It works.
What about patches for MVT to support V3? Are they anywhere ready to be submitted?
This is not about offsets, this is about support of global_load instructions, which is started from GFX9. Then flat_load's also have offsets there and that is checked above as Subtarget->hasFlatInstOffsets(). A global offset is actually a feature to support offsetting workitem ids, so unrelated.
Probably making v3 generally legal and simple type is a right thing to do. This will solve not only problem with loads, but compute on the resulting vector as well. Currently such compute is done on a 4 component vector created by the legalization with the promote of v3 to v4.
It however seems to be long way because a lot of places just designed to work with a power of 2 vectors, halfing and doubling them. I would do this style vec3 load in the short term and target legal v3 in a long term.
The first 2 commits here seem to work without test failures (but a few cost model regressions). They succeed in adding the basic types, the few after that need some more work
I wander what happens to passes which like to split a vector by half? I also can see that vectorizer (load/store and SLP) for instance is written in a way that does not support non power of 2 vectors. I hope some passes will just silently bail instead of silently fail at least when v3* will be reported as legal.
Anyway, if you are planning to fix these patches and merge I will hold current review. It also does not solve v3 operations problem other than load and potentially store, because v3 will be promoted on any arithmetic, so it is way not perfect.