Details
Diff Detail
Event Timeline
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
230–231 | For the 2 element case, we should do this in the pre-legalizer combiner (I guess the 16-bit case code would be different, since it would be bitcast and shift) | |
252 | Register Vec = ...? | |
258 | It would be better to produce a single G_UNMERGE_VALUES and index out of the result list than to produce a sequence of G_EXTRACTs. Also FYI I'm planning on making 16-bit vector indexing bitcast promote to 32-bit vectors, so this would only see 32 and 64-bit element vectors | |
264–267 | You can fold these like | |
268 | Weird spacing, NumElem - 1? |
llvm/lib/Target/AMDGPU/AMDGPUPostLegalizerCombiner.cpp | ||
---|---|---|
230–231 | Why is that? The problem is we need to know if index is divergent or not in general, so we avoid creating loops. I assume if we need some special case for 2 elements, that shall be a separate combine. | |
258 | Thanks. Apparently there is a bug in the CombinerHelper::matchEqualDefs() when it works with an opcode producing multiple values. I will have to fix it first. |
Addressed review comments.
It also contains fix for CombinerHelper::matchEqualDefs(), it likely needs a separate change.
llvm/lib/Target/AMDGPU/SIInstrInfo.cpp | ||
---|---|---|
6965–6966 | This kind of thing belongs in TargetLowering still, it just needs to be made public |
Moved to the RegBankSelect itself. It is too late to do it past RegBankSelect because waterfall loops are already created, and it is too early to do it before because we need divergence info.
LGTM. You could use the ApplyRegBankMapping observer to avoid all the explicit setRegBankCalls, but not everywhere is consistently using it yet anyway
For the 2 element case, we should do this in the pre-legalizer combiner (I guess the 16-bit case code would be different, since it would be bitcast and shift)