The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.
This patch try to calculate the nearest point where post dominats all shapes of AMX registers.
Differential D99010
[X86][AMX] Hoist ldtilecfg pengfei on Mar 19 2021, 11:24 PM. Authored by
Details The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where post dominats all shapes of AMX registers.
Diff Detail
Event TimelineComment Actions Since D98845 is landed. I'd like to do the ldtilecfg hoist together with this patch. WIP.
Comment Actions Address Yuanke and Xiang's comments.
Comment Actions Perhaps we need more comments and more test cases (maybe in a sperate file) to cover those scenario.
Comment Actions Fixed the problem when the sink need to be forked. I.e. +------+ |Entry | BB0 +------+ / \ +------+ +------+ |Shape1| |Shape2| BB2 +------+ +------+ BB1 \ / +------+ | AMX | BB3 +------+ If BB1 and BB2 don't have a call, we will try to insert ldtilecfg from BB0. Comment Actions The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear. Comment Actions
I have thought out a new method but need major refactor. Stay tuned~ Worked it out.
Comment Actions Address Yuanke's comments.
Comment Actions Address Xiang's comments.
Comment Actions Address Yuanke's comments. Add test5 for checking shape peek's loop break. Comment Actions Fix a silly bug that uses != in case == and a bug when records shape for phi. This is found by work with PostRA implementation. This should be found by case amx-ldtilecfg-insert.ll:, but happened to get expected output with these 2 bugs.
Comment Actions Refactor for excluding DBG_VALUE case.
|
Since we change the algorithm, we need to update the pass description.