The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop.
This patch try to calculate the nearest point where post dominats all shapes of AMX registers.
Differential D99010
[X86][AMX] Hoist ldtilecfg pengfei on Mar 19 2021, 11:24 PM. Authored by
Details The previous code calculated the first ldtilecfg by dominating all AMX registers' def. This may result in the ldtilecfg being inserted into a loop. This patch try to calculate the nearest point where post dominats all shapes of AMX registers.
Diff Detail
Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older Changes
Comment Actions Fixed the problem when the sink need to be forked. I.e. +------+ |Entry | BB0 +------+ / \ +------+ +------+ |Shape1| |Shape2| BB2 +------+ +------+ BB1 \ / +------+ | AMX | BB3 +------+ If BB1 and BB2 don't have a call, we will try to insert ldtilecfg from BB0. Comment Actions The algorithm for updating shape postdominate BBs is buggy. For a given ShapeBB, clear all its predecessors flag is not enough since its unreachable BBs are also need to clear. Comment Actions
I have thought out a new method but need major refactor. Stay tuned~ Worked it out.
Comment Actions Address Yuanke's comments.
Comment Actions Address Xiang's comments.
Comment Actions Address Yuanke's comments. Add test5 for checking shape peek's loop break. Comment Actions Fix a silly bug that uses != in case == and a bug when records shape for phi. This is found by work with PostRA implementation. This should be found by case amx-ldtilecfg-insert.ll:, but happened to get expected output with these 2 bugs.
Comment Actions Refactor for excluding DBG_VALUE case.
|