This patch is to implement amx programming model that was discussed in llvm-dev (http://lists.llvm.org/pipermail/llvm-dev/2020-August/144302.html). Thank Hal for the good suggestion in the RA. The fast register allocation is not covered in the patch yet. The patch include:
- The c interface to end user.
 - The AMX intrinsics in LLVM IR.
 - The Lowering from AMX intrinsics to AMX pseudo instruction.
 - Insert psuedo ldtilecfg and build the def-use between ldtilecfg to amx instructions.
 - The register allocation for tile register.
 - Morph AMX pseudo instruction to AMX real instruction.
 
We also need support 2 features. That will be implemented in other patches.
- Support fast register allocation to allocate tile register.
 - Support inline assembly and support allocating the tile register for inline assembly.
 
Change-Id: I935e1080916ffcb72af54c2c83faa8b2e97d5cb0
clang-tidy: error: "Never use <amxintrin.h> directly; include <immintrin.h> instead." [clang-diagnostic-error]
not useful