This patch removes the DenseMap for keeping track of FMA3 grouping information avoiding the startup cost of populating the map and the associated memory usage.
Three new bits are added to the instruction TSFlags for keeping track of the form(132, 213, 231) and whether it is a scalar instrinsic instruction.
The 3 different forms of each instruction are combined into groups similar to the current code. But each of the groups are stored into static tables. Each table is sorted by the opcode of each form. Since opcodes encodings are assigned alphabetically and each form is named the same except for the 132, 213, or 231, when one form is sorted the other two forms are sorted. With the tables sorted, we can find the group for a given opcode by getting the form from the TSFlags and doing a binary search through the appropriate column of the table.
There are 6 tables, split by the evex.b bit, memory/register, and masked/unmasked. The two evex.b tables contain masked and unmasked together. The masked/unmasked split for non evex.b makes it easy to populate the load folding tables. The instructions that use evex.b cannot be folded. For the tables without evex.b the register tables are the same size as their equivalent memory table and the opcodes are in the same order. Converting from register form to memory form is as simple as finding the row in one table and looking up the same row in the opposite table. Determining which table an opcode is in can be determined from other TSFlags bits.
Currently the getFMA3Group function is a private function in X86InstrInfo.cpp, but could be made a static function in the X86InstrInfo class if it becomes needed outside this file.
The load folding table creation as well as the commuting code has been updated to use the new interface. The commuting code makes use of new and existing TSFlags to determine additional information about the opcodes beyond which group they are in.