Hi,
I decided it was time to take a look at improving how we handle the special symbolic AArch64 sysreg names (and those of related instructions). I think the current implementation has a few issues:
- Large-scale duplication between AArch64BaseInfo.h and AArch64BaseInfo.cpp
- That weird Mapping class that I have no idea what I was on when I thought it was a good idea.
- Searches are performed linearly through the entire list.
- We print absolutely all registers in upper-case, even though some are canonically mixed case (SPSel for example).
- The ARM ARM specifies sysregs in terms of 5 fields, but those are relegated to comments in our implementation, with a slightly opaque hex value indicating the canonical encoding LLVM will use.
So, with all that in mind I decided to do something about it. This patch adds a new TableGen backend: -gen-searchable-tables, which is quite a bit of code but seems like it ought to be useful elsewhere (though I didn't manage to think of any places immediately). That backend will emit a primary data table (reasonably generic, at the moment it can contain strings, ints and opaque code blobs) together with specified indexes (sorted for a binary search) and lookup functions.
This was actually inspired by trying to make Clang's diagnostics better (specifically warning when a 64-bit register is used for arm_rsr or arm_wsr), so any ideas on how best to share it would be welcome (best I've come up with it putting the AArch64-specific file in include/llvm/Target and rebuilding it from Clang).
The other nasty hack is DBGDTRTX_EL0 and DBGDTRRX_EL0 which share an encoding but are written differently in MRS/MSR. I just completely hacked around that in InstPrinter, deciding the extra infrastructure for a generic solution wasn't worth implementing.
So, any suggestions for improvements? Anyone think it's a terrible idea from the start?
Worth a (yes, duplicated) comment, if only for consistency?