This adds a command line option -compact-records. Currently this will only affect the behavior of tpi-records and -ipi-records but can eventually be updated to affect the behavior of symbol records as well.
Instead of dumping full record details, it outputs a format like this:
TPI Version: 20040203 Record count: 75 Records [ Index: 0x1000 (8 bytes, offset 0) LF_ARGLIST "()" Index: 0x1001 (16 bytes, offset 8) LF_PROCEDURE "int ()" Index: 0x1002 (76 bytes, offset 24) LF_FIELDLIST "<field list>" Index: 0x1003 (120 bytes, offset 100) LF_ENUM "__vc_attributes::threadingAttribute::threading_e" Index: 0x1004 (100 bytes, offset 220) LF_STRUCTURE "__vc_attributes::threadingAttribute" Index: 0x1005 (12 bytes, offset 320) LF_POINTER "const __vc_attributes::threadingAttribute*" Index: 0x1006 (12 bytes, offset 332) LF_ARGLIST "(__vc_attributes::threadingAttribute::threading_e)" Index: 0x1007 (28 bytes, offset 344) LF_MFUNCTION "void __vc_attributes::threadingAttribute::(__vc_attributes::threadingAttribute::threading_e)" Index: 0x1008 (28 bytes, offset 372) LF_MFUNCTION "void __vc_attributes::threadingAttribute::()" Index: 0x1009 (20 bytes, offset 400) LF_METHODLIST "" Index: 0x100a (68 bytes, offset 420) LF_FIELDLIST "<field list>" Index: 0x100b (100 bytes, offset 488) LF_STRUCTURE "__vc_attributes::threadingAttribute" Index: 0x100c (48 bytes, offset 588) LF_FIELDLIST "<field list>" Index: 0x100d (120 bytes, offset 636) LF_ENUM "__vc_attributes::event_receiverAttribute::type_e" Index: 0x100e (112 bytes, offset 756) LF_STRUCTURE "__vc_attributes::event_receiverAttribute" Index: 0x100f (12 bytes, offset 868) LF_POINTER "const __vc_attributes::event_receiverAttribute*" Index: 0x1010 (16 bytes, offset 880) LF_ARGLIST "(__vc_attributes::event_receiverAttribute::type_e, bool)" Index: 0x1011 (28 bytes, offset 896) LF_MFUNCTION "void __vc_attributes::event_receiverAttribute::(__vc_attributes::event_receiverAttribute::type_e, bool)" Index: 0x1012 (12 bytes, offset 924) LF_ARGLIST "(__vc_attributes::event_receiverAttribute::type_e)" Index: 0x1013 (28 bytes, offset 936) LF_MFUNCTION "void __vc_attributes::event_receiverAttribute::(__vc_attributes::event_receiverAttribute::type_e)" Index: 0x1014 (28 bytes, offset 964) LF_MFUNCTION "void __vc_attributes::event_receiverAttribute::()" Index: 0x1015 (28 bytes, offset 992) LF_METHODLIST "" Index: 0x1016 (96 bytes, offset 1,020) LF_FIELDLIST "<field list>" Index: 0x1017 (112 bytes, offset 1,116) LF_STRUCTURE "__vc_attributes::event_receiverAttribute"
In addition, this patch prints out the type index offset array in the TPI / IPI streams. After printing all type records, you will get some output that looks like this:
Index: 0x1691 (16 bytes, offset 153,924) LF_ARRAY "" Index: 0x1692 (720 bytes, offset 153,940) LF_FIELDLIST "<field list>" Index: 0x1693 (92 bytes, offset 154,660) LF_STRUCTURE "_IMAGE_LOAD_CONFIG_DIRECTORY32" Index: 0x1694 (7,484 bytes, offset 154,752) LF_FIELDLIST "<field list>" Index: 0x1695 (52 bytes, offset 162,236) LF_ENUM "CXCursorKind" Index: 0x1696 (7,660 bytes, offset 162,288) LF_FIELDLIST "<field list>" Index: 0x1697 (52 bytes, offset 169,948) LF_ENUM "CXCursorKind" Index: 0x1698 (7,716 bytes, offset 170,000) LF_FIELDLIST "<field list>" Index: 0x1699 (52 bytes, offset 177,716) LF_ENUM "CXCursorKind" TypeIndexOffsets [ Index: 0x1000, Offset: 0 Index: 0x103e, Offset: 9,408 Index: 0x1093, Offset: 16,392 Index: 0x1104, Offset: 24,624 Index: 0x11b8, Offset: 32,808 Index: 0x126a, Offset: 40,964 Index: 0x133f, Offset: 52,388 Index: 0x1343, Offset: 115,832 Index: 0x1355, Offset: 123,100 Index: 0x1431, Offset: 131,080 Index: 0x1523, Offset: 141,164 Index: 0x15c0, Offset: 147,472 Index: 0x1695, Offset: 162,236 Index: 0x1697, Offset: 169,948 Index: 0x1699, Offset: 177,716 ] ]
With these two options combined, I was able to decipher the way the Index / Offsets array is laid out. It is an array of pairs, sorted by type index, where each pair represents the Type Index of the first item in a "chunk" of records, and the offset at which that record begins. The number of records in chunk N is IndexOffsets[N+1].First - IndexOffsets[N].First, and the spacing is chosen such that each chunk contains the minimum number of consecutive records such that a chunk is >= 8KB.
There are some exceptions to this which I'm investigating. See for example Index 0x1695 and 0x1697 above, the distance between them is < 8KB. But it seems to hold true almost everywhere else, so it shouldn't be too hard to figure out the rest of this edge case.