This is an archive of the discontinued LLVM Phabricator instance.

[libcxxabi] Replace names with single letters in test_demangle.
Needs ReviewPublic

Authored by jyknight on Feb 15 2020, 2:17 PM.

Details

Summary

...and then remove all the duplicates after this process.

Many test-cases here were taken from the actual symbols defined in a
build of llvm/clang. But, having this file contain almost every
llvm/clang symbol name is somewhat irritating when using 'git grep',
and makes this file much larger than is necessary.

So, for the set of llvm/clang-extracted symbols, rewrite the names to
single-letter names, this rather-hacky python script:

import sys, re, string
from collections import OrderedDict

highletters=''.join([chr(0x80+n) for n in range(26)])
letters=''.join([chr(ord('a')+n) for n in range(26)])
translation = string.maketrans(highletters, letters)

word_re = re.compile('[a-zA-Z_][a-zA-Z0-9_]*')

def rewrite_demangle(mangled, demangled):

allwords = [word for word in word_re.findall(demangled)
            if word != 'const' and
               mangled.find('%d%s' % (len(word), word)) != -1]
allwords = list(enumerate(OrderedDict.fromkeys(allwords)))
allwords.sort(key=lambda x: len(x[1]), reverse=True)

# Replace names with a unique character first, so that subsequent
# replacements don't accidentally replace it a second time.
for namenum, word in allwords:
  mangled = mangled.replace('%d%s' % (len(word), word),
                            '1' + chr(namenum + 0x80))
  demangled = re.sub(r'\b'+word+r'\b', chr(namenum + 0x80), demangled)

# Then return, with actual alphabetic characters.
return mangled.translate(translation), demangled.translate(translation)

for l in sys.stdin:

mangled, unmangled = l.rstrip('\n').split(' ', 1)
sys.stdout.write("    {\"%s\", \"%s\"},\n" % rewrite_demangle(mangled, unmangled))

Diff Detail