This patch extends functionality of llvm-strings tool. It enables dumping strings from LLVM bitcode (.bc).
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
This is not printing strings from bitcode, its printing the function list and the global list. This functionality is better homed in llvm-bcanalyzer imo.
@compnerd instead of printing function and global list (which llvm-nm is already providing somehow), WDYT about fixing this patch to actually print the strings in bitcode? I'd expect the llvm- tools that replace binutils ones to handle bitcode just as they do with objects.
@mehdi_amini that should already work. The tool right now doesn't do anything format specific.
Thinking more about this, why is llvm-nm insufficient for printing out the functions/data symbols? That should be able to process bitcode files.
Well strings in bitcode are usually compressed (on 7 bits as much as possible for instance).
Thinking more about this, why is llvm-nm insufficient for printing out the functions/data symbols? That should be able to process bitcode files.
Yes llvm-nm handles already functions and globals.
Ah, that is certainly a missing function. There is the -e or --encoding parameter that we should implement.
- s: single 7-bit characters (ISO 8859)
- S: single 8-bit characters
- b: 16-bit big endian
- l: 16-bit little endian
- B: 32-bit big endian
- L: 32-bit little endian
Sure, I dont think that I am likely to get to that right away, so if you have the time to implement it, patches would be welcome :-).
Bitcode format allows characters to be 6-bit encoded, inside of abbreviated records. To extract them I need to (at least partially) parse bitcode file. So are you ok with adding some bitcode parsing functions here?
No, the file should be treated opaquely. You shouldn't need anything specific to bitcode here, only character encodings.
I'm fine with whatever solution as long as we get the "same" results as with an object file.
This is the 6-bit character array: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._" used for encoding characters in abbreviated records. So for example 'b' is encoded as 1. If the file is treated opaquely, ie. read 6 by 6 bits (without finding abbreviated record headers) everything will be character. Additionally, the actual chars won't be recognized if string doesn't start at offset divisible by 6.
As previously mentioned, if this absolutely requires that the file not be treated opaquely, then we should be putting this functionality into another tool. Perhaps llvm-bc would be a good home for this. I'd really rather LLVM-strings be kept simple and treat all input as opaque.
I don't understand this. The llvm tools that replaces binutils (and cie) all have extra treatment for bitcode files. This is a key point of supplying our replacement tools IMO.