This is an archive of the discontinued LLVM Phabricator instance.

llvm-strings - dumping strings from LLVM bitcode
Needs RevisionPublic

Authored by spetrovic on Nov 22 2016, 2:17 AM.

Details

Summary

This patch extends functionality of llvm-strings tool. It enables dumping strings from LLVM bitcode (.bc).

Diff Detail

Repository
rL LLVM

Event Timeline

spetrovic updated this revision to Diff 78841.Nov 22 2016, 2:17 AM
spetrovic retitled this revision from to llvm-strings - dumping strings from LLVM bitcode.
spetrovic updated this object.
spetrovic added reviewers: compnerd, mclow.lists.
spetrovic set the repository for this revision to rL LLVM.
spetrovic added subscribers: ivanbaev, petarj.
compnerd requested changes to this revision.Nov 22 2016, 8:03 AM
compnerd edited edge metadata.

This is not printing strings from bitcode, its printing the function list and the global list. This functionality is better homed in llvm-bcanalyzer imo.

This revision now requires changes to proceed.Nov 22 2016, 8:03 AM

This is not printing strings from bitcode, its printing the function list and the global list. This functionality is better homed in llvm-bcanalyzer imo.

@compnerd instead of printing function and global list (which llvm-nm is already providing somehow), WDYT about fixing this patch to actually print the strings in bitcode? I'd expect the llvm- tools that replace binutils ones to handle bitcode just as they do with objects.

@mehdi_amini that should already work. The tool right now doesn't do anything format specific.

Thinking more about this, why is llvm-nm insufficient for printing out the functions/data symbols? That should be able to process bitcode files.

@mehdi_amini that should already work. The tool right now doesn't do anything format specific.

Well strings in bitcode are usually compressed (on 7 bits as much as possible for instance).

Thinking more about this, why is llvm-nm insufficient for printing out the functions/data symbols? That should be able to process bitcode files.

Yes llvm-nm handles already functions and globals.

Ah, that is certainly a missing function. There is the -e or --encoding parameter that we should implement.

  • s: single 7-bit characters (ISO 8859)
  • S: single 8-bit characters
  • b: 16-bit big endian
  • l: 16-bit little endian
  • B: 32-bit big endian
  • L: 32-bit little endian

Should I try to implement this option ?

Sure, I dont think that I am likely to get to that right away, so if you have the time to implement it, patches would be welcome :-).

Bitcode format allows characters to be 6-bit encoded, inside of abbreviated records. To extract them I need to (at least partially) parse bitcode file. So are you ok with adding some bitcode parsing functions here?

No, the file should be treated opaquely. You shouldn't need anything specific to bitcode here, only character encodings.

I'm fine with whatever solution as long as we get the "same" results as with an object file.

This is the 6-bit character array: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._" used for encoding characters in abbreviated records. So for example 'b' is encoded as 1. If the file is treated opaquely, ie. read 6 by 6 bits (without finding abbreviated record headers) everything will be character. Additionally, the actual chars won't be recognized if string doesn't start at offset divisible by 6.

Do you have any feedback on this ?

As previously mentioned, if this absolutely requires that the file not be treated opaquely, then we should be putting this functionality into another tool. Perhaps llvm-bc would be a good home for this. I'd really rather LLVM-strings be kept simple and treat all input as opaque.

As previously mentioned, if this absolutely requires that the file not be treated opaquely, then we should be putting this functionality into another tool. Perhaps llvm-bc would be a good home for this. I'd really rather LLVM-strings be kept simple and treat all input as opaque.

I don't understand this. The llvm tools that replaces binutils (and cie) all have extra treatment for bitcode files. This is a key point of supplying our replacement tools IMO.