This is an archive of the discontinued LLVM Phabricator instance.

This is not printing strings from bitcode, its printing the function list and the global list. This functionality is better homed in llvm-bcanalyzer imo.

This revision now requires changes to proceed.Nov 22 2016, 8:03 AM

In D26959#602643, @compnerd wrote:

This is not printing strings from bitcode, its printing the function list and the global list. This functionality is better homed in llvm-bcanalyzer imo.

@compnerd instead of printing function and global list (which llvm-nm is already providing somehow), WDYT about fixing this patch to actually print the strings in bitcode? I'd expect the llvm- tools that replace binutils ones to handle bitcode just as they do with objects.

@mehdi_amini that should already work. The tool right now doesn't do anything format specific.

Thinking more about this, why is llvm-nm insufficient for printing out the functions/data symbols? That should be able to process bitcode files.

In D26959#621174, @compnerd wrote:

@mehdi_amini that should already work. The tool right now doesn't do anything format specific.

Well strings in bitcode are usually compressed (on 7 bits as much as possible for instance).

Thinking more about this, why is llvm-nm insufficient for printing out the functions/data symbols? That should be able to process bitcode files.

Yes llvm-nm handles already functions and globals.

Ah, that is certainly a missing function. There is the -e or --encoding parameter that we should implement.

s: single 7-bit characters (ISO 8859)
S: single 8-bit characters
b: 16-bit big endian
l: 16-bit little endian
B: 32-bit big endian
L: 32-bit little endian

Should I try to implement this option ?

ping

Ping @compnerd?

Sure, I dont think that I am likely to get to that right away, so if you have the time to implement it, patches would be welcome :-).

Bitcode format allows characters to be 6-bit encoded, inside of abbreviated records. To extract them I need to (at least partially) parse bitcode file. So are you ok with adding some bitcode parsing functions here?

No, the file should be treated opaquely. You shouldn't need anything specific to bitcode here, only character encodings.

I'm fine with whatever solution as long as we get the "same" results as with an object file.

This is the 6-bit character array: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789._" used for encoding characters in abbreviated records. So for example 'b' is encoded as 1. If the file is treated opaquely, ie. read 6 by 6 bits (without finding abbreviated record headers) everything will be character. Additionally, the actual chars won't be recognized if string doesn't start at offset divisible by 6.

spetrovic added a subscriber: vstefanovic.Jan 23 2017, 7:00 AM

Do you have any feedback on this ?

As previously mentioned, if this absolutely requires that the file not be treated opaquely, then we should be putting this functionality into another tool. Perhaps llvm-bc would be a good home for this. I'd really rather LLVM-strings be kept simple and treat all input as opaque.

In D26959#708762, @compnerd wrote:

As previously mentioned, if this absolutely requires that the file not be treated opaquely, then we should be putting this functionality into another tool. Perhaps llvm-bc would be a good home for this. I'd really rather LLVM-strings be kept simple and treat all input as opaque.

I don't understand this. The llvm tools that replaces binutils (and cie) all have extra treatment for bitcode files. This is a key point of supplying our replacement tools IMO.

Revision Contents

Path

Size

tools/

llvm-strings/

CMakeLists.txt

1 line

llvm-strings.cpp

81 lines

Diff 78841

tools/llvm-strings/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
				BitReader
	Core			Core
	Object			Object
	Support			Support
	)			)

	add_llvm_tool(llvm-strings			add_llvm_tool(llvm-strings
	llvm-strings.cpp			llvm-strings.cpp
	)			)

tools/llvm-strings/llvm-strings.cpp

	//===-- llvm-strings.cpp - Printable String dumping utility ---------------===//			//===-- llvm-strings.cpp - Printable String dumping utility ---------------===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This program is a utility that works like binutils "strings", that is, it			// This program is a utility that works like binutils "strings", that is, it
	// prints out printable strings in a binary, objdump, or archive file.			// prints out printable strings in a binary, objdump, or archive file.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "llvm/Object/Binary.h"			#include "llvm/Object/Binary.h"
				#include "llvm/IR/Constants.h"
	#include "llvm/Support/CommandLine.h"			#include "llvm/Support/CommandLine.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "llvm/Support/MemoryBuffer.h"			#include "llvm/Support/MemoryBuffer.h"
	#include "llvm/Support/PrettyStackTrace.h"			#include "llvm/Support/PrettyStackTrace.h"
	#include "llvm/Support/Program.h"			#include "llvm/Support/Program.h"
	#include "llvm/Support/Signals.h"			#include "llvm/Support/Signals.h"
				#include "llvm/Bitcode/BitcodeReader.h"
	#include <cctype>			#include <cctype>
	#include <string>			#include <string>

	using namespace llvm;			using namespace llvm;
	using namespace llvm::object;			using namespace llvm::object;

	static cl::list<std::string> InputFileNames(cl::Positional,			static cl::list<std::string> InputFileNames(cl::Positional,
	cl::desc("<input object files>"),			cl::desc("<input object files>"),
	cl::ZeroOrMore);			cl::ZeroOrMore);

	static cl::opt<bool>			static cl::opt<bool>
	PrintFileName("print-file-name",			PrintFileName("print-file-name",
	cl::desc("Print the name of the file before each string"));			cl::desc("Print the name of the file before each string"));
	static cl::alias PrintFileNameShort("f", cl::desc(""),			static cl::alias PrintFileNameShort("f", cl::desc(""),
	cl::aliasopt(PrintFileName));			cl::aliasopt(PrintFileName));

	static cl::opt<int>			static cl::opt<int>
	MinLength("bytes", cl::desc("Print sequences of the specified length"),			MinLength("bytes", cl::desc("Print sequences of the specified length"),
	cl::init(4));			cl::init(4));
	static cl::alias MinLengthShort("n", cl::desc(""), cl::aliasopt(MinLength));			static cl::alias MinLengthShort("n", cl::desc(""), cl::aliasopt(MinLength));

				static void diagnosticHandler(const DiagnosticInfo &DI, void *Context) {
				raw_ostream &OS = errs();
				OS << (char *)Context << ": ";
				switch (DI.getSeverity()) {
				case DS_Error:
				OS << "error: ";
				break;
				case DS_Warning:
				OS << "warning: ";
				break;
				case DS_Remark:
				OS << "remark: ";
				break;
				case DS_Note:
				OS << "note: ";
				break;
				}
				}

	static void strings(raw_ostream &OS, StringRef FileName, StringRef Contents) {			static void strings(raw_ostream &OS, StringRef FileName, StringRef Contents) {
	auto print = [&OS, FileName](StringRef L) {			auto print = [&OS, FileName](StringRef L) {
	if (L.size() < static_cast<size_t>(MinLength))			if (L.size() < static_cast<size_t>(MinLength))
	return;			return;
	if (PrintFileName)			if (PrintFileName)
	OS << FileName << ": ";			OS << FileName << ": ";
	OS << L << '\n';			OS << L << '\n';
	};			};

	const char P = nullptr, E = nullptr, *S = nullptr;			const char P = nullptr, E = nullptr, *S = nullptr;
	for (P = Contents.begin(), E = Contents.end(); P < E; ++P) {			for (P = Contents.begin(), E = Contents.end(); P < E; ++P) {
	if (std::isgraph(P) \|\| std::isblank(P)) {			if (std::isgraph(P) \|\| std::isblank(P)) {
	if (S == nullptr)			if (S == nullptr)
	S = P;			S = P;
	} else if (S) {			} else if (S) {
	print(StringRef(S, P - S));			print(StringRef(S, P - S));
	S = nullptr;			S = nullptr;
	}			}
	}			}
	if (S)			if (S)
	print(StringRef(S, E - S));			print(StringRef(S, E - S));
	}			}

				static void printGlobalVariablesAsString(const Module &Mod, StringRef FileName) {
				raw_ostream &OS = errs();
				const Module::GlobalListType &GlobalList = Mod.getGlobalList();
				for (Module::const_global_iterator I = GlobalList.begin(),
				E = GlobalList.end(); I != E; ++I) {
				if (PrintFileName)
				OS << FileName + ": ";
				OS << I->getName() + "\n";
				if (const ConstantDataArray *CA =
				dyn_cast<ConstantDataArray>(I->getOperandList()->get())) {
				if (!CA->isString())
				continue;
				StringRef PrintStr = CA->getAsString();
				PrintStr = PrintStr.ltrim("\n");
				size_t pos = PrintStr.find_first_of("\n");
				PrintStr = PrintStr.substr(0, pos);
				if (PrintFileName)
				OS << FileName + ": ";
				OS << PrintStr + "\n";
				}
				}
				}

				static void printFunctionNamesAsString(const Module &Mod,
				StringRef FileName) {
				raw_ostream &OS = errs();
				const Module::FunctionListType &FunctionList = Mod.getFunctionList();
				for (Module::const_iterator I = FunctionList.begin(),
				E = FunctionList.end(); I!=E; ++I) {
				if (PrintFileName)
				OS << FileName + ": ";
				OS << I->getName() + "\n";
				}
				}

	int main(int argc, char **argv) {			int main(int argc, char **argv) {
	sys::PrintStackTraceOnErrorSignal(argv[0]);			sys::PrintStackTraceOnErrorSignal(argv[0]);
	PrettyStackTraceProgram X(argc, argv);			PrettyStackTraceProgram X(argc, argv);
				ExitOnError ExitOnErr;
	cl::ParseCommandLineOptions(argc, argv, "llvm string dumper\n");			cl::ParseCommandLineOptions(argc, argv, "llvm string dumper\n");
	if (MinLength == 0) {			if (MinLength == 0) {
	errs() << "invalid minimum string length 0\n";			errs() << "invalid minimum string length 0\n";
	return EXIT_FAILURE;			return EXIT_FAILURE;
	}			}

	if (InputFileNames.empty())			if (InputFileNames.empty())
	InputFileNames.push_back("-");			InputFileNames.push_back("-");

	for (const auto &File : InputFileNames) {			for (const auto &File : InputFileNames) {
	ErrorOr<std::unique_ptr<MemoryBuffer>> Buffer =			ErrorOr<std::unique_ptr<MemoryBuffer>> Buffer =
	MemoryBuffer::getFileOrSTDIN(File);			MemoryBuffer::getFileOrSTDIN(File);
	if (std::error_code EC = Buffer.getError())			if (std::error_code EC = Buffer.getError()) {
	errs() << File << ": " << EC.message() << '\n';			errs() << File << ": " << EC.message() << '\n';
	else			} else if (isBitcode((const unsigned char *)
				Buffer.get().get()->getBufferStart(),
				(const unsigned char *)
				Buffer.get().get()->getBufferEnd())) {
				LLVMContext Context;
				Context.setDiagnosticHandler(diagnosticHandler, argv[0]);
				std::unique_ptr<Module> M =
				ExitOnErr(getOwningLazyBitcodeModule(std::move(Buffer.get()), Context,
				/ShouldLazyLoadMetadata=/true));
				printGlobalVariablesAsString(*M,
				File == "-" ? "{standard input}" : File);
				printFunctionNamesAsString(*M,
				File == "-" ? "{standard input}" : File);
				} else {
	strings(llvm::outs(), File == "-" ? "{standard input}" : File,			strings(llvm::outs(), File == "-" ? "{standard input}" : File,
	Buffer.get()->getMemBufferRef().getBuffer());			Buffer.get()->getMemBufferRef().getBuffer());
	}			}
				}
	return EXIT_SUCCESS;			return EXIT_SUCCESS;
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

llvm-strings - dumping strings from LLVM bitcodeNeeds RevisionPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 78841

tools/llvm-strings/CMakeLists.txt

tools/llvm-strings/llvm-strings.cpp

llvm-strings - dumping strings from LLVM bitcode
Needs RevisionPublic