This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/lib/BinaryFormat/
-
lib/
-
BinaryFormat/
-
Magic.cpp

Differential D44225

Fix identification of COFF executable files
ClosedPublic

Authored by zturner on Mar 7 2018, 1:21 PM.

Download Raw Diff

Details

Reviewers

rnk
compnerd
majnemer
• espindola

Commits

rG9899b5feb696: Fix detection of COFF executable files.
rL327050: Fix detection of COFF executable files.

Summary

identify_magic on COFF executables has been broken for apparently quite some time. The reason is because we only read the first 32 bytes of the file, but the detection algorithm for COFF requires at *minimum* 64 bytes, but in practice you need to read the entire file because the algorithm reads a 4-byte value at offset 60 and then treats that as an offset, from which it reads additional data.

Diff Detail

Event Timeline

zturner created this revision.Mar 7 2018, 1:21 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 7 2018, 1:21 PM

smeenai added a subscriber: smeenai.Mar 7 2018, 1:23 PM

Does anyone know why we bother with the "PE\0\0" validation instead of just relying on the "MZ" magic check? File magic checks are supposed to be heuristic anyway.

I did some archaelogy and found that this is from rL128799, committed by @echristo on behalf of Graydon Hoare. I think we should just go ahead and remove the "PE" check.

In D44225#1030532, @rnk wrote:

I did some archaelogy and found that this is from rL128799, committed by @echristo on behalf of Graydon Hoare. I think we should just go ahead and remove the "PE" check.

It's true that file magic checks are heuristic, but having it rely on only 2 bytes will have quite a lot of false positives on binary files.

How expensive is it to MemoryBuffer::getFilePath() in all cases? Unless we have reason to believe it is expensive, I would say let's just do that, and then we can remove the code that reads the 32 bytes and detect PE/COFF executables properly.

I took a look at http://cvsweb.netbsd.org/bsdweb.cgi/src/external/bsd/file/dist/magic/magdir/msdos?rev=1.1.1.10&content-type=text/x-cvsweb-markup&only_with_tag=MAIN to see what file does, and that also looks for the PE header to determine if this is a PE/COFF executable.

It's probably not super expensive, but we want library code to behave as nicely as possible in 32-bit, and mapping a potentially huge file could kill a 32-bit process.

Always read the whole file.

Most callers already map the whole file and then call the other identify_magic overload. Only llvm-pdbutil uses this overload. Maybe we should remove this overload and push the complexity of mapping the file up to there?

@rnk, I like the idea of pushing the mapping behavior to the single use in llvm-pdbutil and removing this overload.

LGTM.

A followup deleting this completely like Reid Kleckner suggests also
LGTM.

Cheers,
Rafael

Zachary Turner via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

zturner updated this revision to Diff 137482.
zturner added a comment.

Always read the whole file.

https://reviews.llvm.org/D44225

Files:
llvm/lib/BinaryFormat/Magic.cpp
Index: llvm/lib/BinaryFormat/Magic.cpp

llvm/lib/BinaryFormat/Magic.cpp

+++ llvm/lib/BinaryFormat/Magic.cpp
@@ -14,6 +14,7 @@
#include "llvm/BinaryFormat/MachO.h"
#include "llvm/Support/Endian.h"
#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"

#if !defined(_MSC_VER) && !defined(MINGW32)
#include <unistd.h>
@@ -205,15 +206,12 @@
}

std::error_code llvm::identify_magic(const Twine &Path, file_magic &Result) {

int FD;

if (std::error_code EC = openFileForRead(Path, FD))

return EC;

+ auto FileOrError = MemoryBuffer::getFile(Path);
+ if (!FileOrError)
+ return FileOrError.getError();

char Buffer[32];

int Length = read(FD, Buffer, sizeof(Buffer));

if (close(FD) != 0 || Length < 0)

return std::error_code(errno, std::generic_category());

+ std::unique_ptr<MemoryBuffer> FileBuffer = std::move(*FileOrError);
+ Result = identify_magic(FileBuffer->getBuffer());

Result = identify_magic(StringRef(Buffer, Length)); return std::error_code(); }

Index: llvm/lib/BinaryFormat/Magic.cpp

llvm/lib/BinaryFormat/Magic.cpp

+++ llvm/lib/BinaryFormat/Magic.cpp
@@ -14,6 +14,7 @@
#include "llvm/BinaryFormat/MachO.h"
#include "llvm/Support/Endian.h"
#include "llvm/Support/FileSystem.h"
+#include "llvm/Support/MemoryBuffer.h"

#if !defined(_MSC_VER) && !defined(MINGW32)
#include <unistd.h>
@@ -205,15 +206,12 @@
}

std::error_code llvm::identify_magic(const Twine &Path, file_magic &Result) {

int FD;

if (std::error_code EC = openFileForRead(Path, FD))

return EC;

+ auto FileOrError = MemoryBuffer::getFile(Path);
+ if (!FileOrError)
+ return FileOrError.getError();

char Buffer[32];

int Length = read(FD, Buffer, sizeof(Buffer));

if (close(FD) != 0 || Length < 0)

return std::error_code(errno, std::generic_category());

+ std::unique_ptr<MemoryBuffer> FileBuffer = std::move(*FileOrError);
+ Result = identify_magic(FileBuffer->getBuffer());

Result = identify_magic(StringRef(Buffer, Length)); return std::error_code(); }

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

Only if we have a caller that is trying to identify large files but only
ends up using small ones.

Cheers,
Rafael

Zachary Turner via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

zturner added a comment.

It's probably not super expensive, but we want library code to behave as nicely as possible in 32-bit, and mapping a potentially huge file could kill a 32-bit process.

https://reviews.llvm.org/D44225

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

I agree. Lets replace the read with MemoryBuffer::getFile. Most callers
will probably use the file anyway and should be using the
identify_magic(StringRef magic) variant.

Cheers,
Rafael

Bob Haarman via Phabricator via llvm-commits
<llvm-commits@lists.llvm.org> writes:

inglorion added a comment.

How expensive is it to MemoryBuffer::getFilePath() in all cases? Unless we have reason to believe it is expensive, I would say let's just do that, and then we can remove the code that reads the 32 bytes and detect PE/COFF executables properly.

I took a look at http://cvsweb.netbsd.org/bsdweb.cgi/src/external/bsd/file/dist/magic/magdir/msdos?rev=1.1.1.10&content-type=text/x-cvsweb-markup&only_with_tag=MAIN to see what file does, and that also looks for the PE header to determine if this is a PE/COFF executable.

https://reviews.llvm.org/D44225

llvm-commits mailing list
llvm-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

This revision was not accepted when it landed; it landed in state Needs Review.Mar 8 2018, 11:49 AM

Closed by commit rL327050: Fix detection of COFF executable files. (authored by zturner). · Explain Why

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

lib/

BinaryFormat/

Magic.cpp

14 lines

Diff 137482

llvm/lib/BinaryFormat/Magic.cpp

//===- llvm/BinaryFormat/Magic.cpp - File magic identification --- C++ --===//		//===- llvm/BinaryFormat/Magic.cpp - File magic identification --- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/BinaryFormat/Magic.h"		#include "llvm/BinaryFormat/Magic.h"

#include "llvm/BinaryFormat/COFF.h"		#include "llvm/BinaryFormat/COFF.h"
#include "llvm/BinaryFormat/ELF.h"		#include "llvm/BinaryFormat/ELF.h"
#include "llvm/BinaryFormat/MachO.h"		#include "llvm/BinaryFormat/MachO.h"
#include "llvm/Support/Endian.h"		#include "llvm/Support/Endian.h"
#include "llvm/Support/FileSystem.h"		#include "llvm/Support/FileSystem.h"
		#include "llvm/Support/MemoryBuffer.h"

#if !defined(_MSC_VER) && !defined(__MINGW32__)		#if !defined(_MSC_VER) && !defined(__MINGW32__)
#include <unistd.h>		#include <unistd.h>
#else		#else
#include <io.h>		#include <io.h>
#endif		#endif

using namespace llvm;		using namespace llvm;
▲ Show 20 Lines • Show All 175 Lines • ▼ Show 20 Lines	file_magic llvm::identify_magic(StringRef Magic) {

default:		default:
break;		break;
}		}
return file_magic::unknown;		return file_magic::unknown;
}		}

std::error_code llvm::identify_magic(const Twine &Path, file_magic &Result) {		std::error_code llvm::identify_magic(const Twine &Path, file_magic &Result) {
int FD;		auto FileOrError = MemoryBuffer::getFile(Path);
if (std::error_code EC = openFileForRead(Path, FD))		if (!FileOrError)
return EC;		return FileOrError.getError();

char Buffer[32];		std::unique_ptr<MemoryBuffer> FileBuffer = std::move(*FileOrError);
int Length = read(FD, Buffer, sizeof(Buffer));		Result = identify_magic(FileBuffer->getBuffer());
if (close(FD) != 0 \|\| Length < 0)
return std::error_code(errno, std::generic_category());

Result = identify_magic(StringRef(Buffer, Length));
return std::error_code();		return std::error_code();
}		}

This is an archive of the discontinued LLVM Phabricator instance.

Fix identification of COFF executable filesClosedPublic

Details

Diff Detail

Event Timeline

Index: llvm/lib/BinaryFormat/Magic.cpp

Index: llvm/lib/BinaryFormat/Magic.cpp

Revision Contents

Diff 137482

llvm/lib/BinaryFormat/Magic.cpp

Fix identification of COFF executable files
ClosedPublic