This is an archive of the discontinued LLVM Phabricator instance.

Fix buffer overrun in WindowsResourceCOFFWriter::writeSymbolTable()
ClosedPublic

Authored by inglorion on Dec 14 2017, 7:48 PM.

Details

Summary

We were using sprintf(..., "$R06X", <some uint32_t>) to create strings
that are expected to be exactly length 8, but this results in longer
strings if the uint32_t is greater than 0xffffff. This change modifies
the behavior as follows:

  • Uses the loop counter instead of the data offset. This gives us sequential symbol names, avoiding collisions as much as possible.
  • Masks the value to 0xffffff to avoid generating names longer than 8 bytes.
  • Uses formatv instead of sprintf.

Fixes PR35581.

Diff Detail

Repository
rL LLVM

Event Timeline

inglorion created this revision.Dec 14 2017, 7:48 PM

I'll see if I can cook up a test for this tomorrow; basically we need to look at the generated symbol names and see that they match \$R......, where the last 6 bytes match the last 6 bytes of the symbol's value, encoded in hexadecimal.

ruiu added inline comments.Dec 14 2017, 7:56 PM
llvm/lib/Object/WindowsResource.cpp
564 ↗(On Diff #127061)

There seems to be a small chance that this line could create duplicate names, and if two symbols have the same name, something strange could happen. As we discussed, I think it is better to use i instead of DataOffsets[i] to generate a naem.

Uhh, not to ask the obvious, but why are we even using sprintf at all? Use something like formatv, or even raw_ostream

pcc added inline comments.Dec 14 2017, 8:14 PM
llvm/lib/Object/WindowsResource.cpp
564 ↗(On Diff #127061)

But these are static symbols, right? Do we need to give them unique names?

Also, if they only exist for the purpose of relocations, do we even need symbols at all? Can we make each of the relocations relative to the section symbol and store the data offset in the section data?

I also thought of using the iteration count for the names instead of the offset, but decided to match Microsoft's cvtres for now. Thinking about this some more, I don't think the symbol names actually matter, and so I'm happy to go with using the loop counter instead. I would like to keep the symbols at least for now, as at least some tools seem to expect them (e.g. https://github.com/dotnet/roslyn/blob/614299ff83da9959fa07131c6d0ffbc58873b6ae/src/Compilers/Core/Portable/CvtRes.cs).

Can you still get rid of the sprintf? I don't think we should be using this function in production code honestly.

ruiu added a comment.Dec 15 2017, 11:41 AM

I think this is where we should use snprintf to write directly to Symbol->Name.ShortName.

inglorion edited the summary of this revision. (Show Details)
inglorion removed a reviewer: pcc.

Rewrote using formatv and using loop counter instead of offset.

ruiu added inline comments.Dec 18 2017, 1:25 PM
llvm/lib/Object/WindowsResource.cpp
566 ↗(On Diff #127403)

I don't know much about the format string of the formatv function, but is RelocationName guaranteed to be COFF:NameSize byte long? If not, this memcpy overruns a given buffer.

I think snprintf is much better. People are familiar with that, and that's exactly what you want to do here (format a string while not overrunning a given string buffer).

inglorion added inline comments.Dec 18 2017, 1:50 PM
llvm/lib/Object/WindowsResource.cpp
566 ↗(On Diff #127403)

One problem with snprintf is that it always writes a NUL terminator. We need the string to be NUL terminated if it is less than 8 bytes long, and not NUL terminated if it is exactly 8 bytes long - which, in our case, it always is. So I don't think we can use snprintf to avoid copying the bytes here.

The thing we get out of formatv(...).sstr<8> is a SmallString<8>, and the format specifier combined with the bitmask causes is to always write all 8 bytes, so that memcpy should be safe here.

We could do without the memcpy if we had some way to write directly into the 8 bytes Symbol->Name.ShortName already has, but neither formatv nor raw_ostream seem to provide that, and snprintf would truncate the string, so we would have to implement something else for that. Could be done, but I would like to fix the buffer overrun first.

inglorion updated this revision to Diff 127407.Dec 18 2017, 1:51 PM

fixed format specifier and updated tests

ruiu accepted this revision.Dec 18 2017, 2:03 PM

LGTM

This revision is now accepted and ready to land.Dec 18 2017, 2:03 PM
This revision was automatically updated to reflect the committed changes.