This is an archive of the discontinued LLVM Phabricator instance.

[ELF] - Fix for: PR29093 - version script does not support [chars] wildcards
ClosedPublic

Authored by grimar on Aug 23 2016, 7:48 AM.

Details

Summary

From PR29093 description:

GNU ld supports [chars] wildcards in version scripts, to match a single instance of any of the chars.

Here is an excerpt from libstdc++'s version script in FreeBSD:

extern "C++"
{

...

std::locale::_[T-Za-z]*;                                                                                                       
std::[A-Zm]*;                                                            
std::n[^u]*;                                                             
std::nu[^m]*;                                                            
std::num[^e]*;

...

}

Patch adds support for scripts above.

Diff Detail

Repository
rL LLVM

Event Timeline

grimar updated this revision to Diff 68992.Aug 23 2016, 7:48 AM
grimar retitled this revision from to [ELF] - Fix for: PR29093 - version script does not support [chars] wildcards.
grimar updated this object.
grimar added reviewers: ruiu, rafael.
grimar added subscribers: llvm-commits, grimar, evgeny777, emaste.
emaste added inline comments.Aug 23 2016, 7:57 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #68992)

Should - and ^ be in here? They're only related to wildcards if inside of [].

grimar added inline comments.Aug 23 2016, 8:01 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #68992)

I think they can not be a regular part of symbol name. Am I missing something ?

grimar updated this revision to Diff 68994.Aug 23 2016, 8:05 AM
  • Simplified.

Something looks wrong with diff2, I`ll update it soon.

emaste added inline comments.Aug 23 2016, 8:22 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #68994)

not sure, just that e.g. foo-bar does not have a wildcard

grimar updated this revision to Diff 69001.Aug 23 2016, 8:34 AM
  • Last minute fix.
  • Improved testcase
  • Cosmetic changes.
ELF/SymbolTable.cpp
580 ↗(On Diff #68994)

This function is used for checking symbol names. Probably we can name it differently to avoid confusing, but what I want to say is that I think foo-bar is not a valid name for symbols and we should not do more complex checks here IMO.

grimar added inline comments.Aug 23 2016, 8:43 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #69001)

Though after revisiting, since '-' and '^' can not be used without '[' or ']' we can remove it from here to reduce checks I think.

emaste added inline comments.Aug 23 2016, 9:06 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #69001)

Yeah, that's my point. If foo-bar is not valid it'll just be handled as it is today (without the wildcard change).

grimar added inline comments.Aug 23 2016, 9:09 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #69001)

I wonder if we need to check ']' then :) It makes no sence without '[',
so we probably can check only for '['..

emaste added inline comments.Aug 23 2016, 9:13 AM
ELF/SymbolTable.cpp
580 ↗(On Diff #69001)

Good point, I agree. A version script entry must have one of ?*[ to be a wildcard.

Note that \ is also supported to escape a character, so e.g. foo\*bar is not a wildcard. But this is a complexity we can revisit later.

ruiu edited edge metadata.Aug 23 2016, 10:24 PM

I wonder if we should convert glob patterns to regexs and compile them. How hard is it to convert? I believe that compiled regexps are faster than our hand-written, recursion-based glob matcher, so it may be worth doing even if conversion is as hard as our glob matcher.

ruiu added a comment.Aug 23 2016, 10:57 PM

Please take a look at https://reviews.llvm.org/D23827. This is a proof-of-concept patch to do glob pattern matching using std::regex. This needs to be improved (creating regex instance is expensive, so you want to do it only once for each glob pattern instead of doing it in globMatch), but I think you can get the idea.

In D23803#523883, @ruiu wrote:

Please take a look at https://reviews.llvm.org/D23827. This is a proof-of-concept patch to do glob pattern matching using std::regex. This needs to be improved (creating regex instance is expensive, so you want to do it only once for each glob pattern instead of doing it in globMatch), but I think you can get the idea.

Looks good IMO. I can prepare initial batch basing on this approach if you do not mind.

In D23803#523883, @ruiu wrote:

Please take a look at https://reviews.llvm.org/D23827. This is a proof-of-concept patch to do glob pattern matching using std::regex. This needs to be improved (creating regex instance is expensive, so you want to do it only once for each glob pattern instead of doing it in globMatch), but I think you can get the idea.

There is one problem with it:
"If the syntax used in the sequence of characters has some syntax error, the constructor throws a regex_error exception.". Not sure how we going to handle this ?

grimar updated this revision to Diff 70320.Sep 5 2016, 5:23 AM
grimar updated this object.
grimar edited edge metadata.
  • Rebased (now code is based on llvm::Regex).
ruiu accepted this revision.Sep 6 2016, 4:18 PM
ruiu edited edge metadata.

LGTM

ELF/Strings.cpp
33 ↗(On Diff #70320)

InBracketExpr -> InBracket

This revision is now accepted and ready to land.Sep 6 2016, 4:18 PM
This revision was automatically updated to reflect the committed changes.