This is an archive of the discontinued LLVM Phabricator instance.

IR: Properly handle escape characters in Attribute::getAsString()
ClosedPublic

Authored by honggyu.kim on Aug 22 2016, 9:07 PM.

Details

Summary

If an attribute name has special characters such as '\01', it is not
properly printed in LLVM assembly language format. Since the format
expects the special characters are printed as it is, it has to contain
escape characters to make it printable.

Before:

attributes #0 = { ... "counting-function"="^A__gnu_mcount_nc" ...

After:

attributes #0 = { ... "counting-function"="\01__gnu_mcount_nc" ...

Signed-off-by: Honggyu Kim <hong.gyu.kim@lge.com>

Diff Detail

Repository
rL LLVM

Event Timeline

honggyu.kim retitled this revision from to IR: Properly handle escape characters in Attribute::getAsString().
honggyu.kim updated this object.
rjmccall edited edge metadata.Aug 22 2016, 9:23 PM

I'm not sure what you're saying. The existing function PrintEscapedString in AsmWriter.cpp does exactly what you want; you just need to call it from getAsString as well. You'll need to expose it somewhere, of course.

honggyu.kim added a comment.EditedAug 22 2016, 9:56 PM

Hi John,

I'm not sure what you're saying. The existing function PrintEscapedString in AsmWriter.cpp does exactly what you want; you just need to call it from getAsString as well. You'll need to expose it somewhere, of course.

Let me show you the execution sequence to print attribute list as below.

attributes #0 = { ... "counting-function"="\01__gnu_mcount_nc" ...

llvm/lib/IR/AsmWriter.cpp

void AssemblyWriter::writeAllAttributeGroups() {
  std::vector<std::pair<AttributeSet, unsigned> > asVec;
    ...
  for (const auto &I : asVec)
    Out << "attributes #" << I.second << " = { "
        << I.first.getAsString(AttributeSet::FunctionIndex, true) << " }\n";
    ...
}

llvm/lib/IR/AttributeSetNode.h

typedef const Attribute *iterator;

llvm/lib/IR/Attributes.cpp

std::string AttributeSet::getAsString(unsigned Index, bool InAttrGrp) const {
  AttributeSetNode *ASN = getAttributes(Index);
  return ASN ? ASN->getAsString(InAttrGrp) : std::string("");
}

std::string AttributeSetNode::getAsString(bool InAttrGrp) const {
  std::string Str;
  for (iterator I = begin(), E = end(); I != E; ++I) {
    if (I != begin())
      Str += ' ';
    Str += I->getAsString(InAttrGrp);
  }
  return Str;
}

std::string Attribute::getAsString(bool InAttrGrp) const {
    ...
  // Convert target-dependent attributes to strings of the form:
  //
  //   "kind"
  //   "kind" = "value"
  //
  if (isStringAttribute()) {
    std::string Result;
    Result += (Twine('"') + getKindAsString() + Twine('"')).str();

    StringRef Val = pImpl->getValueAsString();
    if (Val.empty()) return Result;

    Result += ("=\"" + Val + Twine('"')).str();
    return Result;
  }

  llvm_unreachable("Unknown attribute");
}

In Attribute::getAsString(), it is expected to return std::string as the name describes. I think getAsString() function should not print the string directly with PrintEscapedString(). If I directly print attribute string here, we have to modify many functions in the execution paths above. The only problem here is printing Val. That's why I added a new function GetEscapedString(). Please correct me if there's a better approach.

honggyu.kim updated this object.Aug 23 2016, 6:48 PM
honggyu.kim edited edge metadata.
honggyu.kim updated this object.

In Attribute::getAsString(), it is expected to return std::string as the name describes. I think getAsString() function should not print the string directly with PrintEscapedString(). If I directly print attribute string here, we have to modify many functions in the execution paths above. The only problem here is printing Val. That's why I added a new function GetEscapedString(). Please correct me if there's a better approach.

PrintEscapedString prints to a raw_ostream that's passed in as a parameter, and there's a raw_string_ostream that will just append to a std::string.

John.

In Attribute::getAsString(), it is expected to return std::string as the name describes. I think getAsString() function should not print the string directly with PrintEscapedString(). If I directly print attribute string here, we have to modify many functions in the execution paths above. The only problem here is printing Val. That's why I added a new function GetEscapedString(). Please correct me if there's a better approach.

PrintEscapedString prints to a raw_ostream that's passed in as a parameter, and there's a raw_string_ostream that will just append to a std::string.

John.

Thanks for pointing out this. But there is one more tiny but tricky issue here. The implementation of PrintEscapedString and GetEscapedString are a bit different. PrintEscapedString just prints an escaped character '\' itself, but GetEscapedString has to put double quoted string "\\" for later usage. For this purpose, PrintEscapedString has to also be modified as below in that case.

--- a/lib/IR/AsmWriter.cpp
+++ b/lib/IR/AsmWriter.cpp
@@ -345,7 +345,7 @@ static void PrintEscapedString(StringRef Name, raw_ostream &Out) {
     if (isprint(C) && C != '\\' && C != '"')
       Out << C;
     else
-      Out << '\\' << hexdigit(C >> 4) << hexdigit(C & 0x0F);
+      Out << "\\" << hexdigit(C >> 4) << hexdigit(C & 0x0F);
   }
 }

But it will change the behaviour of other functions that use PrintEscapedString, that's why I thought it'd be better to add a new function GetEscapedString to make it simple.

In Attribute::getAsString(), it is expected to return std::string as the name describes. I think getAsString() function should not print the string directly with PrintEscapedString(). If I directly print attribute string here, we have to modify many functions in the execution paths above. The only problem here is printing Val. That's why I added a new function GetEscapedString(). Please correct me if there's a better approach.

PrintEscapedString prints to a raw_ostream that's passed in as a parameter, and there's a raw_string_ostream that will just append to a std::string.

John.

Thanks for pointing out this. But there is one more tiny but tricky issue here. The implementation of PrintEscapedString and GetEscapedString are a bit different. PrintEscapedString just prints an escaped character '\' itself, but GetEscapedString has to put double quoted string "\\" for later usage. For this purpose, PrintEscapedString has to also be modified as below in that case.

--- a/lib/IR/AsmWriter.cpp
+++ b/lib/IR/AsmWriter.cpp
@@ -345,7 +345,7 @@ static void PrintEscapedString(StringRef Name, raw_ostream &Out) {
     if (isprint(C) && C != '\\' && C != '"')
       Out << C;
     else
-      Out << '\\' << hexdigit(C >> 4) << hexdigit(C & 0x0F);
+      Out << "\\" << hexdigit(C >> 4) << hexdigit(C & 0x0F);
   }
 }

But it will change the behaviour of other functions that use PrintEscapedString, that's why I thought it'd be better to add a new function GetEscapedString to make it simple.

These should have the same behavior; only, the existing code should be slightly more efficient (without optimization) because it knows that it's only adding a single character rather than possible an array of them. Are you actually seeing different behavior at runtime?

John.

honggyu.kim updated this object.

These should have the same behavior; only, the existing code should be slightly more efficient (without optimization) because it knows that it's only adding a single character rather than possible an array of them. Are you actually seeing different behavior at runtime?

John.

Hi John,

You're right. I've updated it by using existing PrintEscapedString() as you explained. I've tested it with mcount.c and gnu-mcount.c, and they're all passed.
I appreciate your guide!

Honggyu

Thank you, this looks better. A minor suggestions and then this LGTM.

include/llvm/ADT/StringExtras.h
19 ↗(On Diff #69230)

You should be able to just forward-declare raw_ostream in this header:

namespace llvm {
class raw_ostream;
lib/IR/Attributes.cpp
394 ↗(On Diff #69230)

I would suggest just printing this directly into Result, like so:

{
  raw_string_ostream OS(Result);
  OS << "=\"";
  PrintEscapedString(AttrVal, OS);
  OS << "\"";
}
honggyu.kim updated this object.
honggyu.kim marked 2 inline comments as done.Aug 26 2016, 8:29 PM

Thank you, this looks better. A minor suggestions and then this LGTM.

I've just updated as you suggested.
Thanks for your help.

Honggyu

LGTM, thanks!

rjmccall accepted this revision.Aug 26 2016, 8:56 PM
rjmccall edited edge metadata.
This revision is now accepted and ready to land.Aug 26 2016, 8:56 PM
compnerd accepted this revision.Aug 26 2016, 9:30 PM
compnerd edited edge metadata.

Since I don't have write permission, it would be grateful if anyone can give me commit permission on the repository.
Otherwise, can anyone please commit this patch and related ones? - D22825, D22666

Thanks,
Honggyu

This revision was automatically updated to reflect the committed changes.