This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
ClosedPublic

Authored by abhina.sreeskantharajan on Mar 26 2021, 10:29 AM.

Details

Summary

Problem:
On SystemZ we need to open text files in text mode. On Windows, files opened in text mode adds a CRLF '\r\n' which may not be desirable.

Solution:
This patch adds two new flags

  • OF_CRLF which indicates that CRLF translation is used.
  • OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

Developers should now use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

The Major change is in llvm/lib/Support/Windows/Path.inc to only set text mode if the OF_CRLF is set.

if (Flags & OF_CRLF)
  CrtOpenFlags |= _O_TEXT;

These following files are the ones that still use OF_Text which I left unchanged. I modified all these except raw_ostream.cpp in recent patches so I know these were previously in Binary mode on Windows.
./llvm/lib/Support/raw_ostream.cpp
./llvm/lib/TableGen/Main.cpp
./llvm/tools/dsymutil/DwarfLinkerForBinary.cpp
./llvm/unittests/Support/Path.cpp
./clang/lib/StaticAnalyzer/Core/HTMLDiagnostics.cpp
./clang/lib/Frontend/CompilerInstance.cpp
./clang/lib/Driver/Driver.cpp
./clang/lib/Driver/ToolChains/Clang.cpp

Diff Detail

Event Timeline

abhina.sreeskantharajan requested review of this revision.Mar 26 2021, 10:29 AM
Herald added a reviewer: sstefan1. · View Herald Transcript
Herald added a reviewer: baziotis. · View Herald Transcript
Herald added projects: Restricted Project, Restricted Project, Restricted Project, Restricted Project. · View Herald Transcript

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

On SystemZ we need to open text files in text mode,

Why can't it be served without CRLF translation?

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

On SystemZ we need to open text files in text mode,

Why can't it be served without CRLF translation?

So this is a quick summary of the problem we're facing:
On SystemZ, we can accept both OF_Text or OF_TextWithCRLF, it makes no difference. However, there are many text files that are marked as binary/OF_None to suppress CRLF translation on Windows. (e.g. patches like this one https://reviews.llvm.org/rGe78a7a0ecddc747129512fabf4836e22d1805f00). On SystemZ we need the text files to be marked as text and have made patches in the past to do so (e.g. https://reviews.llvm.org/D67696). These efforts are conflicting with each other, so Reid suggested this solution where we create a new flag that will set text mode and turn on CRLF translation on Windows platform. Then we will be able to mark all remaining text files with OF_Text and not turn on CRLF translation for Windows.

I agree this patch is touching a lot of files, I can maybe do the opposite and create a flag called OF_TextWithoutCRLF. I'm open to any new solutions to this problem.

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

On SystemZ we need to open text files in text mode,

Why can't it be served without CRLF translation?

So this is a quick summary of the problem we're facing:
On SystemZ, we can accept both OF_Text or OF_TextWithCRLF, it makes no difference. However, there are many text files that are marked as binary/OF_None to suppress CRLF translation on Windows. (e.g. patches like this one https://reviews.llvm.org/rGe78a7a0ecddc747129512fabf4836e22d1805f00). On SystemZ we need the text files to be marked as text and have made patches in the past to do so (e.g. https://reviews.llvm.org/D67696). These efforts are conflicting with each other, so Reid suggested this solution where we create a new flag that will set text mode and turn on CRLF translation on Windows platform. Then we will be able to mark all remaining text files with OF_Text and not turn on CRLF translation for Windows.

I agree this patch is touching a lot of files, I can maybe do the opposite and create a flag called OF_TextWithoutCRLF. I'm open to any new solutions to this problem.

Hmm. I am still confused after seeting rGe78a7a0ecddc747129512fabf4836e22d1805f00 (does it imply that using OF_None on Windows is fine?)
If OF_None is used on SystemZ, does that regress functionality?

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

On SystemZ we need to open text files in text mode,

Why can't it be served without CRLF translation?

So this is a quick summary of the problem we're facing:
On SystemZ, we can accept both OF_Text or OF_TextWithCRLF, it makes no difference. However, there are many text files that are marked as binary/OF_None to suppress CRLF translation on Windows. (e.g. patches like this one https://reviews.llvm.org/rGe78a7a0ecddc747129512fabf4836e22d1805f00). On SystemZ we need the text files to be marked as text and have made patches in the past to do so (e.g. https://reviews.llvm.org/D67696). These efforts are conflicting with each other, so Reid suggested this solution where we create a new flag that will set text mode and turn on CRLF translation on Windows platform. Then we will be able to mark all remaining text files with OF_Text and not turn on CRLF translation for Windows.

I agree this patch is touching a lot of files, I can maybe do the opposite and create a flag called OF_TextWithoutCRLF. I'm open to any new solutions to this problem.

Hmm. I am still confused after seeting rGe78a7a0ecddc747129512fabf4836e22d1805f00 (does it imply that using OF_None on Windows is fine?)
If OF_None is used on SystemZ, does that regress functionality?

On Windows, OF_None suppresses CRLF translation and OF_Text will allow CRLF translation. On SystemZ it does regress functionality by causing file encoding issues. SystemZ uses a file's text/binary mode to determine whether to perform translation to the system encoding EBCDIC if needed. (On binary files, no translation should be performed)

rnk added a comment.Mar 29 2021, 9:52 AM

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

Right, we need to separate text-ness for System/Z EBCDIC re-encoding from Windows CRLF translation. I think it's best to have a separate, positive CRLF bit, and to make that spelling longer than OF_Text. I think, in general, more problems are caused by extra unintended CRLF than by missing carriage returns.

In D99426#2656217, @rnk wrote:

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

Right, we need to separate text-ness for System/Z EBCDIC re-encoding from Windows CRLF translation. I think it's best to have a separate, positive CRLF bit, and to make that spelling longer than OF_Text. I think, in general, more problems are caused by extra unintended CRLF than by missing carriage returns.

OK.

OF_CRLF which indicates that CRLF translation is used.
OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

My confusion came from I do not know what "CRLF translation" in the description refers to. So it seems like EBCDIC->ASCII translation for CRLF? I thought it was related to Windows.

The current comment in the source code should be clarified too:

/// The file should be opened with CRLF translation on platforms that
/// make this distinction.
OF_CRLF = 2,
OF_TextWithCRLF = 3,
abhina.sreeskantharajan edited the summary of this revision. (Show Details)

Rebase + I updated the comments in FileSystem.h to be a bit more descriptive.

In D99426#2656217, @rnk wrote:

This touches a lot of files. I am a bit worried that it would not be easy for a contributor to know OF_TextWithCRLF is needed to make SystemZ happy.

Right, we need to separate text-ness for System/Z EBCDIC re-encoding from Windows CRLF translation. I think it's best to have a separate, positive CRLF bit, and to make that spelling longer than OF_Text. I think, in general, more problems are caused by extra unintended CRLF than by missing carriage returns.

OK.

OF_CRLF which indicates that CRLF translation is used.
OF_TextWithCRLF = OF_Text | OF_CRLF indicates that the file is text and uses CRLF translation.

My confusion came from I do not know what "CRLF translation" in the description refers to. So it seems like EBCDIC->ASCII translation for CRLF? I thought it was related to Windows.

The current comment in the source code should be clarified too:

/// The file should be opened with CRLF translation on platforms that
/// make this distinction.
OF_CRLF = 2,
OF_TextWithCRLF = 3,

Sorry for causing confusion. CRLF in text files is a Windows-only problem, it doesn't affect z/OS. On z/OS we only need text files to be marked accurately as text. I updated the description and comments in FileSystem.h to be more clear.

abhina.sreeskantharajan edited the summary of this revision. (Show Details)

mention OF_CRLF is Windows-only

abhina.sreeskantharajan edited the summary of this revision. (Show Details)

Set OF_Text for llvm/tools/dsymutil/DwarfLinkerForBinary.cpp instead of OF_TextWithCRLF. This was also another file I recently changed to text from binary.

I like the composability enabled by making OF_Text and OF_CRLF independent. I wonder, though, if there's a chokepoint where we could/should assert if the caller uses OF_CRLF without OF_Text, which would be suspicious.

I'm not concerned by the number of files touched, since it's mostly a mechanical refactor. If anyone's worried that the mechanical changes obscure the behavior change, this _could_ be broken into an NFC refactor patch for the renaming followed by a patch that implements the behavioral distinction. But I'm not going to insist on that.

llvm/include/llvm/Support/FileSystem.h
745–746

Don't be shy to give examples in the comments, as they can illuminate the motivation for the design.

... on platforms, like SystemZ, that distinguish between text and binary files.
757

Nit: If it were me, I'd put the OF_Text | OF_CRLF directly in the code (in place of the 3) instead of in the comment.

llvm/lib/Support/Windows/Path.inc
1086–1087

I assume it would be weird to set OF_CRLF without also setting OF_Text, so this change seems right for correctness. But maybe this Windows-only code would better express the intent if it were written:

if (Flags & (OF_CRLF | OF_Text))

Another idea would be to assert if OF_CRLF is set but OF_Text is not.

Or maybe it's fine as it is.

llvm/lib/Support/Windows/Path.inc
1086–1087

I chose to create OF_CRLF flag because I only want Windows to turn on text mode for OF_TextWithCRLF and not OF_Text.
This solution would return true even if Flags was just OF_Text.

if (Flags & (OF_CRLF | OF_Text))

I think adding an assertion here would be a good idea.

Specified z/OS in FileSystem.h comments, added an assertion in Window's Path.inc

abhina.sreeskantharajan marked 2 inline comments as done.Mar 30 2021, 10:39 AM
abhina.sreeskantharajan added inline comments.
llvm/include/llvm/Support/FileSystem.h
757

Thanks, your solution will also prevent future errors if there is any renumbering

abhina.sreeskantharajan marked an inline comment as done.Mar 30 2021, 10:39 AM

@MaskRay is there still any confusion about the problem this patch is trying to solve and concerns about the renaming?

MaskRay added a comment.EditedApr 1 2021, 11:49 AM
/// The file should be opened in text mode on platforms like z/OS that make
/// this distinction.
OF_Text = 1,
F_Text = 1, // For compatibility

/// The file should use a carriage linefeed '\r\n'.
/// Only makes a difference on windows.
OF_CRLF = 2,

/// The file should be opened in text mode and use a carriage linefeed '\r\n'
/// on platforms that make this distinction.
OF_TextWithCRLF = OF_Text | OF_CRLF,

OF_TextWithCRLF needs to say what platforms make a difference.

Can you mention in the description for Windows and z/OS, how these flags make a difference, and how developers should use these flags for portability?
It's still a bit unclear to me.

e.g. when I need to use OF_CRLR instead of OF_Text?

Update comment in FileSystem.h for OF_TextWithCRLF

/// The file should be opened in text mode on platforms like z/OS that make
/// this distinction.
OF_Text = 1,
F_Text = 1, // For compatibility

/// The file should use a carriage linefeed '\r\n'.
/// Only makes a difference on windows.
OF_CRLF = 2,

/// The file should be opened in text mode and use a carriage linefeed '\r\n'
/// on platforms that make this distinction.
OF_TextWithCRLF = OF_Text | OF_CRLF,

OF_TextWithCRLF needs to say what platforms make a difference.

Can you mention in the description for Windows and z/OS, how these flags make a difference, and how developers should use these flags for portability?
It's still a bit unclear to me.

e.g. when I need to use OF_CRLR instead of OF_Text?

So developers should use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

This is the old behaviour for comparison:
z/OS:
OF_None: open in binary mode
OF_Text: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with carriage return

MaskRay accepted this revision.Apr 1 2021, 2:49 PM
/// The file should be opened in text mode on platforms like z/OS that make
/// this distinction.
OF_Text = 1,
F_Text = 1, // For compatibility

/// The file should use a carriage linefeed '\r\n'.
/// Only makes a difference on windows.
OF_CRLF = 2,

/// The file should be opened in text mode and use a carriage linefeed '\r\n'
/// on platforms that make this distinction.
OF_TextWithCRLF = OF_Text | OF_CRLF,

OF_TextWithCRLF needs to say what platforms make a difference.

Can you mention in the description for Windows and z/OS, how these flags make a difference, and how developers should use these flags for portability?
It's still a bit unclear to me.

e.g. when I need to use OF_CRLR instead of OF_Text?

So developers should use either the OF_Text or OF_TextWithCRLF for text files and OF_None for binary files. If the developer doesn't want carriage returns on Windows, they should use OF_Text, if they do want carriage returns on Windows, they should use OF_TextWithCRLF.

So this is the behaviour per platform with my patch:

z/OS:
OF_None: open in binary mode
OF_Text : open in text mode
OF_TextWithCRLF: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with no carriage return
OF_TextWithCRLF: open file with carriage return

This is the old behaviour for comparison:
z/OS:
OF_None: open in binary mode
OF_Text: open in text mode

Windows:
OF_None: open file with no carriage return
OF_Text: open file with carriage return

Thanks for the information. LG.
Please update the summary to include the information.

If the patch also affects SystemZ, please adjust the [Windows] tag in the subject.

llvm/include/llvm/Support/FileSystem.h
752

Am I right that this is an implementation detail and should not be used directly by users (use OF_TextWithCRLF instead)?
If so, please add a comment.

771

windows -> Windows

This revision is now accepted and ready to land.Apr 1 2021, 2:49 PM
aganea added a comment.Apr 1 2021, 3:37 PM

I am still concerned by the fact that this patch doesn't fix the issue mentionned in https://reviews.llvm.org/D96363#2650460
Was the intention to fix that issue? Will the fix be done in a subsequent patch?

llvm/lib/Support/Windows/Path.inc
1087

s/,/ &&/

I am still concerned by the fact that this patch doesn't fix the issue mentionned in https://reviews.llvm.org/D96363#2650460
Was the intention to fix that issue? Will the fix be done in a subsequent patch?

I was fairly confident that if https://reviews.llvm.org/D96363 was the patch that was causing the issue for you, then this patch would have fixed the problem. I've already reverted the change in clang/lib/Frontend/Rewrite/FrontendActions.cpp and you mentioned that reverting the change in clang/lib/Driver/Driver.cpp didn't resolve the issue either. This patch should turn off CRLF translation in the remaining 2 files modified in that patch (Driver.cpp and DwarfLinkerForBinary.cpp). Since this doesn't seem to fix your problem, I'll need to do further investigation into which patch actually does cause the issue.

abhina.sreeskantharajan retitled this revision from [Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text to [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text.
abhina.sreeskantharajan edited the summary of this revision. (Show Details)

Updated comments and assertion

abhina.sreeskantharajan marked 3 inline comments as done.Apr 2 2021, 6:14 AM
aganea added a comment.Apr 2 2021, 9:05 AM

I am still concerned by the fact that this patch doesn't fix the issue mentionned in https://reviews.llvm.org/D96363#2650460
Was the intention to fix that issue? Will the fix be done in a subsequent patch?

I was fairly confident that if https://reviews.llvm.org/D96363 was the patch that was causing the issue for you

That is the case! I've confirmed that git checkout fdb640ea30d416368b76b68b106deda580c6aced~1 && ninja clang -C build generates a clang-cl.exe that works with my above test case. git revert fdb640ea30d416368b76b68b106deda580c6aced locally over ToT fixes the issue.

I am still concerned by the fact that this patch doesn't fix the issue mentionned in https://reviews.llvm.org/D96363#2650460
Was the intention to fix that issue? Will the fix be done in a subsequent patch?

I was fairly confident that if https://reviews.llvm.org/D96363 was the patch that was causing the issue for you

That is the case! I've confirmed that git checkout fdb640ea30d416368b76b68b106deda580c6aced~1 && ninja clang -C build generates a clang-cl.exe that works with my above test case. git revert fdb640ea30d416368b76b68b106deda580c6aced locally over ToT fixes the issue.

After scratching my head, I realized that I accidentally changed another function to Binary in clang/lib/Frontend/Rewrite/FrontendActions.cpp so that change was never properly reverted. I fixed this in https://reviews.llvm.org/D99837. Hopefully, this will fix your issue. Apologies for the confusion on my side.

Is there any more feedback on this patch?

yroux added a subscriber: yroux.Apr 6 2021, 2:13 AM

Hi,

Sorry I'm bit lost in the various patches proposed to fix the issue introduced by https://reviews.llvm.org/D97785
My understanding is that this is missing one to fix our Windows on ARM bots (broken for more than 2 weeks now)
So it'd be great to have it applied

This revision was landed with ongoing or failed builds.Apr 6 2021, 4:23 AM
This revision was automatically updated to reflect the committed changes.

Hi,

Sorry I'm bit lost in the various patches proposed to fix the issue introduced by https://reviews.llvm.org/D97785
My understanding is that this is missing one to fix our Windows on ARM bots (broken for more than 2 weeks now)
So it'd be great to have it applied

I've committed this and it looks like the bot is back to green :) https://lab.llvm.org/buildbot/#/builders/65/builds/1400. Sorry for the delayed fix

yroux added a comment.Apr 6 2021, 7:47 AM

I've committed this and it looks like the bot is back to green :) https://lab.llvm.org/buildbot/#/builders/65/builds/1400. Sorry for the delayed fix

Great, thanks a lot