This is an archive of the discontinued LLVM Phabricator instance.

[LTO][codegen] Add TargetLibraryInfoWrapperPass initially
ClosedPublic

Authored by FreddyYe on Dec 15 2021, 10:48 PM.

Details

Summary

Many codegen pass require this pass with useful triple info. Legacy pass manager need to
add a TargetLibraryInfo with the module info before run passes. Or the TargetLibraryInfo
will be initialized too conservative.

Diff Detail

Event Timeline

FreddyYe created this revision.Dec 15 2021, 10:48 PM
FreddyYe requested review of this revision.Dec 15 2021, 10:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2021, 10:48 PM

CodeGenPasses is a legacy pass manager here. For now. llc.cpp has not switched to the new pass manager. llc.cpp does add this pass. And many passes in codegen need a right TargetLibraryInfo. If not adding it here with the module's triple info. TargetLibraryInfo will be initialized with a null string. That results in setting many library calls unavailable, as conservative as possible. I think it's not reasonable. Welcome discuss! Welcome adding more reviewers.

pengfei added a subscriber: nikic.Dec 17 2021, 5:48 AM

Not sure if it affects the complie time. +@nikic to have a look.

nikic added a comment.Dec 17 2021, 5:55 AM

@pengfei I don't think this will affect compile-time, the TLI pass should be "free".

@pengfei I don't think this will affect compile-time, the TLI pass should be "free".

Got it. Thank you @nikic !

pengfei accepted this revision.Dec 17 2021, 6:09 AM

I think this should be good to go.

This revision is now accepted and ready to land.Dec 17 2021, 6:09 AM

the summary needs more description, why is this change needed? is it fixing a bug? a test case would be nice

FreddyYe edited the summary of this revision. (Show Details)Dec 19 2021, 7:42 PM
FreddyYe added a comment.EditedDec 19 2021, 7:46 PM

For example this patch can fix such performance drop on Linux:

$ cat foo.h                                                                                                                                                                                                            
double foo(double a);
$ cat foo.c                                                                                                                                                                                                            
#include <math.h>
#include "foo.h"

double foo(double a){
    return sqrt(a);
}
$ cat main.c
#include <stdio.h>
#include "foo.h"
int main() { printf("%lf\n", foo(200));}
$ cat foo.sh                                                                                                                                                                                                           
clang -c -o foo.o -Ofast -flto foo.c                                                                                                                                                                                                                                    
clang -c -o main.o -Ofast -flto main.c                                                                                                                                                                                                                                  
clang -Ofast -flto foo.o main.o -lm                                                                                                                                                                                                                                     
objdump -d a.out | less
$ sh foo.sh

Without this patch, when enable -flto, it will generate call __sqrt_finite, when disable -flto , it will generate sqrtsd %xmm0,%xmm0. The later version is faster. I don't know how to add lto dependent lit test. Hi @aeubanks, do you know how?

For example this patch can fix such performance drop on Linux:

$ cat foo.h                                                                                                                                                                                                            
double foo(double a);
$ cat foo.c                                                                                                                                                                                                            
#include <math.h>
#include "foo.h"

double foo(double a){
    return sqrt(a);
}
$ cat main.c
#include <stdio.h>
#include "foo.h"
int main() { printf("%lf\n", foo(200));}
$ cat foo.sh                                                                                                                                                                                                           
clang -c -o foo.o -Ofast -flto foo.c                                                                                                                                                                                                                                    
clang -c -o main.o -Ofast -flto main.c                                                                                                                                                                                                                                  
clang -Ofast -flto foo.o main.o -lm                                                                                                                                                                                                                                     
objdump -d a.out | less
$ sh foo.sh

Without this patch, when enable -flto, it will generate call __sqrt_finite, when disable -flto , it will generate sqrtsd %xmm0,%xmm0. The later version is faster. I don't know how to add lto dependent lit test. Hi @aeubanks, do you know how?

Why would TLI be involved in that test. Isn't the sqrt call converted to llvm.sqrt by the frontend?

Why would TLI be involved in that test. Isn't the sqrt call converted to llvm.sqrt by the frontend?

Sorry for the late response. BTW, thanks for review! This maybe not a good example. I realized it depends on certain version of math.h. Here's the source code pre-processed on my system which can generate call __sqrt_finite: https://gcc.godbolt.org/z/947nhffv1

I'd expect most/all the TLI optimizations to have happened before codegen, although some IR passes can be added to the codegen pipeline, and having the right TLI for those does make sense.
it'd be nice to have an actual test for this in tree, but that might be hard to come up with. can you use --print-after-all to see which pass is doing something different when this patch is applied, just to convince me that this actually works?

I'd expect most/all the TLI optimizations to have happened before codegen, although some IR passes can be added to the codegen pipeline, and having the right TLI for those does make sense.
it'd be nice to have an actual test for this in tree, but that might be hard to come up with. can you use --print-after-all to see which pass is doing something different when this patch is applied, just to convince me that this actually works?

Speculating a little. For the sqrt_finite example given previously, it's probably SelectionDAGBuilder that is doing something different. There is code in there to convert LibFunc_sqrt_finite to ISD::FSQRT. I guess we don't turn finite calls into intrinsics anywhere before that. TargetLibraryInfo needs the Triple in order to know if sqrt_finite is supported. It's marked unavailable in TargetLibraryInfo.cpp's initialize function unless the Triple has a Linux OS and a GNU environment.

I'd expect most/all the TLI optimizations to have happened before codegen, although some IR passes can be added to the codegen pipeline, and having the right TLI for those does make sense.
it'd be nice to have an actual test for this in tree, but that might be hard to come up with. can you use --print-after-all to see which pass is doing something different when this patch is applied, just to convince me that this actually works?

Speculating a little. For the sqrt_finite example given previously, it's probably SelectionDAGBuilder that is doing something different. There is code in there to convert LibFunc_sqrt_finite to ISD::FSQRT. I guess we don't turn finite calls into intrinsics anywhere before that. TargetLibraryInfo needs the Triple in order to know if sqrt_finite is supported. It's marked unavailable in TargetLibraryInfo.cpp's initialize function unless the Triple has a Linux OS and a GNU environment.

in that case we should be able to put some IR with some target triple through the LTO pipeline and see that we call the proper function as a test

craig.topper commandeered this revision.Jan 5 2022, 1:44 PM
craig.topper added a reviewer: FreddyYe.

Commandeering to add test case.

Add LTO test.

craig.topper requested review of this revision.Jan 5 2022, 2:03 PM
aeubanks accepted this revision.Jan 5 2022, 2:08 PM

awesome, thanks!

This revision is now accepted and ready to land.Jan 5 2022, 2:08 PM
FreddyYe accepted this revision.Jan 5 2022, 4:25 PM

Thank you both! Your speculating is right! That's the original motivation for this patch.

@FreddyYe do you mind taking this back to commit since the code change is yours.

@FreddyYe do you mind taking this back to commit since the code change is yours.

Thanks for asking. I don't mind! pls commit!

@FreddyYe do you mind taking this back to commit since the code change is yours.

Thanks for asking. I don't mind! pls commit!

Sorry. I was asking you to take ownership back and commit it yourself.

FreddyYe commandeered this revision.Jan 5 2022, 4:57 PM
FreddyYe edited reviewers, added: craig.topper; removed: FreddyYe.

All right. I'll commit then.

This revision was landed with ongoing or failed builds.Jan 5 2022, 5:25 PM
This revision was automatically updated to reflect the committed changes.