Page MenuHomePhabricator

Part 2 to fix x86_64 fp128 calling convention.
ClosedPublic

Authored by chh on Jul 22 2015, 4:12 PM.

Details

Summary

Part 1 was submitted in http://reviews.llvm.org/D15134.
Bugs:

Changes:

  • X86RegisterInfo.td, X86RecognizableInstr.cpp: Add FR128 register class.
  • X86CallingConv.td: Pass f128 values in XMM registers or on stack.
  • X86InstrCompiler.td, X86InstrInfo.td, X86InstrSSE.td: Add instruction selection patterns for f128.
  • X86ISelLowering.cpp: When target has MMX registers, configure MVT::f128 in FR128RegClass, with TypeSoftenFloat action, and custom actions for some opcodes. Add missed cases of MVT::f128 in places that handle f32, f64, or vector types. Add TODO comment to support f128 type in inline assembly code.
  • Add unit tests for x86_64 fp128 type.

Diff Detail

Repository
rL LLVM

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
chh updated this revision to Diff 41962.Dec 4 2015, 4:55 PM
chh retitled this revision from Fix x86_64 fp128 calling convention to Part 2 to fix x86_64 fp128 calling convention. .
chh updated this object.
chh added a reviewer: davidxl.
chh removed a subscriber: davidxl.

Now the diff does not contain changes in submitted part1.

davidxl added inline comments.Dec 5 2015, 6:46 PM
lib/Target/X86/X86ISelLowering.cpp
14628 ↗(On Diff #41962)

This needs some explanation. Why can the Op1's value type be i128?

27776 ↗(On Diff #41962)

Why TODO here? 'x' constraint should work.

27889 ↗(On Diff #41962)

Explain TODO here.

lib/Target/X86/X86InstrInfo.td
957 ↗(On Diff #41962)

Unrelated format change here.

davidxl added inline comments.Dec 5 2015, 9:32 PM
lib/Target/X86/X86InstrSSE.td
8862 ↗(On Diff #41962)

Move the comment above the pattern def.

  1. movaps is shorter, not 'should be'
  2. regarding 'faster' part -- put a reference there. In fact, f128 operations should be considered in integer domain so movdqa should be used to avoid domain bypass penalty.
8868 ↗(On Diff #41962)

pand is for SIMD integer. andps is shorter though.

chh updated this revision to Diff 42090.Dec 7 2015, 12:13 PM
chh added inline comments.
lib/Target/X86/X86ISelLowering.cpp
14628 ↗(On Diff #41962)

Removed it. It was an condition only triggered by my older hacks.
Not it should not happen.

27776 ↗(On Diff #41962)

f128 and i128 were not in SSE_REG before.
So I think this is a new feature that can be added later.

27889 ↗(On Diff #41962)

Same as above. f128 and i128 not in SSE_REG before this change.

lib/Target/X86/X86InstrInfo.td
957 ↗(On Diff #41962)

They are changed to align up with new line 962.

lib/Target/X86/X86InstrSSE.td
8862 ↗(On Diff #41962)

I updated comment to keep only the shorter and SSE reasons.
I was not sure about 'faster with movaps',
which seems to be used more in clang than gcc.
My main reasons are shorter and available in SSE for Android's applications.

8868 ↗(On Diff #41962)

I also choose andps over pand for shorter and availability in SSE,
not sure about performance difference when combined with other instructions.
I think by not splitting f128 into two registers, we already saved more code and execution time.

davidxl added inline comments.Dec 8 2015, 5:39 PM
lib/Target/X86/X86InstrInfo.td
957–962 ↗(On Diff #42090)

It seems you are adding extra space before :

lib/Target/X86/X86InstrSSE.td
8871 ↗(On Diff #42090)

Is there any coding style guidelines for table gen code? There are long lines that are wrapped ..

test/CodeGen/X86/fp128-calling-conv.ll
15 ↗(On Diff #42090)

Is this a relevant test? non-f128 related tests can be submitted in a different patch.

test/CodeGen/X86/fp128-cast.ll
12 ↗(On Diff #42090)

There are also lots of irrelevant tests added in this file .

chh added inline comments.Dec 9 2015, 4:37 PM
lib/Target/X86/X86InstrInfo.td
957–962 ↗(On Diff #42090)

Yes, it seems to be the style at line 964-969 too.

lib/Target/X86/X86InstrSSE.td
8871 ↗(On Diff #42090)

Note sure if there is special rule for table gen code.
http://llvm.org/docs/CodingStandards.html says 80 columns.
There are few exceptions in this file,
but now I wrapped all my new lines to less than 80 characters.

test/CodeGen/X86/fp128-calling-conv.ll
15 ↗(On Diff #42090)

Okay, i will put non-fp128 type tests to another patch.

test/CodeGen/X86/fp128-cast.ll
12 ↗(On Diff #42090)

Okay, i will put non-fp128 type tests to another patch.

chh updated this revision to Diff 42354.Dec 9 2015, 4:38 PM
davidxl added inline comments.Dec 10 2015, 11:32 AM
test/CodeGen/X86/fp128-cast.ll
154 ↗(On Diff #42354)

Please move this test case to a different patch.

test/CodeGen/X86/fp128-compare.ll
4 ↗(On Diff #42354)

Move this test case out of the patch -- -there are more than 20 or so test cases below that need to be separated out.

chh added a comment.Dec 10 2015, 11:39 AM

Will reduce fp128-compare.ll in the next diff.

test/CodeGen/X86/fp128-cast.ll
154 ↗(On Diff #42354)

This is about f128 calling convention to return f128 in SSE.
Are you sure about removing this one?

davidxl added inline comments.Dec 10 2015, 11:42 AM
test/CodeGen/X86/fp128-cast.ll
154 ↗(On Diff #42354)

you are right -- this one is relevant.

Please check the rest.

chh updated this revision to Diff 42455.Dec 10 2015, 12:38 PM

Removed test cases without f128 type.

chh marked an inline comment as done.Dec 10 2015, 12:41 PM
davidxl edited edge metadata.Dec 10 2015, 12:54 PM

great. I will look at the tests in more details.

David

davidxl added inline comments.Dec 10 2015, 2:56 PM
test/CodeGen/X86/fp128-calling-conv.ll
5 ↗(On Diff #42455)

Is this variable used?

8 ↗(On Diff #42455)

Is it used?

11 ↗(On Diff #42455)

Are the two parts swapped? GCC seems to generates: 3FFF0000000000000000000000000000

test/CodeGen/X86/fp128-cast.ll
12 ↗(On Diff #42455)

TestFPExtF32_F128

25 ↗(On Diff #42455)

TestFPExt ..

38 ↗(On Diff #42455)

Missing a test for conversion to unsigned I32

126 ↗(On Diff #42455)

might be better to relax the check a little : testq %rax, %rax should be fine too.

135 ↗(On Diff #42455)

Can you simplify the variable names ?

166 ↗(On Diff #42455)

There is no guarantee adcq will be after movq ...

test/CodeGen/X86/fp128-compare.ll
23 ↗(On Diff #42455)

missing check of 'set<cc>'

33 ↗(On Diff #42455)

is this correct?

45 ↗(On Diff #42455)

Missing check

test/CodeGen/X86/fp128-i128.ll
4 ↗(On Diff #42455)

Better comment?

92 ↗(On Diff #42455)

var names can be cleaned up to be shorter.

125 ↗(On Diff #42455)

seems irrelevant.

137 ↗(On Diff #42455)

Seems irrelevant.

205 ↗(On Diff #42455)

Can this test case be simplified more?

chh marked 7 inline comments as done.Dec 10 2015, 7:42 PM
chh added inline comments.
test/CodeGen/X86/fp128-calling-conv.ll
5 ↗(On Diff #42455)

No, Removed now.

8 ↗(On Diff #42455)

No. Removed now.

11 ↗(On Diff #42455)

That looked strange but correct. I copied that from clang's dump, and the llvm output assembly code is the same as gcc's. http://llvm.org/docs/LangRef.html says big-endian is used for hexadecimal floating point constants.

test/CodeGen/X86/fp128-cast.ll
38 ↗(On Diff #42455)

Added conversion to uint32 and uint64.

135 ↗(On Diff #42455)

These were generated by clang for my simplified C code from libm.
They are useful to show the clang transformations.
I will add the C code example as comments.

166 ↗(On Diff #42455)

Okay, relaxed the check patterns.

test/CodeGen/X86/fp128-compare.ll
33 ↗(On Diff #42455)

Yes. It's a strange optimization, which returns 1 if %cmp is negative as __lttf2 will return when %d1 < %d2.

test/CodeGen/X86/fp128-i128.ll
92 ↗(On Diff #42455)

These IL were copied from libm compiled code. Clang has its way to convert C union structure references. I will add original C code as comment.

125 ↗(On Diff #42455)

Okay, removed the i64 test.

137 ↗(On Diff #42455)

We need to test i128 too, since this patch put also i128 into the FR128 register class. The i128 instruction was generated from f128 C code.

205 ↗(On Diff #42455)

This was from libm C code, which triggered one error related to f128 without a complete patch. So I added it. I tried to reduce the original C code but then that won't trigger the problem.

chh updated this revision to Diff 42494.Dec 10 2015, 7:42 PM
chh edited edge metadata.
chh marked an inline comment as done.
davidxl added inline comments.Dec 11 2015, 10:59 AM
test/CodeGen/X86/fp128-cast.ll
136 ↗(On Diff #42494)

Why do we care what transformations have been done to get the IR? The IR code should by itself readable -- so while the C example is useful, I still prefer the naming in IR simplified.

test/CodeGen/X86/fp128-compare.ll
34 ↗(On Diff #42494)

ok. But it is possible with test + sets, right? may be adding a comment so that people know how to fix the test if it breaks in the future?

test/CodeGen/X86/fp128-i128.ll
7 ↗(On Diff #42494)

__float128

23 ↗(On Diff #42494)

long double --> __float128?

44 ↗(On Diff #42494)

The pattern checked is pretty long -- I worry it may break in the future. Is it possible to relax it some how?

57 ↗(On Diff #42494)

long double --> __float128

81 ↗(On Diff #42494)

long double --> __float128

106 ↗(On Diff #42494)

long double --> __float128

chh added inline comments.Dec 11 2015, 3:38 PM
test/CodeGen/X86/fp128-cast.ll
136 ↗(On Diff #42494)

The comment now seems to be showing at wrong place in this code diff.
The biggest confusing name is u.sroa.0.4.extract.shift in TestBits128.
So I will shorten long names in this function for now.

The C code is important for anyone in the future to test, if not having time to rebuild and test the whole AOSP with libm.
Any simplification of the IR might work, but clang could generate different bit operators for those long double union types and trigger problems with any ad-hoc f128 optimization.

test/CodeGen/X86/fp128-compare.ll
34 ↗(On Diff #42494)

Sure, added comment.
If it is broken in the future, maybe it would be easier to continue using this trick. :-)

test/CodeGen/X86/fp128-i128.ll
7 ↗(On Diff #42494)

Unfortunately, clang does not accept __float128 keyword, although it can emit f128 for llvm.

23 ↗(On Diff #42494)

no __float128 in clang.

44 ↗(On Diff #42494)

I tried to reduce the C code, but any reduction won't trigger the complicated IL that reached a point that my partial fix core dumped. Maybe we can take out a few CHECK-NEXT requirements.

On the other hard, I was terrified by so many ad-hoc optimizations of floating points for the usage patterns in libm.
I guess llvm tried to match or do better then gcc and libm tried to use every possible bit operations.
So maybe it is better for Android or anyone depends on f128 type to have more check rules here.
Whoever changes float optimization in the future has better fully understand and update these tests.

chh updated this revision to Diff 42595.Dec 11 2015, 3:39 PM
davidxl added inline comments.Dec 11 2015, 3:49 PM
test/CodeGen/X86/fp128-i128.ll
8 ↗(On Diff #42595)

This should be fixed in clang FE. By default, long double is extended FP, not quadFP --- so do fix the comment to avoid confusion.

chh updated this revision to Diff 42634.Dec 11 2015, 10:57 PM
chh marked an inline comment as done.
chh added inline comments.
test/CodeGen/X86/fp128-i128.ll
8 ↗(On Diff #42595)

Used __float128 here and added more comments at the top of file.

davidxl accepted this revision.Dec 12 2015, 7:04 PM
davidxl edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Dec 12 2015, 7:04 PM
chh updated this revision to Diff 42766.Dec 14 2015, 1:51 PM
chh edited edge metadata.

Fix infinite loop in SelectionDAGBuilder.cpp, caught by new regression tests in this patch.
It happens only with new f128 type, whose VT == TLI.getTypeToTransformTo(Ctx, VT).

chh added a comment.Dec 14 2015, 1:54 PM

David, thanks for the review suggestions and approval.
There is one recent regression caught by two of my unit tests,
so I need to fix it too in the updated diff.

This revision was automatically updated to reflect the committed changes.