Download Raw Diff

Details

Reviewers

ruiu
• rafael

Commits

rG77d1cb1ddf35: [ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.
rLLD253966: [ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.
rL253966: [ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.

Summary

R_X86_64_GOTTPOFF is not always requires GOT entries. Some relocations can be converted to local ones.
This patch is a port of gold optimizations for this relocation:
https://android.googlesource.com/toolchain/binutils/+/53b6ed3bceea971857c996b6dcb96de96b99335f/binutils-2.19/gold/x86_64.cc#2551

Diff Detail

Repository: rL LLVM

Event Timeline

grimar updated this revision to Diff 40303.Nov 16 2015, 9:30 AM

grimar retitled this revision from to [ELF2] - Optimization for R_X86_64_GOTTPOFF relocation..

grimar updated this object.

grimar added reviewers: ruiu, • rafael.

grimar added subscribers: llvm-commits, grimar.

Herald added subscribers: danalbert, tberghammer. · View Herald TranscriptNov 16 2015, 9:30 AM

grimar added inline comments.Nov 16 2015, 9:33 AM

ELF/Target.cpp
362 ↗	(On Diff #40303)	I dont sure what to do here. How to check out of bound for negative idx. Should I check (will need BufBegin arg I guess) or we can assume its never be out ?

ruiu added inline comments.Nov 16 2015, 10:06 AM

ELF/InputSection.cpp
150–154 ↗	(On Diff #40303)	Can you move this above line 137 just like "if (Target->isTlsGlobalDynamicReloc(Type)) { ... }"? Also, do you think you can merge getTlsOptimization and relocateTlsToLe into one function? I'd probably write like this. // Some TLS relocations are always compiled to use GOT, but the linker // is sometimes able to rewrite it so that it doesn't use GOT. This may not // only apply relocations but also modify preceding instructions. if (applyOptimizeTls(BufLoc, Type, AddrLoc, Body)) continue;
ELF/Target.cpp
356 ↗	(On Diff #40303)	Please add a reference to Ulrich's document.
362–368 ↗	(On Diff #40303)	Is that actually a possible scenario that one wants to add a TLS value to SP register? Do you have to take care of that?
ELF/Target.h
61 ↗	(On Diff #40303)	Currently it only returns None or ToLE, so can you change the signature of the function so that it returns a boolean value?

grimar added inline comments.Nov 16 2015, 11:53 AM

ELF/InputSection.cpp
150–154 ↗	(On Diff #40303)	About merging into one. getTlsOptimization() is also now used also in another place, in bool X86_64TargetInfo::relocNeedsGot(). So such merging probably not possible.
ELF/Target.cpp
362–368 ↗	(On Diff #40303)	I did not analyse how much people do that :) gold and bfd do the same. And it is still possible scenario even if chance is low. I jsut want to be on a safe side with this single line.
ELF/Target.h
61 ↗	(On Diff #40303)	Will do.

ruiu added inline comments.Nov 16 2015, 12:26 PM

ELF/InputSection.cpp
150–154 ↗	(On Diff #40303)	Fair. Then please rename relocateTlsOptimize. (Readers don't really want to know how it is going to be optimized in this context, so "ToLe" is too detailed.)
ELF/Target.cpp
362–368 ↗	(On Diff #40303)	As long as it can be written simple enough, I'm fine, but currently it looks a bit too cryptic. There are six instructions this function may emit. Could you explain what each instruction mean? 0x49 0x81 0xc? 0x49 0xc7 0xc? 0x?? 0x81 0xc? 0x?? 0xc7 0xc? 0x4d 0x8d 0x8? 0x?? 0x8d 0x8?

Review comments addressed.
Simplified.
Test updated.

ELF/InputSection.cpp
150–154 ↗	(On Diff #40303)	Renaming is done, but I did not move it because I need SymVA variable. Also this part of code became shorter now. And it uses continue like Target->relocNeedsCopy, so it looks to be in the most consistent place now, no ?
ELF/Target.cpp
362–368 ↗	(On Diff #40303)	That is (but I simplified code, so actual table is below in comments) 0x49 0x81 0xc? This is 0x49 0x81 0xC? XX XX XX XX, add $0xXXXXXXXX, %r? Ex: 49 81 C0 00 00 00 00 = add $0x0, %r8 0x49 0xc7 0xc? This is 0x49 0xC7 0xC? XX XX XX XX, mov $0xXXXXXXXX, %r? Ex: 49 C7 C0 00 00 00 00 = mov $0x0, %r8 0x?? 0x81 0xc? Never saw not the 0x48 as prefix for that cases, so This is 0x48 0x81 0xC? XX XX XX XX, add $0xXXXXXXXX, %r?? Ex: 48 81 C0 00 00 00 00 = add $0x0, %rax 0x?? 0xc7 0xc? Never saw not the 0x48 as prefix for that cases, so This is 0x48 0xc7 0xc? XX XX XX XX, mov $0xXXXXXXXX, %r?? Ex: 48 C7 C0 00 00 00 00 = mov $0x0, %rax 0x4d 0x8d 0x8? This is 0x4D 0x8D 0x8? XX XX XX XX, lea 0xXXXXXXXX(%r?),%r? Ex: 4D 8D 89 00 00 00 00 = lea 0x0(%r9),%r9 0x?? 0x8d 0x8? Never saw not the 0x48 as prefix for that cases, so This is 0x48 0x8d 0x8? XX XX XX XX, lea 0xXXXXXXXX(%r??),%r?? Ex: 48 8D 89 00 00 00 00 = lea 0x0(%rcx),%rcx
362–368 ↗	(On Diff #40303)	gold and bfd contain misleading comment about that special rsp handling. In fact that also touch r12 for them: asm: addq tls0@GOTTPOFF(%rip), %rsp addq tls0@GOTTPOFF(%rip), %r8 ... addq tls0@GOTTPOFF(%rip), %r12 ... addq tls0@GOTTPOFF(%rip), %r15 00000000004000f0 <main>: 400105: 48 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%rsp 40010c: 4d 8d 80 f8 ff ff ff lea -0x8(%r8),%r8 ... 400128: 49 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%r12 ... 40013d: 4d 8d bf f8 ff ff ff lea -0x8(%r15),%r15 Looks like a bug of gold/bfd for me. I rewrited the code to simplify it and removed special handling for rsp for now. I think we can add that if it will be needed later. Updated possible emit table is: 0x49 0xC7 0xC? 0x?? 0xC7 0xC? 0x4D 0x8D 0x8? 0x?? 0x8D 0x8?
ELF/Target.h
61 ↗	(On Diff #40303)	Done.

In D14713#291058, @grimar wrote:

I rewrited the code to simplify it and removed special handling for rsp for now. I think we can add that if it will be needed later.

Or I can insert the fixed (in compare with gold/bfd) check affecting only rsp, but that will enlarge the void X86_64TargetInfo::relocateTlsOptimize()

ruiu added inline comments.Nov 17 2015, 10:00 AM

ELF/Target.cpp
367 ↗	(On Diff #40406)	Add BufStart and check for the bound.
370–374 ↗	(On Diff #40406)	Instruct -> Inst IsMovOp -> IsMov
375–378 ↗	(On Diff #40406)	Does this handle SP register?
379 ↗	(On Diff #40406)	I think the original code was better write32le(Loc, SA - Out<ELF64LE>::TlsPhdr->p_memsz);

grimar added inline comments.Nov 17 2015, 10:20 AM

ELF/Target.cpp
375–378 ↗	(On Diff #40406)	No. As I wrote in comments it will enlarge that code, but I can do that. Should I ?
379 ↗	(On Diff #40406)	According to manual its optimized exactly to R_X86_64_TPOFF32. Not sure how important skipped in that case OOR check as well: case R_X86_64_TPOFF32: if (!isInt<32>(Val)) error("R_X86_64_TPOFF32 out of range");

grimar added inline comments.Nov 17 2015, 12:11 PM

ELF/Target.cpp

375–378 ↗

(On Diff #40406)

With handling SP/r12 registers will be something like next:
(generates
48 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%rsp for addq tls0@GOTTPOFF(%rip), %rsp
49 81 c4 f8 ff ff ff add $0xfffffffffffffff8,%r12 for addq tls0@GOTTPOFF(%rip), %r12)

void X86_64TargetInfo::relocateTlsOptimize(uint8_t *Loc, uint8_t *BufEnd,
                                           uint32_t Type, uint64_t P,
                                           uint64_t SA) const {
  uint8_t *Prefix = &Loc[-3];
  uint8_t *Inst = &Loc[-2];
  uint8_t *RegSlot = &Loc[-1];
  uint8_t Reg = (Loc[-1]) >> 3;
  bool IsAdd = !(*Inst == 0x8b || Reg == 4);
  if (Reg == 4 && !IsAdd)
    *Inst = 0x81;
  else
    *Inst = IsAdd ? 0x8d : 0xc7;
  if (*Prefix == 0x4c)
    *Prefix = IsAdd ? 0x4d : 0x49;
  *RegSlot = IsAdd ? (0x80 | Reg | (Reg << 3)) : (0xc0 | Reg);

  relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA);
}

Review comments addressed
Added handling of rsp/r12, added comment why it is needed.
Added buffer overrun check.
Test updated (added rsp/r12 cases).

ELF/Target.cpp
367 ↗	(On Diff #40406)	Done.
370–374 ↗	(On Diff #40406)	Done.

I also have patch implementing optimizations for R_X86_64_TLSLD relocation. Its relies on some code changes in this one, so will post it right after it.

ruiu added inline comments.Nov 18 2015, 1:05 PM

ELF/InputSection.cpp
150 ↗	(On Diff #40484)	getTlsOptimization -> isTlsOptimized (since it returns a boolean value.) Can you move this before uintX_t SymVA = getSymVA<ELFT>(Body); ?
151 ↗	(On Diff #40484)	Currently you are not using BufEnd, so please remove that parameter.
ELF/Target.cpp
354 ↗	(On Diff #40484)	Remove outermost ().
371–372 ↗	(On Diff #40484)	I'm wondering if this is correct. One should never use a R_X86_64_GOTTPOFF relocation with instructions other than MOV or ADD?

Review comments addressed.

ELF/InputSection.cpp
150 ↗	(On Diff #40484)	Done.
151 ↗	(On Diff #40484)	It is used because relocateTlsOptimize() calls relocateOne() which accepts it. I am not using Type now, so removed it.
ELF/Target.cpp
354 ↗	(On Diff #40484)	Done.
371–372 ↗	(On Diff #40484)	As stated in comment above the method, @gottpoff(%rip) must be used in movq or addq instructions only (its written in Ulrich manual), so yes, this relocation optimization only can have mov and add cases and thats definitely correct. Also just in case gold and bfd has the same logic (but Ulrich is more preferable information source I think).

ruiu added inline comments.Nov 18 2015, 3:24 PM

ELF/Target.cpp

392 ↗

(On Diff #40551)

I applied you patch to my local repository to see if there's a room to make this simpler, only to find that this wouldn't be significantly simplified. I slightly updated the comments and code though. Please take a look.

// In some conditions, R_X86_64_GOTTPOFF relocation can be optimized to
// R_X86_64_TPOFF32 so that R_X86_64_TPOFF32 so that it does not use GOT.
// This function does that. Read "ELF Handling For Thread-Local Storage,
// 5.5 x86-x64 linker optimizations" (http://www.akkadia.org/drepper/tls.pdf)
// by Ulrich Drepper for details.
void X86_64TargetInfo::relocateTlsOptimize(uint8_t *Loc, uint8_t *BufStart,
                                           uint8_t *BufEnd, uint64_t P,
                                           uint64_t SA) const {
  // Ulrich's document section 6.5 says that @gottpoff(%rip) must be
  // used in MOVQ or ADDQ instructions only.
  // "MOVQ foo@GOTTPOFF(%RIP), %REG" is transformed to "MOVQ $foo, %REG".
  // "ADDQ foo@GOTTPOFF(%RIP), %REG" is transformed to "LEAQ foo(%REG), %REG"
  // (if the register is not RSP) or "ADDQ $foo, %RSP".
  // Opcodes info can be found at http://ref.x86asm.net/coder64.html#x48.
  if (Loc - 3 < BufStart)
    error("TLS relocation optimization failed. Buffer overrun!");
  uint8_t *Prefix = Loc - 3;
  uint8_t *Inst = Loc - 2;
  uint8_t *RegSlot = Loc - 1;
  uint8_t Reg = Loc[-1] >> 3;
  bool IsMov = *Inst == 0x8b;
  bool RspAdd = !IsMov && Reg == 4;
  // r12 and rsp registers requires special handling.
  // Problem is that for other registers, for example leaq 0xXXXXXXXX(%r11),%r11
  // result out is 7 bytes: 4d 8d 9b XX XX XX XX,
  // but leaq 0xXXXXXXXX(%r12),%r12 is 8 bytes: 4d 8d a4 24 XX XX XX XX.
  // The same true for rsp. So we convert to addq for them, saving 1 byte that
  // we dont have.
  if (RspAdd)
    *Inst = 0x81;
  else
    *Inst = IsMov ? 0xc7 : 0x8d;
  if (*Prefix == 0x4c)
    *Prefix = (IsMov || RspAdd) ? 0x49 : 0x4d;
  *RegSlot = (IsMov || RspAdd) ? (0xc0 | Reg) : (0x80 | Reg | (Reg << 3));
  relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA);
}

• rafael added inline comments.Nov 18 2015, 4:46 PM

test/elf2/tls-opt.s
2 ↗	(On Diff #40551)	Just create a _start and drop the "-e main"
13 ↗	(On Diff #40551)	Do you really need the binary content? The text should be sufficient, no? I.E.: // DISASM-NEXT: 11000: {{.*}} movq $-8, %rax

• rafael added inline comments.Nov 18 2015, 4:47 PM

ELF/Target.cpp
371 ↗	(On Diff #40551)	Please add a test for this. It is important to show that we can actually get here with broken input.

Review comments addressed.

ELF/Target.cpp
371 ↗	(On Diff #40551)	I bit doubt that we should check such things here. Broken input for that place with or without that patch will lead to broken output. With this optimization applied or without it. The difference only in what way it broken (broken + optimization == broken x 2, but still broken). Broken input is a bug of compiler and should be covered by its tests or by our tests of broken imputs that are separate tests. Second problem that if according manual this relocation must use only mov or add then in future llvm-mc for example may add check for that and do not generate broken output. That will make this test to fail. We can save from that only by placing precompiled corrupted binaries as inputs, but thats again not for that patch I think. But anyways just in case I added what you asked for to test. So if you really think we should do that for some reason - I see nothing too much bad in that if it touches only the test. Its easy to remove from test at any time.
392 ↗	(On Diff #40551)	Looks good. Applied your changes. Thanks ! I only added r12 to comment: // (if the register is not RSP/R12) or "ADDQ $foo, %RSP".
test/elf2/tls-opt.s
2 ↗	(On Diff #40551)	Done.
13 ↗	(On Diff #40551)	Any of them is sufficient I think but we are emiting binary output and not the text. So need to keep and check the binary. At the same time text helps to read the test so it should be there at least in comments for binary I think. But there is no point to move it to comments I believe. I would leave it as is because of that all.

grimar added inline comments.Nov 19 2015, 7:42 AM

test/elf2/tls-opt.s
24 ↗	(On Diff #40608)	The last one also is a part of currupted output, I`ll move it below // Corrupred output:

Rebased.

grimar added inline comments.Nov 20 2015, 4:53 AM

test/ELF/tls-opt.s
2 ↗	(On Diff #40747)	Forgot to switch to ld.lld here. Will be fixed.

Rebased
Removed buffer overrun check and *BeginBuf argument.
Test: ld.lld2->ld.lld

LGTM with a nit.

ELF/InputSection.cpp
137 ↗	(On Diff #40903)	Can you move Body.isTLS() into isTlsOptimized function?

This revision is now accepted and ready to land.Nov 23 2015, 11:16 AM

In D14713#295215, @ruiu wrote:

LGTM with a nit.

Will do, thanks !

Closed by commit rL253966: [ELF2] - Optimization for R_X86_64_GOTTPOFF relocation. (authored by grimar). · Explain WhyNov 24 2015, 1:02 AM

This revision was automatically updated to reflect the committed changes.

jevinskie added a subscriber: jevinskie.Nov 24 2015, 12:24 PM

Diff 41013

lld/trunk/ELF/InputSection.cpp

Show First 20 Lines • Show All 128 Lines • ▼ Show 20 Lines	for (const RelType &RI : Rels) {

if (Target->isTlsGlobalDynamicReloc(Type)) {		if (Target->isTlsGlobalDynamicReloc(Type)) {
Target->relocateOne(BufLoc, BufEnd, Type, AddrLoc,		Target->relocateOne(BufLoc, BufEnd, Type, AddrLoc,
Out<ELFT>::Got->getEntryAddr(Body) +		Out<ELFT>::Got->getEntryAddr(Body) +
getAddend<ELFT>(RI));		getAddend<ELFT>(RI));
continue;		continue;
}		}

		if (Target->isTlsOptimized(Type, Body)) {
		Target->relocateTlsOptimize(BufLoc, BufEnd, AddrLoc,
		getSymVA<ELFT>(Body));
		continue;
		}

uintX_t SymVA = getSymVA<ELFT>(Body);		uintX_t SymVA = getSymVA<ELFT>(Body);
if (Target->relocNeedsPlt(Type, Body)) {		if (Target->relocNeedsPlt(Type, Body)) {
SymVA = Out<ELFT>::Plt->getEntryAddr(Body);		SymVA = Out<ELFT>::Plt->getEntryAddr(Body);
Type = Target->getPltRefReloc(Type);		Type = Target->getPltRefReloc(Type);
} else if (Target->relocNeedsGot(Type, Body)) {		} else if (Target->relocNeedsGot(Type, Body)) {
SymVA = Out<ELFT>::Got->getEntryAddr(Body);		SymVA = Out<ELFT>::Got->getEntryAddr(Body);
Type = Body.isTLS() ? Target->getTlsGotReloc()		Type = Body.isTLS() ? Target->getTlsGotReloc()
: Target->getGotRefReloc(Type);		: Target->getGotRefReloc(Type);
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

lld/trunk/ELF/Target.h

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	virtual void writePltEntry(uint8_t *Buf, uint64_t GotEntryAddr,
uint64_t PltEntryAddr, int32_t Index) const = 0;		uint64_t PltEntryAddr, int32_t Index) const = 0;
virtual bool isRelRelative(uint32_t Type) const;		virtual bool isRelRelative(uint32_t Type) const;
virtual bool relocNeedsCopy(uint32_t Type, const SymbolBody &S) const;		virtual bool relocNeedsCopy(uint32_t Type, const SymbolBody &S) const;
virtual bool relocNeedsGot(uint32_t Type, const SymbolBody &S) const = 0;		virtual bool relocNeedsGot(uint32_t Type, const SymbolBody &S) const = 0;
virtual bool relocPointsToGot(uint32_t Type) const;		virtual bool relocPointsToGot(uint32_t Type) const;
virtual bool relocNeedsPlt(uint32_t Type, const SymbolBody &S) const = 0;		virtual bool relocNeedsPlt(uint32_t Type, const SymbolBody &S) const = 0;
virtual void relocateOne(uint8_t Loc, uint8_t BufEnd, uint32_t Type,		virtual void relocateOne(uint8_t Loc, uint8_t BufEnd, uint32_t Type,
uint64_t P, uint64_t SA) const = 0;		uint64_t P, uint64_t SA) const = 0;
		virtual bool isTlsOptimized(unsigned Type, const SymbolBody &S) const;
		virtual void relocateTlsOptimize(uint8_t Loc, uint8_t BufEnd, uint64_t P,
		uint64_t SA) const;
virtual ~TargetInfo();		virtual ~TargetInfo();

protected:		protected:
unsigned PageSize = 4096;		unsigned PageSize = 4096;

// On freebsd x86_64 the first page cannot be mmaped.		// On freebsd x86_64 the first page cannot be mmaped.
// On linux that is controled by vm.mmap_min_addr. At least on some x86_64		// On linux that is controled by vm.mmap_min_addr. At least on some x86_64
// installs that is 65536, so the first 15 pages cannot be used.		// installs that is 65536, so the first 15 pages cannot be used.
Show All 35 Lines

lld/trunk/ELF/Target.cpp

Show First 20 Lines • Show All 68 Lines • ▼ Show 20 Lines	public:
void writePltEntry(uint8_t *Buf, uint64_t GotEntryAddr,		void writePltEntry(uint8_t *Buf, uint64_t GotEntryAddr,
uint64_t PltEntryAddr, int32_t Index) const override;		uint64_t PltEntryAddr, int32_t Index) const override;
bool relocNeedsCopy(uint32_t Type, const SymbolBody &S) const override;		bool relocNeedsCopy(uint32_t Type, const SymbolBody &S) const override;
bool relocNeedsGot(uint32_t Type, const SymbolBody &S) const override;		bool relocNeedsGot(uint32_t Type, const SymbolBody &S) const override;
bool relocNeedsPlt(uint32_t Type, const SymbolBody &S) const override;		bool relocNeedsPlt(uint32_t Type, const SymbolBody &S) const override;
void relocateOne(uint8_t Loc, uint8_t BufEnd, uint32_t Type, uint64_t P,		void relocateOne(uint8_t Loc, uint8_t BufEnd, uint32_t Type, uint64_t P,
uint64_t SA) const override;		uint64_t SA) const override;
bool isRelRelative(uint32_t Type) const override;		bool isRelRelative(uint32_t Type) const override;
		bool isTlsOptimized(unsigned Type, const SymbolBody &S) const override;
		void relocateTlsOptimize(uint8_t Loc, uint8_t BufEnd, uint64_t P,
		uint64_t SA) const override;
};		};

class PPC64TargetInfo final : public TargetInfo {		class PPC64TargetInfo final : public TargetInfo {
public:		public:
PPC64TargetInfo();		PPC64TargetInfo();
void writeGotPltEntry(uint8_t *Buf, uint64_t Plt) const override;		void writeGotPltEntry(uint8_t *Buf, uint64_t Plt) const override;
void writePltZeroEntry(uint8_t *Buf, uint64_t GotEntryAddr,		void writePltZeroEntry(uint8_t *Buf, uint64_t GotEntryAddr,
uint64_t PltEntryAddr) const override;		uint64_t PltEntryAddr) const override;
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	TargetInfo *createTarget() {
case EM_X86_64:		case EM_X86_64:
return new X86_64TargetInfo();		return new X86_64TargetInfo();
}		}
error("Unknown target machine");		error("Unknown target machine");
}		}

TargetInfo::~TargetInfo() {}		TargetInfo::~TargetInfo() {}

		bool TargetInfo::isTlsOptimized(unsigned Type, const SymbolBody &S) const {
		return false;
		}

uint64_t TargetInfo::getVAStart() const { return Config->Shared ? 0 : VAStart; }		uint64_t TargetInfo::getVAStart() const { return Config->Shared ? 0 : VAStart; }

bool TargetInfo::relocNeedsCopy(uint32_t Type, const SymbolBody &S) const {		bool TargetInfo::relocNeedsCopy(uint32_t Type, const SymbolBody &S) const {
return false;		return false;
}		}

unsigned TargetInfo::getGotRefReloc(unsigned Type) const { return GotRefReloc; }		unsigned TargetInfo::getGotRefReloc(unsigned Type) const { return GotRefReloc; }

unsigned TargetInfo::getPltRefReloc(unsigned Type) const { return PCRelReloc; }		unsigned TargetInfo::getPltRefReloc(unsigned Type) const { return PCRelReloc; }

bool TargetInfo::relocPointsToGot(uint32_t Type) const { return false; }		bool TargetInfo::relocPointsToGot(uint32_t Type) const { return false; }

bool TargetInfo::isRelRelative(uint32_t Type) const { return true; }		bool TargetInfo::isRelRelative(uint32_t Type) const { return true; }

		void TargetInfo::relocateTlsOptimize(uint8_t Loc, uint8_t BufEnd, uint64_t P,
		uint64_t SA) const {}

void TargetInfo::writeGotHeaderEntries(uint8_t *Buf) const {}		void TargetInfo::writeGotHeaderEntries(uint8_t *Buf) const {}

void TargetInfo::writeGotPltHeaderEntries(uint8_t *Buf) const {}		void TargetInfo::writeGotPltHeaderEntries(uint8_t *Buf) const {}

X86TargetInfo::X86TargetInfo() {		X86TargetInfo::X86TargetInfo() {
PCRelReloc = R_386_PC32;		PCRelReloc = R_386_PC32;
GotReloc = R_386_GLOB_DAT;		GotReloc = R_386_GLOB_DAT;
GotRefReloc = R_386_GOT32;		GotRefReloc = R_386_GOT32;
▲ Show 20 Lines • Show All 101 Lines • ▼ Show 20 Lines	bool X86_64TargetInfo::relocNeedsCopy(uint32_t Type,
if (Type == R_X86_64_32S \|\| Type == R_X86_64_32 \|\| Type == R_X86_64_PC32 \|\|		if (Type == R_X86_64_32S \|\| Type == R_X86_64_32 \|\| Type == R_X86_64_PC32 \|\|
Type == R_X86_64_64)		Type == R_X86_64_64)
if (auto *SS = dyn_cast<SharedSymbol<ELF64LE>>(&S))		if (auto *SS = dyn_cast<SharedSymbol<ELF64LE>>(&S))
return SS->Sym.getType() == STT_OBJECT;		return SS->Sym.getType() == STT_OBJECT;
return false;		return false;
}		}

bool X86_64TargetInfo::relocNeedsGot(uint32_t Type, const SymbolBody &S) const {		bool X86_64TargetInfo::relocNeedsGot(uint32_t Type, const SymbolBody &S) const {
		if (Type == R_X86_64_GOTTPOFF)
		return !isTlsOptimized(Type, S);
return Type == R_X86_64_GOTTPOFF \|\| Type == R_X86_64_GOTPCREL \|\|		return Type == R_X86_64_GOTTPOFF \|\| Type == R_X86_64_GOTPCREL \|\|
relocNeedsPlt(Type, S);		relocNeedsPlt(Type, S);
}		}

unsigned X86_64TargetInfo::getPltRefReloc(unsigned Type) const {		unsigned X86_64TargetInfo::getPltRefReloc(unsigned Type) const {
if (Type == R_X86_64_PLT32)		if (Type == R_X86_64_PLT32)
return R_X86_64_PC32;		return R_X86_64_PC32;
return Type;		return Type;
▲ Show 20 Lines • Show All 49 Lines • ▼ Show 20 Lines	bool X86_64TargetInfo::isRelRelative(uint32_t Type) const {
case R_X86_64_PC8:		case R_X86_64_PC8:
case R_X86_64_PLT32:		case R_X86_64_PLT32:
case R_X86_64_DTPOFF32:		case R_X86_64_DTPOFF32:
case R_X86_64_DTPOFF64:		case R_X86_64_DTPOFF64:
return true;		return true;
}		}
}		}

		bool X86_64TargetInfo::isTlsOptimized(unsigned Type,
		const SymbolBody &S) const {
		if (Config->Shared \|\| !S.isTLS())
		return false;
		return Type == R_X86_64_GOTTPOFF && !canBePreempted(&S, true);
		}

		// In some conditions, R_X86_64_GOTTPOFF relocation can be optimized to
		// R_X86_64_TPOFF32 so that R_X86_64_TPOFF32 so that it does not use GOT.
		// This function does that. Read "ELF Handling For Thread-Local Storage,
		// 5.5 x86-x64 linker optimizations" (http://www.akkadia.org/drepper/tls.pdf)
		// by Ulrich Drepper for details.
		void X86_64TargetInfo::relocateTlsOptimize(uint8_t Loc, uint8_t BufEnd,
		uint64_t P, uint64_t SA) const {
		// Ulrich's document section 6.5 says that @gottpoff(%rip) must be
		// used in MOVQ or ADDQ instructions only.
		// "MOVQ foo@GOTTPOFF(%RIP), %REG" is transformed to "MOVQ $foo, %REG".
		// "ADDQ foo@GOTTPOFF(%RIP), %REG" is transformed to "LEAQ foo(%REG), %REG"
		// (if the register is not RSP/R12) or "ADDQ $foo, %RSP".
		// Opcodes info can be found at http://ref.x86asm.net/coder64.html#x48.
		uint8_t *Prefix = Loc - 3;
		uint8_t *Inst = Loc - 2;
		uint8_t *RegSlot = Loc - 1;
		uint8_t Reg = Loc[-1] >> 3;
		bool IsMov = *Inst == 0x8b;
		bool RspAdd = !IsMov && Reg == 4;
		// r12 and rsp registers requires special handling.
		// Problem is that for other registers, for example leaq 0xXXXXXXXX(%r11),%r11
		// result out is 7 bytes: 4d 8d 9b XX XX XX XX,
		// but leaq 0xXXXXXXXX(%r12),%r12 is 8 bytes: 4d 8d a4 24 XX XX XX XX.
		// The same true for rsp. So we convert to addq for them, saving 1 byte that
		// we dont have.
		if (RspAdd)
		*Inst = 0x81;
		else
		*Inst = IsMov ? 0xc7 : 0x8d;
		if (*Prefix == 0x4c)
		*Prefix = (IsMov \|\| RspAdd) ? 0x49 : 0x4d;
		*RegSlot = (IsMov \|\| RspAdd) ? (0xc0 \| Reg) : (0x80 \| Reg \| (Reg << 3));
		relocateOne(Loc, BufEnd, R_X86_64_TPOFF32, P, SA);
		}

void X86_64TargetInfo::relocateOne(uint8_t Loc, uint8_t BufEnd, uint32_t Type,		void X86_64TargetInfo::relocateOne(uint8_t Loc, uint8_t BufEnd, uint32_t Type,
uint64_t P, uint64_t SA) const {		uint64_t P, uint64_t SA) const {
switch (Type) {		switch (Type) {
case R_X86_64_PC32:		case R_X86_64_PC32:
case R_X86_64_GOTPCREL:		case R_X86_64_GOTPCREL:
case R_X86_64_PLT32:		case R_X86_64_PLT32:
case R_X86_64_TLSLD:		case R_X86_64_TLSLD:
case R_X86_64_TLSGD:		case R_X86_64_TLSGD:
▲ Show 20 Lines • Show All 500 Lines • Show Last 20 Lines

lld/trunk/test/ELF/tls-opt.s

				// RUN: llvm-mc -filetype=obj -triple=x86_64-unknown-linux %s -o %t.o
				// RUN: ld.lld %t.o -o %t1
				// RUN: llvm-readobj -r %t1 \| FileCheck --check-prefix=NORELOC %s
				// RUN: llvm-objdump -d %t1 \| FileCheck --check-prefix=DISASM %s

				// NORELOC: Relocations [
				// NORELOC-NEXT: ]

				// DISASM: Disassembly of section .text:
				// DISASM-NEXT: _start:
				// DISASM-NEXT: 11000: 48 c7 c0 f8 ff ff ff movq $-8, %rax
				// DISASM-NEXT: 11007: 49 c7 c7 f8 ff ff ff movq $-8, %r15
				// DISASM-NEXT: 1100e: 48 8d 80 f8 ff ff ff leaq -8(%rax), %rax
				// DISASM-NEXT: 11015: 4d 8d bf f8 ff ff ff leaq -8(%r15), %r15
				// DISASM-NEXT: 1101c: 48 81 c4 f8 ff ff ff addq $-8, %rsp
				// DISASM-NEXT: 11023: 49 81 c4 f8 ff ff ff addq $-8, %r12
				// DISASM-NEXT: 1102a: 48 c7 c0 fc ff ff ff movq $-4, %rax
				// DISASM-NEXT: 11031: 49 c7 c7 fc ff ff ff movq $-4, %r15
				// DISASM-NEXT: 11038: 48 8d 80 fc ff ff ff leaq -4(%rax), %rax
				// DISASM-NEXT: 1103f: 4d 8d bf fc ff ff ff leaq -4(%r15), %r15
				// DISASM-NEXT: 11046: 48 81 c4 fc ff ff ff addq $-4, %rsp
				// DISASM-NEXT: 1104d: 49 81 c4 fc ff ff ff addq $-4, %r12

				// Corrupred output:
				// DISASM-NEXT: 11054: 48 8d 80 f8 ff ff ff leaq -8(%rax), %rax
				// DISASM-NEXT: 1105b: 48 d1 81 c4 f8 ff ff rolq -1852(%rcx)
				// DISASM-NEXT: 11062: ff 48 d1 decl -47(%rax)
				// DISASM-NEXT: 11065: 81 c4 f8 ff ff ff addl $4294967288, %esp

				.type tls0,@object
				.section .tbss,"awT",@nobits
				.globl tls0
				.align 4
				tls0:
				.long 0
				.size tls0, 4

				.type tls1,@object
				.globl tls1
				.align 4
				tls1:
				.long 0
				.size tls1, 4

				.section .text
				.globl _start
				_start:
				movq tls0@GOTTPOFF(%rip), %rax
				movq tls0@GOTTPOFF(%rip), %r15
				addq tls0@GOTTPOFF(%rip), %rax
				addq tls0@GOTTPOFF(%rip), %r15
				addq tls0@GOTTPOFF(%rip), %rsp
				addq tls0@GOTTPOFF(%rip), %r12
				movq tls1@GOTTPOFF(%rip), %rax
				movq tls1@GOTTPOFF(%rip), %r15
				addq tls1@GOTTPOFF(%rip), %rax
				addq tls1@GOTTPOFF(%rip), %r15
				addq tls1@GOTTPOFF(%rip), %rsp
				addq tls1@GOTTPOFF(%rip), %r12

				//Invalid input case:
				xchgq tls0@gottpoff(%rip),%rax
				shlq tls0@gottpoff
				rolq tls0@gottpoff

This is an archive of the discontinued LLVM Phabricator instance.

[ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 41013

lld/trunk/ELF/InputSection.cpp

lld/trunk/ELF/Target.h

lld/trunk/ELF/Target.cpp

lld/trunk/test/ELF/tls-opt.s

This is an archive of the discontinued LLVM Phabricator instance.

[ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 41013

lld/trunk/ELF/InputSection.cpp

lld/trunk/ELF/Target.h

lld/trunk/ELF/Target.cpp

lld/trunk/test/ELF/tls-opt.s

[ELF2] - Optimization for R_X86_64_GOTTPOFF relocation.
ClosedPublic