ELF Vs Hunk

bloodline · 05 March 2017, 14:07

I'm quite familiar with the Hunk format. It's nice an straight forward to parse from a file into memory segments:

1. Check for a hunk signature 1011.
2. Scan the hunk table and allocate required memory.
3. Identify hunk type (only care about CODE, DATA or BSS).
4. If hunk type is BSS, skip to step 7.
5. Copy hunk data into the pre allocated memory.
6. Check for RELOC information at the end of the hunk:
If present, write the correct segment memory address to the stated offset.
7. advance to the next hunk and jump back to step 3, until no more hunks.

It's a nice executable file format.

Now I am exploring the ELF file format. And it seem to be well structured, with plenty of extra information in the header for supporting different systems. But I have a few questions which I can't find decent answers to:

1. There appear to be two distinct types of file... an executable object and a relocatable object... It is my understanding that the "Executable Objects", assume the code will be loaded to fixed address, which is common on UNIX machines with virtual memory and that Relocatable objects are far more like amiga Hunk files, where the code is supplied with relocations information.

If the above assumption is true then the "Executable Object" is of no interest to me, and I need to spend my time learning how the ELF stores its relocation information.

2. Are there any good resources for learning about how ELF relocation is done?
I'm currently reading:
http://wiki.osdev.org/ELF_Tutorial

But any other resource would be good!

Thanks

ross · 05 March 2017, 15:42

Quote:

Originally Posted by bloodline

I'm quite familiar with the Hunk format. It's nice an straight forward to parse from a file into memory segments:

1. Check for a hunk signature 1011.
2. Scan the hunk table and allocate required memory.
3. Identify hunk type (only care about CODE, DATA or BSS).
4. If hunk type is BSS, skip to step 7.
5. Copy hunk data into the pre allocated memory.
6. Check for RELOC information at the end of the hunk:
If present, write the correct segment memory address to the stated offset.
7. advance to the next hunk and jump back to step 3, until no more hunks.

There is an extension to point 2 or 5, depends on using calloc or malloc (Amiga MEMF_CLEAR):
5b. fill up with 0 remaining allocated memory
I'm making a offload system loader and facing the problem of extra space in CODE or DATA section

No help for the ELF, sorry.

bloodline · 05 March 2017, 16:07

Quote:

Originally Posted by ross

There is an extension to point 2 or 5, depends on using calloc or malloc (Amiga MEMF_CLEAR):
5b. fill up with 0 remaining allocated memory
I'm making a offload system loader and facing the problem of extra space in CODE or DATA section

Since we are working on mega fast CPUs now, I always allocate with MEMF_CLEAR... it really don't take any extra time at all to just Zero the memory.

Quote:

No help for the ELF, sorry.

No problem, it's going to take me a while to figure this all out. I just want to be able to load precompiled code to any arbitrary memory location... and ELF is a format supported by pretty much every linker available now!, so that's the obvious format to use.

ross · 05 March 2017, 16:33

Quote:

Originally Posted by bloodline

Since we are working on mega fast CPUs now, I always allocate with MEMF_CLEAR... it really don't take any extra time at all to just Zero the memory

Well i'm an 80s old men, 0ing 400KB of ChipMem in A500 wasn't fast.
But You are right, my fact is useless today.

phx · 05 March 2017, 20:14

Quote:

Originally Posted by bloodline

1. There appear to be two distinct types of file... an executable object and a relocatable object... It is my understanding that the "Executable Objects", assume the code will be loaded to fixed address, which is common on UNIX machines with virtual memory and that Relocatable objects are far more like amiga Hunk files, where the code is supplied with relocations information.

Correct. All common operating systems nowadays use the MMU to give each process its private address space with fixed addresses, including memory protection. There are no relocations needed anymore in this case.

Quote:

If the above assumption is true then the "Executable Object" is of no interest to me, and I need to spend my time learning how the ELF stores its relocation information.

MorphOS, PowerUp (and AFAIK also AROS) use these ELF object files as executables, for the reasons you already mentioned. Only OS4 has real ELF executables, but with relocations still included. You can tell the linker to leave relocations in the output, even when generating an ELF executable. This is the -q option with GNU-ld and with vlink.

Quote:

2. Are there any good resources for learning about how ELF relocation is done?

When writing vlink, I was just using the ELF header files, and experimenting a lot with linking and analysing ELF programs under Unix.

alpine9000 · 05 March 2017, 22:33

Quote:

Originally Posted by bloodline

Since we are working on mega fast CPUs now, I always allocate with MEMF_CLEAR... it really don't take any extra time at all to just Zero the memory.

No problem, it's going to take me a while to figure this all out. I just want to be able to load precompiled code to any arbitrary memory location... and ELF is a format supported by pretty much every linker available now!, so that's the obvious format to use.

This is JavaScript code I wrote that loads and relocates an executable to any specified location. It's pretty simple. There is an elf spec document somewhere also.

https://github.com/alpine9000/BitMac...h-pages/elf.js

bloodline · 05 March 2017, 22:57

Quote:

Originally Posted by phx

Correct. All common operating systems nowadays use the MMU to give each process its private address space with fixed addresses, including memory protection. There are no relocations needed anymore in this case.

I'm currently playing around with bare metal programming on the RaspberryPI, I'm very far from even looking at the MMU

Quote:

MorphOS, PowerUp (and AFAIK also AROS) use these ELF object files as executables, for the reasons you already mentioned. Only OS4 has real ELF executables, but with relocations still included. You can tell the linker to leave relocations in the output, even when generating an ELF executable. This is the -q option with GNU-ld and with vlink.

I'm not sure I see the advantage of executables?

Quote:

When writing vlink, I was just using the ELF header files, and experimenting a lot with linking and analysing ELF programs under Unix.

I am also compiling small test programs (using arm-none-eabi-gcc), running them through objdump seeing what I get and then trying to get my parser to find the same.

bloodline · 05 March 2017, 23:01

Quote:

Originally Posted by alpine9000

This is JavaScript code I wrote that loads and relocates an executable to any specified location. It's pretty simple. There is an elf spec document somewhere also.

https://github.com/alpine9000/BitMac...h-pages/elf.js

Many thanks for the link, BitMachine looks like a fun project! I have a Synthesizer which uses the SuperH CPU.

phx · 07 March 2017, 15:41

Quote:

Originally Posted by bloodline

I'm not sure I see the advantage of executables?

The sections in ELF executables are grouped as segments, which are usually page-aligned for the MMU to support memory protection (1st segment is usually code and read-only data, which is write-protected), altough this is probably irrelevant for you.

Segments also have file size and memory size in their PHDR, which allows techniques like stripping uninitialized data from the end of your code or data sections. This is not possible in ELF objects.

Also, when you want to use a standard GNU-ld linker, you have no possibility to detect unresolved symbol references, when generating another object file as output (-r option), as they are allowed in objects.

bloodline · 22 March 2017, 11:42

So, after some playing with compiler settings, I have found that (as phx said) the segments are to be found in the Program header. My ELFs seem to always have two segments:

1. A text (code) and rodata (constants) segment.
2. A data and bss segment.

I can allocate memory and load these as I would with a hunk executable... But still I was unable to workout how to relocate the code/data to point to the correct memory...

After much playing around with gcc settings, I found that I needed to pass --emit-relocs to the linker (via the -Wl option) to get it to generate relocation information.

This still wasn't very useful as the code was still using absolute references... finally passing the -fPIC option to gcc made forced gcc to generate position independent code (yes I appreciate that seems obvious now...) with a table of pointers at the end of the text section to point to the variables in the data segment... finally I'm getting somewhere!

This also adds a Global Offset Table... which I'm not sure what to do with yet...

positive side effect, I'm becoming increasingly familiar with ARM assembler

phx · 22 March 2017, 23:51

The combination of ELF and ARM makes this increasingly off-topic...

Quote:

Originally Posted by bloodline

After much playing around with gcc settings, I found that I needed to pass --emit-relocs to the linker (via the -Wl option) to get it to generate relocation information.

I mentioned that already two replies earlier (-q and --emit-relocs are the same options).

Quote:

This still wasn't very useful as the code was still using absolute references...

Sure. ELF executables are always absolute, even when you preserve the reloc information.

Quote:

finally passing the -fPIC option to gcc made forced gcc to generate position independent code (yes I appreciate that seems obvious now...) with a table of pointers at the end of the text section to point to the variables in the data segment... finally I'm getting somewhere!

This also adds a Global Offset Table... which I'm not sure what to do with yet...

I'm not that familiar with ARM, but the "table of pointers" you see is probably the GOT. The GOT is usually referenced via a fixed base register (similar to small-data mode on the Amiga). The GOT works nearly the same as the pointer table in PowerOpen (called TOC-section), which is used for WarpOS. But PowerOpen can also store data objects in the TOC, when the object is not bigger than a pointer (saves an indirection).

Coming back to your relocation problem. With ELF there are two types of relocation entries: with and without addends. Reloc tables named ".secname.rela" have the addend included. ".secname.rel" have not.

In the last case the addend must be extracted from the location in the section, before the relocation can take place. This is the same method as with Amiga hunk format.

Easier for you (and common for PPC, but not for x86, not sure about ARM) would be when the addend is included. Then you just have to take the new load address of the target section, add the addend and overwrite whatever there was in the location to relocate.

Otherwise you have to read the location first and subtract the relocation section's absolute base address, then add the real address again.

bloodline · 23 March 2017, 16:32

Quote:

Originally Posted by phx

The combination of ELF and ARM makes this increasingly off-topic...

Indeed, I would be happy to move the topic if anyone is annoyed, but there is no other forum where I have to opportunity to converse with coders like yourself, who are happy to indulge my naive understanding of C compilers/linkers

I would be happy to simply use the HUNK format, but ld only wants to produce ELF files... and since I want to support both ARM and x86 with my code, ELF seems like a good choice.

Quote:

I mentioned that already two replies earlier (-q and --emit-relocs are the same options).

yes, I was confused by the way gcc rejected the -q option, only to (eventually) discover I need to pass the option through to the linker with -Wl

Quote:

Sure. ELF executables are always absolute, even when you preserve the reloc information.

I'm not that familiar with ARM, but the "table of pointers" you see is probably the GOT. The GOT is usually referenced via a fixed base register (similar to small-data mode on the Amiga). The GOT works nearly the same as the pointer table in PowerOpen (called TOC-section), which is used for WarpOS. But PowerOpen can also store data objects in the TOC, when the object is not bigger than a pointer (saves an indirection).

The table of pointers includes a pointer to the GOT... either way, it is referenced by the reloc table, so I can fix the pointers after loading.

Quote:

Coming back to your relocation problem. With ELF there are two types of relocation entries: with and without addends. Reloc tables named ".secname.rela" have the addend included. ".secname.rel" have not.

I haven't met .rela sections yet, but I see them in the ELF spec, so I appreciate the heads up!

Quote:

In the last case the addend must be extracted from the location in the section, before the relocation can take place. This is the same method as with Amiga hunk format.

Easier for you (and common for PPC, but not for x86, not sure about ARM) would be when the addend is included. Then you just have to take the new load address of the target section, add the addend and overwrite whatever there was in the location to relocate.

Otherwise you have to read the location first and subtract the relocation section's absolute base address, then add the real address again.

Good call!

05 March 2017, 14:07	#1
bloodline Registered User Join Date: Jan 2017 Location: London, UK Posts: 433	ELF Vs Hunk I'm quite familiar with the Hunk format. It's nice an straight forward to parse from a file into memory segments: 1. Check for a hunk signature 1011. 2. Scan the hunk table and allocate required memory. 3. Identify hunk type (only care about CODE, DATA or BSS). 4. If hunk type is BSS, skip to step 7. 5. Copy hunk data into the pre allocated memory. 6. Check for RELOC information at the end of the hunk: If present, write the correct segment memory address to the stated offset. 7. advance to the next hunk and jump back to step 3, until no more hunks. It's a nice executable file format. Now I am exploring the ELF file format. And it seem to be well structured, with plenty of extra information in the header for supporting different systems. But I have a few questions which I can't find decent answers to: 1. There appear to be two distinct types of file... an executable object and a relocatable object... It is my understanding that the "Executable Objects", assume the code will be loaded to fixed address, which is common on UNIX machines with virtual memory and that Relocatable objects are far more like amiga Hunk files, where the code is supplied with relocations information. If the above assumption is true then the "Executable Object" is of no interest to me, and I need to spend my time learning how the ELF stores its relocation information. 2. Are there any good resources for learning about how ELF relocation is done? I'm currently reading: http://wiki.osdev.org/ELF_Tutorial But any other resource would be good! Thanks

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Elf (OCEAN) problem	Bolch	support.Games	20	11 December 2014 12:25
Elf - Ocean's Elf - could do with an update.	MethodGit	project.WHDLoad	9	28 August 2013 18:50
Elf (Ocean) - US version	MethodGit	request.Old Rare Games	10	21 September 2012 17:22
Elf Mania CDTV	macce2	Retrogaming General Discussion	15	03 December 2006 03:07
Can anybone help me with Elf?	robbert	support.Games	1	21 April 2006 21:04

22 March 2017, 11:42	#10
bloodline Registered User Join Date: Jan 2017 Location: London, UK Posts: 433	So, after some playing with compiler settings, I have found that (as phx said) the segments are to be found in the Program header. My ELFs seem to always have two segments: 1. A text (code) and rodata (constants) segment. 2. A data and bss segment. I can allocate memory and load these as I would with a hunk executable... But still I was unable to workout how to relocate the code/data to point to the correct memory... After much playing around with gcc settings, I found that I needed to pass --emit-relocs to the linker (via the -Wl option) to get it to generate relocation information. This still wasn't very useful as the code was still using absolute references... finally passing the -fPIC option to gcc made forced gcc to generate position independent code (yes I appreciate that seems obvious now...) with a table of pointers at the end of the text section to point to the variables in the data segment... finally I'm getting somewhere! This also adds a Global Offset Table... which I'm not sure what to do with yet... positive side effect, I'm becoming increasingly familiar with ARM assembler

Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)