How program is born?

this article is gonna be hands-on building an embedded program using AVR Studio 4.
I’ll start with a slightly sneaky but educational example. Suppose we have this project which contains main.c , delay.c and delay.h:
main.c

 

#include <avr\io.h>
#include <stdint.h>
#include "delay.h"
#define OUTPUT1 (1U << 1)
#define OUTPUT2 (1U << 2)
#define OUTPUT3 (1U << 3)
int16_t g_var;
uint32_t g_var2 = 5;
 int main()
 {
   DDRB = OUTPUT1 | OUTPUT2 | OUTPUT3;
   PORTB &=~(OUTPUT1 | OUTPUT2 | OUTPUT3);
   while (1)
   {
     PORTB ^= OUTPUT1 | OUTPUT2 | OUTPUT3;
     delay(500000);
   }
}

 delay.h

#ifndef __DELAY_H__
#define __DELAY_H__
void delay(int volatile counts);
#endif // __DELAY_H_

delay.c

#include "delay.h"
void delay(int volatile counts)
 {
   while (counts &amp;gt; 0)
   {
     --counts;
   }
 }

Now, you need to take a deeper look at the embedded software build process.

cross-development and native development

Untitled 1

here is a diagram that shows the main steps of building your embedded project.First, I’d like to make sure, however, that you understand that all these steps are performed by tools such as the AVR Studio 4 on a desktop computer, called the host machine,
even though the produced program is for a completely different computer, such as your AVR kit or specific embedded Hardware, called the target machine.
This aspect, called cross-development, is very characteristic for embedded systems. It just doesn’t make sense to run the compiler and linker on the small embedded target machine.
This is also in stark contrast with software development for the desktop computers, where you typically both develop and run the software on the same machine (so the host is also the target). This type of software development is often called “native development”.

But going back to the embedded build process, the source files, such as main.c and delay.c are fed to the C-language compiler, which turns them into the so called object files main.o and delay.o.

Next, all object files from the project, together with any standard and other libraries, as well as the linker script, are fed to the linker, which combines them into the final program.

what is an object file?

simply, an object file contains “relocatable” machine code that is not directly executable, because it is not yet committed to any specific address in memory. It is the job of the linker to combine all the objects, resolve the cross-module references, and fix the addresses. I could also leave it at that, but this hands-on article, so I think it is worth while to see how object files are organized and what “relocatable code” really means.
In your project, object files are located in the sub-directory Debug\Obj.  you can find delay.o and main.o.
If you open one of them in a text editor, you would see mostly garbage,because it is a binary file.

3

However, even viewed as text, you should recognize that the file appears to contain distinct sections. Also, the first few ASCII characters of the file spell out ‘E’, ‘L’,’F’. This is the indicator of the ELF file format, which stands for”Executable and Linkable Format, also known as “Extensible Linking Format”. ELF is not the only format for object files, but it is one of the most popular formats used by modern development tools. So, I think it is a good idea for you to get somewhat familiar with ELF files and the tools that allow you to inspect them.
For example, win-avr toolchan comes with command-line tool called avr-objdump.exe,
how to use it?
go to your $project_file/debug ,from there open the command prompt by writing cmd
and write this command
avr-objdump -x main.o>main.txt
Now, the contents of the binary ELF file in a human-readable text.
main.txt. what are waiting for let’s open it.

4
As you can see, the ELF file indeed contains several sections, such as .data, .bss. and .text, for initialized data, uninitialized data, and code, respectively.
The remaining sections hold the symbolic information for the linker and a lot of sections contain debug information for the debugger. At this point, it should become clear that you should never use the size of the object file to assess the code size generated from a given .c
source code, because the actual machine code is only a small part among many other parts in an object file. The only reliable source of information about the code size of various modules is the linker map file(main.map) which you would find under other files in the avr studio 4
5

Now, let’s take a look at the final image main.elf produced by the linker. Again, if you quickly open it as text, you can see the ‘ELF’ signature at the top, so this is also an ELF file. This means that you can also dump the content of the main.elf file using the avr-objdump.exe.
So now let’s open the human-readable dump of the final image and compare it with the dump of the main.o object file by putting them side by side.

6

Let me scroll the final image to till you find the main under SYMBOL TABLE,

7

When you compare the ELF dumps, you can see that most instructions are exactly the same in main.o as they are in the final image c.out. However, interestingly, some instructions have a different encoding. For example, the instruction, by which main calls the delay
function is encoded as 0x00000000  in the object file and 0x0000002a in the final image. What’s going on here?
Well, the call instruction is a PC-relative instruction, meaning that in order to branch to the delay function the Program Counter (PC) will be incremented by the signed immediate offset encoded in the instruction itself. The problem is that the object file does not know where the delay function will end up in memory, so the the call instruction opcode in the object file contains a generic offset 0x0000. This offset is then fixed by the linker, after the linker decides where the delay function will be in memory with respect to main. But this does not end here. The linker needs to fix not just addresses of functions, but of variables as well. For example, the actual constant addresses of the variables g_var and g_var2 are not known at compile time either.  after figuring out where to put the variables g_var and g_var2 and so on in the data section.  You can  understand now that the linker must be specific to the target processor, because it must “know” the instructions and how to fix them at the binary opcode level. This means, for example, that a linker designed for the x86 processor of your PC cannot be used to link programs for the AVR MCUs , even though all these tools might be using the ELF file format. You need both a compiler and a the linker for the same processor.

The Linker

let’s talk a bit about how the linker resolves the cross-module references to functions and global variables.
First, you need to realize that every object provides symbols that it exports, meaning that they are defined in the object and can be used by other objects. For example, main.o exports the symbols g_var, g_var2, and main. An object might also have imported symbols, that is, symbols that it needs but does not define. For example, main.o imports the function delay, because it calls it, yet does not define it.
2
Now, resolving inter-dependencies means that the linker must match all imported references to the exported references. As the linker works on one object file at a time, it internally uses two lists: a list of exported symbols, and a list of undefined symbols in
all the objects encountered so far.For example, at the very beginning of the linking process of your project, the exported list is empty and the undefined list contains only.  The general rule is that all object files directly included in the project, such as main.o and delay.o, are always linked into the final image. Because of this, the order of linking does not matter for those files, but let’s assume that the first will be main.o. As the linker processes this object file, it adds all the exported symbols to the exported list and also takes every symbol imported by the object file, such as delay, and tries to find it in the exported list. If the symbol is not found, as in this case, the linker adds it to the undefined list. So after processing the main.o object, the undefined
list contains __vector_* (* are the numbers eg.__vector_1,__vector_2 and so on)and delay. Next is the delay.o object file. This file exports delay, which is added to the exported list and at the same time removed from the undefined list, because it is now known and resolved. The delay.o file does not import anything else, so the undefined list contains only the __vector_* symbols.
At this point there are no more object files in your project, yet the undefined list is still not empty, so the linker proceeds to look through the standard libraries. Libraries are simply bundled collections of object files. However, the linking rules are different for libraries than for objects included directly. The critical difference is that objects from a library are added to the final image only if they contain symbols in the undefined list. Otherwise they are not added at all.
It turns out that the __vector_* symbols is found to be exported by
the object file crtm32.o in the winavr folder avr5
How do I know this? Well, you can find it inside the linker map file, by searching for “vector”. You find that symbol in the Entry List section, where you can see that it comes from the crtm32.o object.
So at this point, the linker applies another rule for linking libraries, which is to keep searching all object files in the current library for the undefined symbols.
This object has some more imported symbols of its own, such as .init0  until .init9, and some others.

This fine granularity of objects ensures that only stuff actually needed is taken from the library. If, on the other hand, the objects would contain multiple functions and variables, all this would be linked in, and so you would bloat the final image unnecessarily.

So, if you ever develop your own libraries, remember to make objects small and nimble. Ideally, you should define only one function or one global variable per module.
But going back to your project, the linker eventually resolves all the references from the standard libraries, the undefined list becomes empty and the linking process ends.

Another possible outcome is that the linker runs out of all objects and libraries, yet the undefined list still contains some symbols. In this case, you get the linker error and a dump of all the unresolved references still present in the undefined list. To fix such errors you need to add an object or a library.

Advertisements

One thought on “How program is born?

  1. Thank you for Your efforts ,
    It’s very useful article

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s