The hello world program is one of the first programs we learn to write in a given programming language. It's nice and short here is a hello world program written in C.
Since it's so short it should be a piece of cake to explain what is going on under the hood. First thing is what happens when we compile and link this.
gcc --save-temps hello.c -o hello
--save-temps to keep the hello.s file containing the assembler code. This is the assembler code I get more or less.
Now looking at the assembler code we can see that it's actually not calling a function called
printf it's calling
puts instead. The
puts function is also declared in the stdio.h header and will print a string and a trailing newline. So now we know which function our program is actually calling. But where is this
To find out which library is providing this function I will use
ldd which prints shared library dependencies and
nm which list symbols from an object file.
$ ldd hello libc.so.6 => /lib64/libc.so.6 (0x0000003e4da00000) $ nm /lib64/libc.so.6 | grep " puts" 0000003e4da6dd50 W puts
The function is provided by the C library called libc and located at /lib64/libc.so.6 on my system (fedora 19) the /lib64 directory is a symbolic link to /usr/lib64 and and /usr/lib64/libc.so.6 is a symbolic link to /usr/lib64/libc-2.17.so so that's the actual file containing all the functions. We can check the version by running the libc .so file as a program.
$ /usr/lib64/libc-2.17.so GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al. ...
So to summarize the results we can say that our hello program is calling the
puts function from glibc version 2.17. The next step is to see what the
puts function in glibc-2.17 actually does.
The glibc codebase is quite hard to navigate due to the massive use of preprocessing macros and code generation scripts. Looking at the codebase we find this entry in the file
weak_alias (_IO_puts, puts)
In glibc speak this means that anyone who calls
puts will actually call
_IO_puts instead. So we need to find a function called
_IO_puts. This happens to be located in the same file. The guts of the function looks like this.
I have cut away all the nastiness in the rest of the function that is not important in our analysis. This
_IO_sputn is the next step in the hello world chain. So we find the definition and it's a macro defined in
libio/libioP.h which again uses a macro and so on. In the end we get a tree of macros like this:
What on earth is going on here? It might be more clear when we flatten all this macro stuff into the actual code which is being used.
This is so unclear that it hurts my eyes so I'm just going to explain it. glibc is using a jump table to invoke a function. In this case the jump table is in the object called
_IO_2_1_stdout_ and the function is called
__xsputn. If we look in
libio/libio.h there are declaration that match our analysis.
libio/libioP.h we find the declaration of the table and the entries.
If we dig a bit deeper we can also find that the jump table of
_IO_2_1_stdout_ is initialized in the file
libio/stdfiles.c and the actual jump table which is used is declared in
So this means that when we use the jump table of the io object representing the stdout file then we will eventually call the function
_IO_new_file_xsputn. I think we're getting something here, don't you. This function is moving memory around into buffers and call a function
new_do_write when it's ready to write something from the buffer. This is what the
new_do_write looks like.
As expected this uses a macro to do the actual function call. This is calling a function using the jump table as we have already seen with the
__xsputn and this time we are calling the
__write function. So when we look at the jump table for a file we can see that a call to
__write will actually call
_IO_new_file_write so this is the function that should receive the function call. Let's take a look at that function.
Finally a function that calls something that does not start with underscores. The write function is something we recognize from the
unistd.h header. It is the basic way of writing bytes to a file descriptor. The write function is something that glibc itself implements so the code that receives the function call must live somewhere in the glibc source code.
After searching the glibc source for the write function I found it in the
sysdeps/unix/syscalls.list. Most of the system calls that glibc implements are generated from files like this. The file contains the name of the function, what arguments it expects. The actual body of the function is generated from system call templates.
# File name Caller Syscall name Args Strong name Weak names ... write - write Ci:ibn __libc_write __write write ...
So when the glibc code calls write (or _libcwite or _write) they make a write syscall and we enter kernel space. The kernel code is quite nice compared to glibc. The entry point for the write system call is in `fs/readwrite.c`.
So the write system call will first lookup a structure that describes the file, and then call another function in the linux virtual filesystem (vfs) called
vfs_write. What is the file we are using in this case? Well since we are writing to stdout this is the file representing stdout.
We can see that this function delegates the work to the write function for the specific file. In linux this is often implemented in the driver code, so we need to backtrack to find out which driver are we contacting in our case.
For my experiements I am using the Fedora 19 distro with Gnome 3 desktop. This means that my default termimnal is the gnome-terminal. So I fire up gnome-terminal and do this.
~$ tty /dev/pts/0 ~$ ls -l /proc/self/fd total 0 lrwx------ 1 kos kos 64 okt. 15 06:37 0 -> /dev/pts/0 lrwx------ 1 kos kos 64 okt. 15 06:37 1 -> /dev/pts/0 lrwx------ 1 kos kos 64 okt. 15 06:37 2 -> /dev/pts/0 ~$ ls -la /dev/pts total 0 drwxr-xr-x 2 root root 0 okt. 10 10:14 . drwxr-xr-x 21 root root 3580 okt. 15 06:21 .. crw--w---- 1 kos tty 136, 0 okt. 15 06:43 0 c--------- 1 root root 5, 2 okt. 10 10:14 ptmx
The tty command will print the filename of the terminal connected to standard input, and as we can see from the proc files this is the same as what's connected to standard output and standard error. These devices in /dev/pts are called pseudoterminals, or actually these are slave pseudoterminal. Whenever a process write to the slave pseudoterminal then it ends up in the master pseudoterminal. The master pseudoterminal device is called /dev/ptmx.
We find the device driver for the pseudoterminal in the linux kernel at
Whenever we write to a pts device we will end up in the pty_write function which looks like this.
So as the comment says, the data will end up in the input queue of the master pseudoterminal. But who is reading from that devices.
~$ lsof | grep ptmx gnome-ter 13177 kos 11u CHR 5,2 0t0 1133 /dev/ptmx gdbus 13177 13178 kos 11u CHR 5,2 0t0 1133 /dev/ptmx dconf 13177 13179 kos 11u CHR 5,2 0t0 1133 /dev/ptmx gmain 13177 13182 kos 11u CHR 5,2 0t0 1133 /dev/ptmx ~$ ps 13177 PID TTY STAT TIME COMMAND 13177 ? Sl 0:04 /usr/libexec/gnome-terminal-server
gnome-terminal-server is the process that spawns all the gnome-terminals and creates the new pseudoterminals so it is the one sitting on the master side of the pseudoterminal and will receive our data, which is in our case "Hello World". The gnome-terminal server receives the string and draws the string on the screen. I have not analyzed the gnome-terminal side completely due to time constraints :)
The actual path of a "Hello World" printout is.
0. hello: printf("Hello World") 1. glibc: puts() 2. glibc: _IO_puts() 3. glibc: _IO_new_file_xsputn() 4. glibc: new_do_write() 5. glibc: _IO_new_file_write() 6. glibc: syscall write 7. kernel: vfs_write() 8. kernel: pty_write() 9. gnome_terminal: read() 10. gnome_terminal: show to user
This seems a little too much for such a simple thing as printing a string to the screen. It is a good thing that this is only exposed to those who really want to see it.