Introduction to C on GNU/Linux
When working with GNU based Free Operating systems, most implementations are based on the original work of earlier UNIX systems. Unix was originaly developed out of AT&T. One of the key developments which made Unix an early sucesses was the co-development of the C programing language. This article hopes to introduce C to beginning users of GNU/Linux and BSD.
When C was developed, part of it's goal was to make a highly portable syntax which still gave low level access to the memory and the CPU. The result was a 3 tier development system. All C programs are compiled from some text into a binary program. It is the binarywhich runs on your computer. The program which creates the binary is called a compiler. The compiler on GNU systems is Richard Stalman's gcc. The compiler parses a text file, or a series of text files, processes all the instructions, and builds a binary program from the instructions. It does this by compinary binary code. Some of the code it produces is imported from external libraries. Some of it is new binary code. Pathced together thismost often produces a single binary application.
The three tier system of C includes libraries, source code and header files. Header files tell the compiler where to find code definitions. It is also sometimes needed to tell the compiler where libraries live which are defined in the source code or the headers. While this can seem confusing, as you become familiar with C, it will become more natural. Let's look at a simple example to see how these three tiers interact with each other.
We start by opening a simple text file called prog1.c with the VI editor:
#include
int main(int argc, char **argv){ printf("Welcome to NYLXS\n"); return(1); exit(1); } Exit the file and now run the compiler with the following command: ruben$: gcc prog1.c
This command starts the compiler and creates a new file called a.out. a.out is the executable program. Run it from the command line:
ruben$: ./a.out Welcome to NYLXS ruben$:
We can now examine all three components of our C program.
The fisrt line in our program tells the compiler to look for a file called stdio.h and to bring it into our program. stdio.h is the main C in and out library header file. It defines many function in C, including the printf function. Without this file our compiler can not find printf. After this line we are now dropd into our original code. In our case we begin with the definition of the main subroutine. All C progams have a subroutine 'main'. Main has a defined prototype
int main (int argc, char* argv[]);
This should never change. Main is the launcher of all activity within your C program. Lastly, our compiler accessed libraries on the system in order to build your binary. Despite the fact that our command to gcc did not explicitely introduce any librarirs, our C program was built from them anyway. Sometimes the compiler needs libraries it can not natively find. Under these conditions our gcc command needs an option to tell it where to find a library. For example, if we need to use an advanced math function, we need to tell gcc to link with the math library like this:
gcc -lmath program1.c
Let's examine the nature of C more closely by looking at a slightly more complex program:
#include <stdio.h> #include <string.h>
char name[255] = {'\0'};
int main(int argc, char **argv){ printf("Welcome to NYLXS\n"); printf("Enter your name-->\n"); fgets(name, sizeof(name), stdin);
while(strcmp("\n", name) != 0){ printf("value ->%s size->%d\n", name, sizeof(name)); fgets(name, sizeof(name), stdin); } return(1); }
This program includes two external header files to define library functions. The first one we saw before, stdio.h. The second include file, strings.h defines the standard C library for strings. The function strcmp is used to test each string we recieve from standard input.
Before we declare main(), we define and initialize a symbol called 'name'. C is a strongly typed language. Every variable in C needs to be pre-declared as one which stores a particular kind of data. If we try to assign to the variable data which is diferent that it's predefined type, the gcc compiler will complain and probibly not create a binary file.
In this case, the symbol 'name' is marketed as a variable of type char. The words int, char, double, float are examples for key words in C which define data types. In our editor they are marked in green. char name means that this variable is marked as a character type variable. It stores only carachters. In the example of 'name' the declaration also declares this variable as an array. An array is a group of data accessable through an index.
Let's' look at this line more closely
char name[255] = {'\0'};
name is declared as a char data type through the keyword char name is declared as an array because of the the square bracket to the right of the symbol. name is declared as an array with 255 chars because of the number in the square brackets in the declaration. Different data types are stored in different sized memory locations. Charactors are universally defined as being of 1 byte or 8 bits. By declaring name to be an array of 255 charactors in length, we essentially tell the computer to please allocate a space in memory with 255 bytes. We will look at this closer in a minute. When we declare the array, we can fill it with data. This is done through the Curly Braces {} The array is initially filled with the 'zero' byte: 00000000. We do this by initializing the array with a String Contant null '\0'. String constant are defined using single quotes. The \0 is a special character which means 00000000 When we initialize the array with less entries than all the array elements, then C fills the rest of the array with null characters. It is not necessary to initialize an array in a declaration. It is usually necessary to define the size of the array when you declare it with a few exceptions as will be noted. One such exception to the above rule would be if we initialize the array and declare it together like this: int numbers[]={1,2,3,4,5,6,7,8};
In this case, the array is declared with 8 elements, even without the number in the square bracket.
The next line is where we define out main function. As we said before, all C programs require a main function. Main is the jumping off point for all C programs. However, in most regards, main looks like any other function in C. Let's look more closely at the main declaration:
int main(int argc, char **argv){
The int in green before the symbol 'main', tells C that main is returning an integer. In fact, this integer is returned to the shell when you run a program on the command line. You can check it's value after your program is finished by entering: echo $? on the command line of a bash shell.
All functions are defined with a symbol(). The paranthesis tells C this symbol is a funcion, just as the sqaure brackets tells C a symbol is an array. Within the paranthesis we put parameters which are expected to be passed to our function. Unlike other languages, such are Perl, the parameters defined in our function must be used when these functions are used. In the case of main, the funcion is used by the operating system of shell and our two arguments (argv and argc) are automatically filled by the Operations or shell when the program is called.
argc is represents the number of arguements which are called with the program. argv is the arguements themselves, represented as arrays of chars. Hence, argc is declared as an int data type and argv is a char data type.
Inside of main, our program begins to work. Our program not processes these lines from top to bottom in order. The first line prints the greating, "Welcome to NYLXS" and adds a line feed. The \n is a special character, in some ways like \0 combination which means add a line feed and start at the new line. We will look at the printf function in more detail later. The next line prints to standard out a prompt for user input: "Enter your name-->". The next line retrieves information from the Standard Input Device, most often a keyboard, and stores that information into the array of characters which we asked to be previously allocated with the symbol 'name'. We can store up to 255 characters into our array.
Let's look at the fgets function. Like most C functions, fgets is documented in the man page of your Gnu/Linux system. Let's look at the manual page:
ruben$: man fgets
GETS(3) Linux Programmer's Manual GETS(3)
NAME fgetc, fgets, getc, getchar, gets, ungetc - input of char acters and strings
SYNOPSIS #include <stdio.h>
int fgetc(FILE *stream); char *fgets(char *s, int size, FILE *stream); int getc(FILE *stream); int getchar(void); char *gets(char *s); int ungetc(int c, FILE *stream);
DESCRIPTION fgetc() reads the next character from stream and returns it as an unsigned char cast to an int, or EOF on end of file or error.
getc() is equivalent to fgetc() except that it may be implemented as a macro which evaluates stream more than once.
getchar() is equivalent to getc(stdin).
gets() reads a line from stdin into the buffer pointed to by s until either a terminating newline or EOF, which it replaces with '\0'. No check for buffer overrun is per formed (see BUGS below).
fgets() reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A '\0' is stored after the last character in the buffer.
ungetc() pushes c back to stream, cast to unsigned char, where it is available for subsequent read operations. Pushed - back characters will be returned in reverse order; only one pushback is guaranteed.
Calls to the functions described here can be mixed with each other and with calls to other input functions from the stdio library for the same input stream.
For non-locking counterparts, see unlocked_stdio(3).
RETURN VALUE fgetc(), getc() and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error.
gets() and fgets() return s on success, and NULL on error or when end of file occurs while no characters have been read.
ungetc() returns c on success, or EOF on error.
CONFORMING TO ANSI - C, POSIX.1
BUGS Never use gets(). Because it is impossible to tell with out knowing the data in advance how many characters gets() will read, and because gets() will continue to store char acters past the end of the buffer, it is extremely danger ous to use. It has been used to break computer security. Use fgets() instead.
It is not advisable to mix calls to input functions from the stdio library with low - level calls to read() for the file descriptor associated with the input stream; the results will be undefined and very probably not what you want.
SEE ALSO read(2), write(2), ferror(3), fopen(3), fread(3), fseek(3), puts(3), scanf(3), unlocked_stdio(3)
The man page tells us several important thing about this function and it's use in C. All functions (in all programming languages) represent a process. A process has 3 components: input, output and side effect.
Diagran of a process
The inputs of functions are the parameters. The output is the return value which for main is an int. The side effects is all the work the program does which is not it's return value.
From the man page, we can see that fgets is one of a group of C functions which include gets, getc and others. In addition, the man page tells us that fgets is in the stdio library. It tells us to include put #include <stdio.h> into our code to gain access to the function. It defines the function for us as follows:
char *fgets(char *s, int size, FILE *stream);
fgets takes 3 parameters for input. A pointer to character data, an integer, and a file stream. Let's look at all thre definitions:
char *s: A pointer to character data: A pointer is a symbol which has at it's value a memory address as a value. In this case, the memory adress has to be an allocated area in memory which is typed as a char set of data. In our example, we have a char array called 'name'. With arrays, C will convert the symbol of an array to a pointer of the address where the array is located. C does this for us automatically. This is a specific property of arrays and can not be depended upon to happe with other kinds of data constructions unless specified in the C programming specification. int size: An integer which represents a SIZE_T data type. SIZE_T is a special data type in C which is used to store and describe the size of data constructions in our programs. FILE *stream: File streams are pointers to devices and or other programming constructions which provide a stream of data into and out of our program. All programs in Unix inherit three streams: Standard In (stdin) - usualy the keyboard Standard Out (std) most normaly the screen stderr(stderr) - another output most normaly to the screen, but in this case, it is used only for error messages and the like
Because C has strict data typing, a function definition is very clear and specific about the use of a function. Other information which is described in the man page mostly concerns the side effect of the function. In the case of fgets we are told it reads in at most one less than size characters from stream and stores them into the buffer pointed to by s. In addition, we are told it stops read when it recieves an End of File marker (EOF) or a new line (line feed) charactor. We are told the line feed character is added to the buffer, and then fgets adds an additional character '\0'.
Let's now see how we used it in our program:
fgets(name, sizeof(name), stdin);
We call fgets with the parameter 'name' which is the symbol which defines our array of chars. It automatically convert for us to a pointer to a char data construction, or our array of chars. The second argument is sizeof(name). sizeof is a marco in C (similar to a fucntion) which returns the size of a state construction. In this case, that data construction is name which is of size 255 (which means it has 255 bytes). The third parameter is stdin. stdin is the default symbol for our Standard Input File Stream pointer. We inherit it from the environment.
Finally, you might notice that we disregard the return value of fgets. Since the function stores the input into 'name', we can do this. However it is often prudent to test the return value of a function to assure that it worked properly. If fgets returns a 'NULL', it would mean that our program encountered a problem in its environment.
It is ciritical that fgets can not try to put more characters into our array than is allocated for it in memory. If we did that, we can create a security problem, and invade the memory of other programs in our syste. This is bad. Therefor, we limit the input ability of fgets by the size of our array. This is good and proper programming practice which you must adopt.
The next section of our program introduces looping and flow control. Much of our time programming involves working on conditional actions (do this if you hear a click) or loops (do this over and over until the the user says uncle). The while key word in C creates a conditional loop. The expression inside the paranthesis is tested. If it returns a possitive integer, or a non-null character, it enter the loop. The actions within the loop are inside the curly braces. When the last action within the braces is evaluated, then it returns to the top and tests the expression in the paranthesis again.
Inside the paranthesis of our while loop, we call a function called strcmp (do a man strcmp now). strcmp looks at two strings and compares them. It then returns a positive number, a negitive number or a 0 (zero) depending upon if the first string is great than, less than or equal to the second.
Characters in a string are reprented by integer numbers which are one byte in size. Since their is eight bits in a byte, at most, you can represent 256 characters in a char. There is a standard integer which respesents each key on the keyboard. This standard association of characters to byte integers is called the ASCII standard table. In this table, the letter A is 65 and Z is 90. All the rest of the capital letters fall inbetween in order. The letter 'a' is 97, and 'z' is 122. Again, all the lower case letters fall inbetween in order. In this manner, strcmp can compare the strings by their ASCII representation. It is important to note at this point that there is a very tight relationship between short integers (integers stored in a single byte) and characters in C. It should be also noted that strcmp reads the arrays of chars until it reaches a '\0' (nul) character. Anything stored after the nul is ignored.
Our program checks if our input buffer (name) is equal to "\n". "\n" is a string constant. All string constants add a '\0' to the end of their allocated array. So the comaprision is actually to '\n\0'. Since fgets adds the null to the end of the string, everyone is happy with this comparision.
Our program now repeats all the steps in the curly braces until the user enters "\n" into the keyboard on an empty line.
This sample program and the explanation is a good introduction to C for a beginner. But their is far more to learn, even for a beginner, which I hope to explore in the coming months of the NYLXS journal. In the meantime, I challenge you to try a few things with this program.
First, change the size integer in fgets to 5 and try to enter 10 characters into your keyboard. What happens with your program?
Second, try rewriting this program so that you fill the char array with 255 characters and NO NULL value at the end. How does this affect the strcmp function.
(hint try adding this code into your program and comment out the fgets:
for(i = name;i<(name+256);i++){ *i = getchar(); printf("Char entered->%c\n", *i); } )
Third - Try changing the size of the name array to 5 and enter 10 characters on the prompt.
What happens?
____________________________ NYLXS: New Yorker Free Software Users Scene Fair Use - because it's either fair use or useless.... NYLXS is a trademark of NYLXS, Inc
|