This introduction to C examines the contradictions.
by Tim Cochran
Less is more -- a classic paradox. Eloquently concise and disturbingly ambiguous. As a computational expression, it evaluates to false. Run it through your computer and the bits turn to zero as fast as current through silicon. Yet it is quite suitable for describing the philosophy of the C language.
Developed in the early seventies by Dennis Ritchie, the coauthor of UNIX, C can be characterized as efficient, flexible, cryptic, and portable. It is an amazingly small language -- much of its functionality must be obtained from the hardware on which it is implemented.
It seems appropriate to describe C in terms of a paradox since throughout its existence it has presented inherent contradictions. Consider these examples.
1) C is a low-level systems language suitable for writing utilities, compilers, and operating systems; yet C has become one of the most widely used general purpose languages for writing end-user applications such as word processors, data bases, and spreadsheets. 2) C is devoid of support in essential areas such as file I/O, strings, and dynamic memory management; yet C provides a standard run-time library of over 100 functions supporting file I/O, strings, and dynamic memory management. 3) C's data types and operators have direct analogs in hardware and allow you to manipulate machine-dependent bits, registers, and memory locations; yet C's pervasiveness is due in large part to its portability via machine-independence.
So what are we to believe? Let's get right to the heart of the matter and exami ne C's fundamental elements to see how their implementation and use might differ from that in another language. I don't want to be accused of writing "spaghetti-code" articles so I'll use C's characteristic terms, buzzwords if you like, as a means of imposing structure. But first let's dispense with the unfamiliar.
Take a look at 1 which shows C's data types, operators, and control flow statements. There should be only a few unfamiliar items. The basic data type pointer and the aggregate data type union are the only items without similar representation in RPG.
Take a look at Figure 1 which shows C's data types, operators, and control flow statements. There should be only a few unfamiliar items. The basic data type pointer and the aggregate data type union are the only items without similar representation in RPG.
I'll be brief about pointers. They are variables which contain the addresses of other variables (they "point" to other variables). They are used extensively in C; however, they are known to be a conceptual stumbling block. Entire article s have been devoted to interpreting pointers in C (see "Suggested Reading" at the end of this article).
Unions allow you to declare a variable whose type is variable over a set of defined types. For example you could declare a variable char_or_float to be a union of the types char and float. Once declared, char_or_float can be used as either a variable of type char or a variable of type float. Conceptually, unions are similar to overlapping subfields in RPG data structures. In other languages, most notably Pascal, unions are called variant records.
Two other items, both control flow statements, may be unfamiliar, but each has a similar representation in RPG. The switch() statement is analogous to RPG's CABxx (compare and branch) and CASxx (case) statements. C's for() statement is nearly identical to RPG's DO statement.
Efficient
A compiler is a compiler. Right? Regardless of what language you are working with, two compilers producing output for the same target machine must produce similar results. How can it be that one language is necessarily more efficient than another? A primary area in which C achieves efficiency is through its representation and use of data.
Each of C's basic data types (char, int, and float) are directly supported by typical hardware. That is, for each of these types, the hardware provides a fundamental unit of storage to accommodate a variable of these types and arithmetic a nd logical operations for manipulating the variable. Consider a variable of typ e int in C. The amount of storage allocated to an integer is typically equal to the machine's word size. Word size is typically equal to the width of the machine's internal registers. Adding two integer variables is simple. Load the variables into registers and invoke the operator.
In contrast, RPG's decimal formats, both packed and zoned, introduce an additional level of abstraction between typical hardware and data representation. Unless the machine has hardware support for decimal format, decimal numbers must be converted to binary before an operation, then converted back to decimal after the operation. The benefit of decimal format is arbitrary precision. RPG allows you to match storage size with the size of data. This is useful when working with decimal base numbers, but it is not particularly efficient for array indexes a nd loop counters.
Although less obvious, C's aggregate data types also have representations which provide a close fit with hardware. Consider C's array type. Its representation is particularly suitable for demonstrating C's tendency toward simplicity. An array of n elements is represented as a sequence of n contiguous memory locations. Notice the lack of abstraction. The physical representation is identical to the conceptual. The rules for accessing arrays are also few. An index into an array is analogous to an offset from the beginning of the array. Take for example an array of five elements of type char. If an element of type char is repre sented by one byte, the array will occupy five contiguous bytes of storage. Accessing the third element of the array is equivalent to accessing the third byte from the beginning of the array. Since we are discussing array indexing in the context of efficiency, it's worth pointing out that C, unlike RPG, does not perform "bounds checking" when accessing array elements.
In addition to representation, the use of data can also affect performance. A common technique of compiler optimization is efficient use of machine registers. You have probably heard the expression "ninety percent of the time is spent in ten percent of the code." There are certain "hot-spots" that pay high rewards in terms of optimization. C allows the programmer some control over register allocation by providing a register storage class specifier. The keyword register can be included with a variable declaration to indicate to the compiler that, if possible, the variable should exist in a register. The benefit is that register variables are much faster than memory variables. Also, some machine instructio ns require registers as operands, and in these cases a move from memory to register is saved. A typical use is with a variable used as a counter within a deeply nested loop.
C is not without fault in the area of efficiency. To find an example you need l ook no further than C's use of data. (I know, another contradiction.) Specifically, look at the way in which C provides functionality for manipulating objects of greater complexity than the basic types. For consistency, let's stay with arrays as an example. Other than accessing an array element, the C language provides no other operations for manipulating an object of type array. However, th e C run-time library (RTL) does. In fact, the RTL includes functions which are the equivalent of RPG's LOKUP and SORTA operations. There is, however, a key difference in the way the functionality is implemented. RPG's LOKUP and SORTA operations are built into the language and are often called direct or in-line opera tors. C's equivalents to LOKUP and SORTA are implemented as function calls which are similar to RPG's subroutines. The key difference between in-line operators and subroutine calls is that subroutines involve more overhead -- primarily in the form of dynamic memory management. Upon entry of a subroutine, temporary s torage must be allocated for parameters and local variables. Upon exit, the storage must be de-allocated. Storage management is typically implemented through a run-time an abstract type called data structure. Now let's determine what operations can be performed on our object. Earlier I stated that C provided no operators for manipulating arrays as a unit, but that a function exists in the RTL. This function is equivalent to RPG's SORTA operation in that it is used to sort an array. Since we have an array and we have a function capable of sorting an array, there should be no problem in getting our array into order.
Hopefully you see by now that we have a potential problem. We are trying to sort an array of abstract types. What is the criterion of the sort? How is our sort function to know what comparisons to make between individual elements? C handles this situation by allowing you to pass a user-defined function as an argument to the sort function -- what is actually passed is the address of the function . The function passed as a parameter is the compare function. The sort function that exists in the RTL is really a framework of a sort function, and it requires the additional function to supply the conditions of the sort, as well the type of object being sorted. In effect, the passed function communicates to the library function the type of data and the operations to be performed.
This case illustrates two examples of C's handling of abstract data. The less obvious example can be seen if you consider that an array is an abstract data type. Consider this example in RPG. It can't be done because you cannot place abstract data (data structures) into an array. Why? There's the rub. There is no good reason why not, other than that it would break RPG's built-in operators. So now we've seen two sides of in-line operators. On one hand, they are more efficient than subroutines. On the other hand, because they require a fixed domain of types, they impose restrictions on the type of objects which can be represented. C's position, whether by design or coincidence, is that "the most flexible support for abstract types is minimal support". In other words, since abstract types are an unknown, the operations to be performed on them are also unknown and must be user-defined.
There is one other example involving arrays and flexibility that is worthy of consideration. Recall that, in C, an array of n elements is represented as a sequence of n contiguous memory locations. Recall also that C does not perform bounds checking when accessing an array. A consequence is that arrays can be alloca ted and de-allocated dynamically. That is, the number of elements in an array can be varied during run-time. This should be relatively obvious for arrays of dimension one but it also extends to arrays of multiple dimension in a similar manner. This is an excellent example of "less is more." You gain additional flexibility by imposing minimal restrictions on what qualifies as type array.
Cryptic
There's no denying it: to the uninitiated, C can be very difficult to read. The main culprit is no doubt the number of operators -- there are a lot of them. And not an English-like mnemonic among them. You might observe that of all of the symbols on a standard QWERTY keyboard, all except two (@ $) are used as operator mnemonics or in specifying precedence and associativity of operators and expressions.
Once you become familiar with the way the symbols are used you'll want to investigate other uses for them. One such use is "compound assignment". Any of the arithmetic or bitwise operators can be used in compound assignment expressions of the form a += 1 which is equivalent to a = a + 1. The assignment operator '=' can be used in "multiple assignments" of the form a = b = c = 0 which sets a, b, and c to zero.
C includes two unary operators (versus binary or ternary), the sole purpose of which is to increment and decrement a variable by one. These operators, incremen t '++' and decrement 'D', are typically used with loop counters. For example, + +i forms an expression which increments the value of i by one. The idea is simple but becomes somewhat more complex when you learn that each of these can be used as either prefix or postfix operators. The prefix form is ++i and the postfi x form is i++. To see how they differ, consider these two examples: 1) i = 5; x = ++i; 2) i = 5; x = i++. In the first example x gets the value 6; in the second example x gets the value 5. In the second example the effect is to take the value of i (which is 5), assign its value to x, then increment i.
Portable
C is considered portable primarily because it is a small language. This makes implementing a C compiler on most machines relatively easy. To demonstrate the n otion of small, consider once again C's data types and operators. The basic data types are in effect implemented by the hardware. The aggregate types are built from the basic types (they contain basic types) and their implementation is straightforward and unrestricted by in-line operators. The operators are in effect implemented by the hardware's arithmetic and logic. The fact that there are so many of them can be misleading. Many of them exist only because they provide a direct correspondence to machine instructions. For example, the operator += g enerates an instruction of the form "INCR address, value".
It is useful to view portability from two standpoints. The availability of a compiler is a separate element from its behavior. Specifically, how consistent is the compiler's behavior from one machine to another? In the case of C, availability and consistency prove to have an interesting, even contradictory, relationship. The very aspect that makes it available can also make its behavior inconsistent -- namely, there is a conspicuous dependence on the underlying hardware. C's basic data types are primarily affected because they are in effect implemented by the hardware. Consequently, the determination of storage size for each type is machine- and implementation-dependent. That is, the implementor of a compiler for a given machine is free to choose sizes which provide a good match with the underlying hardware. Type int is a good example. The most natural size for int is the machine word or register size. A machine with 16 bit registers will typically have 16 bit ints. A machine with 32 bit registers will have 32 bit ints. If a program requiring integer precision between 16 and 32 bits is written on the 32 bit machine, the program will not behave consistently when ported to the 16 bit machine. Consider the example given in the next paragraph. The example is relevant in three ways: 1) It draws an analogy to a feature of RPG. 2) It demonstrates reliance on type sizes, which are machine-dependent. 3) It demonstrates flexibility in that it makes use of an aggregate data type in a way that is contrary to the data type's intended use. You could argue that using a union in this manner is just an abuse of the language but, if it serves your purpose, you may view it as a feature.
Recall that C's aggregate-type union is conceptually similar to overlapping subfields in RPG data structures. The similarity is that both of these provide a form of aliasing by allowing two or more variables to refer to the same storage space. In the previous example I showed a union consisting of type char and type float. Let's extend this example slightly to form a union consisting of a group of four individual char types and one float type. Let's determine what the union looks like in memory. Remember that the size of C's basic data types are machine- and implementation-dependent. Assuming our machine represents chars with one byte and float with four bytes, our union will be four bytes in length -- not eight bytes. This could be similarly represented in RPG as a data structure with four char subfields -- of length one -- occupying positions one through four, and a fifth subfield overlapping positions one through four.
Hopefully you are convinced that the C union is sufficiently similar to the RPG data structure and can be operated upon in a similar manner. We can access and store to each individual byte through the char subfields, or access and store all four bytes at once through the float field.
The portability problem should be obvious. We've created a dependent relationsh ip between the size of an occurrence of type char and the size of an occurrence of type float. The relationship holds as long as chars are one byte and floats are four bytes. If on another machine floats are eight bytes our program will n ot behave consistently.
The problem of type sizes is actually less significant a problem than I've illustrated, and can be avoided by adhering to standards. However, it may be your fundamental concern if you are programming for portability. And I broke at least one rule in the example. Unions are not intended to be used as I have shown. It is fine to have a union of type char and float, but the rule of thumb is that "you should take out whatever you put in" to a union, meaning that if you put in chars you should take out chars -- not floats. But this is a characteristic of C -- it allows you to do what you want.
Suggested Reading
If you are interested in reading more about C, there are two books that you might consider required reading. "The C Programming Language" by Brian W. Kernighan and Dennis M. Ritchie (Prentice-Hall, 1978) contains the classic description of the language by its creator. "C: A Reference Manual by Samuel P. Harbison and Guy L. Steele, Jr. (Prentice-Hall, 1987) is a detailed and extensive source in reference form. Finally, since I've managed to avoid explaining pointers, it would be useful to read Greg Comeau's article "A Guide to Understanding Even the Most Complex C Declarations", which appeared in "Microsoft Systems Journal," September 1988.
C... Is Less More?
Figure 1 The fundamental elements of C language
Figure 1: The Fundamental Elements of C Language A. Data Types 1. Basic a. char b. int c. float d. pointer 2. Aggregate a. array b. structure c. union B. Operators 1. Arithmetic a. + addition b. - subtraction c. * multiplication d. / division e. % remainder 2. Bitwise a. >> shift right b. << shift left c. & AND d. || O e. XOR 3. Logical a. && AND b. |||| 4. Relational a. -- equal b. !- not equal c. < less than d. <- less than or equal e. > greater than f. >- greater than or equal C. Control Flow Statements 1. Decision/Selection a. if (expression) ... else ... b. switch (expression) 2. Loop a. for (...; (expression); ...;) b. while (expression) ... c. do...while (expression) 3. Branch goto
LATEST COMMENTS
MC Press Online