A Comparison of B and FORTRAN

 

Introduction

B fits firmly in the traditional procedural family typified by FORTRAN. Both are oriented towards system programming, are small and compactly described, and are amenable to translation by simple compilers.  B was designed specifically for non-numeric computations, involving many complex logical decisions and computations on integers, and FORTRAN was developed for scientific and engineering computation. Both languages were originally 'close to the machine' in that the abstractions they introduce were readily grounded in the concrete data types and operations supplied by conventional computers, and they rely on library routines for input-output and other interactions with an operating system. Both use library procedures to specify interesting control constructs such as co-routines and procedure closures, B with less success. At the same time, abstractions lie at a sufficiently high level that, with care, allow for portability between machines. B programs consist of a sequence of global declarations and function (procedure) declarations, and procedures cannot be nested. B recognizes separate compilation, and provides a means for including text from named files.

Data Objects

One major difference between B and FORTRAN is that B is a typeless language; it uses a single data type, the `word,' or `cell,' in a fixed-length bit pattern, thus variables are all 36-bit quantities, and arithmetic operations are integer, unless special functions are written. B supplies no semblance of the FORTRAN IJKLMN standard (where the use of I, J, K, L, M, or N specifies integer format). The syntax of B says that any number that begins with "O" (alpha character) is an octal number, and, therefore, cannot have any 8s or 9s in it, a convention unique to B. Unlike FORTRAN, there is no floating point, no data types, no type conversions (wanted or unwanted), and no type checking in B (static type checking is used in FORTRAN, but it is incomplete). 

FORTRAN additionally provides a restricted set of data types: four types of numeric data (integer, real, complex, and double-precision real), Boolean data (called LOGICAL), arrays, character strings, and files. FORTRAN also provides an extensive set of arithmetic operations and mathematical functions, which reflects the orientation of the language toward engineering and scientific computation. 

Both languages supply the usual basic arithmetic operations (+, -, *, /). FORTRAN, however, also includes a large set of predefined intrinsic functions that include trigonometric and logarithmic operations (SIN, COS, TAN, LOG), square root (SQRT), maximum (MAX) and minimum (MIN), as well as explicit type-conversion functions for the various numeric types. Both languages include he usual relational operations on numeric values .EQ., .LT., .LE., .NE., .GT., and .GE. in FORTRAN are equivalent to ==, <, <=, !=, >, and >= in B. All comparisons are arithmetic and not logical in B, however, the relational operations in FORTRAN are also defined for character strings, using lexicographic ordering. Since B is a typeless language, arithmetic on characters is legal, and can be used to convert the case of characters (from lower to upper case, or vice versa). 

In B, variable names have one to eight ASCII characters, chosen from A-Z, a-z, ., _, 0-9, whereas FORTRAN uses from one to six characters (31 in FORTRAN 90), and both languages require starting with a non-digit. All keywords in B are reserved and only recognized in lower case. Both languages are otherwise case insensitive. 

B does not support character data strongly. The language treats strings much like vectors of integers and supplements general rules by a few conventions. A string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In B, there is no count of the number of characters in a string and strings are terminated by a special character, which B spelled '*e'. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed less convenient than using a terminator. Individual characters in a B string were usually manipulated by spreading the string out into another array, one character per cell, and then repacking it later, but people more often used other library functions that accessed or replaced individual characters in a string.

FORTRAN supports declaration of fixed-length character-string variables. The IMPLICIT declaration may also be used to provide a default length and CHARACTER type for variables not explicitly declared. Arrays of character strings and functions that return character strings as their values may also be defined. 

In both B and FORTRAN, memory consists of a linear array of cells, and the meaning of the contents of a cell depends on the operation applied. The + operator, for example, simply adds its operands using the machine's integer add instruction, and the other arithmetic operations are equally unconscious of the actual meaning of their operands. Because memory is a linear array, it is possible to interpret the value in a cell as an index in this array, and B supplies an operator for this purpose, the unary *. Thus, if p is a cell containing the index of (or address of, or pointer to) another cell, *p refers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment. 

Because pointers in B and FORTRAN are merely integer indices in the memory array, arithmetic on them is meaningful: if p is the address of a cell, then p+1 is the address of the next cell. FORTRAN 90 does add pointers as a new typed data object. 

Sequence Control

FORTRAN has a limited set of control structures which include expressions, assignment statements, conditional statements, iteration statements (FORTRAN 90), and null statements. Subprogram control is achieved by the use of the CALL statement, which is used to invoke a subprogram. In B, most statements contain expressions, which in turn are made up of operators, names, and constants, with parentheses to alter the order of evaluation. B has a particularly rich set of operators and expressions, and relatively few statement types.

In B, the statement "auto ..." is a declaration, that is, it defines the variables to be used within the function. auto in particular is used to declare local variables, variables which exist only within their own function. Variables may be distributed among auto declarations arbitrarily, but all declarations must precede executable statements. All variables must be declared, as one of auto, extrn, or implicitly as function arguments. auto variables are different from FORTRAN local variables in one important respect - they appear when the function is entered, and disappear when it is left. They cannot be initialized at compile time, and they have no memory from one call to another.

The arithmetic and the assignment statements of B are much the same as in FORTRAN, except for the semicolons.

Both languages are equivalent assignment statement syntax, wherein the rvalue of the expression is assigned to the lvalue of a variable. An lvalue is a bit pattern representing a storage location containing an rvalue. lvalue is a type in B, and the unary operator * can be used to interpret an rvalue as an lvalue. The unary operator & can be used to interpret an lvalue as an rvalue (address function).

B insists that the entire program be presented all at once to the compiler. Later implementations of B use a conventional linker to resolve external names occurring in files compiled separately, instead of placing the burden of assigning offsets on the programmer. B uses the single character = for assignment instead of :=, and uses /**/ to enclose comments. FORTRAN influenced the syntax of declarations: B declarations begin with a specifier like auto or static, followed by a list of names.

Subprograms and Storage Management

Referencing in FORTRAN is either local or global; no provision for intermediate levels of non-local referencing is made prior to FORTRAN 90. FORTRAN 90 introduces the concept of nested subroutines. The global environment may be partitioned into separate common environments that are shared among sets of subprograms, but only data objects may be shared in this way. 

Local Referencing

As FORTRAN is ordinarily implemented, the local environment is retained between calls, because the activation record is allocated storage statically as part of the code segment. However, the language definition does not require the retention of local environments unless the programmer explicitly includes the statement "SAVE" within the subprogram. SAVE indicates that the complete local environment is to be retained in between calls. Alternatively, only specified variables may be saved. 

Common Environments

If simple variables or arrays are to be shared between subprograms, they must be explicitly declared as part of the global referencing environment. This global environment is not set up in terms of single variables and arrays, but rather in terms of sets of variables and arrays, which are termed COMMON blocks.  FORTRAN 90 introduces the concept of nested procedures with the CONTAINS statement. 

All B programs consist of one or more "functions", which are similar to the functions and subroutines of a FORTRAN program. main is such a function, and in fact all B and FORTRAN programs must have a main. Execution of the program begins at the first statement of main, and usually ends at the last. main will usually invoke other functions to perform its job, some coming from the same program, and others from libraries. As in FORTRAN, one method of communicating data between functions is by arguments. In both B and FORTRAN, parentheses following the function name surround the argument list; main is a function of no arguments. B requires the use of parentheses after main [written main( )] and { } enclose the statements of all  functions (including main), while no such requirements exist in FORTRAN.

Function definitions in B have the following form:

name ( arguments) statement

The name is initialized to the rvalue of the function. The arguments consist of a list of names separated by commas. Each name is defined as an automatic variable; the statement (usually compound) defines the execution of the function. When the function is invoked, each dummy argument is initialized to the value of the corresponding actual argument in the call; there are no side effects on the actual arguments in the function invocation (argument passing is call-by-value). 

In FORTRAN, however, parameter transmission is uniformly call-by-reference. Actual parameters may be simple variables, literals, array names, subscripted variables, subprogram names, or arithmetic or logical expressions. 

Results of FORTRAN function subprograms are transmitted by assignment within the subprogram to the name of the subprogram. The name of the subprogram acts as a local variable within the subprogram, and the last assignment made before return to the calling program is the value returned. Functions may return only single numbers, character strings, or logical values as results.  

Functions are totally independent in B as far as the compiler is concerned, and main need not be the first, although execution starts with it. Each function consists of one or more statements which express operations on variables and transfer of control. Functions may be used recursively at little cost in time or space.

Abstraction and Encapsulation

Subprograms are the only abstraction mechanism in FORTRAN 77, although FORTRAN 90 does add modules and data types. All built-in data types of B, however, are abstract data types. Also, all B subprograms are a form of process abstraction. The programmer may create variables of a given type. Basically, abstraction in B is character-oriented and based on getchar and putchar.  By default, these functions refer to the terminal, and abstraction goes there unless explicit steps are taken to divert it. The diversion is easy and systematic, so abstraction unit switching is quite feasible.

COMMON blocks may be used in FORTRAN to isolate global data to only a few subprograms needing that data. Any other subprogram may gain access to the same COMMON block by inclusion of the same COMMON declaration in the subprogram definition. It is only the name of a COMMON block that is actually a global identifier; the variable and array names in the COMMON statement list are local to the subprogram in which the statement appears. 

The effect of the COMMON declaration is to allow the global referencing environment to be partitioned into blocks so that every subprogram need not have access to all the global environment. The COMMON declaration allows efficient run-time processing because any reference to a global variable in a subprogram may be immediately compiled into a base-address-plus-offset computation at run time, where the base address is the address of the beginning of the COMMON block. 

B does not provide for good encapsulation constructs. Programs can be organized by nesting subprogram definitions inside the larger subprograms using static scoping. The format of B programs is, however, quite free. Statements can be broken into more than one line at any reasonable point, i.e., between names, operators, and constants. Conversely, more than one statement can occur on one line. Statements are separated by semicolons. A group of statements may be grouped together and treated as a single statement by enclosing them in { }; in this case, no semicolon is used after the '}' . This convention is used to lump together the statements of a function.