The Programming Languages B and Fortran

With Comparisons

By:

Natalie Alleman, Brock Boyd, Coy van Bui, and Pattie G. Dickerson

The Programming Language B

History

B is a descendant of the programming language of BCPL. B was first designed and implemented by D.W. Ritchie and K.L. Thompson of Bell Telephone Laboratories, Inc., Murray Hill, N.J. S.C. Johnson did the original implementation of the run-time package, also of Bell Labs.

The development of B began in 1970 when K.L. Thompson decided that they could not pretend to offer real computing service without FORTRAN, so he sat down to write a FORTRAN in TMG. Thompson's intent to handle FORTRAN lasted about a week. What he produced instead was a new language B. An influence of B was the small space in which a compiler had to fit.

is good for recursive, non-numeric, and machine independent applications, such as system and language work. B, compared to BCPL, is syntactically rich in expressions and syntactically poor in statements.

B was soon implemented on the PDP-11.; A few years experience with B showed that it was not satisfactory, and C was developed. During the transition from B to C there was also a short-lived language NB (new B).

There are several differences between this version of B and the Bell Laboratories versions. The switch statement has been extended. Floating-point operators and proper logical operators have also been added. Finally, the order in which operators are evaluated has been changed.

Type checking was not a part of the language B. There was only one type, the machine word. B is a "typeless" language. This means that you can perform any operation you like on the 36-bit word used as the basic unit of computation. Besides the usual arithmetic operations, B lets you perform lower level shifting as well as bitwise and complement operations. Floating point operators are available also, but the code generated for them is not particularly efficient.

Overview

B is a computer language designed in 1970, directly descending from BCPL. B is good for recursive, non-numeric, machine independent applications, such as system and language work.

B is a simple procedural language and a typeless language. A typeless language can be thought of as having a single data type, the `word,' or `cell,' a fixed-length bit pattern. This means that the compiler does not keep track of whether variables refer to integers, characters, octal numbers, an so on. You can subtract the letter ‘a’ from 2.0 without getting an error. B can be thought of as C without types. The typeless language gives the programmer a lot of freedom. However, the programmer must make sure the operations they are asking B to do make sense.

In B, an identifier is formed from characters a-z, A-Z, 0-9, the (_), and the dot (.). The first character cannot be a digit. Identifiers in B can be long but only the first eight characters are significant. This causes the compiler to believe that function1
and function3 are the same. B (except for earliest versions of B) recognizes separate compilation, and provides a means for including text from named files. Storage limitations on the B compiler demanded a one-pass technique in which output was generated as soon as possible, and the syntactic redesign that made this possible was carried forward into C. When you compile a program, B ignores case distinctions in external names.

B allows the programmer to define octal, decimal, floating point, ASCII character, and BCD character and string constants. All but the last of these are stored internally in a single machine word. A machine word is 36 bits. External variables are the only form of “global” variables in B.

A program written in B can contain three kinds of components:

Manifest constant definitions;

External variable definitions; and

Function body definitions

After B was working, Thompson rewrote B in itself (a bootstrapping step). During development, he continually struggled against memory limitations: each language addition inflated the compiler so it could barely fit, but each rewrite taking advantage of the feature reduced its size. For example, B introduced generalized assignment operators, using x=+y to add y to x. The notation came from Algol 68.(In B and early C, the operator was spelled =+ instead of +=; this mistake, repaired in 1976, was induced by a seductively easy way of handling the first form in B's lexical analyzer.)

Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.

The B compiler on the PDP-7 did not generate machine instructions, but instead `threaded code' [Bell 72], an interpretive scheme in which the compiler's output consists of a sequence of addresses of code fragments that perform the elementary operations. On the PDP-7 Unix system, only a few things were written in B except B itself, because the machine was too small and too slow to do more than experiment; rewriting the operating system and the utilities wholly into B was too expensive a step to seem feasible. At some point Thompson relieved the address-space crunch by offering a `virtual B' compiler that allowed the interpreted program to occupy more than 8K bytes by paging the code and data within the interpreter, but it was too slow to be practical for the common utilities. Still, some utilities written in B appeared, including an early version of the variable-precision calculator dc familiar to Unix users.

By 1971, people wanted to create interesting software more easily. Using assembler was dreary enough that B, despite its performance problems, had been supplemented by a small library of useful service routines and was being used for more and more new programs. In conclusion, B brought along the development of the language C. It is worth summarizing compactly the roles of the direct contributors to today's C language. Ken Thompson created the B language in 1969-70; it was derived directly from Martin Richards's BCPL. Dennis Ritchie turned B into C during 1971-73, keeping most of B's syntax while adding types and many other changes, and writing the first compiler.

The Fortran Programming Language

History

The development of FORTRAN dates back to the 1950's, the first FORTRAN system being released in 1957, for the IBM 704. In 1954, John Backus and his group at IBM had produced the report entitled "The IBM Mathematical FORmula Translating System: FORTRAN". FORTRAN is the oldest of the established "high level" languages. The programming language Fortran was originally designed for the solution of problems involving numerical computation.

FORTRAN became so popular in the 1960's that other vendors started to produce their own versions and this led to a growing divergence of dialects (by 1963 there were 40 different compilers). The rapid growth brought FORTRAN 66 to be the first language to be officially standardized. Unfortunately, the standard did not give a clear, precise definition of FORTRAN. Therefore, in the 70's a new standard was published. The new standard became known as ANSI X3.9-1978, which was published by the American National Standards Institute. This standard was then adopted by the International Standards Organization (ISO) as an International Standard. The language is commonly known as FORTRAN 77. However, FORTRAN 77 had a number of old-fashioned facilities that might be termed deficiencies.

Due to FORTRAN's inability to represent data structures sufficiently and the lack of dynamic storage, it became clear that a new language needed to be developed. These insufficiencies led to the development of FORTRAN 8x. The work took 12 years because the developers wanted to keep the efficiency of FORTRAN 77. Other languages came about; however, none could match the efficiency of FORTRAN. The standards preceding FORTRAN 90 attempted mainly to standardize existing extensions and practices. The reason for this was that there are many programs written in FORTRAN 77 and FORTRAN 66 which although old are still very reliable. Therefore, each addition allowed older versions still to be implemented. The approach to FORTRAN 90 was to allow programs to be more modernized. Meaning, the new version would allow FORTRAN to become portable, efficient, safe and maintainable code.

In the last couple of years the FORTRAN 90 based language known as High Performance Fortran (HPF) has been developed. This language contains the whole of FORTRAN 90 and also includes other desirable extensions. FORTRAN 95 will include many of the new features from HPF.

In summary, FORTRAN was developed for the following:

Small, and unreliable computers;

Computers used primarily for scientific computations;

No efficient way to program a computer;

Computers were of high cost compared to programmers and so the speed of the generated object code was the primary goal.

Overview

All B programs consist of one or more "functions", which are similar to the functions and subroutines of a Fortran program. main is such a function, and in fact all B programs must have a main. Execution of the program begins at the first statement of main, and usually ends at the last. main will usually invoke other functions to perform its job, some coming from the same program, and others from libraries. As in Fortran, one method of communicating data between functions is by arguments. The parentheses following the function name surround the argument list; here main is a function of no arguments, indicated by ( ). The { } enclose the statements of the function.

Functions are totally independent as far as the compiler is concerned, and main need not be the first, although execution starts with it. This program has two other functions: newfunc has two arguments, and fun3 has one. Each function consists of one or more statements which express operations on variables and transfer of control. Functions may be used recursively at little cost in time or space.

Most statements contain expressions, which in turn are made up of operators, names, and constants, with parentheses to alter the order of evaluation. B has a particularly rich set of operators and expressions, and relatively few statement types.

The format of B programs is quite free its uniform style show good programming practice. Statements can be broken into more than one line at any reasonable point, i.e., between names, operators, and constants. Conversely, more than one statement can occur on one line. Statements are separated by semicolons. A group of statements may be grouped together and treated as a single statement by enclosing them in { }; in this case, no semicolon is used after the '}' . This convention is used to lump together the statements of a function.

One major difference between B and Fortran is that B is a typeless language: there is no analog in B of the Fortran IJKLMN convention. Thus a, b, c, and sum are all 36-bit quantities, and arithmetic operations are integer. This is discussed at length in the next section (->5).

Variable names have one to eight ASCII characters, chosen from A-Z, a-z, ., _, 0-9, and start with a non-digit. Stylistically, it's much better to use only a single case (upper or lower) and give functions and external variables (->7) names that are unique in the first six characters. ( Function and external variable names are used by batch GMAP, which is limited to six character single-case identifiers.)

The statement "auto ..." is a declaration, that is, it defines the variables to be used within the function. auto in particular is used to declare local variables, variables which exist only within their own function (main in this case). Variables may be distributed among auto declarations arbitrarily, but all declarations must precede executable statements. All variables must be declared, as one of auto, extrn (->7), or implicitly as function arguments (->8).

auto variables are different from Fortran local variables in one important respect - they appear when the function is entered, and disappear when it is left. They cannot be initialized at compile time, and they have no memory from one call to the

A Comparison of B and Fortran

B fits firmly in the traditional procedural family typified by FORTRAN. They are particularly oriented towards system programming, are small and compactly described, and are amenable to translation by simple compilers. B was designed specifically for non-numeric computations, involving many complex logical decisions, computations on integers typified by system programming, while FORTRAN is a language developed for scientific and engineering computation. They are 'close to the machine' in that the abstractions they introduce are readily grounded in the concrete data types and operations supplied by conventional computers, and they rely on library routines for input-output and other interactions with an operating system. With less success, B also uses library procedures to specify interesting control constructs such as co-routines and procedure closures. At the same time, abstractions lie at a sufficiently high level that, with care, allows for portability between machines. B programs consist of a sequence of global declarations and function (procedure) declarations, and procedures cannot be nested. B recognizes separate compilation, and provides a means for including text from named files.

Data Objects:

One major difference between B and Fortran is that B is a typeless language, or rather it uses a single data type, the `word,' or `cell,' a fixed-length bit pattern. Memory consists of a linear array of such cells, and the meaning of the contents of a cell depends on the operation applied. The + operator, for example, simply adds its operands using the machine's integer add instruction, and the other arithmetic operations are equally unconscious of the actual meaning of their operands. Because memory is a linear array, it is possible to interpret the value in a cell as an index in this array, and B supplies an operator for this purpose, the unary *. Thus, if p is a cell containing the index of (or address of, or pointer to) another cell, *p refers to the contents of the pointed-to cell, either as a value in an expression or as the target of an assignment.

Because pointers in B are merely integer indices in the memory array, arithmetic on them is meaningful: if p is the address of a cell, then p+1 is the address of the next cell. This convention is the basis for the semantics of arrays in B. When in B one writes

auto V[10];

a cell named V is allocated, then another group of 10 contiguous cells is set aside, and the memory index of the first of these is placed into V. By a general rule, in B the expression
*(V+i)

adds V and i, and refers to the ith location after V. B adds special notation to sweeten such array accesses; an equivalent expression is

V[i]

B does not support character data strongly. The language treats strings much like vectors of integers and supplements general rules by a few conventions. A string literal denotes the address of a static area initialized with the characters of the string, packed into cells. In B, there is no count of the number of characters in a string and strings are terminated by a special character, which B spelled `*e'. This change was made partially to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed less convenient than using a terminator. Individual characters in a B string were usually manipulated by spreading the string out into another array, one character per cell, and then repacking it later, but people more often used other library functions that accessed or replaced individual characters in a string.

B insists that the entire program be presented all at once to the compiler. Later implementations of B use a conventional linker to resolve external names occurring in files compiled separately, instead of placing the burden of assigning offsets on the programmer. B uses the single character = for assignment instead of :=, and uses /**/ to enclose comments. Fortran influenced the syntax of declarations: B declarations begin with a specifier like auto or static, followed by a list of names..