All automata referred to from this point on may be understood to be essentially Turing machines classified in terms of the number, length, and movement of tapes and of the reading and writing operations used. The term discrete state automaton is sometimes used to emphasize the discrete nature of the internal states. The principal classes are transducers and acceptors. In automata theory, a transducer is an automaton with input and output; any Turing machine for computing a partial recursive function, as previously described, can stand as an example. An acceptor is an automaton without output that, in a special sense, recognizes or accepts words on the machine alphabet. The input of an acceptor is written on tape in the usual way, but the tape is blank at the end of the computation, and acceptance of the input word is represented by a special state called a final state. Thus, a word x, or sequence of symbols from an alphabet denoted by the letter S, is said to be accepted by an acceptor A if A computes, beginning in an initial state q0 with x on tape, and halts in a final state with tape being entirely blank. A subset designated U of the set of words S* on an alphabet S is called an accepted set if there is an automaton A that accepts any word xU.

Acceptors

An elementary result of automata theory is that every recursively enumerable set, or range of a partial recursive function, is an accepted set. In general the acceptors are two-way unbounded tape automata.

A useful classification of acceptors has been introduced in conjunction with a theory of generative grammars developed in the United States by a linguist, Noam Chomsky. A generative grammar is a system of analysis usually identified with linguistics. By its means a language can be viewed as a set of rules, finite in number, that can produce sentences. The use of a generative grammar, in the context of either linguistics or automata theory, is to generate and demarcate the totality of grammatical constructions of a language, natural or automata oriented. A simple grammar for a fragment of English, determined by 12 rules (see 7), can serve to introduce the main ideas.

In this simple grammar, each rule is of the form gg′ (read, “g′ replaces g”) and has the meaning that g′ may be rewritten for g within strings of symbols. The symbol S that appears in the rules may be understood as standing for the grammatical category “sentence,” Pr for “pronoun,” VP for “verb phrase,” NP for “noun phrase,” and so forth. Symbols marked with a vinculum (-) constitute the set VN of nonterminal symbols. The English expressions “she,” etc., occurring in the rules constitute the set VT of terminal symbols. S is the initial symbol.

Beginning with S, sentences of English may be derived by applications of the rules. The derivation begins with S; the first rule allows Pr VP to be rewritten for S, yielding Pr VP; the fourth rule allows V NP to be rewritten for VP, yielding Pr V NP; and so forth (see 8). A last step yields a terminal string or sentence; it consists solely of elements of the terminal vocabulary VT. None of the rules apply to it; so no further steps are possible.

The set of sentences thus generated by a grammar is called a language. Aside from trivial examples, grammars generate denumerably infinite languages.

Recursively enumerable grammars and Turing acceptors

As noted above, an elementary result of automata theory is that every recursively enumerable set constitutes an accepted set. Generally speaking, acceptors are two-way unbounded tape automata. On the other hand, a grammar consisting of rules gg′, in which g and g′ are arbitrary words of (VTVN)*, is an unrestricted rewriting system, and any recursively enumerable set of words—i.e., language in the present sense—is generated by some such system. These very general grammars thus correspond to two-way acceptors, called Turing acceptors, that accept precisely the recursively enumerable sets.

Finite-state grammars and finite-state acceptors

Acceptors that move tape left only, reading symbol by symbol and erasing the while, are the simplest possible, the finite-state acceptors. These automata have exactly the same capability as McCulloch-Pitts automata and accept sets called regular sets. The corresponding grammars in the classification being discussed are the finite-state grammars. In these systems the rules gg′ are restricted so that g is a nonterminal v of VN (as exemplified above) and g′ is of the form us, uVN and sVT. The languages generated by finite-state grammars, owing to this correspondence, are called regular languages.

Although these simple grammars and acceptors are of some interest in information theory and in neural network modelling, they are not descriptively adequate for English or for such standard computer languages as Algol because they are not able to account for phrase structure. In particular, finite-state grammars cannot generate self-embedded sentences such as “the man the dog bit ran away,” nor can they produce sentences with several readings such as “she is a pretty little girl.”

Context-free grammars and pushdown acceptors

Context-free, or phrase-structure, grammars, although apparently not affording completely adequate descriptions of vernacular languages, do have the desirable properties just noted. For this family, the rules gg′ contain single nonterminals on the left, as in the case of the finite-state grammars, but allow g′ to be any word of (VTVN)*. The example discussed above is a context-free grammar. Grammars of this kind can account for phrase structure and ambiguity (see 9).

Pushdown acceptors, which play a key role in computer-programming theory, are automata corresponding to context-free grammars. A pushdown acceptor is a finite-state acceptor equipped with an added two-way storage tape, the so-called pushdown store. At the beginning of operation this tape is blank. As the automaton computes, the store is used to analyze the syntactical structure of the sentence being read. The store moves left when printed, and only the last symbol printed may be read, then the next to the last, and so forth. The input is accepted if both the (one-way) input and storage tapes are blank when the automaton halts in a final state.

The representation of Turing machines in quadruple form may be replaced here by a somewhat clearer list of rules that simulate tape action in their application. Rules can be formulated for a pushdown acceptor P for a context-free language L of items xcx-1, in which x is a word on an abstract alphabet {a, b} and x-1 is x written in reverse. A first such rule can be formulated to mean that, if P is in state q0 scanning a on input and any (defined) symbol on the pushdown store, it moves tape left, erases a from the input, prints a on the store, and goes into state q1. A symbolic expression for the rule might be: q0aaq1. Another rule might be of the form: if P is in state q1 scanning c on input and anything on store, it moves input left, erases c, and does nothing with respect to the store—briefly, q1cq2. Another requires that, if P is in q2 scanning a on input and a on store, then it moves input left, erases a, moves store right, and erases a (see 10). An example is easily constructed to show that under certain rules a set, say, abcba is accepted (see 11). If q0abcba indicates the outset of a computation with P in the initial state q0 scanning the first a in abcba on input tape and blank on store tape, and if q2 is a final state, then the computation is determined by the rules given above (see 10). At the end of the computation the automaton is in a final state q2, both tapes are blank, and there is no rule with q2 alone on the left; P halts and hence abcba is accepted.

Reflection on the example and on others easily constructed shows that a pushdown acceptor is able, in effect, to parse sentences of context-free languages.

Context-sensitive grammars and linear-bounded acceptors

A fourth type of acceptor, which is mainly of mathematical rather than applied interest, is the two-way acceptor with bounded tape—i.e., tape the length of which never exceeds a linear function of the input length. These are the linear-bounded acceptors. They correspond in the present classificatory scheme to context-sensitive grammars. Unlike the context-free grammars, these latter systems use rules gg′, in which the nonterminal symbol ν ∊ VN in g may be rewritten only in a context xwy; thus gg′ is of the form xvyxwy, x, y, w ∊ (VT ∪ VN)*. An example of a context-sensitive language accepted by a linear-bounded automaton is the copy language xcx.

The family of recursively enumerable languages includes the context-sensitive languages, which in turn includes the context-free, which finally includes the regular, or finite-state, languages. No other hierarchy of corresponding acceptors has been intensively investigated.