SGML (standard generalized markup language) is an international standard for the definition of markup languages; that is, it is a metalanguage. Markup consists of notations called tags that specify the function of a piece of text or how it is to be displayed. SGML emphasizes descriptive markup, in which a tag might be “<emphasis>.” Such a markup denotes the document function, and it could be interpreted as reverse video on a computer screen, underlining by a typewriter, or italics in typeset text.

SGML is used to specify DTDs (document type definitions). A DTD defines a kind of document, such as a report, by specifying what elements must appear in the document—e.g., <Title>—and giving rules for the use of document elements, such as that a paragraph may appear within a table entry but a table may not appear within a paragraph. A marked-up text may be analyzed by a parsing program to determine if it conforms to a DTD. Another program may read the markups to prepare an index or to translate the document into PostScript for printing. Yet another might generate large type or audio for readers with visual or hearing disabilities.

World Wide Web display languages

HTML

The World Wide Web is a system for displaying text, graphics, and audio retrieved over the Internet on a computer monitor. Each retrieval unit is known as a Web page, and such pages frequently contain “links” that allow related pages to be retrieved. HTML (hypertext markup language) is the markup language for encoding Web pages. It was designed by Tim Berners-Lee at the CERN nuclear physics laboratory in Switzerland during the 1980s and is defined by an SGML DTD. HTML markup tags specify document elements such as headings, paragraphs, and tables. They mark up a document for display by a computer program known as a Web browser. The browser interprets the tags, displaying the headings, paragraphs, and tables in a layout that is adapted to the screen size and fonts available to it.

HTML documents also contain anchors, which are tags that specify links to other Web pages. An anchor has the form <A HREF= “http://www.britannica.com”> Encyclopædia Britannica</A>, where the quoted string is the URL (uniform resource locator) to which the link points (the Web “address”) and the text following it is what appears in a Web browser, underlined to show that it is a link to another page. What is displayed as a single page may also be formed from multiple URLs, some containing text and others graphics.

XML

HTML does not allow one to define new text elements; that is, it is not extensible. XML (extensible markup language) is a simplified form of SGML intended for documents that are published on the Web. Like SGML, XML uses DTDs to define document types and the meanings of tags used in them. XML adopts conventions that make it easy to parse, such as that document entities are marked by both a beginning and an ending tag, such as <BEGIN>…</BEGIN>. XML provides more kinds of hypertext links than HTML, such as bidirectional links and links relative to a document subsection.

A laptop computer
More From Britannica
computer science: Programming languages

Because an author may define new tags, an XML DTD must also contain rules that instruct a Web browser how to interpret them—how an entity is to be displayed or how it is to generate an action such as preparing an e-mail message.

Web scripting

Web pages marked up with HTML or XML are largely static documents. Web scripting can add information to a page as a reader uses it or let the reader enter information that may, for example, be passed on to the order department of an online business. CGI (common gateway interface) provides one mechanism; it transmits requests and responses between the reader’s Web browser and the Web server that provides the page. The CGI component on the server contains small programs called scripts that take information from the browser system or provide it for display. A simple script might ask the reader’s name, determine the Internet address of the system that the reader uses, and print a greeting. Scripts may be written in any programming language, but, because they are generally simple text-processing routines, scripting languages like PERL are particularly appropriate.

Another approach is to use a language designed for Web scripts to be executed by the browser. JavaScript is one such language, designed by the Netscape Communications Corp., which may be used with both Netscape’s and Microsoft’s browsers. JavaScript is a simple language, quite different from Java. A JavaScript program may be embedded in a Web page with the HTML tag <script language=“JavaScript”>. JavaScript instructions following that tag will be executed by the browser when the page is selected. In order to speed up display of dynamic (interactive) pages, JavaScript is often combined with XML or some other language for exchanging information between the server and the client’s browser. In particular, the XMLHttpRequest command enables asynchronous data requests from the server without requiring the server to resend the entire Web page. This approach, or “philosophy,” of programming is called Ajax (asynchronous JavaScript and XML).

VB Script is a subset of Visual Basic. Originally developed for Microsoft’s Office suite of programs, it was later used for Web scripting as well. Its capabilities are similar to those of JavaScript, and it may be embedded in HTML in the same fashion.

Behind the use of such scripting languages for Web programming lies the idea of component programming, in which programs are constructed by combining independent previously written components without any further language processing. JavaScript and VB Script programs were designed as components that may be attached to Web browsers to control how they display information.

Elements of programming

Despite notational differences, contemporary computer languages provide many of the same programming structures. These include basic control structures and data structures. The former provide the means to express algorithms, and the latter provide ways to organize information.

Control structures

Programs written in procedural languages, the most common kind, are like recipes, having lists of ingredients and step-by-step instructions for using them. The three basic control structures in virtually every procedural language are:

  • 1. Sequence—combine the liquid ingredients, and next add the dry ones.
  • 2. Conditional—if the tomatoes are fresh then simmer them, but if canned, skip this step.
  • 3. Iterative—beat the egg whites until they form soft peaks.

Sequence is the default control structure; instructions are executed one after another. They might, for example, carry out a series of arithmetic operations, assigning results to variables, to find the roots of a quadratic equation ax2 + bx + c = 0. The conditional IF-THEN or IF-THEN-ELSE control structure allows a program to follow alternative paths of execution. Iteration, or looping, gives computers much of their power. They can repeat a sequence of steps as often as necessary, and appropriate repetitions of quite simple steps can solve complex problems.

These control structures can be combined. A sequence may contain several loops; a loop may contain a loop nested within it, or the two branches of a conditional may each contain sequences with loops and more conditionals. In the “pseudocode” used in this article, “*” indicates multiplication and “←” is used to assign values to variables. The following programming fragment employs the IF-THEN structure for finding one root of the quadratic equation, using the quadratic formula:

quadratic formula.

The quadratic formula assumes that a is nonzero and that the discriminant (the portion within the square root sign) is not negative (in order to obtain a real number root). Conditionals check those assumptions:

  • IF a = 0 THEN
  • ROOT ← −c/b
  • ELSE
  • DISCRIMINANT ← b*b − 4*a*c
  • IF DISCRIMINANT ≥ 0 THEN
  • ROOT ← (−b + SQUARE_ROOT(DISCRIMINANT))/2*a
  • ENDIF
  • ENDIF

The SQUARE_ROOT function used in the above fragment is an example of a subprogram (also called a procedure, subroutine, or function). A subprogram is like a sauce recipe given once and used as part of many other recipes. Subprograms take inputs (the quantity needed) and produce results (the sauce). Commonly used subprograms are generally in a collection or library provided with a language. Subprograms may call other subprograms in their definitions, as shown by the following routine (where ABS is the absolute-value function). SQUARE_ROOT is implemented by using a WHILE (indefinite) loop that produces a good approximation for the square root of real numbers unless x is very small or very large. A subprogram is written by declaring its name, the type of input data, and the output:

  • FUNCTION SQUARE_ROOT(REAL x) RETURNS REAL
  • ROOT ← 1.0
  • WHILE ABS(ROOT*ROOT − x) ≥ 0.000001
  • AND WHILE ROOT ← (x/ROOT + ROOT)/2
  • RETURN ROOT

Subprograms can break a problem into smaller, more tractable subproblems. Sometimes a problem may be solved by reducing it to a subproblem that is a smaller version of the original. In that case the routine is known as a recursive subprogram because it solves the problem by repeatedly calling itself. For example, the factorial function in mathematics (n! = n∙(n−1)⋯3∙2∙1—i.e., the product of the first n integers), can be programmed as a recursive routine:

  • FUNCTION FACTORIAL(INTEGER n) RETURNS INTEGER
  • IF n = 0 THEN RETURN 1
  • ELSE RETURN n * FACTORIAL(n−1)

The advantage of recursion is that it is often a simple restatement of a precise definition, one that avoids the bookkeeping details of an iterative solution.

At the machine-language level, loops and conditionals are implemented with branch instructions that say “jump to” a new point in the program. The “goto” statement in higher-level languages expresses the same operation but is rarely used because it makes it difficult for humans to follow the “flow” of a program. Some languages, such as Java and Ada, do not allow it.