Information about http://www.cl.cam.ac.uk/~mgk25/iso-14977.pdf

ISO…

Tags: definitions, normative references, scope, syntactic element, syntax rule,
Pages: 19
Language: english
Created: Thu Sep 17 18:01:03 1998
Display cached document
Page 1
image
Page 2
image
Page 3
image
Page 4
image
Page 5
image
Page 6
image
Page 7
image
Page 8
image
Page 9
image
Page 10
image
Page 11
image
Page 12
image
Page 13
image
Page 14
image
Page 15
image
Page 16
image
Page 17
image
Page 18
image
Page 19
image
                                                                   ISO/IEC 14977 : 1996(E)




Contents                                                                                                    Page

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      iii

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      iv

1 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            1

2 Normative references . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                     1

3 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .              1

4 The      form of each syntactic element of Extended                           BNF . . . . . . . . . .        1
  4.1       General . . . . . . . . . . . . . . . . . . . . . . . . . . . .     ...............                2
  4.2       Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . .    ...............                2
  4.3       Syntax-rule . . . . . . . . . . . . . . . . . . . . . . . . .       ...............                2
  4.4       Definitions-list . . . . . . . . . . . . . . . . . . . . . . .      ...............                2
  4.5       Single-definition . . . . . . . . . . . . . . . . . . . . . .       ...............                2
  4.6       Syntactic-term . . . . . . . . . . . . . . . . . . . . . . .        ...............                2
  4.7       Syntactic exception . . . . . . . . . . . . . . . . . . .           ...............                2
  4.8       Syntactic-factor . . . . . . . . . . . . . . . . . . . . . .        ...............                2
  4.9       Integer . . . . . . . . . . . . . . . . . . . . . . . . . . . . .   ...............                2
  4.10      Syntactic-primary . . . . . . . . . . . . . . . . . . . . .         ...............                2
  4.11      Optional-sequence . . . . . . . . . . . . . . . . . . . .           ...............                3
  4.12      Repeated sequence . . . . . . . . . . . . . . . . . . . .           ...............                3
  4.13      Grouped sequence . . . . . . . . . . . . . . . . . . . .            ...............                3
  4.14      Meta-identifier . . . . . . . . . . . . . . . . . . . . . . .       ...............                3
  4.15      Meta-identifier-character . . . . . . . . . . . . . . . .           ...............                3
  4.16      Terminal-string . . . . . . . . . . . . . . . . . . . . . . .       ...............                3
  4.17      First-terminal-character . . . . . . . . . . . . . . . . .          ...............                3
  4.18      Second-terminal-character . . . . . . . . . . . . . . .             ...............                3
  4.19      Special-sequence . . . . . . . . . . . . . . . . . . . . .          ...............                3
  4.20      Special-sequence-character . . . . . . . . . . . . . .              ...............                3
  4.21      Empty-sequence . . . . . . . . . . . . . . . . . . . . . .          ...............                3
  4.22      Further examples . . . . . . . . . . . . . . . . . . . . .          ...............                3

5 The      symbols represented by each syntactic element . . . . . . . . . . . .                               3
  5.1       General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .      3
  5.2       Terminal-string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        4
  5.3       Meta-identifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .        4
  5.4       Grouped-sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .             4
  5.5       Optional-sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            4
  5.6       Repeated-sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .            4
  5.7       Syntactic-factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         4
  5.8       Syntactic-term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .         4


                                                                                                               i
ISO/IEC 14977 : 1996(E)




     5.9    Single-definition . .       ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   .    5
     5.10   Definitions-list . . .      ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   .    5
     5.11   Special-sequence .          ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   .    5
     5.12   Empty-sequence . .          ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   ..   ..   ..   .   ..   ..   .   .    5

6    Layout and Comments . . . . . . . . . . . . . . . . . . . .                                 .............                             5
     6.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                       ..............                            5
     6.2 Terminal-character . . . . . . . . . . . . . . . . . . . . .                            ..............                            5
     6.3 Gap-free-symbol . . . . . . . . . . . . . . . . . . . . . . .                           ..............                            6
     6.4 Gap-separator . . . . . . . . . . . . . . . . . . . . . . . . .                         ..............                            6
     6.5 Commentless-symbol . . . . . . . . . . . . . . . . . . .                                ..............                            6
     6.6 Comment-symbol . . . . . . . . . . . . . . . . . . . . . .                              ..............                            6
     6.7 Bracketed-textual-comment . . . . . . . . . . . . . . .                                 ..............                            6

7    The    representation of each terminal-character in Extended BNF . .                                                                  6
     7.1     General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                 6
     7.2     Letters and digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                    6
     7.3     Other terminal characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                         6
     7.4     Alternative representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                         6
     7.5     Other-character . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   7
     7.6     Gap-separator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                   7
     7.7     Terminal-characters represented by a pair of characters . . . . . . . .                                                       8
     7.8     Invalid character sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                         8

8    Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                        .............                             8
     8.1 The syntax of Extended BNF . . . . . . . . . . . . .                                    ..............                            8
     8.2 Extended BNF used to define itself informally                                           ..............                           10
     8.3 Extended BNF defined informally . . . . . . . . . .                                     ..............                           10

Annexes
A Two-level grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                11

B Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                            12




ii
                                                  ISO/IEC 14977 : 1996(E)




Foreword
ISO (the International Organization for Standardization) and IEC (the International
Electrotechnical Commission) form the specialized system for worldwide
standardization. National bodies that are members of ISO or IEC participate
in the development of International Standards through technical committees
established by the respective organization to deal with particular fields of
technical activity. ISO and IEC technical committees collaborate in fields
of mutual interest. Other international organizations, governmental and non-
governmental, in liaison with ISO and IEC, also take part in the work.

In the field of information technology, ISO and IEC have established a joint
technical committee ISO/IEC JTC 1. Draft International Standards adopted
by the joint technical committee are circulated to national bodies for voting.
Publication as an International Standard requires approval by at least 75% of
the national bodies casting a vote.

International Standard ISO/IEC 14977 was prepared by BSI (as BS 6154)
and was adopted, under a special "fast-track procedure", by Joint Technical
Committee ISO/IEC JTC 1, Information technology, in parallel with its approval
by national bodies of ISO and IEC.

Annexes A and B of this International Standard are for information only.




                                                                                iii
ISO/IEC 14977 : 1996(E)




Introduction
A syntactic metalanguage is an important tool of computer science. The
concepts are well known, but many slightly different notations are in use. As a
result syntactic metalanguages are still not widely used and understood, and the
advantages of rigorous notations are unappreciated by many people.

Extended BNF brings some order to the formal definition of a syntax and will
be useful not just for the definition of programming languages, but for many
other formal definitions.

Since the definition of the programming language Algol 60 (Naur, 1960) the
custom has been to define the syntax of a programming language formally. Algol
60 was defined with a notation now known as BNF or Backus-Naur Form. This
notation has proved a suitable basis for subsequent languages but has frequently
been extended or slightly altered. The many different notations are confusing
and have prevented the advantages of formal unambiguous definitions from
being widely appreciated. The syntactic metalanguage Extended BNF described
in this standard is based on Backus-Naur Form and includes the most widely
adopted extensions.

Syntactic metalanguages

A syntactic metalanguage is a notation for defining the syntax of a language
by use of a number of rules. Each rule names part of the language (called a
non-terminal symbol of the language) and then defines its possible forms. A
terminal symbol of the language is an atom that cannot be split into smaller
components of the language. A syntactic metalanguage is useful whenever a
clear formal description and definition is required, e.g. the format for references
in papers submitted to a journal, or the instructions for performing a complicated
task.

A formal syntax definition has three distinct uses:

     a) it names the various syntactic parts (i.e. non-terminal symbols) of the
     language;

     b) it shows which sequences of symbols are valid sentences of the language;

     c) it shows the syntactic structure of any sentence of the language.

The need for a standard syntactic metalanguage

Without a standard syntactic metalanguage every programming language definition
starts by specifying the metalanguage used to define its syntax. This causes
various problems:


iv
                                                 ISO/IEC 14977 : 1996(E)




  Many different notations -- It is unusual for two different programming
  languages to use the same metalanguage. Thus human readers are handicapped
  by having to learn a new metalanguage before they can study a new language.

  Concepts not widely understood -- The lack of a standard notation hinders
  the use of rigorous unambiguous definitions.

  Imperfect notations -- Because a metalanguage needs to be defined for
  every programming language, almost inevitably, the metalanguage contains
  defects. For example errors occurred in the drafting of RTL/2 (BS5904) and
  CORAL 66 (BS5905) because the metalanguages could not be typed easily.

  Special purpose notations -- A metalanguage defined for a particular pro-
  gramming language is often simplified by taking advantage of special features
  in the language to be defined. However, the metalanguage is then unsuitable
  for other programming languages.

  Few general syntax processors -- The multiplicity of syntactic meta-
  languages has limited the availability of computer programs to analyse
  and process syntaxes, e.g. to list a syntax neatly, to make an index of the
  symbols used in the syntax, to produce a syntax-checker for programs written
  in the language.

In practice experienced readers have little difficulty in picking up and learning
a new notation, but even so the differences obscure mutual understanding
and hinder communication. A standard metalanguage enables more people to
crystallize vague ideas into an unambiguous definition. It is also useful because
other people needing to provide formal definitions no longer need to reinvent
similar concepts.

The objectives to be satisfied

It is desirable that a standard syntactic metalanguage should be:

  a) concise, so that languages can be defined briefly and thus be more easily
  understood;

  b) precise, so that the rules are unambiguous;

  c) formal, so that the rules can be parsed, or otherwise processed, by a
  computer when required;

  d) natural, so that the notation and format are relatively simple to learn and
  understand, even for those who are not themselves language designers; (The
  meaning of a symbol should not be surprising. It should also be possible to
  define the syntax of a language in a way that helps to indicate the meaning
  of the constructions.)

  e) general, so that the notation is suitable for many purposes including the
  description of many different languages;

  f) simple in its character set and with a notation that avoids, as far as
  is practicable, using characters that are not generally available on standard
  keyboards (both typewriters and computer terminals) so that the rules can be
  typed and can be processed by computer programs;

  g) self describing, so that the notation is able to describe itself;


                                                                               v
ISO/IEC 14977 : 1996(E)




     h) linear, so that the syntax can be expressed as a single stream of characters.
     (This simplifies printing a syntax. Computer processing of a syntax is also
     simpler.)

Some common syntactic metalanguages

Unfortunately none of the existing syntactic metalanguages was suitable for
adoption as the standard, for example:

     a) COBOL (ISO 1989:1985) lists alternatives vertically and uses brackets
     spreading over many lines. This is inconvenient for computer processing and
     cannot be prepared on typewriters.

     b) Backus-Naur Form (used in ALGOL 60) has problems if the metasymbols
     < > | ::= occur in the language being defined. Some common forms of
     construction (e.g. comments) cannot be expressed naturally, other constructions
     (e.g. repetition) are long-winded.

     c) The obsolete FORTRAN 77 (ISO 1539:1980) had `railroad tracks'. These
     are easy to understand but difficult to prepare and to process on a computer
     or typewriter. The current version, FORTRAN 90 (ISO/IEC 1539:1991), no
     longer uses this notation.

Most other languages use a variant of one of these metalanguages. Most of
them cannot be candidates for standardization because they use characters not in
the language being defined as metasymbols of the metalanguage. This simplifies
the metalanguage but prevents it from being used generally.

POSIX (ISO/IEC 9945-2:1993) includes two complementary facilities which
both assume an ISO/IEC 646:1991 character set is applicable: LEX permits
the definition and lexical analysis of regular expressions, but is inadequate for
the description of an arbitrary context-free grammar, and YACC (Yet Another
Compiler Compiler) is a parser generator for an LALR(1) grammar.

The standard metalanguage Extended BNF

Extended BNF, the metalanguage defined in this International Standard, is based
on a suggestion by Niklaus Wirth (Wirth, 1977) that is based on Backus-Naur
Form and that contains the most common extensions, i.e.:

     a) Terminal symbols of the language are quoted so that any character,
     including one used in Extended BNF, can be defined as a terminal symbol of
     the language being defined.

     b) [ and ] indicate optional symbols.

     c) { and } indicate repetition.

     d) Each rule has an explicit final character so that there is never any
     ambiguity about where a rule ends.

     e) Brackets group items together. It is an obvious convenience to use ( and
     ) in their ordinary mathematical sense.

The main differences in Extended BNF are further features that experience has
shown are often required when providing a formal definition:


vi
                                                ISO/IEC 14977 : 1996(E)




  a) Defining an explicit number of items. Fortran contains a rule that a label
  field contains exactly five characters; an identifier in PL/I or COBOL has up
  to 32 characters: rules such as these can be expressed only with difficulty in
  Backus-Naur Form. In practice, such definitions are often left incomplete and
  the rules qualified informally in English.

  b) Defining something by specifying the few exceptional cases. An Algol
  end-comment ends at the first end, else or semicolon. A rule like this cannot
  be expressed concisely or clearly in Backus-Naur Form and is also usually
  specified informally in English.

  c) Including comments. Programming languages and other structures with
  a complicated syntax need many rules to define them. The syntax will
  be clearer if explanations and cross-references can be provided; accordingly
  Extended BNF contains a comment facility so that ordinary text can be added
  to a syntax for the benefit of a human reader without affecting the formal
  meaning of the syntax.

  d) Meta-identifier. A meta-identifier (the name of a non-terminal symbol
  in the language) need not be a single word or enclosed in brackets because
  there is an explicit concatenate symbol. This also ensures that the layout of
  a syntax (except in a terminal symbol) does not affect the language being
  defined.

  e) Extensions. A user may wish to extend Extended BNF. A special-sequence
  is provided for this purpose, the format and meaning of which are not defined
  in the standard except to ensure that the start and end of an extension
  can always be seen easily. Various possible extensions are outlined in the
  following paragraphs.

Limitations and extensions

The main limitation of Extended BNF is that the language being defined needs
to be linear, i.e. the symbols in a sentence of the language can be placed in
an ordered sequence. For example knitting patterns and recipes in cooking are
linear languages, but electric circuit diagrams are not.

A further limitation is that Extended BNF is inadequate for defining more
complex forms of grammars. Such facilities were not provided because it was
thought the main need was to define a notation sufficient for the simpler and
commoner requirements.

Instead Extended BNF has been designed so that various extensions can be
made in a natural way. There are two simple ways of extending the standard
metalanguage. Firstly, the special-sequence concept provides a basic framework
for any extension, the format between the special-sequence-characters being
almost completely arbitrary. This method would be suitable for an action
grammar, i.e. one specifying actions that are to take place as a sentence is
parsed. Secondly, a meta-identifier can never be followed immediately by a left
parenthesis in the standard metalanguage; thus another method of extending the
metalanguage is to define the syntax and meaning of a meta-identifier followed
by a sequence of parameters enclosed in parentheses. This would be reasonable
in an attribute grammar where the rules ensure consistency between different
parts of a sentence in the language being defined.

More complicated extensions are also possible. Annex A suggests how Extended
BNF might be extended to define a two-level grammar.


                                                                            vii
INTERNATIONAL STANDARD                                                                         ISO/IEC 14977 : 1996(E)




Information technology -- Syntactic metalanguage -- Extended BNF
1     Scope                                                      3.2 subsequence: A sequence within a sequence.

This International Standard defines a notation, Extended
BNF, for specifying the syntax of a linear sequence of           3.3 non-terminal symbol: A syntactic part of the lan-
symbols. It defines both the logical structure of the            guage being defined.
notation and its graphical representation.

Extended BNF has applications in the definition of pro-
                                                                 3.4 meta-identifier: The name of a non-terminal sym-
                                                                 bol.
gramming and other languages, as well as in other formal
definitions, for example the commands to an operating
system, or the precise format of data and results.               3.5 start symbol: A non-terminal symbol that is defined
                                                                 by one or more syntax rules but does not occur in any
Examples of Extended BNF are given in clause 8.                  other syntax rule.
NOTE -- Like many other notations, Extended BNF can still
be misused; thus it does not prevent someone from trying to
define an unparsable or ambiguous language.                      3.6 sentence: A sequence of symbols that represents
                                                                 the start symbol.


2     Normative references                                       3.7 terminal symbol: A sequence of one or more
                                                                 characters forming an irreducible element of a language.
The following standards contain provisions which, through
reference in this text, constitute provisions of this Interna-   NOTE -- In this International Standard a terminal symbol of
tional Standard. At the time of publication, the editions        Extended BNF is called a terminal-character, and a terminal
                                                                 symbol of a language being defined by a syntax is represented
indicated were valid. All standards are subject to revi-
                                                                 by a terminal-string.
sion, and parties to agreements based on this International
Standard are encouraged to investigate the possibility of
applying the most recent editions of the standards listed
below. Members of IEC and ISO maintain registers of              4 The form of each syntactic element of Ex-
currently valid International Standards.                           tended BNF
ISO 2382-15 : 1985, Data processing -- Vocabulary --             NOTES
Part 15: Programming languages.
                                                                 1    The following conventions are used:
ISO/IEC 646 : 1991, Information technology -- ISO 7-bit
coded character set for information interchange.                     a) Each meta-identifier of Extended BNF is written as one
                                                                     or more words joined together by hyphens;
ISO/IEC 6429 : 1992, Information technology -- Control               b) A meta-identifier ending with "-symbol" is the name of
functions for 7-bit and 8-bit coded character sets.                  a terminal symbol of Extended BNF.

BS 6154 : 1981, Method of defining -- Syntactic meta-            2 The normal character representing each operator of Extended
language.                                                        BNF and its implied precedence is (highest precedence at the
                                                                 top):


3     Definitions                                                                 *   repetition-symbol
                                                                                  -   except-symbol
                                                                                  ,   concatenate-symbol
For the purposes of this International Standard, the                              |   definition-separator-symbol
definitions given in ISO 2382-15 and the following                                =   defining-symbol
definitions apply:                                                                ;   terminator-symbol

                                                                 3 The normal precedence is over-ridden by the following
3.1    sequence: An ordered list of zero or more items.          bracket pairs:


                                                                                                                            1
ISO/IEC 14977 : 1996(E)



    ´      first-quote-symbol     first-quote-symbol     ´   4.7    Syntactic exception
    "      second-quote-symbol    second-quote-symbol    "
    (*     start-comment-symbol   end-comment-symbol    *)
    (      start-group-symbol     end-group-symbol       )   A syntactic-exception consists of a syntactic-factor subject
    [      start-option-symbol    end-option-symbol      ]   to the restriction that the sequences of symbols represented
    {      start-repeat-symbol    end-repeat-symbol      }   by the syntactic-exception could equally be represented by
    ?      special-sequence       special-sequence       ?
                                                             a syntactic-factor containing no meta-identifiers.
               -symbol                -symbol
                                                             NOTE -- If a syntactic-exception is permitted to be an arbitrary
                                                             syntactic-factor, Extended BNF could define a wider class of
                                                             languages than the context-free grammars, including attempts
4.1      General                                             which lead to Russell-like paradoxes, e.g.
                                                                xx = "A" - xx;
The logical structure of Extended BNF is defined in 4.2      Is "A" an example of xx? Such licence is undesirable
                                                             and the form of a syntactic-exception is therefore restricted
to 4.21.                                                     to cases that can be proved to be safe. Thus whereas a
                                                             syntactic-factor is in general equivalent to some context-free
                                                             grammar, a syntactic-exception is always equivalent to some
                                                             regular grammar. It may be shown that the difference between a
4.2      Syntax                                              context-free grammar and a regular grammar is always another
                                                             context-free grammar; hence a syntactic-term (and hence any
                                                             grammar defined according to this standard) is equivalent to
The syntax of a language consists of one or more             some context-free grammar.
syntax-rules.

                                                             4.8    Syntactic-factor
4.3      Syntax-rule
                                                             A syntactic-factor consists of either:
A syntax-rule consists of a meta-identifier (the name of
the non-terminal symbol being defined) followed by a           a) an integer followed by a repetition-symbol followed
defining-symbol followed by a definitions-list followed by     by a syntactic-primary, or
a terminator-symbol.
                                                               b) a syntactic-primary.


4.4      Definitions-list
                                                             4.9    Integer
A definitions-list consists of an ordered list of one or     An integer consists of an ordered list of one or more
more single-definitions separated from each other by a       decimal-digits.
definition-separator-symbol.

                                                             4.10    Syntactic-primary
4.5      Single-definition
                                                             A syntactic-primary consists of one of the following:
A single-definition consists of an ordered list of one
or more syntactic-terms separated from each other by a         a) an optional-sequence;
concatenate-symbol.
                                                               b) a repeated-sequence;

                                                               c) a grouped-sequence;
4.6      Syntactic-term
                                                               d) a meta-identifier;
A syntactic-term consists of either:
                                                               e) a terminal-string;
    a) a syntactic-factor, or
                                                               f) a special-sequence;
    b) a syntactic-factor followed by an except-symbol
    followed by a syntactic-exception.                         g) an empty-sequence.


2
                                                                                          ISO/IEC 14977 : 1996(E)




4.11   Optional-sequence                                       4.19   Special-sequence

An optional-sequence consists of a start-option-symbol         A special-sequence consists of a special-sequence-symbol
followed by a definitions-list followed by an end-option-      followed by a (possibly empty) sequence of special-
symbol.                                                        sequence-characters followed by a special-sequence-
                                                               symbol.

4.12   Repeated sequence
                                                               4.20   Special-sequence-character
A repeated-sequence consists of a start-repeat-symbol
followed by a definitions-list followed by an end-repeat-
symbol.                                                        A special-sequence-character is any terminal-character ex-
                                                               cept a special-sequence-symbol.

4.13   Grouped sequence
                                                               4.21   Empty-sequence
A grouped-sequence consists of a start-group-symbol fol-
lowed by a definitions-list followed by an end-group-          An empty-sequence consists of the empty sequence of
symbol.                                                        terminal-characters.


4.14   Meta-identifier
                                                               4.22   Further examples
A meta-identifier consists of an ordered list of one or more
meta-identifier-characters subject to the condition that the   The following example is a syntax-rule that states that a
first meta-identifier-character is a letter.                   Fortran 77 continuation line starts with 5 blanks, the sixth
                                                               character must not be a blank or zero, and there must not
                                                               be more than 72 (= 5+1+66) characters altogether.
4.15   Meta-identifier-character
                                                               Fortran 77 continuation line = 5 * " ",
                                                                 (character - (" " | "0")), 66 * [character] ;
A meta-identifier-character is a letter or a decimal-digit.

                                                               In Fortran 66, the definition of a continuation line is more
4.16   Terminal-string                                         complicated. The following example is a syntax-rule that
                                                               states that a continuation line must not start with C, there
A terminal-string consists of either:                          must be at least 6 characters, the sixth character must not
                                                               be a blank or zero, and there must not be more than 72
  a) A first-quote-symbol followed by a sequence of            (= 1+4+1+66) characters altogether.
  one or more first-terminal-characters followed by a
  first-quote-symbol, or                                       Fortran 66 continuation line = character - "C",
                                                                 4 * character, character - (" " | "0"),
                                                                 66 * [character] ;
  b) A second-quote-symbol, followed by a sequence of
  one or more second-terminal-characters followed by a
  second-quote-symbol.
                                                               5 The symbols represented by each syntactic
4.17   First-terminal-character                                  element

A first-terminal-character is any terminal-character except    5.1 General
a first-quote-symbol.
                                                               Each syntax-rule is a syntax rule that defines (possibly
                                                               empty) sequences of terminal and non-terminal symbols.
4.18   Second-terminal-character                               Each of these sequences of symbols is represented by the
                                                               non-terminal symbol named by the meta-identifier at the
A second-terminal-character is any terminal-character ex-      start of the syntax-rule. 5.2 to 5.12 define the sequences
cept a second-quote-symbol.                                    of symbols that are represented by any definitions-list.


                                                                                                                         3
ISO/IEC 14977 : 1996(E)




NOTES                                                                5.7     Syntactic-factor

1     When the syntax of a complete language is defined there is:    A syntactic-factor represents an explicit number of subse-
                                                                     quences where each subsequence is a sequence of symbols
    a)   a start symbol, and                                         represented by the syntactic-primary that is part of that
                                                                     syntactic-factor. The required number of subsequences
    b) at least one syntax-rule starting with each meta-identifier   equals one when no integer is given and otherwise is equal
    used as a syntactic-primary.                                     to the value of the integer.
2 It is more difficult to understand a language if there are         As examples the following syntax-rules illustrate the
several syntax-rules defining a meta-identifier and no indication
that each definition only partly defines the non-terminal symbol.    facilities for expressing repetition.

                                                                       aa   =   "A";
                                                                       bb   =   3 * aa, "B";
5.2      Terminal-string                                               cc   =   3 * [aa], "C";
                                                                       dd   =   {aa}, "D";
                                                                       ee   =   aa, {aa}, "E";
A terminal-string represents either:                                   ff   =   3 * aa, 3 * [aa], "F";
                                                                       gg   =   3 * {aa}, "D";
    a) the sequence of first-terminal-characters between its
    first-quote-symbols, or                                          Terminal-strings defined by these rules are as follows:

                                                                       aa:      A
    b) the sequence of second-terminal-characters between              bb:      AAAB
    its second-quote-symbols.                                          cc:      C AC   AAC AAAC
                                                                       dd:      D AD   AAD AAAD AAAAD etc.
                                                                       ee:      AE AAE AAAE AAAAE AAAAAE etc.
                                                                       ff:      AAAF AAAAF AAAAAF AAAAAAF
5.3      Meta-identifier
                                                                     NOTE -- The definition for gg, although syntactically valid,
A meta-identifier used as a syntactic-primary represents             is not sensible. The sequences of symbols represented by gg
any sequence of symbols defined by the definitions-list of           are identical with those given by dd but cannot be parsed
                                                                     unambiguously.
any syntax-rule that starts with that meta-identifier.

                                                                     5.8     Syntactic-term
5.4      Grouped-sequence
                                                                     When a syntactic-term is a single syntactic-factor it
A grouped-sequence represents any sequence of symbols                represents any sequence of symbols represented by that
defined by the definitions-list enclosed by its start-group-         syntactic-factor.
symbol and end-group-symbol.
                                                                     When a syntactic-term is a syntactic-factor followed by
                                                                     an except-symbol followed by a syntactic-exception it
5.5      Optional-sequence                                           represents any sequence of symbols that satisfies both of
                                                                     the conditions:
An optional-sequence represents either:
                                                                       a) it is a sequence of symbols represented by the
    a) the empty sequence of symbols, or                               syntactic-factor,

    b) any sequence of symbols defined by the definitions-             b) it is not a sequence of symbols represented by the
    list enclosed by its start-option-symbol and end-option-           syntactic-exception.
    symbol.
                                                                     As examples the following syntax-rules illustrate the
                                                                     facilities provided by the except-symbol.
5.6      Repeated-sequence                                             letter = "A" | "B" | "C" | "D" | "E" | "F"
                                                                         | "G" | "H" | "I" | "J" | "K" | "L" | "M"
A repeated-sequence represents a (possibly empty) se-                    | "N" | "O" | "P" | "Q" | "R" | "S" | "T"
                                                                         | "U" | "V" | "W" | "X" | "Y" | "Z";
quence of subsequences where each subsequence is any
                                                                       vowel = "A" | "E" | "I" | "O" | "U";
sequence of symbols defined by the definitions-list enclosed           consonant = letter - vowel;
by the start-repeat-symbol and end-repeat-symbol.                      ee = {"A"}-, "E";


4
                                                                                                ISO/IEC 14977 : 1996(E)




Terminal-strings defined by these rules are as follows:             NOTES

                                                                    1 It is much easier for a person to read and understand a
    letter:     A   B C    D E F G H I J etc.                       syntax if each syntax-rule starts on a new line and the various
    vowel:      A   E I    O U                                      metalanguage symbols are sensibly spaced.
    consonant: B     C D    F G H J K L M etc.
    ee:        AE    AAE   AAAE AAAAE AAAAAE etc.                   2 A language defined by Extended BNF may have completely
                                                                    different lexical rules from Extended BNF itself.
NOTE -- {"A"}- represents a sequence of one or more A's
                                                                    3 Comments enable explanatory text to be added to a syntax
because it is a syntactic-term with an empty syntactic-exception.
                                                                    and thus help a human to understand a syntax. For example,
                                                                    syntax-rules can be numbered and each meta-identifier followed
                                                                    by a comment identifying the position of the syntax-rule that
                                                                    defines it. It is recommended that any comment concerning a
5.9    Single-definition                                            syntax-rule should appear before the terminator-symbol of the
                                                                    rule.
A single-definition represents a sequence of one or more            4 Comments have no formal effect on the language defined
subsequences where each subsequence is a sequence of                by a syntax.
symbols represented by the corresponding syntactic-term
in that single-definition.
                                                                    6.2 Terminal-character

                                                                    A terminal-character of Extended BNF is one of the
5.10    Definitions-list                                            following:

                                                                      a) a letter;
A definitions-list represents any sequence of symbols that
is represented by any one of the single-definitions forming           b) a decimal-digit;
that definitions-list.
                                                                      c) a concatenate-symbol;

                                                                      d) a defining-symbol;
5.11    Special-sequence
                                                                      e) a definition-separator-symbol;
The sequence of symbols represented by a special-sequence
is outside the scope of this International Standard. Only the         f) an end-comment-symbol;
format of a special-sequence is defined in this International         g) an end-group-symbol;
Standard. A special-sequence provides a notation for
extensions which a user may require.                                  h) an end-option-symbol;

                                                                      i) an end-repeat-symbol;
5.12    Empty-sequence                                                j) an except-symbol;

An empty-sequence represents the empty sequence of                    k) a first-quote-symbol;
symbols.
                                                                      l) a repetition-symbol;

                                                                      m) a second-quote-symbol;

6     Layout and Comments                                             n) a special-sequence-symbol;

                                                                      o) a start-comment-symbol;
6.1    General
                                                                      p) a start-group-symbol;
The layout of the syntax on a page is almost completely
                                                                      q) a start-option-symbol;
arbitrary. 6.2 to 6.4 define that a non-printing character
such as space or new-line has no formal effect on a syntax            r) a start-repeat-symbol;
if the character is outside a terminal-string or pair of
characters forming a single terminal-character. 6.5 to 6.7            s) a terminator-symbol;
define where arbitrary text may be inserted as a comment
in a syntax.                                                          t) an other-character.


                                                                                                                                 5
ISO/IEC 14977 : 1996(E)




6.3    Gap-free-symbol                                         6.7       Bracketed-textual-comment

A gap-free-symbol is either:                                   A bracketed-textual-comment is a start-comment-symbol
                                                               followed by a (possibly empty) sequence of comment-
    a) a terminal-character that is neither a first-quote-     symbols followed by an end-comment-symbol.
    symbol nor a second-quote-symbol, or
                                                               One or more bracketed-textual-comments may be placed:
    b) a terminal-string.
                                                                   a) before a syntax, and

6.4    Gap-separator                                               b) between any two commentless-symbols of a syntax,
                                                                   and
A gap-separator is one of the non-printing characters:
space, horizontal-tabulation, new-line, vertical-tabulation,       c) after a syntax
or form-feed.
                                                               without affecting the language defined by the syntax.
One or more gap-separators may be placed:
                                                               NOTE -- 6.5 to 6.7 imply that bracketed-textual-comments
                                                               cannot appear in any of the following:
    a) before a syntax, and
                                                                   a)    a meta-identifier;
    b) between any two gap-free-symbols of a syntax, and
                                                                   b)    an integer;
    c) after a syntax
                                                                   c)    a special-sequence;
without affecting the language defined by the syntax.
                                                                   d)    a terminal-string.


6.5    Commentless-symbol
                                                               7        The representation of each            terminal-
A commentless-symbol is one of the following:                           character in Extended BNF

    a) a terminal-character that is neither a letter nor a     7.1       General
    decimal-digit nor a first-quote-symbol nor a second-
    quote-symbol nor a start-comment-symbol nor an end-        The representation of each terminal-character and gap-
    comment-symbol nor a special-sequence-symbol nor an        separator in Extended BNF using the characters in the 7-bit
    other-character;                                           character set (ISO/IEC 646:1991 International Reference
                                                               Version) is defined in 7.2 to 7.8.
    b) a meta-identifier;

    c) an integer;                                             7.2       Letters and digits

    d) a terminal-string;                                      Each letter and decimal-digit is represented by the corre-
                                                               sponding character.
    e) a special-sequence.
                                                               7.3       Other terminal characters
6.6    Comment-symbol
                                                               Table 1 defines the character representation for each
                                                               terminal-character that is neither a letter, nor a decimal-
A comment-symbol is one of the following:
                                                               digit nor an other-character.
    a) a bracketed-textual-comment;
                                                               7.4       Alternative representations
    b) a commentless-symbol;
                                                               Table 2 defines alternative character representations for
    c) an other-character.                                     some terminal-characters.


6
                                                                                         ISO/IEC 14977 : 1996(E)




                                                            NOTES

                                                            1 The main reason for specifying alternative representations is
                                                            that not all computers and typewriters have the characters listed
                                                            in table 1.
  Table 1 -- Representation of terminal-characters
                                                            2 To avoid confusion, the representation of a terminal-character
 Metalanguage symbol           Normal representation        in any one document should be consistent.
 concatenate-symbol            ,   comma
 defining-symbol               =   equals sign              3 7.2 to 7.4 imply that the characters required for Extended
 definition-separator-symbol   |   vertical line            BNF are:
 end-comment-symbol            *) asterisk,
                                       right parenthesis         letters digits = , - * ( ) ?
                                                              | or / or !
 end-group-symbol              )   right parenthesis          / or both of [ ]
 end-option-symbol             ]   right square bracket       : or both of { }
 end-repeat-symbol             }   right curly bracket        ´ or " (Both characters are needed if either is
 except-symbol                 -   hyphen-minus               a terminal symbol of the language being defined)
 first-quote-symbol            ´   apostrophe
 repetition-symbol             *   asterisk
 second-quote-symbol           "   quotation mark           7.5 Other-character
 special-sequence-symbol       ?   question mark
 start-comment-symbol          (* left parenthesis,         An other-character is any other character in the ISO/IEC
                                       asterisk             646:1991 character set that is neither:
 start-group-symbol            (   left parenthesis
 start-option-symbol           [   left square bracket        a) a control character, nor
 start-repeat-symbol           {   left curly bracket
 terminator-symbol             ;   semicolon
                                                              b) required to represent any other terminal-character.

                                                            NOTE -- When the terminal-characters are represented as
                                                            specified in table 1, the other-characters are:


                                                                                     space
                                                                                 .   full stop
                                                                                 :   colon
                                                                                 !   exclamation mark
                                                                                 +   plus sign
                                                                                     lowline
                                                                                 %   percent sign
Table 2 -- Alternative representation of terminal-                               @   commercial at
characters                                                                       &   ampersand
                                                                                 #   number sign
 Metalanguage symbol           Alternative representation                        $   dollar sign
 definition-separator-symbol   /    solidus                                      <   less-than sign
 definition-separator-symbol   !    exclamation mark                             >   greater-than sign
                                                                                 /   solidus
 end-option-symbol             /)   solidus,                                     \   reverse solidus
                                        right parenthesis                        ^   circumflex accent
 end-repeat-symbol             :) colon,                                         `   grave accent
                                        right parenthesis                        ~   tilde
 start-option-symbol           (/ left parenthesis,
                                        solidus
 start-repeat-symbol           (: left parenthesis,         7.6 Gap-separator
                                        colon
 terminator-symbol             .    full stop               A gap-separator is represented as follows:

                                                              a) a space is represented by a Space character,

                                                              b) a horizontal-tabulation is represented by a Horizontal
                                                              Tabulation character,


                                                                                                                           7
ISO/IEC 14977 : 1996(E)



Table 3 -- Character pairs that represent a single               8     Examples
terminal-character
                             (*                                  8.1    The syntax of Extended BNF
                             *)
                             (:                                  (*
                             :)                                       The syntax of Extended BNF can be defined using
                             (/                                       itself. There are four parts in this example,
                                                                      the first part names the characters, the second
                             /)                                       part defines the removal of unnecessary non-
                                                                      printing characters, the third part defines the
       Table 4 -- Invalid sequences of characters                     removal of textual comments, and the final part
                                                                      defines the structure of Extended BNF itself.
                            (*)                                       Each syntax rule in this example starts with a
                            (:)                                       comment that identifies the corresponding clause
                            (/)                                       in the standard.

                                                                      The meaning of special-sequences is not defined
                                                                      in the standard. In this example (see the
    c) a new-line is represented by a (possibly empty)                reference to 7.6) they represent control
    sequence of Carriage Return characters, a Line Feed               functions defined by ISO/IEC 6429:1992.
                                                                      Another special-sequence defines a
    character, and a (possibly empty) sequence of Carriage            syntactic-exception (see the reference to 4.7).
    Return characters,                                           *)

                                                                 (*
    d) a vertical-tabulation is represented by a Vertical             The first part of the lexical syntax defines the
    Tabulation character,                                             characters in the 7-bit character set (ISO/IEC
                                                                      646:1991) that represent each terminal-character
                                                                      and gap-separator in Extended BNF.
    e) a form-feed is represented by a Form Feed character.      *)
                                                                 (* see 7.2 *) letter
                                                                    = 'a' | 'b' | 'c' | 'd' | 'e' | 'f' | 'g' | 'h'
                                                                    | 'i' | 'j' | 'k' | 'l' | 'm' | 'n' | 'o' | 'p'
                                                                    | 'q' | 'r' | 's' | 't' | 'u' | 'v' | 'w' | 'x'
7.7    Terminal-characters represented by a pair of                 | 'y' | 'z'
       characters                                                   | 'A' | 'B' | 'C' | 'D' | 'E' | 'F' | 'G' | 'H'
                                                                    | 'I' | 'J' | 'K' | 'L' | 'M' | 'N' | 'O' | 'P'
                                                                    | 'Q' | 'R' | 'S' | 'T' | 'U' | 'V' | 'W' | 'X'
Each pair of characters in table 3 always represents a              | 'Y' | 'Z';
single terminal-character in a syntax-rule except inside a       (* see 7.2 *) decimal digit
                                                                    = '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7'
terminal-string or special-sequence.                                | '8' | '9';
                                                                 (*
NOTE -- This restriction is necessary because these character       The representation of the following
sequences are ambiguous, for example /) could be a definition-      terminal-characters is defined in clauses 7.3,
separator-symbol followed by an end-group-symbol, or an             7.4 and tables 1, 2.
end-option-symbol.                                               *)
                                                                 concatenate symbol = ',';
                                                                 defining symbol = '=';
                                                                 definition separator symbol = '|' | '/' | '!';
                                                                 end comment symbol = '*)';
7.8    Invalid character sequences                               end group symbol = ')';
                                                                 end option symbol = ']' | '/)';
                                                                 end repeat symbol = '}' | ':)';
Each line of table 4 specifies a character sequence that         except symbol = '-';
does not appear in a syntax-rule outside a terminal-string       first quote symbol = "'";
                                                                 repetition symbol = '*';
or special-sequence.
                                                                 second quote symbol = '"';
                                                                 special sequence symbol = '?';
NOTE -- This restriction is necessary because these character    start comment symbol = '(*';
sequences are ambiguous, for example (*) could be a              start group symbol = '(';
start-comment-symbol followed by an end-group-symbol, or a       start option symbol = '[' | '(/';
start-group-symbol followed by an end-comment-symbol.            start repeat symbol = '{' | '(:';
                                                                 terminator symbol = ';' | '.';
                                                                 (* see 7.5 *) other character
Inserting a gap-separator allows either meaning, for example        = ' ' | ':' | '+' | '_' | '%' | '@'
(*) is a start-comment-symbol followed by an end-group-             | '&' | '#' | '$' | '' | '\'
symbol, and ( *) is a start-group-symbol followed by an             | '^' | '`' | '~';
end-comment-symbol.                                              (* see 7.6 *) space character = ' ';


8
                                                                               ISO/IEC 14977 : 1996(E)




horizontal tabulation character                              | decimal digit
  = ? ISO 6429 character Horizontal Tabulation ? ;           | first quote symbol
new line                                                     | second quote symbol
  = { ? ISO 6429 character Carriage Return ? },              | start comment symbol
  ? ISO 6429 character Line Feed ?,                          | end comment symbol
  { ? ISO 6429 character Carriage Return ? };                | special sequence symbol
vertical tabulation character                                | other character)
  = ? ISO 6429 character Vertical Tabulation ? ;         | meta identifier
form feed                                                | integer
  = ? ISO 6429 character Form Feed ? ;                   | terminal string
                                                         | special sequence;
(*                                                     (* see 4.9 *) integer
     The second part of the syntax defines the           = decimal digit, {decimal digit};
     removal of unnecessary non-printing characters    (* see 4.14 *) meta identifier
     from a syntax.                                      = letter, {meta identifier character};
*)                                                     (* see 4.15 *) meta identifier character
(* see 6.2 *) terminal character                         = letter
   = letter                                              | decimal digit;
   | decimal digit                                     (* see 4.19 *) special sequence
   | concatenate symbol                                  = special sequence symbol,
   | defining symbol                                       {special sequence character},
   | definition separator symbol                           special sequence symbol;
   | end comment symbol                                (* see 4.20 *) special sequence character
   | end group symbol                                    = terminal character - special sequence symbol;
   | end option symbol                                 (* see 6.7 *) comment symbol
   | end repeat symbol                                   = bracketed textual comment
   | except symbol                                       | other character
   | first quote symbol                                  | commentless symbol;
   | repetition symbol                                 (* see 6.8 *) bracketed textual comment
   | second quote symbol                                 = start comment symbol, {comment symbol},
   | special sequence symbol                               end comment symbol;
   | start comment symbol                              (* see 6.9 *) syntax
   | start group symbol                                  = {bracketed textual comment},
   | start option symbol                                   commentless symbol,
   | start repeat symbol                                   {bracketed textual comment},
   | terminator symbol                                     {commentless symbol,
   | other character;                                        {bracketed textual comment}};
(* see 6.3 *) gap free symbol
   = terminal character                                (*
     - (first quote symbol | second quote symbol)           The final part of the syntax defines the
   | terminal string;                                       abstract syntax of Extended BNF, i.e. the
(* see 4.16 *) terminal string                              structure in terms of the commentless symbols.
   = first quote symbol, first terminal character,     *)
     {first terminal character},
     first quote symbol                                (* see 4.2 *) syntax
   | second quote symbol, second terminal character,     = syntax rule, {syntax rule};
     {second terminal character},                      (* see 4.3 *) syntax rule
     second quote symbol;