@q Copyright 2012-2024, Alexander Shibakov@>
@q This file is part of SPLinT@>
@q SPLinT is free software: you can redistribute it and/or modify@>
@q it under the terms of the GNU General Public License as published by@>
@q the Free Software Foundation, either version 3 of the License, or@>
@q (at your option) any later version.@>
@q SPLinT is distributed in the hope that it will be useful,@>
@q but WITHOUT ANY WARRANTY; without even the implied warranty of@>
@q MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the@>
@q GNU General Public License for more details.@>
@q You should have received a copy of the GNU General Public License@>
@q along with SPLinT. If not, see .@>
% The scheme for extracting token equivalences below does not use a
% bootstrap parser, which would be easier.
% To use a different parser (the `prologue' parser, \.{dyytab.tex} in
% this case), some extra steps have to be inserted in
% \.{yybootstrap.sty}. First, the token equivalence table for the `main'
% parser (rather, for the `main' scanner) had to be loaded
% (\.{yybootstrap.sty} usually relies on the tokens that are
% `hard-coded' with the bootstrap parser). Second, it was necessary to
% define \.{\\let\\yylexreturn\\yylexregular} to use the scanner. One
% advantage of using a different parser is the ability to intermix token
% definitions with grammar productions (the bootstrap mode macros in
% \.{\\yyunion} will simply ignore the extra definitions).
% Note also, that the `grammar rule' parser cannot be used in this
% case since the token definitions as they are used in this file fit
% the `prologue' parser syntax only (there are no semicolons at the
% end of the definitions). A more elaborate scheme (similar to how the
% typesetting of rules is set up) using several parsers can be used
% instead.
\input limbo.sty
\def\optimization{5}
\newread\testeof
\immediate\openin\testeof=\jobname.tok
\ifeof\testeof % make the local token equivalence table
\let\nx\noexpand
\edef\tokendeffile{\jobname.tok} % where to put the token equivalence table
\def\bstrapparser{dyytab.tex}
\def\bstraptokens{bo.tok}% use token equivalence table to set the values of non-string tokens
% this has to be added if a non-bootstrap parser is used to
% extract token information (see the comments above)
\def\bootstraplexersetup{%
\let\yylexreturn\yylexreturnregular
\bootstrapmodetrue
}
\toks0{%
\let\fin\finmod % this is necessary since the original modifies \output
% in a way that conflicts with the scheme in dcols.sty
\input trt1.sty % \TeX\ `runtime': temporary register definitions
\input yycommon.sty % general routines for stack and array access
\input yymisc.sty % helper macros (stack manipulation, table processing, value stack pointers)
% parser initialization, optimization
\input yyinput.sty % input functions
\input yyparse.sty % parser machinery
\input flex.sty % lexer functions
\input yyfaststack.sty
\input yystype.sty % scanner auxiliary types and functions
\input yyunion.sty % parser data structures
% the main parser
\let\parsernamespace\empty
% create token equivalence table (making, say, \tokenID the same as \csname token"identifier"\endcsname)
\input yybootstrap.sty
\input yytexlex.sty
\expandafter\def %/* adjust the \.{\\yyinput} to recognize \.{\\yyendgame} */
\expandafter\multicharswitch\expandafter
{\multicharswitch\yyendgame{\yyinput\yyeof\yyeof\endparseinput\removefinalvb}}%
}
\else
\toks0{%
\input yy.sty
\modenormal
\let\currentparsernamespace\parsernamespace
\def\parsernamespace{[xxdisplay]}% for \pretty... commands to works
\def\hostparsernamespace{[xxdisplay]}% for the \nameproc macro
\input xtoks.sty
\let\parsernamespace\currentparsernamespace % does not really matter
% the \hostparsernamespace stays `[xxdisplay]' which should cause the
% \nameproc macro to correct the typesetting of terminals accordingly
}
\fi
\immediate\closein\testeof
\the\toks0
\input dcols.sty
\initauxstream
@**Parser file.
This is an enhanced parser for expressions. It takes
advantage of the `symbolic term name' mechanism and extends the basic
expression syntax.
The top-level structure of the input file is an exact copy of the one
for the expression parser.
@s TeX_ TeX
@(xxpp.yy@>=
@G Switch to generic mode.
%{@> @ @=%}
@> @ @=
%union {@> @ @=}
%{@> @ @=%}
@> @ @=
%%
@> @ @=
%%
@g
@ The following is reproduced from the simple expression example.
The \prodstyle{\%token-table} option is not merely a debugging help,
as it is in the case of the `real' \bison\ parsers and cannot be
omitted . The name table it is responsible for setting up is used as
a set of keys for various associative arrays. Token declarations are
parsed by a bootstrap parser during the \TeX\ processing stage to
establish equivalences between the names kept in |yytname| and the
macro names used internally by the parsers built by \bison. The reason
this is necessary is not very complicated: either version of the token
name can be used in the grammar while the `driver' program
(\.{mkeparser.c}) only has access to the names in |yytname|. In
general, this is important whenever the grammar uses a different set of
token names from the lexer or when diagnostics messages are output. An
important case is the symbolic name switch: before the rules can be
listed to create the switch, the token numerical values must be
known. If the parser is only aware of the |yytname| listed names and the
grammar being parsed uses the `internal' names, the listing macros
will fail. The array, |yytname| is used in a few functions inside the
`driver', as well, so omitting this option would make building the
parser impossible.
@=
@G
%token-table
%debug
%start value
@g
@ To continue the token name discussion, this parser uses internal
names only but the |yytname| array contains a string equivalent of
\prodstyle{IDENTIFIER}. Thus, bootstrapping is necessary\footnote{This
was done as a demonstration; changing the definition of
\prodstyle{IDENTIFIER} would easily remove this requirement.}. The beginning
of this file contains a simple scheme for producing a token
equivalence table.
The typesetting of the tokens can be adjusted using \.{\\prettywordpair}
macros (see the included \.{xtoks.sty} file for examples and the way
\prodstyle{IDENTIFIER} is typeset).
@=
@G
%token IDENTIFIER "identifier"
%token INTEGER
@g
@ Here is the whole grammar, simply additive expressions with two
levels of precedence. We have added `divide' and `subtract' operations.
The use of \prodstyle{IDENTIFIER} instead of \.{"identifier"} below
necessitates `harvesting' of token equivalences in \.{xxpression.tok}
at the beginning of this file.
\showlastactiontrue
\input yynested.sty
@=
@G
value:
expression[exp] {@> TeX_( "/yy0{/the/yy]exp[}" ); @=}
;
expression:
term {@> TeX_( "/yy0{/the/yy]term[}" ); @=}
| expression[exp] add_op term {@> @ @=}
;
term:
atom {@> TeX_( "/yy0{/the/yy]atom[}" ); @=}
| term mult_op atom {@> @ @=}
;
@t}\vb{\inline\flatten}{@>
mult_op:
'*' {@> TeX_( "/yy0{/multiply}" ); @=}
| '/' {@> TeX_( "/yy0{/divide}" ); @=}
;
add_op:
'+' {@> TeX_( "/yy0{}" ); @=}
| '-' {@> TeX_( "/yy0{-}" ); @=}
;
@t}\vb{\resetf}{@>
atom:
@t}\vb{\inputboundary{\boundarylower}}{@>
IDENTIFIER[id] {@> @ @=}
| INTEGER[int] {@> @ @=}
| '(' expression[exp] ')' {@> TeX_( "/yy0{/the/yy]exp[}" ); @=}
;
@t}\vb{\inputboundary{\boundaryupper}}{@>
@g
@ @=
@[TeX_( "/tempca/the/yy]exp[/relax" );@]@;
@[TeX_( "/tempcb/the/yy]term[/relax" );@]@;
@[TeX_( "/advance/tempca by /the/yy]add_op[/tempcb" );@]@;
@[TeX_( "/yy0{/the/tempca}" );@]@;
@ @=
@[TeX_( "/tempca/the/yy]term[/relax" );@]@;
@[TeX_( "/tempcb/the/yy]atom[/relax" );@]@;
@[TeX_( "/the/yy]mult_op[/tempca by /tempcb" );@]@;
@[TeX_( "/yy0{/the/tempca}" );@]@;
@ @=
@[TeX_( "/getsecond{/yy]id[}/to/toksa" );@]@;
@[TeX_( "/toksb/expandafter/expandafter/expandafter{/expandafter" );@]@;
@[TeX_( " /number/csname/the/toksa/endcsname}" );@]@;
@[TeX_( "/yy0{/the/toksb}" );@]@;
@ @=
@[TeX_( "/getfirst{/yy]int[}/to/toksa" );@]@;
@[TeX_( "/yy0{/the/toksa}" );@]@;
@ \Cee\ preamble. In this case, there are no `real' actions that our
grammar performs, only \TeX\ output, so this section is empty.
@=
@ \Cee\ postamble. It is tricky to insert function definitions that use \bison's internal types,
as they have to be inserted in a place that is aware of the internal definitions but before said
definitions are used.
@=
@ Union of types. Empty as well.
@=
@**The lexer file. The scanner for the grammar above is the same as
for a regular expression parser. Identifiers are interpreted as
variable names that expand to appropriate values.
%\checktabletrue
@(xxpl.ll@>=
@G
@> @@=
%{@> @ @=%}
@> @ @=
%%
@> @ @=
%%
@g
@ @=
@G(fs1)
letter [_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]
id {letter}({letter}|[-0-9])*
int [0-9]+
@g
@ @=
#include
#include
void define_all_states( void ){}
@ @=
@G(fs1)
%option bison-bridge
%option noyywrap nounput noinput reentrant
%option noyy_top_state
%option debug
%option stack
%option outfile="xxpl.c"
@g
@ @=
@@;
@@;
@ White space skipping.
\traceparserstatestrue
\tracestackstrue
\tracerulestrue
\traceactionstrue
\tracelookaheadtrue
\traceparseresultstrue
\tracebadcharstrue
\yyflexdebugtrue
%
\traceparserstatesfalse
\tracestacksfalse
\tracerulesfalse
\traceactionsfalse
\tracelookaheadfalse
\traceparseresultsfalse
\tracebadcharsfalse
\yyflexdebugfalse
@=
@G(fs2)
[ \f\n\t\v] {@> @[TeX_( "/yylexnext" );@]@=}
@g
@ @=
@G(fs2)
{id} {@> @[TeX_( "/yylexreturnval{IDENTIFIER}" );@]@=}
{int} {@> @[TeX_( "/yylexreturnval{INTEGER}" );@]@=}
[-+*/()] {@> @[TeX_( "/yylexreturnchar" );@]@=}
. {@> @[@@]@=}
@g
@ @=
@[TeX_( "/iftracebadchars" );@]@;
@[TeX_( " /yycomplain{invalid character(s): /the/yytext}" );@]@;
@[TeX_( "/fi" );@]@;
@[TeX_( "/yyerrterminate" );@]@;
@**Generating symbols. This is the routine that creates symbolic name
assignments for the grammar. The internal mechanics of creating such
assignments is inside \.{xymmap.sty} which should be consulted if
any adjustments are needed.
@(xymbols.txx@>=
@G
\def\optimization{5} % this can be omitted
\input cwebmac.tex
\input limbo.sty
\input yy.sty
\modenormal
\input xymmap.sty
\end
@g
@**Test file. The test file includes a handy list of debugging options
that can be activated to see the inner workings of the parser and
scanner routines.
@(test.txx@>=
@G
\chardef\other=12 % needed for some macros to work
\input xxpression.sty
\iftrue
\tracedfatrue
\traceparserstatestrue
\tracestackstrue
\tracerulestrue
\traceactionstrue
\tracelookaheadtrue
\traceparseresultstrue
\tracebadcharstrue
\yyflexdebugtrue
\yyinputdebugtrue
\traceactioncodetrue
\fi
\newread\ssw
\immediate\openin\ssw = xymbols.sns
\ifeof\ssw
\else
\immediate\closein\ssw
\input xymbols.sns
\let\yysymswitch\symswitch
\let\yysymcleanup\symswitchoff
\fi
\def\varone{10}
\def\expression{1 + 3 * ( 5 + 7 ) + varone - 10}
\basicparserinit\expandafter\yyparse \expression \yyeof\yyeof\endparseinput\endparse
{
\newlinechar`^^J
\immediate\write16{^^Jexpression: \expression^^Jthe value: \the\yyval^^J^^J}
}
\bye
@g
@q Include the list of index section markers; this is a hack to get around @>
@q the lack of control over the generation of \CWEB's index; the correct order @>
@q of index entries depends on the placement of this inclusion @>
@i alphas.hx
@**Index.\global\let\secrangedisplay\empty% do not show the current section range anymore
\global\topskip=9pt
\def\Tex{\TeX\ output}
\def\TeXx{\TeX\ output}