@q Copyright 2012-2022 Alexander Shibakov@>
@q Copyright 2002-2014 Free Software Foundation, Inc.@>
@q This file is part of SPLinT@>
@q SPLinT is free software: you can redistribute it and/or modify@>
@q it under the terms of the GNU General Public License as published by@>
@q the Free Software Foundation, either version 3 of the License, or@>
@q (at your option) any later version.@>
@q SPLinT is distributed in the hope that it will be useful,@>
@q but WITHOUT ANY WARRANTY; without even the implied warranty of@>
@q MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the@>
@q GNU General Public License for more details.@>
@q You should have received a copy of the GNU General Public License@>
@q along with SPLinT. If not, see .@>
@** The parser.
\ifbootstrapmode
\def\tokendeffile{ldp.tok}%
\input ldman.sty
\def\bstrapparser{dyytab.tex}% only the full or preamble parser would know how to
% parse grammar sections that mix \prodstyle{\%token}
% declarations with other \bison\ syntax
\def\bstraptokens{bo.tok}% token equivalences for the full lexer
\modebootstrap
\def\MRI{}
\def\ld{}
\fi
\immediate\openout\exampletable=\jobname.exl\relax%
The outline of the grammar file below does not reveal anything unusual
in the general layout of \ld\ grammar. The first section lists all the
token definitions, \prodstyle{\%union} styles, and some \Cee\
code. The original comments that come with the grammar file of the
linker have been mostly left intact. They are typeset in {\it
italics\/} to make them easy to recognize.
@s TeX_ TeX
@s TeXa TeX
@s TeXb TeX
@s TeXf TeX
@s TeXfo TeX
@s TeXao TeX
@(ldp.yy@>=
@G Switch to generic mode.
%{@> @<\ld\ parser \Cee\ preamble@> @=%}
@> @<\ld\ parser \bison\ options@> @=
%union {@> @ @=}
%{@> @<\ld\ parser \Cee\ postamble@> @=%}
@> @ @=
%%
@> @<\ld\ parser productions@> @=
%%
@g
@ Among the options listed in this section, \prodstyle{\%token-table}
is the most critical for the proper operation of the parser and must be enabled
to supply the token information to the lexer (the traditional way
of passing this information along is to use a \Cee\ header file with
the appropriate definitions). The start symbol does not have to be
given explicitly and can be indicated by listing the appropriate rules
at the beginning.
Most other sections of the grammar file, with the exception of the
rules are either empty or hold placeholder values. The functionality
provided by the code in these sections in the case of a \Cee\ parser
is supplied by the \TeX\ macros in \.{ldman.sty}.
@<\ld\ parser \bison\ options@>=
@G
%token-table
%debug
%start script_file
@g
@ @<\ld\ parser \Cee\ preamble@>=
@ @=
@ @<\ld\ parser \Cee\ postamble@>=
@ @<\ld\ parser productions@>=
@<\GNU\ \ld\ script rules@>@;
@@;
@ The tokens are declared first. This section is also used to supply
numerical token values to the lexer by the original parser, as well as
the bootstrapping phase of the typesetting parser. Unlike the native (\Cee) parser for
\ld\ the typesetting parser has no need for the type of each token
(rather, the type consistency is based on the weak dynamic type system
coded in \.{yyunion.sty} and \.{ldunion.sy}). Thus all the tokens used
by the \ld\ parser are put in a single list.
@=
@G
%token INT
%token NAME LNAME
%token END
%token ALIGN_K BLOCK BIND QUAD SQUAD LONG SHORT BYTE
%token SECTIONS PHDRS INSERT_K AFTER BEFORE
%token DATA_SEGMENT_ALIGN DATA_SEGMENT_RELRO_END DATA_SEGMENT_END
%token SORT_BY_NAME SORT_BY_ALIGNMENT SORT_NONE
%token SORT_BY_INIT_PRIORITY
%token '{' '}'
%token SIZEOF_HEADERS OUTPUT_FORMAT FORCE_COMMON_ALLOCATION OUTPUT_ARCH
%token INHIBIT_COMMON_ALLOCATION
%token SEGMENT_START
%token INCLUDE
%token MEMORY
%token REGION_ALIAS
%token LD_FEATURE
%token NOLOAD DSECT COPY INFO OVERLAY
%token DEFINED TARGET_K SEARCH_DIR MAP ENTRY
%token NEXT
%token SIZEOF ALIGNOF ADDR LOADADDR MAX_K MIN_K
%token STARTUP HLL SYSLIB FLOAT NOFLOAT NOCROSSREFS
%token ORIGIN FILL
%token LENGTH CREATE_OBJECT_SYMBOLS INPUT GROUP OUTPUT CONSTRUCTORS
%token ALIGNMOD AT SUBALIGN HIDDEN PROVIDE PROVIDE_HIDDEN AS_NEEDED
%token CHIP LIST SECT ABSOLUTE LOAD NEWLINE ENDWORD ORDER NAMEWORD ASSERT_K
%token LOG2CEIL FORMAT PUBLIC DEFSYMEND BASE ALIAS TRUNCATE REL
%token INPUT_SCRIPT INPUT_MRI_SCRIPT INPUT_DEFSYM CASE EXTERN START
%token VERS_TAG VERS_IDENTIFIER
%token GLOBAL LOCAL VERSIONK INPUT_VERSION_SCRIPT
%token KEEP ONLY_IF_RO ONLY_IF_RW SPECIAL INPUT_SECTION_FLAGS ALIGN_WITH_INPUT
%token EXCLUDE_FILE
%token CONSTANT
%token INPUT_DYNAMIC_LIST
%right PLUSEQ MINUSEQ MULTEQ DIVEQ '=' LSHIFTEQ RSHIFTEQ ANDEQ OREQ '?' ':' UNARY
%left OROR ANDAND '|' '^' '&' EQ NE '<' '>' LE GE LSHIFT RSHIFT '+' '-' '*' '/' '%' '('
@g
@*1 Grammar rules, an overview.
The first natural step in transforming an existing parser into a
`parser stack' for pretty printing is to understand the `anatomy' of
the grammar. Not every grammar is suitable for such a transformation
and in almost every case, some modifications are needed. The
parser and lexer implementation for \ld\ is not terrible although it does
have some idiosynchrasies that could have been eliminated by a careful
grammar redesign. Instead of invasive rewriting of significant
portions of the grammar, the approach taken here merely omits some
rules and partitions the grammar into several subsets, each of which
is supposed to handle a well defined logical section of an \ld\ script
file.
One example of a trick used by the \ld\ parser that is not appropriate for a
pretty printing grammar is the way the original parser handles the choice of the
format of an input file. After a command line option that selects the
input format has been read (or the format has been determined using
some other method), the first token output by the lexer branches the
parser to the appropriate portion of the full grammar.
Since the token never appears as part of the input file there is no
need to include this part of the main grammar for the purposes of
typesetting.
\traceparserstatestrue
\tracestackstrue
\tracerulestrue
\traceactionstrue
\tracelookaheadtrue
\traceparseresultstrue
\tracebadcharstrue
\yyflexdebugtrue
%\checktabletrue
%
\traceparserstatesfalse
\tracestacksfalse
\tracerulesfalse
\traceactionsfalse
\tracelookaheadfalse
\traceparseresultsfalse
\tracebadcharsfalse
\yyflexdebugfalse
\checktablefalse
%
\saveparseoutputfalse
@=
@G
file:
INPUT_SCRIPT script_file
| INPUT_MRI_SCRIPT mri_script_file
| INPUT_VERSION_SCRIPT version_script_file
| INPUT_DYNAMIC_LIST dynamic_list_file
| INPUT_DEFSYM defsym_expr
;
@g
@ @=
@G
@t}\vb{\inline\flatten}{@>
filename:
NAME {@>@[TeX_( "/yy0{/noexpand/ldfilename{/the/yy(1)}}" );@]@=}
;
@g
@ The simplest parser subset is intended to parse symbol definitions
given in the command line that invokes the linker. Creating a parser
for it involves almost no extra effort so we leave it in.
Note that the simpliciy is somewhat deceptive as the syntax of
\prodstyle{exp} is rather complex. That part of the grammar is needed
elsewhere, however, so symbol definitions parsing costs almost nothing
on top of the already required effort. The only practical use for this
part of the \ld\ grammar is presenting examples in text.
The\namedspot{pingpong} \TeX\ macro \.{\\ldlex@@defsym} switches the lexer state to
\.{DEFSYMEXP} (see \locallink{stateswitchers}all the state switching
macros\endlink\ in the chapter about the lexer implementation
below). Switching lexer states from the parser presents some
difficulties which can be overcome by careful design. For example, the
state switching macros can be invoked before the lexer is called and
initialized (when the parser performs a {\it default action\/}).
@=
@G
defsym_expr:
{@>@[TeX_( "/ldlex@@defsym" );@]@=}
NAME '=' exp {@>@[TeX_( "/ldlex@@popstate" );@]@=}
;
@g
@ {\it Syntax within an \MRI\ script file}\footnote{As explained at the
beginning of this chapter, the text in {\it italics\/} was taken from
the original comments by the \ld\ parser and lexer programmers.}. The parser for typesetting
is only intended to process \GNU\ \ld\ scripts and does not concern
itself with any additional compatibility modes. For this reason, all
support for \MRI\ style scripts has been omitted. One use for the
section below is a small demonstration of the formatting tools that
change the output of the \bison\ parser.
%\checktabletrue
%\tracingliststrue
@<\MRI\ style script rules@>=
@G
@t}\vb{\inline\flatten}{@>
mri_script_file:
{@>@[TeX_( "/ldlex@@mri@@script" );@]@=}
mri_script_lines {@>@[TeX_( "/ldlex@@popstate" );@]@=}
;
mri_script_lines:
mri_script_lines mri_script_command NEWLINE
|
;
@t}\vb{\resetf}{@>
mri_script_command:
CHIP exp
| CHIP exp ',' exp
| NAME {}
| LIST {}
| ORDER ordernamelist {}
| ENDWORD {}
@t}\vb{\flatten}{@>
| PUBLIC NAME '=' exp {}
| PUBLIC NAME ',' exp {}
| PUBLIC NAME exp
@t}\vb{\resetf}{@>
{}
| FORMAT NAME {}
@t}\vb{\flatten}{@>
| SECT NAME ',' exp {}
| SECT NAME exp {}
| SECT NAME '=' exp
@t}\vb{\resetf}{@>
{}
@t}\vb{\flatten}{@>
| ALIGN_K NAME '=' exp {}
| ALIGN_K NAME ',' exp
@t}\vb{\resetf}{@>
{}
@t}\vb{\flatten}{@>
| ALIGNMOD NAME '=' exp {}
| ALIGNMOD NAME ',' exp {}
@t}\vb{\resetf}{@>
{}
| ABSOLUTE mri_abs_name_list
| LOAD mri_load_name_list
| NAMEWORD NAME {}
@t}\vb{\flatten}{@>
| ALIAS NAME ',' NAME {}
| ALIAS NAME ',' INT {}
@t}\vb{\resetf}{@>
{}
| BASE exp {}
| TRUNCATE INT {}
| CASE casesymlist
| EXTERN extern_name_list
@t}\vb{\flatten}{@>
| INCLUDE filename {@>@@=}
mri_script_lines END
@t}\vb{\resetf}{@>
{@>@@=}
| START NAME {}
|
;
@t}\vb{\inline\flatten}{@>
ordernamelist:
ordernamelist ',' NAME {}
| ordernamelist NAME {}
|
;
mri_load_name_list:
NAME {}
| mri_load_name_list ',' NAME {}
;
mri_abs_name_list:
NAME {}
| mri_abs_name_list ',' NAME {}
;
casesymlist:
{}
| NAME
| casesymlist ',' NAME
;
@g
@ {\it Parsed as expressions so that commas separate entries.} The
core of the parser consists of productions describing \GNU\ \ld\ linker
scripts. The first rule is common to both \MRI\ and \GNU\ formats.
\checktablefalse
\tracinglistsfalse
@<\GNU\ \ld\ script rules@>=
@G
extern_name_list:
{@>@[TeX_( "/ldlex@@expression" );@]@=}
extern_name_list_body {@>@[TeX_( "/ldlex@@popstate" );@]@=}
;
extern_name_list_body:
NAME {}
| extern_name_list_body NAME {}
| extern_name_list_body ',' NAME {}
;
@ The top level productions simply define a script file as a list of
script commands.
@<\GNU\ \ld\ script rules@>=
@G
script_file:
{@>@[TeX_( "/ldlex@@both" );@]@=}
ifile_list {@>@[TeX_( "/getfifth{/yy(2)}/to/ldcmds/ldlex@@popstate" );@]@=}
;
ifile_list:
ifile_list ifile_p1 {@>@@=}
| {@>@[TeX_( "/yy0{/nx/ldinsertcweb{}{}{}{}}" );@]@=}
;
@g
@ @=
@[TeX_( "/getsecond{/yy(1)}/to/toksa/getthird{/yy(1)}/to/toksb" );@]@;
@[TeX_( "/getfourth{/yy(1)}/to/toksc/getfifth{/yy(1)}/to/toksd" );@]@;
@[TeX_( "/getsecond{/yy(2)}/to/tokse/getthird{/yy(2)}/to/toksf" );@]@;
@[TeX_( "/getfourth{/yy(2)}/to/toksg/getfifth{/yy(2)}/to/toksh" );@]@;
@[TeXb( "/yytoksempty{/toksh}{/yy0{/the/yy(1)}}{/yytoksempty{/toksd}{/yy0{/the/yy(2)}}" );@]@;
@[TeXf( " {/yy0{/nx/ldinsertcweb{/the/toksa}{/the/toksb}{/the/toksg}{/the/toksd" );@]@;
@[TeXfo( " /nx/ldcommandseparator{/the/tokse}{/the/toksf}{/the/toksc}{/the/toksg}/the/toksh}}}}" );@]@;
@*1 Script internals.
There are a number of different commands. For typesetting purposes,
the handling of most of these can be significantly
simplified. In the \prodstyle{GROUP} command there is no need to
perform any actions upon entering the group, for
instance. \prodstyle{INCLUDE} presents a special challenge. In
the original grammar this command is followed by a general list of script
commands (the contents of the included file) terminated by
\prodstyle{END}. The `magic' of opening the file and inserting its
contents into the stream being parsed is performed by the lexer and
the parser in the
background. The typesetting parser, on the other hand, only has to
typeset the \prodstyle{INCLUDE} command itself and has no need for
opening and parsing the file being included. We can simply change the
grammar rule to omit the follow up script commands but that would
require altering the existing grammar. \namedspot{pretendbuffersw}Since the command list
(\prodstyle{ifile\_list}) is allowed to be empty,
we simply \locallink{pretendbufferswlex}fake\endlink\ the
inclusion of the file in the lexer by immediately outputting
\prodstyle{END} upon entering the appropriate lexer state. One
advantage in using this approach is the ability, when desired, to
examine the included file for possible cross-referencing information.
Each command is packaged with a qualifier that records its type for
the rule that adds the fragment to the script file.
@<\GNU\ \ld\ script rules@>=
@G
ifile_p1:
memory {@>@@=}
| sections {@>@@=}
| phdrs
| startup
| high_level_library
| low_level_library
| floating_point_support
| statement_anywhere {@>@@=}
| version
| ';' {@>@[TeX_( "/yy0{/nx/ldinsertcweb/the/yy(1){none}{}}" );@]@=}
| TARGET_K '(' NAME ')' {}
| SEARCH_DIR '(' filename ')' {}
| OUTPUT '(' filename ')' {}
| OUTPUT_FORMAT '(' NAME ')' {}
| OUTPUT_FORMAT '(' NAME ','
NAME ',' NAME ')' {}
| OUTPUT_ARCH '(' NAME ')' {}
| FORCE_COMMON_ALLOCATION {}
| INHIBIT_COMMON_ALLOCATION {}
| INPUT '(' input_list ')'
| GROUP {}
'(' input_list ')' {}
| MAP '(' filename ')' {}
| INCLUDE filename {@>@@=}
ifile_list END {@>@@=}
| NOCROSSREFS '('
nocrossref_list ')' {}
| EXTERN '(' extern_name_list ')'
| INSERT_K AFTER NAME {}
| INSERT_K BEFORE NAME {}
| REGION_ALIAS '(' NAME ',' NAME ')' {}
| LD_FEATURE '(' NAME ')' {}
;
input_list:
NAME {}
| input_list ',' NAME {}
| input_list NAME {}
| LNAME {}
| input_list ',' LNAME {}
| input_list LNAME {}
| AS_NEEDED '(' {}
input_list ')' {}
| input_list ',' AS_NEEDED '(' {}
input_list ')' {}
| input_list AS_NEEDED '(' {}
input_list ')' {}
;
sections:
SECTIONS '{' sec_or_group_p1 '}' {@>@