To be acceptable by Wisent a context-free grammar must respect a particular format. That is, must be represented as an Emacs Lisp list of the form:
(terminals assocs . non-terminals)
Is the list of terminal symbols used in the grammar.
Specify the associativity of terminals. It is nil
when
there is no associativity defined, or an alist of
(assoc-type . assoc-value)
elements.
assoc-type must be one of the default-prec
,
nonassoc
, left
or right
symbols. When
assoc-type is default-prec
, assoc-value must be
nil
or t
(the default). Otherwise it is a list of
tokens which must have been previously declared in terminals.
For details, see (bison)Contextual Precedence, in the Bison manual.
Is the list of nonterminal definitions. Each definition has the form:
(nonterm . rules)
Where nonterm is the nonterminal symbol defined and rules the list of rules that describe this nonterminal. Each rule is a list:
(components [precedence] [action])
Where:
Is a list of various terminals and nonterminals that are put together by this rule.
For example,
(exp ((exp ?+ exp)) ;; exp: exp '+' exp ) ;; ;
Says that two groupings of type ‘exp’, with a ‘+’ token in between, can be combined into a larger grouping of type ‘exp’.
By convention, a nonterminal symbol should be in lower case, such as
‘exp’, ‘stmt’ or ‘declaration’. Terminal symbols
should be upper case to distinguish them from nonterminals: for
example, ‘INTEGER’, ‘IDENTIFIER’, ‘IF’ or
‘RETURN’. A terminal symbol that represents a particular keyword
in the language is conventionally the same as that keyword converted
to upper case. The terminal symbol error
is reserved for error
recovery.
Scattered among the components can be middle-rule actions. Usually only action is provided (see action).
If components in a rule is nil
, it means that the rule
can match the empty string. For example, here is how to define a
comma-separated sequence of zero or more ‘exp’ groupings:
(expseq (nil) ;; expseq: ;; empty ((expseq1)) ;; | expseq1 ) ;; ; (expseq1 ((exp)) ;; expseq1: exp ((expseq1 ?, exp)) ;; | expseq1 ',' exp ) ;; ;
Assign the rule the precedence of the given terminal item, overriding the precedence that would be deduced for it, that is the one of the last terminal in it. Notice that only terminals declared in assocs have a precedence level. The altered rule precedence then affects how conflicts involving that rule are resolved.
precedence is an optional vector of one terminal item.
Here is how precedence solves the problem of unary minus.
First, declare a precedence for a fictitious terminal symbol named
UMINUS
. There are no tokens of this type, but the symbol
serves to stand for its precedence:
… ((default-prec t) ;; This is the default (left '+' '-') (left '*') (left UMINUS))
Now the precedence of UMINUS
can be used in specific rules:
(exp … ;; exp: … ((exp ?- exp)) ;; | exp '-' exp … ;; … ((?- exp) [UMINUS]) ;; | '-' exp %prec UMINUS … ;; … ) ;; ;
If you forget to append [UMINUS]
to the rule for unary minus,
Wisent silently assumes that minus has its usual precedence. This
kind of problem can be tricky to debug, since one typically discovers
the mistake only by testing the code.
Using (default-prec nil)
declaration makes it easier to
discover this kind of problem systematically. It causes rules that
lack a precedence modifier to have no precedence, even if the
last terminal symbol mentioned in their components has a declared
precedence.
If (default-prec nil)
is in effect, you must specify
precedence for all rules that participate in precedence conflict
resolution. Then you will see any shift/reduce conflict until you
tell Wisent how to resolve it, either by changing your grammar or by
adding an explicit precedence. This will probably add declarations to
the grammar, but it helps to protect against incorrect rule
precedences.
The effect of (default-prec nil)
can be reversed by giving
(default-prec t)
, which is the default.
For more details, see (bison)Contextual Precedence, in the Bison manual.
It is important to understand that assocs declarations defines
associativity but also assign a precedence level to terminals. All
terminals declared in the same left
, right
or
nonassoc
association get the same precedence level. The
precedence level is increased at each new association.
On the other hand, precedence explicitly assign the precedence level of the given terminal to a rule.
An action is an optional Emacs Lisp function call, like this:
(identity $1)
The result of an action determines the semantic value of a rule.
From an implementation standpoint, the function call will be embedded in a lambda expression, and several useful local variables will be defined:
$n
Where n is a positive integer. Like in Bison, the value of
$n
is the semantic value of the nth element of
components, starting from 1. It can be of any Lisp data
type.
$regionN
Where n is a positive integer. For each $n
variable defined there is a corresponding $regionn
variable. Its value is a pair (start-pos .
end-pos)
that represent the start and end positions (in the
lexical input stream) of the $n
value. It can be
nil
when the component positions are not available, like for an
empty string component for example.
$region
Its value is the leftmost and rightmost positions of input data
matched by all components in the rule. This is a pair
(leftmost-pos . rightmost-pos)
. It can be
nil
when components positions are not available.
$nterm
This variable is initialized with the nonterminal symbol (nonterm) the rule belongs to. It could be useful to improve error reporting or debugging. It is also used to automatically provide incremental re-parse entry points for Semantic tags (see How to use Wisent with Semantic).
$action
The value of $action
is the symbolic name of the current
semantic action (see Debugging semantic actions).
When an action is not specified a default value is supplied, it is
(identity $1)
. This means that the default semantic value of a
rule is the value of its first component. Excepted for a rule
matching the empty string, for which the default action is to return
nil
.