doc/misc/bovine.texi


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476

\input texinfo  @c -*-texinfo-*-
@c %**start of header
@setfilename ../../info/bovine.info
@set TITLE  Bovine parser development
@set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
@settitle @value{TITLE}
@include docstyle.texi

@c *************************************************************************
@c @ Header
@c *************************************************************************

@c Merge all indexes into a single index for now.
@c We can always separate them later into two or more as needed.
@syncodeindex vr cp
@syncodeindex fn cp
@syncodeindex ky cp
@syncodeindex pg cp
@syncodeindex tp cp

@c @footnotestyle separate
@c @paragraphindent 2
@c @@smallbook
@c %**end of header

@copying
Copyright @copyright{} 1999--2004, 2012--2021 Free Software Foundation,
Inc.

@quotation
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with the Front-Cover Texts being ``A GNU Manual,''
and with the Back-Cover Texts as in (a) below.  A copy of the license
is included in the section entitled ``GNU Free Documentation License''.

(a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
modify this GNU manual.''
@end quotation
@end copying

@dircategory Emacs misc features
@direntry
* Bovine: (bovine).             Semantic bovine parser development.
@end direntry

@iftex
@finalout
@end iftex

@c @setchapternewpage odd
@c @setchapternewpage off

@titlepage
@sp 10
@title @value{TITLE}
@author by @value{AUTHOR}
@page
@vskip 0pt plus 1 fill
@insertcopying
@end titlepage
@page

@macro semantic{}
@i{Semantic}
@end macro

@c *************************************************************************
@c @ Document
@c *************************************************************************
@contents

@node top
@top @value{TITLE}

The @dfn{bovine} parser is the original @semantic{} parser, and is an
implementation of an @acronym{LL} parser.  It is good for simple
languages.  It has many conveniences making grammar writing easy.  The
conveniences make it less powerful than a Bison-like @acronym{LALR}
parser.  For more information, @inforef{Top, The Wisent Parser Manual,
wisent}.

Bovine @acronym{LL} grammars are stored in files with a @file{.by}
extension.  When compiled, the contents is converted into a file of
the form @file{NAME-by.el}.  This, in turn is byte compiled.
@inforef{top, Grammar Framework Manual, grammar-fw}.

@ifnottex
@insertcopying
@end ifnottex

@menu
* Starting Rules::              The starting rules for the grammar.
* Bovine Grammar Rules::        Rules used to parse a language.
* Optional Lambda Expression::  Actions to take when a rule is matched.
* Bovine Examples::             Simple Samples.
* GNU Free Documentation License::  The license for this documentation.
@c * Index::
@end menu

@node Starting Rules
@chapter Starting Rules

In Bison, one and only one nonterminal is designated as the ``start''
symbol.  In @semantic{}, one or more nonterminals can be designated as
the ``start'' symbol.  They are declared following the @code{%start}
keyword separated by spaces.  @inforef{start Decl, ,grammar-fw}.

If no @code{%start} keyword is used in a grammar, then the very first
is used.  Internally the first start nonterminal is targeted by the
reserved symbol @code{bovine-toplevel}, so it can be found by the
parser harness.

To find locally defined variables, the local context handler needs to
parse the body of functional code.  The @code{scopestart} declaration
specifies the name of a nonterminal used as the goal to parse a local
context, @inforef{scopestart Decl, ,grammar-fw}.  Internally the
scopestart nonterminal is targeted by the reserved symbol
@code{bovine-inner-scope}, so it can be found by the parser harness.

@node Bovine Grammar Rules
@chapter Bovine Grammar Rules

The rules are what allow the compiler to create tags from a language
file.  Once the setup is done in the prologue, you can start writing
rules.  @inforef{Grammar Rules, ,grammar-fw}.

@example
@var{result} : @var{components1} @var{optional-semantic-action1})
       | @var{components2} @var{optional-semantic-action2}
       ;
@end example

@var{result} is a nonterminal, that is a symbol synthesized in your grammar.
@var{components} is a list of elements that are to be matched if @var{result}
is to be made.  @var{optional-semantic-action} is an optional sequence
of simplified Emacs Lisp expressions for concocting the parse tree.

In bison, each time an element of @var{components} is found, it is
@dfn{shifted} onto the parser stack.  (The stack of matched elements.)
When all @var{components}' elements have been matched, it is
@dfn{reduced} to @var{result}.  @xref{Algorithm,,, bison, The GNU Bison Manual}.

A particular @var{result} written into your grammar becomes
the parser's goal.  It is designated by a @code{%start} statement
(@pxref{Starting Rules}).  The value returned by the associated
@var{optional-semantic-action} is the parser's result.  It should be
a tree of @semantic{} @dfn{tags}, @inforef{Semantic Tags, ,
semantic-appdev}.

@var{components} is made up of symbols.  A symbol such as @code{FOO}
means that a syntactic token of class @code{FOO} must be matched.

@menu
* How Lexical Tokens Match::
* Grammar-to-Lisp Details::
* Order of components in rules::
@end menu

@node How Lexical Tokens Match
@section How Lexical Tokens Match

A lexical rule must be used to define how to match a lexical token.

For instance:

@example
%keyword FOO "foo"
@end example

Means that @code{FOO} is a reserved language keyword, matched as such
by looking up into a keyword table, @inforef{keyword Decl,
,grammar-fw}.  This is because @code{"foo"} will be converted to
@code{FOO} in the lexical analysis stage.  Thus the symbol @code{FOO}
won't be available any other way.

If we specify our token in this way:

@example
%token <symbol> FOO "foo"
@end example

then @code{FOO} will match the string @code{"foo"} explicitly, but it
won't do so at the lexical level, allowing use of the text
@code{"foo"} in other forms of regular expressions.

In that case, @code{FOO} is a @code{symbol}-type token.  To match, a
@code{symbol} must first be encountered, and then it must
@code{string-match "foo"}.

@table @strong
@item Caution:
Be especially careful to remember that @code{"foo"}, and more
generally the %token's match-value string, is a regular expression!
@end table

Non symbol tokens are also allowed.  For example:

@example
%token <punctuation> PERIOD "[.]"

filename : symbol PERIOD symbol
         ;
@end example

@code{PERIOD} is a @code{punctuation}-type token that will explicitly
match one period when used in the above rule.

@table @strong
@item Please Note:
@code{symbol}, @code{punctuation}, etc., are predefined lexical token
types, based on the @dfn{syntax class}-character associations
currently in effect.
@end table

@node Grammar-to-Lisp Details
@section Grammar-to-Lisp Details

For the bovinator, lexical token matching patterns are @emph{inlined}.
When the grammar-to-lisp converter encounters a lexical token
declaration of the form:

@example
%token <@var{type}> @var{token-name} @var{match-value}
@end example

It substitutes every occurrences of @var{token-name} in rules, by its
expanded form:

@example
@var{type} @var{match-value}
@end example

For example:

@example
%token <symbol> MOOSE "moose"

find_a_moose: MOOSE
            ;
@end example

Will generate this pseudo equivalent-rule:

@example
find_a_moose: symbol "moose"   ;; invalid syntax!
            ;
@end example

Thus, from the bovinator point of view, the @var{components} part of a
rule is made up of symbols and strings.  A string in the mix means
that the previous symbol must have the additional constraint of
exactly matching it, as described in @ref{How Lexical Tokens Match}.

@table @strong
@item Please Note:
For the bovinator, this task was mixed into the language definition to
simplify implementation, though Bison's technique is more efficient.
@end table

@node Order of components in rules
@section Order of components in rules

If a rule has multiple components, order is important, for example

@example
headerfile : symbol PERIOD symbol
           | symbol
           ;
@end example

would match @samp{foo.h} or the @acronym{C++} header @samp{foo}.
The bovine parser will first attempt to match the long form, and then
the short form.  If they were in reverse order, then the long form
would never be tested.

@c @xref{Default syntactic tokens}.

@node Optional Lambda Expression
@chapter Optional Lambda Expressions

The @acronym{OLE} (@dfn{Optional Lambda Expression}) is converted into
a bovine lambda.  This lambda has special short-cuts to simplify
reading the semantic action definition.  An @acronym{OLE} like this:

@example
( $1 )
@end example

results in a lambda return which consists entirely of the string
or object found by matching the first (zeroth) element of match.
An @acronym{OLE} like this:

@example
( ,(foo $1) )
@end example

executes @code{foo} on the first argument, and then splices its return
into the return list whereas:

@example
( (foo $1) )
@end example

executes @code{foo}, and that is placed in the return list.

Here are other things that can appear inline:

@table @code
@item $1
The first object matched.

@item ,$1
The first object spliced into the list (assuming it is a list from a
non-terminal).

@item '$1
The first object matched, placed in a list.  I.e., @code{( $1 )}.

@item foo
The symbol @code{foo} (exactly as displayed).

@item (foo)
A function call to foo which is stuck into the return list.

@item ,(foo)
A function call to foo which is spliced into the return list.

@item '(foo)
A function call to foo which is stuck into the return list in a list.

@item (EXPAND @var{$1} @var{nonterminal} @var{depth})
A list starting with @code{EXPAND} performs a recursive parse on the
token passed to it (represented by @samp{$1} above.)  The
@dfn{semantic list} is a common token to expand, as there are often
interesting things in the list.  The @var{nonterminal} is a symbol in
your table which the bovinator will start with when parsing.
@var{nonterminal}'s definition is the same as any other nonterminal.
@var{depth} should be at least @samp{1} when descending into a
semantic list.

@item (EXPANDFULL @var{$1} @var{nonterminal} @var{depth})
Is like @code{EXPAND}, except that the parser will iterate over
@var{nonterminal} until there are no more matches.  (The same way the
parser iterates over the starting rule (@pxref{Starting Rules}). This
lets you have much simpler rules in this specific case, and also lets
you have positional information in the returned tokens, and error
skipping.

@item (ASSOC @var{symbol1} @var{value1} @var{symbol2} @var{value2} @dots{})
This is used for creating an association list.  Each @var{symbol} is
included in the list if the associated @var{value} is non-@code{nil}.
While the items are all listed explicitly, the created structure is an
association list of the form:

@example
((@var{symbol1} . @var{value1}) (@var{symbol2} . @var{value2}) @dots{})
@end example

@item (TAG @var{name} @var{class} [@var{attributes}])
This creates one tag in the current buffer.

@table @var
@item name
Is a string that represents the tag in the language.

@item class
Is the kind of tag being create, such as @code{function}, or
@code{variable}, though any symbol will work.

@item attributes
Is an optional set of labeled values such as @code{:constant-flag t :parent
"parenttype"}.
@end table

@item  (TAG-VARIABLE @var{name} @var{type} @var{default-value} [@var{attributes}])
@itemx (TAG-FUNCTION @var{name} @var{type} @var{arg-list} [@var{attributes}])
@itemx (TAG-TYPE @var{name} @var{type} @var{members} @var{parents} [@var{attributes}])
@itemx (TAG-INCLUDE @var{name} @var{system-flag} [@var{attributes}])
@itemx (TAG-PACKAGE @var{name} @var{detail} [@var{attributes}])
@itemx (TAG-CODE @var{name} @var{detail} [@var{attributes}])
Create a tag with @var{name} of respectively the class
@code{variable}, @code{function}, @code{type}, @code{include},
@code{package}, and @code{code}.
See @inforef{Creating Tags, , semantic-appdev} for the lisp
functions these translate into.
@end table

If the symbol @code{%quotemode backquote} is specified, then use
@code{,@@} to splice a list in, and @code{,} to evaluate the expression.
This lets you send @code{$1} as a symbol into a list instead of having
it expanded inline.

@node Bovine Examples
@chapter Examples

The rule:

@example
any-symbol: symbol
          ;
@end example

is equivalent to

@example
any-symbol: symbol
            ( $1 )
          ;
@end example

which, if it matched the string @samp{"A"}, would return

@example
( "A" )
@end example

If this rule were used like this:

@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
        ( $1 $3 )
      ;
@end example

it would match @samp{"A=B"}, and return

@example
( ("A") ("B") )
@end example

The letters @samp{A} and @samp{B} come back in lists because
@samp{any-symbol} is a nonterminal, not an actual lexical element.

To get a better result with nonterminals, use @asis{,} to splice lists
in like this:

@example
%token <punctuation> EQUAL "="
@dots{}
assign: any-symbol EQUAL any-symbol
        ( ,$1 ,$3 )
      ;
@end example

which would return

@example
( "A" "B" )
@end example

@node GNU Free Documentation License
@appendix GNU Free Documentation License

@include doclicense.texi

@c There is nothing to index at the moment.
@ignore
@node Index
@unnumbered Index
@printindex cp
@end ignore

@iftex
@contents
@summarycontents
@end iftex

@bye

@c Following comments are for the benefit of ispell.

@c  LocalWords:  bovinator inlined