summaryrefslogtreecommitdiff
path: root/doc/lispref/modes.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/lispref/modes.texi')
-rw-r--r--doc/lispref/modes.texi459
1 files changed, 453 insertions, 6 deletions
diff --git a/doc/lispref/modes.texi b/doc/lispref/modes.texi
index 9527df33b82..c472f9b4411 100644
--- a/doc/lispref/modes.texi
+++ b/doc/lispref/modes.texi
@@ -2853,11 +2853,14 @@ mode; most major modes define syntactic criteria for which faces to use
in which contexts. This section explains how to customize Font Lock for
a particular major mode.
- Font Lock mode finds text to highlight in two ways: through
-syntactic parsing based on the syntax table, and through searching
-(usually for regular expressions). Syntactic fontification happens
-first; it finds comments and string constants and highlights them.
-Search-based fontification happens second.
+ Font Lock mode finds text to highlight in three ways: through
+parsing based on a full-blown parser (usually, via an external library
+or program), through syntactic parsing based on the Emacs's built-in
+syntax table, or through searching (usually for regular expressions).
+If enabled, parser-based fontification happens first
+(@pxref{Parser-based Font Lock}). Syntactic fontification happens
+next; it finds comments and string constants and highlights them.
+Search-based fontification happens last.
@menu
* Font Lock Basics:: Overview of customizing Font Lock.
@@ -2872,6 +2875,7 @@ Search-based fontification happens second.
* Syntactic Font Lock:: Fontification based on syntax tables.
* Multiline Font Lock:: How to coerce Font Lock into properly
highlighting multiline constructs.
+* Parser-based Font Lock:: Use parse data for fontification.
@end menu
@node Font Lock Basics
@@ -3652,6 +3656,71 @@ This face inherits, by default, from @code{font-lock-constant-face}.
@item font-lock-negation-char-face
@vindex font-lock-negation-char-face
for easily-overlooked negation characters.
+
+@item font-lock-escape-face
+@vindex font-lock-escape-face
+for escape sequences in strings.
+This face inherits, by default, from @code{font-lock-regexp-grouping-backslash}.
+
+Here is an example in Python, where the escape sequence @code{\n} is used:
+
+@smallexample
+@group
+print('Hello world!\n')
+@end group
+@end smallexample
+
+@item font-lock-number-face
+@vindex font-lock-number-face
+for numbers.
+
+@item font-lock-operator-face
+@vindex font-lock-operator-face
+for operators.
+
+@item font-lock-property-face
+@vindex font-lock-property-face
+for properties of an object, such as the declaration and use of fields
+in a struct.
+This face inherits, by default, from @code{font-lock-variable-name-face}.
+
+For example,
+
+@smallexample
+@group
+typedef struct
+@{
+ int prop;
+// ^ property
+@} obj;
+
+int main()
+@{
+ obj o;
+ o.prop = 3;
+// ^ property
+@}
+@end group
+@end smallexample
+
+@item font-lock-punctuation-face
+@vindex font-lock-punctuation-face
+for punctuation such as brackets and delimiters.
+
+@item font-lock-bracket-face
+@vindex font-lock-bracket-face
+for brackets (e.g., @code{()}, @code{[]}, @code{@{@}}).
+This face inherits, by default, from @code{font-lock-punctuation-face}.
+
+@item font-lock-delimiter-face
+@vindex font-lock-delimiter-face
+for delimiters (e.g., @code{;}, @code{:}, @code{,}).
+This face inherits, by default, from @code{font-lock-punctuation-face}.
+
+@item font-lock-misc-punctuation-face
+@vindex font-lock-misc-punctuation-face
+for punctuation that is not a bracket or delimiter.
+This face inherits, by default, from @code{font-lock-punctuation-face}.
@end table
@node Syntactic Font Lock
@@ -3876,6 +3945,191 @@ Since this function is called after every buffer change, it should be
reasonably fast.
@end defvar
+@node Parser-based Font Lock
+@subsection Parser-based Font Lock
+@cindex parser-based font-lock
+
+@c This node is written when the only parser Emacs has is tree-sitter;
+@c if in the future more parser are supported, this should be
+@c reorganized and rewritten to describe multiple parsers in parallel.
+
+Besides simple syntactic font lock and regexp-based font lock, Emacs
+also provides complete syntactic font lock with the help of a parser.
+Currently, Emacs uses the tree-sitter library (@pxref{Parsing Program
+Source}) for this purpose.
+
+Parser-based font lock and other font lock mechanisms are not mutually
+exclusive. By default, if enabled, parser-based font lock runs first,
+replacing syntactic font lock, then the regexp-based font lock.
+
+Although parser-based font lock doesn't share the same customization
+variables with regexp-based font lock, it uses similar customization
+schemes. The tree-sitter counterpart of @var{font-lock-keywords} is
+@var{treesit-font-lock-settings}.
+
+@cindex tree-sitter fontifications, overview
+@cindex fontifications with tree-sitter, overview
+In general, tree-sitter fontification works as follows:
+
+@itemize @bullet
+@item
+A Lisp program (usually, part of a major mode) provides a @dfn{query}
+consisting of @dfn{patterns}, each pattern associated with a
+@dfn{capture name}.
+
+@item
+The tree-sitter library finds the nodes in the parse tree
+that match these patterns, tags the nodes with the corresponding
+capture names, and returns them to the Lisp program.
+
+@item
+The Lisp program uses the returned nodes to highlight the portions of
+buffer text corresponding to each node as appropriate, using the
+tagged capture names of the nodes to determine the correct
+fontification. For example, a node tagged @code{font-lock-keyword}
+would be highlighted in @code{font-lock-keyword} face.
+@end itemize
+
+For more information about queries, patterns, and capture names, see
+@ref{Pattern Matching}.
+
+To setup tree-sitter fontification, a major mode should first set
+@code{treesit-font-lock-settings} with the output of
+@code{treesit-font-lock-rules}, then call
+@code{treesit-major-mode-setup}.
+
+@defun treesit-font-lock-rules &rest query-specs
+This function is used to set @var{treesit-font-lock-settings}. It
+takes care of compiling queries and other post-processing, and outputs
+a value that @var{treesit-font-lock-settings} accepts. Here's an
+example:
+
+@example
+@group
+(treesit-font-lock-rules
+ :language 'javascript
+ :feature 'constant
+ :override t
+ '((true) @@font-lock-constant-face
+ (false) @@font-lock-constant-face)
+ :language 'html
+ :feature 'script
+ "(script_element) @@font-lock-builtin-face")
+@end group
+@end example
+
+This function takes a series of @var{query-spec}s, where each
+@var{query-spec} is a @var{query} preceded by one or more
+@var{:keyword}/@var{value} pairs. Each @var{query} is a
+tree-sitter query in either the string, s-expression or compiled form.
+
+For each @var{query}, the @var{:keyword}/@var{value} pairs that
+precede it add meta information to it. The @code{:lang} keyword
+declares @var{query}'s language. The @code{:feature} keyword sets the
+feature name of @var{query}. Users can control which features are
+enabled with @code{font-lock-maximum-decoration} and
+@code{treesit-font-lock-feature-list} (described below). These two
+keywords are mandatory.
+
+Other keywords are optional:
+
+@multitable @columnfractions .15 .15 .6
+@headitem Keyword @tab Value @tab Description
+@item @code{:override} @tab nil
+@tab If the region already has a face, discard the new face
+@item @tab t @tab Always apply the new face
+@item @tab @code{append} @tab Append the new face to existing ones
+@item @tab @code{prepend} @tab Prepend the new face to existing ones
+@item @tab @code{keep} @tab Fill-in regions without an existing face
+@end multitable
+
+Lisp programs mark patterns in @var{query} with capture names (names
+that starts with @code{@@}), and tree-sitter will return matched nodes
+tagged with those same capture names. For the purpose of
+fontification, capture names in @var{query} should be face names like
+@code{font-lock-keyword-face}. The captured node will be fontified
+with that face.
+
+@findex treesit-fontify-with-override
+Capture names can also be function names, in which case the function
+is called with 4 arguments: @var{node} and @var{override}, @var{start}
+and @var{end}, where @var{node} is the node itself, @var{override} is
+the override property of the rule which captured this node, and
+@var{start} and @var{end} limits the region in which this function
+should fontify. (If this function wants to respect the @var{override}
+argument, it can use @code{treesit-fontify-with-override}.)
+
+Beyond the 4 arguments presented, this function should accept more
+arguments as optional arguments for future extensibility.
+
+If a capture name is both a face and a function, the face takes
+priority. If a capture name is neither a face nor a function, it is
+ignored.
+@end defun
+
+@defvar treesit-font-lock-feature-list
+This is a list of lists of feature symbols. Each element of the list
+is a list that represents a decoration level.
+@code{font-lock-maximum-decoration} controls which levels are
+activated.
+
+Each element of the list is a list of the form @w{@code{(@var{feature}
+@dots{})}}, where each @var{feature} corresponds to the
+@code{:feature} value of a query defined in
+@code{treesit-font-lock-rules}. Removing a feature symbol from this
+list disables the corresponding query during font-lock.
+
+Common feature names, for many programming languages, include
+@code{definition}, @code{type}, @code{assignment}, @code{builtin},
+@code{constant}, @code{keyword}, @code{string-interpolation},
+@code{comment}, @code{doc}, @code{string}, @code{operator},
+@code{preprocessor}, @code{escape-sequence}, and @code{key}. Major
+modes are free to subdivide or extend these common features.
+
+Some of these features warrant some explanation: @code{definition}
+highlights whatever is being defined, e.g., the function name in a
+function definition, the struct name in a struct definition, the
+variable name in a variable definition; @code{assignment} highlights
+the whatever is being assigned to, e.g., the variable or field in an
+assignment statement; @code{key} highlights keys in key-value pairs,
+e.g., keys in a JSON object, or a Python dictionary; @code{doc}
+highlights docstrings or doc-comments.
+
+For example, the value of this variable could be:
+@example
+@group
+((comment string doc) ; level 1
+ (function-name keyword type builtin constant) ; level 2
+ (variable-name string-interpolation key)) ; level 3
+@end group
+@end example
+
+Major modes should set this variable before calling
+@code{treesit-major-mode-setup}.
+
+@findex treesit-font-lock-recompute-features
+For this variable to take effect, a Lisp program should call
+@code{treesit-font-lock-recompute-features} (which resets
+@code{treesit-font-lock-settings} accordingly), or
+@code{treesit-major-mode-setup} (which calls
+@code{treesit-font-lock-recompute-features}).
+@end defvar
+
+@defvar treesit-font-lock-settings
+A list of settings for tree-sitter based font lock. The exact format
+of each setting is considered internal. One should always use
+@code{treesit-font-lock-rules} to set this variable.
+
+@c Because the format is internal, we don't document them here. Though
+@c we do have it explained in the docstring. We also expose the fact
+@c that it is a list of settings, so one could combine two of them with
+@c append.
+@end defvar
+
+Multi-language major modes should provide range functions in
+@code{treesit-range-functions}, and Emacs will set the ranges
+accordingly before fontifing a region (@pxref{Multiple Languages}).
+
@node Auto-Indentation
@section Automatic Indentation of code
@@ -3932,10 +4186,12 @@ and a few other such modes) has been made more generic over the years,
so if your language seems somewhat similar to one of those languages,
you might try to use that engine. @c FIXME: documentation?
Another one is SMIE which takes an approach in the spirit
-of Lisp sexps and adapts it to non-Lisp languages.
+of Lisp sexps and adapts it to non-Lisp languages. Yet another one is
+to rely on a full-blown parser, for example, the tree-sitter library.
@menu
* SMIE:: A simple minded indentation engine.
+* Parser-based Indentation:: Parser-based indentation engine.
@end menu
@node SMIE
@@ -4595,6 +4851,197 @@ to the file's local variables of the form:
@code{eval: (smie-config-local '(@var{rules}))}.
@end defun
+@node Parser-based Indentation
+@subsection Parser-based Indentation
+@cindex parser-based indentation
+
+@c This node is written when the only parser Emacs has is tree-sitter;
+@c if in the future more parsers are supported, this should be
+@c reorganized and rewritten to describe multiple parsers in parallel.
+
+When built with the tree-sitter library (@pxref{Parsing Program
+Source}), Emacs is capable of parsing the program source and producing
+a syntax tree. This syntax tree can be used for guiding the program
+source indentation commands. For maximum flexibility, it is possible
+to write a custom indentation function that queries the syntax tree
+and indents accordingly for each language, but that is a lot of work.
+It is more convenient to use the simple indentation engine described
+below: then the major mode needs only to write some indentation rules
+and the engine takes care of the rest.
+
+To enable the parser-based indentation engine, either set
+@var{treesit-simple-indent-rules} and call
+@code{treesit-major-mode-setup}, or equivalently, set the value of
+@code{indent-line-function} to @code{treesit-indent}.
+
+@defvar treesit-indent-function
+This variable stores the actual function called by
+@code{treesit-indent}. By default, its value is
+@code{treesit-simple-indent}. In the future we might add other,
+more complex indentation engines.
+@end defvar
+
+@heading Writing indentation rules
+@cindex indentation rules, for parser-based indentation
+
+@defvar treesit-simple-indent-rules
+This local variable stores indentation rules for every language. It is
+a list of the form: @w{@code{(@var{language} . @var{rules})}}, where
+@var{language} is a language symbol, and @var{rules} is a list of the
+form @w{@code{(@var{matcher} @var{anchor} @var{offset})}}.
+
+First, Emacs passes the smallest tree-sitter node at the beginning of
+the current line to @var{matcher}; if it returns non-@code{nil}, this
+rule is applicable. Then Emacs passes the node to @var{anchor}, which
+returns a buffer position. Emacs takes the column number of that
+position, adds @var{offset} to it, and the result is the indentation
+column for the current line. @var{offset} can be an integer or a
+variable whose value is an integer.
+
+The @var{matcher} and @var{anchor} are functions, and Emacs provides
+convenient defaults for them.
+
+Each @var{matcher} or @var{anchor} is a function that takes three
+arguments: @var{node}, @var{parent}, and @var{bol}. The argument
+@var{bol} is the buffer position whose indentation is required: the
+position of the first non-whitespace character after the beginning of
+the line. The argument @var{node} is the largest (highest-in-tree)
+node that starts at that position; and @var{parent} is the parent of
+@var{node}. However, when that position is in a whitespace or inside
+a multi-line string, no node can start at that position, so
+@var{node} is @code{nil}. In that case, @var{parent} would be the
+smallest node that spans that position.
+
+Emacs finds @var{bol}, @var{node} and @var{parent} and
+passes them to each @var{matcher} and @var{anchor}. @var{matcher}
+should return non-@code{nil} if the rule is applicable, and
+@var{anchor} should return a buffer position.
+@end defvar
+
+@defvar treesit-simple-indent-presets
+This is a list of defaults for @var{matcher}s and @var{anchor}s in
+@code{treesit-simple-indent-rules}. Each of them represents a function
+that takes 3 arguments: @var{node}, @var{parent} and @var{bol}. The
+available default functions are:
+
+@ftable @code
+@item no-node
+This matcher is a function that is called with 3 arguments:
+@var{node}, @var{parent}, and @var{bol}, and returns non-@code{nil},
+indicating a match, if @var{node} is @code{nil}, i.e., there is no
+node that starts at @var{bol}. This is the case when @var{bol} is on
+an empty line or inside a multi-line string, etc.
+
+@item parent-is
+This matcher is a function of one argument, @var{type}; it returns a
+function that is called with 3 arguments: @var{node}, @var{parent},
+and @var{bol}, and returns non-@code{nil} (i.e., a match) if
+@var{parent}'s type matches regexp @var{type}.
+
+@item node-is
+This matcher is a function of one argument, @var{type}; it returns a
+function that is called with 3 arguments: @var{node}, @var{parent},
+and @var{bol}, and returns non-@code{nil} if @var{node}'s type matches
+regexp @var{type}.
+
+@item query
+This matcher is a function of one argument, @var{query}; it returns a
+function that is called with 3 arguments: @var{node}, @var{parent},
+and @var{bol}, and returns non-@code{nil} if querying @var{parent}
+with @var{query} captures @var{node} (@pxref{Pattern Matching}).
+
+@item match
+This matcher is a function of 5 arguments: @var{node-type},
+@var{parent-type}, @var{node-field}, @var{node-index-min}, and
+@var{node-index-max}). It returns a function that is called with 3
+arguments: @var{node}, @var{parent}, and @var{bol}, and returns
+non-@code{nil} if @var{node}'s type matches regexp @var{node-type},
+@var{parent}'s type matches regexp @var{parent-type}, @var{node}'s
+field name in @var{parent} matches regexp @var{node-field}, and
+@var{node}'s index among its siblings is between @var{node-index-min}
+and @var{node-index-max}. If the value of an argument is @code{nil},
+this matcher doesn't check that argument. For example, to match the
+first child where parent is @code{argument_list}, use
+
+@example
+(match nil "argument_list" nil nil 0 0)
+@end example
+
+@item comment-end
+This matcher is a function that is called with 3 arguments:
+@var{node}, @var{parent}, and @var{bol}, and returns non-@code{nil} if
+point is before a comment ending token. Comment ending tokens are
+defined by regular expression @code{treesit-comment-end}
+(@pxref{Tree-sitter major modes, treesit-comment-end}).
+
+@item first-sibling
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the start of the first child
+of @var{parent}.
+
+@item parent
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the start of @var{parent}.
+
+@item parent-bol
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the first non-space character
+on the line of @var{parent}.
+
+@item prev-sibling
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the start of the previous
+sibling of @var{node}.
+
+@item no-indent
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the start of @var{node}.
+
+@item prev-line
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the first non-whitespace
+character on the previous line.
+
+@item point-min
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the beginning of the buffer.
+This is useful as the beginning of the buffer is always at column 0.
+
+@item comment-start
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the position right after the
+comment-start token. Comment-start tokens are defined by regular
+expression @code{treesit-comment-start} (@pxref{Tree-sitter major
+modes, treesit-comment-start}). This function assumes @var{parent} is
+the comment node.
+
+@item coment-start-skip
+This anchor is a function that is called with 3 arguments: @var{node},
+@var{parent}, and @var{bol}, and returns the position after the
+comment-start token and any whitespace characters following that
+token. Comment-start tokens are defined by regular expression
+@code{treesit-comment-start}. This function assumes @var{parent} is
+the comment node.
+@end ftable
+@end defvar
+
+@heading Indentation utilities
+@cindex utility functions for parser-based indentation
+
+Here are some utility functions that can help writing parser-based
+indentation rules.
+
+@defun treesit-check-indent mode
+This function checks the current buffer's indentation against major
+mode @var{mode}. It indents the current buffer according to
+@var{mode} and compares the results with the current indentation.
+Then it pops up a buffer showing the differences. Correct
+indentation (target) is shown in green color, current indentation is
+shown in red color. @c Are colors customizable? faces?
+@end defun
+
+It is also helpful to use @code{treesit-inspect-mode} (@pxref{Language
+Definitions}) when writing indentation rules.
@node Desktop Save Mode
@section Desktop Save Mode