-
Notifications
You must be signed in to change notification settings - Fork 72
Supporting a new language (classic)
Note: match-up offers utility functions to make managing b:match_words
easier.
When placed in an autocommand or in a file
after/ftplugin/{&filetype}.vim
, they can be used to customize the matching
regular expressions for a particular file type.
matchup#util#append_match_words
:
call matchup#util#append_match_words('some:words')
Adds a set of patterns to b:match_words
, adding a comma if necessary. Use this instead of concatenating directly.
matchup#util#patch_match_words
:
call matchup#util#patch_match_words(before, after)
This function replaces the literal string in before
contained in b:match_words
with the literal string in after
.
In order for match-up to support a new language, you must define a suitable
pattern for b:match_words
. If your language has a
complicated syntax, or many keywords, you will need to know something about
vim's regular-expression
s.
The format for b:match_words
is similar to that of the 'matchpairs' option:
it is a comma (,)-separated list of groups; each group is a colon(:)-separated
list of patterns (regular expressions). Commas and backslashes that are part
of a pattern should be escaped with backslashes (':' and ','). It is OK to
have only one group; the effect is undefined if a group has only one pattern.
A simple example is
:let b:match_words = '\<if\>:\<endif\>,'
\ . '\<while\>:\<continue\>:\<break\>:\<endwhile\>'
(In vim regular expressions, \<
and \>
denote word boundaries. Thus "if"
matches the end of "endif" but "<if>" does not.) Then banging on the "%"
key will bounce the cursor between "if" and the matching "endif"; and from
"while" to any matching "continue" or "break", then to the matching "endwhile"
and back to the "while". It is almost always easier to use literal-string
s
(single quotes) as above: '<if>' rather than "\<if\>" and so on.
Exception: If the ":" character does not appear in b:match_words, then it is treated as an expression to be evaluated. For example,
:let b:match_words = 'GetMatchWords()'
allows you to define a function. This can return a different string depending on the current syntax, for example. Note: this is deprecated in match-up, try not to use it if possible.
Once you have defined the appropriate value of b:match_words
, you will
probably want to have this set automatically each time you edit the
appropriate file type. The recommended way to do this is by adding the
definition to a filetype-plugin
file.
Tips: Be careful that your initial pattern does not match your final pattern.
See the example above for the use of word-boundary expressions. It is usually
better to use ".{-}" (as many as necessary) instead of ".*" (as many as
possible). See \{-
. For example, in the string "<tag>label</tag>"
, "<.*>"
matches the whole string whereas "<.\{-}>"
and "<[^>]*>"
match "<tag>"
and
"</tag>"
.
If "if" is to be paired with "end if" (Note the space!) then word boundaries are not enough. Instead, define a regular expression s:notend that will match anything but "end" and use it as follows:
:let s:notend = '\%(\<end\s\+\)\@<!'
:let b:match_words = s:notend . '\<if\>:\<end\s\+if\>'
This is a simplified version of what is done for Ada. The s:notend is a
script-variable
. Similarly, you may want to define a start-of-line regular
expression
:let s:sol = '\%(^\`;\)\s*'
if keywords are only recognized after the start of a line or after a semicolon (;), with optional white space.
In any group, the expressions \1
, \2
, ..., \9
refer to parts of the
INITIAL pattern enclosed in \(
escaped parentheses\)
. These are referred
to as back references, or backrefs. For example,
:let b:match_words = '\<b\(o\+\)\>:\(h\)\1\>'
means that "bo" pairs with "ho" and "boo" pairs with "hoo" and so on. Note
that "\1" does not refer to the "(h)" in this example. If you have
"(nested (parentheses)) then "\d" refers to the d-th "(" and everything
up to and including the matching ")": in "(nested(parentheses))", "\1"
refers to everything and "\2" refers to "(parentheses)". If you use a
variable such as s:notend
or s:sol
in the previous paragraph then remember
to count any "(" patterns in this variable. You do not have to count groups
defined by \%(\)
.
It should be possible to resolve back references from any pattern in the group. For example,
:let b:match_words = '\(foo\)\(bar\):more\1:and\2:end\1\2'
would not work because "\2" cannot be determined from "morefoo" and "\1" cannot be determined from "andbar". On the other hand,
:let b:match_words = '\(\(foo\)\(bar\)\):\3\2:end\1'
should work (and have the same effect as "foobar:barfoo:endfoobar"), although this has not been thoroughly tested.
You can use zero-width
patterns such as \@<=
and \zs
.
For example, if the keyword "if"
must occur at the start of the line, with optional white space, you might use
the pattern "(^\s*)@<=if" so that the cursor will end on the "i" instead of
at the start of the line. For another example, if HTML had only one tag then
one could
:let b:match_words = '<:>,<\@<=tag>:<\@<=/tag>'
so that "%" can bounce between matching "<" and ">" pairs or (starting on
"tag" or "/tag") between matching tags. Without the \@<=
, the script would
bounce from "tag" to the "<" in "", and another "%" would not take you
back to where you started.
On top of matchit compatibility, match-up provides a few extensions to support additional languages.
In your regular expressions, you can use the \g{}
pseudo-atom to give special handling. This is entirely a match-up extension; vim's regex engine does not define \g
in regular expressions. The syntax is as follows:
/\g{tag;arg1;arg2}/
Currently, two such tags are possible:
-
\g{hlend}
terminates highlighting at this place in the regex. This is similar to but distinct from\ze
, since this would also terminate the match for the purposes of motions and text objects. I.e.,hlend
only applies to highlighting. -
\g{syn;+offset;group}
and\g{syn;-offset;!group}
. This is experimental. When matching, disambiguate two matches by the syntax group under the match position. The offset is how many bytes from the match position to grab the syntax. In the first alternative, the group must match the regular expressiongroup
. In the second, with!
, the group must not match the regular expressiongroup
.
Some languages have blocks that mids can be in but are not distinguishable by the end marker. As an example, consider a language with function:return:end
and if:end
and the following snippet:
function foo(x)
if x
return -1
end
end
In matchit, the return
will be incorrectly matched with if
/end
since it simply takes the nearest block. In match-up however, we have a special option b:match_midmap
to fix this. It is specified in a list of pairs as follows (for example, in ruby):
let b:match_midmap = [
\ ['rubyRepeat', 'next'],
\ ['rubyDefine', 'return'],
\]
The first element of each pair is the syntax group which must be present on the block to consider the return matching it. Suppose if
were to match with return
without the midmap, but if
does not have the group rubyDefine
. Then it would be struck, and match-up would instead match the next outer group (repeating this process as many times as necessary).
As it is syntax group based, this mechanism only works and is only required in classic matching.
Adapted from matchit.txt
.