Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Textual's tree-sitter version to 0.22.x #4845

Closed
prurph opened this issue Aug 5, 2024 · 11 comments
Closed

Upgrade Textual's tree-sitter version to 0.22.x #4845

prurph opened this issue Aug 5, 2024 · 11 comments

Comments

@prurph
Copy link

prurph commented Aug 5, 2024

I discussed this briefly with @darrenburns on Discord: Textual currently uses tree-sitter 0.20.4, and it would be nice to upgrade it, notably for the matches method, that returns a dict of matches, allowing easier navigation when you have several matching instances of a query with sub-matches.

There are two main concerns with ugprading to the latest version, 0.22.x:

  1. It requires Python >=3.9, but Textual is >=3.8. tree-sitter 0.21.x does support 3.8, but since the source of the other issue is a breaking API change in 0.22.x, it's probably better to just jump to the latest version if at all possible. Since tree-sitter is an optional dependency of Textual, I think pyproject.toml lets you specify that an optional dependency has a subset of supported Python versions so maybe that's an option too.

  2. Textual uses tree-sitter-languages to add grammars for use. It is unmaintained, and is incompatible with tree-sitter 0.22; you can no longer instantiate a language by path to the compiled grammar, and that's how tree-sitter-languages get_language works.

    Further, tree-sitter recommends grammar authors release them directly and individually to PyPi, npm, and cargo, rather than having other projects that attempt to bundle together the binaries for many languages. Tree-sitter offers Github workflows to facilitate this, but AFAICT these are newer introductions, and a few grammars do not use them or do not release Python versions.

There are two paths to upgrading that I see:

  1. Use the unofficial replacement for py-tree-sitter-languages, tree-sitter-language-pack. This attempts to provide grammars in bulk, but loads them in a way compatible with the newer tree-sitter API.

    • This could be a benefit as it can provide grammars that aren't released on PyPi, for example Kotlin, and SQL (see below)
    • The downside is reliance on a single, new package, and one that aims to duplicate/circumvent the "official" way to release a grammar
  2. Use languages with grammars that are installable by pip. Here's a list of Textual's built-in languages and their status in that regard:

Language Can pip install? Other Notes
Bash
CSS
Go
HTML
Java
Javascript
JSON
Kotlin Recent open issue to add
Markdown Installable via git url; release may be inflight. See this comment
Python
Regex
Rust
SQL I opened an issue and maintainers responded very quickly!
TOML
YAML

I'm hoping to find some time to see what issues crop up trying to upgrade tree-sitter but thought I would share my findings so far and more importantly ask what the Python version support plan/philosophy is for Textual. Thanks!

Copy link

github-actions bot commented Aug 5, 2024

Thank you for your issue. Give us a little time to review it.

PS. You might want to check the FAQ if you haven't done so already.

This is an automated reply, generated by FAQtory

@merriam
Copy link
Contributor

merriam commented Aug 7, 2024

I have noticed these issues:

  • Backwards compatibility to Python 3.8.1 does cause problems. For example, running the documentation server also requires a later version of Python.
  • Tree Sitter lacks an amazing amount of documentation. Specifically, there are three main parts of tree-sitter: an executable 'tree-sitter' command and its core libraries, Python bindings to call the libraries, and language (.scm) grammar files. Each is maintained separately, with grammars being split over many sources. The python bindings, somewhat half-heartedly maintained, are extremely non-Pythonic. So far, I ended up writing a few utilities (like walk) to make them useful.
  • Deployments on Tree-sitter often avoid PyPi, and its scrutiny, and often provide only 'trust us' precompiled binaries.
  • Tree Sitter is used only for the TextEdit widget. Pygments, a mature regular expression syntax coloring system, is used for the CodeBrowser demonstration.
  • I have been unable to get the current tree sitter installations to allow a .tccs language definition.

My usage of TreeSitter in Textual has been limited to self-inspection, e.g., I pull out the DEFAULT_CSS values, comments, and some other items. I'm still working on getting it into the documentation build system. Even for this, I cheat and use the Documents code to actually parse the tree as it worked first and has not been worth making it work.

So far the usage cases for TreeSitter are niche, tools and TextArea. It might be better to remove TreeSitter entirely and have an example or documentation note on how to call it when needed. This is only my opinion.

@prurph
Copy link
Author

prurph commented Aug 7, 2024

Hmmm, I think the syntax highlighting in text areas is nice, and, moreover, the ability to parse the content into an arbitrary AST is extremely useful for interacting with the text the user types.

I do agree the availability of the grammars varies by language (since they are independently maintained), but have found the major ones to be well-maintained, and I haven't had issues with the Python bindings when using them with my own parser inside of Textual.

I definitely think keeping tree-sitter as an optional dependency of Textual is the way to go, but removing it entirely would shut out a lot of the TextArea functionality, both natively with out of the box syntax highlighting, and for Textual projects using TextArea. Perhaps it should be "bring your own parser" with instructions on how to pip install the available ones?

My understanding of the recent API changes is that they are motivated at least in part by moving towards grammars installed as dependencies and away from the past of loading the binary .so file directly, hence why languages can no longer be loaded from a file and instead are expected to be imported as modules/crates/etc.

@merriam
Copy link
Contributor

merriam commented Aug 8, 2024

Do you believe TextArea could be equally useful without the built-in TreeSitter, but with detailed instructions?

Can you see a point where Tree-sitter is not a niche capability of Textual? Some killer app?

@darrenburns
Copy link
Member

Thanks for investigating @prurph, this is a super helpful write up. I think it answers all of my questions before I asked them! I think the 3.9+ requirement is going to be a deal breaker though, at least for now. We generally support Python versions a little beyond end-of-life.

I think in the future, the "language pack" is may be the way to go in the future too, as a replacement for the current py-tree-sitter-languages module.


@merriam

Do you believe TextArea could be equally useful without the built-in TreeSitter, but with detailed instructions?

Tree-sitter is an optional extra that's only required if you want syntax highlighting in the TextArea. You don't need to install it to use TextArea without highlighting.

Deployments on Tree-sitter often avoid PyPi, and its scrutiny

Maybe I'm misunderstanding this but Textual is pinned to use 0.20.* which comes from PyPI. All of Textual's dependencies, including tree-sitter are available on PyPI.

Tree Sitter is used only for the TextEdit widget. Pygments, a mature regular expression syntax coloring system, is used for the CodeBrowser demonstration.

We don't use pygments in the TextArea widget because it's too slow.

Can you see a point where Tree-sitter is not a niche capability of Textual? Some killer app?

Is syntax highlighting "niche"? I can say with a pretty high degree of certainty if we removed it, we'd have a lot of disappointed users. I use it in a couple of my own apps and know that if it wasn't already integrated with Textual it'd be a real pain.

I have been unable to get the current tree sitter installations to allow a .tccs language definition.

Textual is using an older tree-sitter version. You may have been reading docs for a newer version of tree-sitter.

@prurph
Copy link
Author

prurph commented Aug 8, 2024

Thanks for investigating @prurph, this is a super helpful write up. I think it answers all of my questions before I asked them! I think the 3.9+ requirement is going to be a deal breaker though, at least for now. We generally support Python versions a little beyond end-of-life.

Sure thing @darrenburns. That sounds reasonable--3.8 is EOL in October so it makes sense to revisit then. This will also give some time to see if the "replacement" or tree-sitter-languages is still going then, and/or if it makes sense to instead let users bring their own parsers (typically just pip installing them).

@prurph
Copy link
Author

prurph commented Aug 9, 2024

Looks like SQL is now available on PyPi! DerekStride/tree-sitter-sql#269

Huge thanks to them for accommodating my request very quickly! 🎉

@darrenburns
Copy link
Member

tree-sitter-markdown is available on PyPI now too it appears.

@darrenburns
Copy link
Member

tree-sitter-language-pack appears to be ~700MB, so that's probably not an option 😆

@TomJGooding
Copy link
Contributor

tree-sitter was upgraded in Textual v0.89.0, so this should probably be closed.

Copy link

Don't forget to star the repository!

Follow @textualizeio for Textual updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants