Question: Parsing Vue-Template Syntax (HTML flavor) #2524
-
I try to build some tooling for the Vue Javascript framework (to extract I18n-strings). As Vue-Single-Files uses a "kind-of" HTML, I try to use Nokogiri to parse it, extract the strings, but keep the structure and format mostly intact. Unfortunately, all parsers have some problems, e.g. parsing this snippet: <template>
<div :class="foo" v-bind:foo="Foobar Foo" :fooBar="fooBar" v-if="bla" @click.prevent="foo" stacked>
<BModal></BModal>
</div>
</template>
<script setup>
</script>
So the HTML5-Parser works best for now (because 1. is critcial), as the case-sensitivity is "Workaroundable" with some replacing/unreplacing steps, but the boolean attributes are not so pretty (maybe there are more cases which I will find later on). Nokogiri::HTML5 has no obvious configuration options, so my question is, if there is any way to customize the parsing, or combining the parsers somehow (I guess not?) |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi, thanks for asking this question! Nokogiri is a wrapper around other parsing engines (libgumbo for HTML5, libxml2 for HTML4 and XML), and so this parsing behavior can't easily be changed by Nokogiri. I'm curious if other folks have tips or tricks they use to parse not-quite-well-formed markup? |
Beta Was this translation helpful? Give feedback.
According to the HTML living standard, attributes without values implicitly have a value of the empty string. So one idea is that you write your own serialization code that does not include the empty string values for attributes.
The algorithm to serialize is more or less straight forward. You could start with this code and modify the code around line 326.
Dealing with case of attributes and tags is harder. The standard says during parsing that all attributes and tag names becomes lowercase. This is true even for attributes in foreign elements (MathML and SVG). (There are separate parsing steps for converting the foreign attributes and tags to their appropriate case, but the specific attr…