-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
XML Entities, individual and grouped #183
base: master
Are you sure you want to change the base?
Conversation
Added support for namespace correct, bundled XML entity files. So a line like this on <entity name="link.composer"><link xlink:href="&url.pecl;">Composer</link></entity> becomes this on <!ENTITY link.composer '<link xmlns="http://docbook.org/ns/docbook" xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="&url.pecl;">Composer</link>'> and an file named <methodsynopsis>
<type>int</type><methodname><replaceable>callback</replaceable></methodname>
<methodparam><type>mixed</type><parameter>a</parameter></methodparam>
<methodparam><type>mixed</type><parameter>b</parameter></methodparam>
</methodsynopsis> becomes <!ENTITY callback.cmp '<methodsynopsis xmlns="http://docbook.org/ns/docbook"><type>int</type><methodname><replaceable>callback</replaceable></methodname>
<methodparam><type>mixed</type><parameter>a</parameter></methodparam>
<methodparam><type>mixed</type><parameter>b</parameter></methodparam>
</methodsynopsis>'> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a very cursory glance at this, but you might want to rebase and squash some commits together?
Yes to squash and rebase. I just found out that manual building is not idempotent, and because of that, my regression testing so far was inadequate. Back to draft, until I can address all points above. I will ping |
Some observations that I gathered while working on making manual build idempotent. At a size of ~41 Mb and actual entity usage, the PHP manual is dangerously close to hinting some hard coded libxml2 limits.Twice I changed the code to avoid the misleading error Specifically, it was necessary to keep entities listings, generated by |
Would changing the entities generated by Yes this would require patching all of the docs, but wondering if this is a way forward as I was thinking of this already. |
Yes. Yet, a better solution would be having an option to change whatever metric libxml2 uses to identify high entity usage. The entity usage is huge in manuals, and this will bite someday.
In the end, I think directly changing entity files for But, priorities. For now, I focused on XInclude/fallback, then XInclude by xml:id, then this infrastructure, then |
Yes, obviously this is lower on the priority. |
I'm thinking of opening an issue, to keep track of these projects. An road map of mentioned projects above, and to document some bottlenecks found in the way, like the entities limit. |
Feel free to do that :) it can be a meta issue like the doc tracking one which you can update overtime and split into individual issues if needed. |
I will change my answer, after this comment.
No. XInclude only "runs" by calling
Let me be clear about this. The PHP manuals are at breaking point as far as libxml2 is concerned. There are files it loads, and there are files it rejects. Full stop. DITA DTDs are unusable on libxml2 for several months, and there are other reports of files being rejected starting at ~40 Mb size. Looking ahead, the PHP community may need to ask/contribute/fund for an "unlimited" option on libxml2, on a libxml2 version that it could use, compile and distribute (or building manual outside servers become impossible). The linked fix for DITA only fixes half the problem (the size amplification one), but PHP manual already are triggering another limit, entity recursion level. This is the |
While doing rebase (and tests) efforts, I found an entity collision. Enity |
Please remove the one in |
I will do it Monday morning, after merging the small PRs. |
Some other notes. I discovered only yesterday that is a whole W3C recommendation for XML Fragments, and I'm surprised to to see solution adopted here is the same of said recommendation. About replacing file entities by Process Instructions and/or XPointer, this might be possible. The problem is the bad iterations between entities and XInclude, so replacing file entities for PI/XI would need to exist as one of two possible stacks bellow: Without XInclude 1.1 native support
With XInclude 1.1 native support
I think it is possible to create an userland XInclude by Href, but I have not created a prototype yet, to test if the bad iteration can be overcome. The risk of succeeding here is that we may paint ourselves into a ugly corner of XML toolage in the end. So the answer might be: if possible, change to a XML processor that does XInclude 1.1. |
Pushed a change to detect duplicated entity names on first language loaded (so translations can detect internal duplications), and finally tested inter repository debug mode. Found two more duplicated entities between doc-base and doc-en, so there is three in total:
|
Remove the three duplicated entities from Meanwhile, this is waiting for idempotent to get merged (so regression tests get a little less random), but it's in good enough shape to be merged, if there is demand for experimentation while 8.4 changes are still high. |
Yes I think this is the best approach, those shouldn't be translate. |
And
After some tests today, my answer is that replacing file entities with anything else will only be possible by changing to a XML processor that does XInclude 1.1 and propagate entities between files, something that is not mandated by standards. The test. <!DOCTYPE a [<!ENTITY c "CC">]>
<a>
<b>&c;</b>
<b><xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="b.xml"/></b>
</a> And <b>&c;</b> That is,
Complete parsed infosets. As |
In the end, a simple
I found hacky ways to do controlled file loading/inclusion in userland code, by entity and XInclude, so in theory it is possible to get rid of |
Conflict resolved. But the main question remains. Do the languages manuals want to split |
This PR creates a new
doc-base/scripts/entities.ent
file, that is called fromconfigure.php
but can also be called from the command line.The new script start looking
global.ent
,manual.ent
andremove.ent
in each doc-lang repository. Besides the.ent
extension, these are normal. XML files, that uses the same namespaces as manual, so small entities placed here can be namespace clean(er),The new script also starts looking for an
entities/
dir in each doc-lang repository, and loads any.xml
file found here as an individual entity file, so bigger entities get easier to edit and can be now revchecked individually.Included are two other scripts,
dtdent-conv.php
anddtdent-split.php
, that bulk convert (or split) big files of DTD Entities into XML Entities. These tools are not necessary for implantation.This will make entity experimentation a lot easier, and is the enabling step into splitting
language-entities.ent
file. This works well, but is another possibly big change, so I do not plan to push for this until 2025, or the PHP 8.4 doc changes slow down, or if there is some demand for early experimentation.