Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XInclude by xml:id #198

Merged
merged 4 commits into from
Dec 13, 2024
Merged

XInclude by xml:id #198

merged 4 commits into from
Dec 13, 2024

Conversation

alfsb
Copy link
Member

@alfsb alfsb commented Dec 4, 2024

This PR enhances configure.php to add XInclude 1.1 partial implementation. That is, XInclude by xml:id. Also, it moves the various sections of XInclude functionality into functions, to avoid polluting the global namespace. Points of interest:

  • xinclude_run_byid(): where the new magic occurs;
  • the new code, starting after if ( $explain ).

With this PR, would be possible to change manual like this:

diff --git a/reference/filesystem/functions/fgetcsv.xml b/reference/filesystem/functions/fgetcsv.xml
index 7e1ba4046d..9e313d2630 100644
--- a/reference/filesystem/functions/fgetcsv.xml
+++ b/reference/filesystem/functions/fgetcsv.xml
@@ -62,7 +62,7 @@
       </para>
      </listitem>
     </varlistentry>
-    <varlistentry>
+    <varlistentry xml:id="function.fgetcsv..parameters.separator">
      <term><parameter>separator</parameter></term>
      <listitem>
       <para>
@@ -71,7 +71,7 @@
       </para>
      </listitem>
     </varlistentry>
-    <varlistentry>
+    <varlistentry xml:id="function.fgetcsv..parameters.enclosure">
      <term><parameter>enclosure</parameter></term>
      <listitem>
       <para>
@@ -80,7 +80,7 @@
       </para>
      </listitem>
     </varlistentry>
-    <varlistentry>
+    <varlistentry xml:id="function.fgetcsv..parameters.escape">
      <term><parameter>escape</parameter></term>
      <listitem>
       <para>
diff --git a/reference/filesystem/functions/fputcsv.xml b/reference/filesystem/functions/fputcsv.xml
index f3a94fb1b1..ccaf4a3622 100644
--- a/reference/filesystem/functions/fputcsv.xml
+++ b/reference/filesystem/functions/fputcsv.xml
@@ -43,15 +43,9 @@
       </para>
      </listitem>
     </varlistentry>
-    <xi:include xpointer="xmlns(db=http://docbook.org/ns/docbook) xpointer(id('function.fgetcsv')/db:refsect1[@role='parameters']//db:varlistentry[db:term[db:parameter[text()='separator']]]/.)">
-     <xi:fallback/>
-    </xi:include>
-    <xi:include xpointer="xmlns(db=http://docbook.org/ns/docbook) xpointer(id('function.fgetcsv')/db:refsect1[@role='parameters']//db:varlistentry[db:term[db:parameter[text()='enclosure']]]/.)">
-     <xi:fallback/>
-    </xi:include>
-    <xi:include xpointer="xmlns(db=http://docbook.org/ns/docbook) xpointer(id('function.fgetcsv')/db:refsect1[@role='parameters']//db:varlistentry[db:term[db:parameter[text()='escape']]]/.)">
-     <xi:fallback/>
-    </xi:include>
+    <xi:include xpointer="function.fgetcsv..parameters.separator"/>
+    <xi:include xpointer="function.fgetcsv..parameters.enclosure"/>
+    <xi:include xpointer="function.fgetcsv..parameters.escape"/>
     <varlistentry>
      <term><parameter>eol</parameter></term>
      <listitem>

The format for XInclude ID targets, id.on.file dot dot local.path is not special. It is only a convention, but I think it is a good one, because:

  • It helps differentiate between "structural" IDs and these XInclude target IDs;
  • It helps indicate what parts of manual are expected to be copied elsewhere;
  • It helps debug some unexpected, but possibly duplicated xml:id, in some cases.

That last point warrants a detailed explanation. Running this PR with the patch above also illustrates the point. Until now, XIncludes by XPath/XPointer was not duplicating IDs because it would generate a validation error. So XInclude by path, targeting elements with IDs are impossible to pull off. XInclude by xml:id never duplicates any IDs. But mixed includes by path and ID probably will duplicate IDs, at least temporary, in doc-en and translations, because there will be more xml:ids around, positioned exactly where parts of manual are targeted for duplication by paths.

The patch above generates this warnings:

  Random removing duplicated xml:id: function.fgetcsv..parameters.separator
  Random removing duplicated xml:id: function.fgetcsv..parameters.enclosure
  Random removing duplicated xml:id: function.fgetcsv..parameters.escape
  Random removing duplicated xml:id: function.fgetcsv..parameters.separator
  Random removing duplicated xml:id: function.fgetcsv..parameters.enclosure
  Random removing duplicated xml:id: function.fgetcsv..parameters.escape

That is, there are six XIndluces by path that pull parts of the manual with xml:ids. This could occur in translations, each time an ID targeted XInclude is added. I could not find a reliable way to delete IDs, only from duplicated nodes, so this is what I came up with.

This PR has the potential to really shrink language-snippets.ent, by placing a lot of related entities in their related extensions, and XIncluding them elsewhere.

And... that's it. Reviews and comments welcome. HT to a humble comment, that started this all.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the naming convention for the IDs makes sense, but this should be documented somewhere, @jimwins probably could say where.

If the policy going forward is to remove fallbacks, then gen_stub.php in the php/php-src repo needs to be updated. But it might also make sense to trial this whole system on the big doc PR for the new DOM classes by @nielsdos. The rationale being this is brand-new content that does need to concern itself with translations and we can work out all the issues.

@alfsb
Copy link
Member Author

alfsb commented Dec 4, 2024

About the new version and docs, there is a way to run one XPointer query, akin to run one XPath query via DOMXPath class?

@jimwins
Copy link
Member

jimwins commented Dec 4, 2024

I think the naming convention for the IDs makes sense, but this should be documented somewhere, @jimwins probably could say where.

Maybe we could add a section to https://doc.php.net/guide/structure.md about conventions around xml:id and using XInclude.

@nielsdos
Copy link
Member

nielsdos commented Dec 4, 2024

there is a way to run one XPointer query, akin to run one XPath query via DOMXPath class?

Sadly not

@alfsb
Copy link
Member Author

alfsb commented Dec 5, 2024

Sadly not

I worried that this would be the answer. So there is no simple way to recreate (and control) XInclude by XPoint in user land code. There possibly is hack ways to do this, but I can imagine that will be brittle and slow, doubling memory usage on configure.php (from ~2 Gb to ~4 Gb),, or requiring several loads/saves of manual.xml, if not running into exponential slowness.

Another question: I barely found about xmlCtxtSetMaxAmplification. From memory, do you know about a similar option to control entity recursion level, @nielsdos ?

@Girgias Girgias mentioned this pull request Dec 5, 2024
27 tasks
@alfsb
Copy link
Member Author

alfsb commented Dec 5, 2024

Pushed a bit of documentation. It's more an skeleton, really. This parte will need to be heavy reviewed into idiomatic English.

@nielsdos
Copy link
Member

nielsdos commented Dec 5, 2024

Another question: I barely found about xmlCtxtSetMaxAmplification. From memory, do you know about a similar option to control entity recursion level, @nielsdos ?

No I don't know of another. Also this option can't be set from within PHP unfortunately :/ And since this is a new feature we can only add it in 8.5.
Using FFI would work probably to set the value higher, but then manual editors need that extension loaded and it's not even a default-enabled extension...

@alfsb
Copy link
Member Author

alfsb commented Dec 5, 2024

No I don't know of another.

Thanks for the answers. And by so, I will not remove the workaround on file-entities script, for the limit that all languages already exceed.

Also this option can't be set from within PHP unfortunately :/ And since this is a new feature we can only add it in 8.5. Using FFI would work probably to set the value higher, but then manual editors need that extension loaded and it's not even a default-enabled extension...

C'est la vie, I think. Let's hope for the best, and that none of languages start being locked out (as DITA DTD guys gone for years, if I'm reading bug dates correctly).

@alfsb
Copy link
Member Author

alfsb commented Dec 6, 2024

Rebased and retested. Produces identical results on doc-en compared to ´master` (from pristine checkouts), and produces expected results when the example diff above is applied (only differences are automatic namespace cleanups). So it's a op-in feature of the sorts, that can be tried up as suggested.

The only thing left is to test it in all languages, which I will do during the day.

@alfsb alfsb reopened this Dec 6, 2024
@alfsb
Copy link
Member Author

alfsb commented Dec 6, 2024

Tested on all languages, without regressions. Pushed changes are mostly related to error reporting.

CI builds fails because this depends on PR 200.

@alfsb
Copy link
Member Author

alfsb commented Dec 9, 2024

Merged 200, CI now passes. @nielsdos , let me know if you are interested in testing this for the new DOM classes by, and in what time frame. 202 may conflict with this, but opened it separately because non idempotency and non contextual error messages are eroding my sanity.

@nielsdos
Copy link
Member

nielsdos commented Dec 9, 2024

@nielsdos , let me know if you are interested in testing this for the new DOM classes by, and in what time frame.

The initial batch of the new DOM docs were merged a few days ago, but there is still more documentation to write. The reason we merged that already is because that PR already was huge. (See php/doc-en#4212). I plan on documenting some more this weekend and I could try to play with this new mechanism.

@alfsb
Copy link
Member Author

alfsb commented Dec 10, 2024 via email

@alfsb
Copy link
Member Author

alfsb commented Dec 12, 2024

All translations failing in CI. So I'll postpone this.

Incidentally, this would be a good use for remove.ent, to deal with changed/removed entities, even automatically generated ones.

@alfsb alfsb reopened this Dec 12, 2024
@alfsb
Copy link
Member Author

alfsb commented Dec 12, 2024

Rebased again. Will do more testing before merging this tomorrow. This incidentally fixes doc-pl, as it has duplicated IDs.

@alfsb alfsb merged commit 4776d12 into php:master Dec 13, 2024
11 of 12 checks passed
@alfsb
Copy link
Member Author

alfsb commented Dec 13, 2024

Merged. Let me know if there are problems. Tested on CI, local and Windows.

Only doc-it fails, as for now, and yet it may help doc-it in the end, after removing some <xi:fallback> on doc-en, the new code will start patching up failed XIncluide with non-empty fixups.

@alfsb alfsb deleted the xinclude11 branch December 13, 2024 13:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants