Add LIBXML_NOBLANKS to dom_load() XML parse options#304
Conversation
|
As commented on discord, I know little to nothing about PhD in this respect. Why PhD? Because some Docbook parts specify that whitespace should be preserved in the rendering process, so some whitespace may end up being relevant in PhD, or in final output. To get better performance and reduce memory, removing XML comments may give a similar impact, and these are unspecified in Docbook. |
|
Philip O gives an example where this causes a problem. The rendering of The problem is more general. Whitespace in Docbook is complicated. In some elements it is completely irrelevant, and could be trimmed, but there are some other contexts, where whitespace should be coalesced (like HTML), and other contexts where it should be fully preserved. There is a hint in old libxml docs:
So... it may be possible to change libxml to the "correct" Docbook behaviour if it is called in validating mode. |
Drops whitespace-only text nodes. Which shouldn't be harmful. It results in a smaller DOM, thus faster to traverse and requires less memory.
Benchmarked
configure.php --with-lang=en, 5 runs, PHP 8.5:Benchmark:
Raw files:
Before patch (baseline)
After patch