bpo-14465: xml.etree.ElementTree pretty printing#4016
Conversation
| is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). | ||
| *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. | ||
| *pretty* enables pretty printing with default indent. Keyword-only *indent* | ||
| allows to set custom indent. |
There was a problem hiding this comment.
allows to set → allows setting (or allows one to set, etc)
| emitted as a single self-closed tag, otherwise they are emitted as a pair | ||
| of start/end tags. | ||
| of start/end tags. *indent* enables and set indentation level for the | ||
| human readable output. |
There was a problem hiding this comment.
enables and sets
human-readable
|
@vadmium Thanks for the doc review - I've fixed the mistakes. |
| generate a Unicode string (otherwise, a bytestring is generated). *method* | ||
| is either ``"xml"``, ``"html"`` or ``"text"`` (default is ``"xml"``). | ||
| *short_empty_elements* has the same meaning as in :meth:`ElementTree.write`. | ||
| *pretty* enables pretty printing with default indent. Keyword-only *indent* |
There was a problem hiding this comment.
Please name it pretty_print instead for lxml compatibility.
|
Does this slow down the normal serialisation in any noticeable way? |
|
As for serialization performance - here are some simple tests. I've tested with current master and my branch with and without pretty printing. I did tostring serialization on 207KB xml file. So the new version is 14% slower. Pretty printing is 17% slower than non pretty printing. Overall, pretty printing enabled code is 34% slower than original version from master. I'm not sure if it's really scary. Do we have such comparison for lxml? |
I suggest you to use perf timeit, timeit is not reliable. |
|
Here are the benchmarks using This one shows that new version is 12% slower. Pretty printing adds 10%. I don't think that slowing down because of pretty printing is a big deal because if you need fast serialization you don't need a pretty output. I'm gonna see how lxml behaves with and without pretty printing, cause I think I'll see the same percentage. What I'm worried about is 12% slowing against master version. Should we really stress it? Because I think if you really need to serialize a lot of XML and need it fast you'll gonna use lxml anyway. |
97783fb to
1e9149c
Compare
|
@dzeban I think it's unfair to assume users will always use
|
I don't see how a rare and "mostly for debugging" feature like pretty printing could possibly justify such a big slow-down for a time critical feature like serialisation.
"Many" seems highly exaggerated.
A surprising argument, given that almost all of the documentation (and tutorials and books and ...) for
You shouldn't believe everything that random people write on the Internet. Why do you think there is a dedicated subclassing API in lxml? |
|
@scoder Valid counterpoints. Though, on your last point:
I was referencing an answer I wrote myself based on the documentation for
Upon further review, it appears |
|
While I'm in favor of adding an option for pretty printing, I'm not sure it's worth a 12% decrease in performance. Could the performance impact be lessened if the |
|
@Stevoisiak do you mean to remove |
@Stevoisiak This feels like a serious ungrateful bikeshedding to me. Could we first add this feature, which will finally remove for me the last and only reason why to use |
That's true for many use cases but not for all. ElementTree is actually very memory efficient and very fast on certain API operations. Building trees is something that can be done very fast in ElementTree but is not quite as fast in lxml. If the serialisation then slows it down so much afterwards, that is a problem. I would rather like to see the serialisation performance of ElementTree improved, definitely not further slowed down. |
The point I wanted to make is, that the question of performance of the ElementTree as found in the standard library is quite orthogonal to whether I would still have to pepper my code with silliness like: # from http://infix.se/2007/02/06/gentlemen-indent-your-xml
def __xmlIndent(self,elem,level=0):
i = "\n" + level*" "
if len(elem):
if not elem.text or not elem.text.strip():
elem.text = i + " "
for e in elem:
self.__xmlIndent(e, level+1)
if not e.tail or not e.tail.strip():
e.tail = i + " "
if not e.tail or not e.tail.strip():
e.tail = i
else:
if level and (not elem.tail or not elem.tail.strip()):
elem.tail = iPerformance is the never ending story (https://bugs.python.org/issue14465#msg323690 promises more holy wars over |
Well, you can't just drop a performance regression on everyone just to get a feature implemented that many people won't use, and that is particularly uninteresting for performance relevant cases (where the amount of data is large and you wouldn't want to inflate it further). This is simply the wrong tradeoff.
What about providing a pull request that takes that code and either a) adds it as an |
Well, of course, the
It wouldn't solve all problems. I would really prefer having compatible API (of course, again, defaulting to |
Ah, but the performance regression of this PR was measured to be 12% when it the option is disabled and another 10% when enabling the option. I wouldn't mind the pretty printing case being slower, but I am arguing against making the normal case so much slower. |
|
Rejecting this PR since it slows down the normal (non pretty-printed) serialisation. |
This PR adds pretty printing to the xml.etree.ElementTree module, namely to the
tostring,tostringlistfunctions andwritemethod of the ElementTree object.Pretty printing feature saves backward compatibility in a sense that old output won't be changed.
https://bugs.python.org/issue14465