obj[key] values need to be htmlEncoded to produce valid XML. Encoder clas#3
obj[key] values need to be htmlEncoded to produce valid XML. Encoder clas#3devrim wants to merge 1 commit into
Conversation
…class better be moved to it's own file.
|
hey Devrim, thanks for your pull request. I'd like to discuss a little bit more about this, though. The thing is that the official way to signalize xml parsers to not consider some text during the parsing is by using CDATA sections. As I said in the email I sent to you, if you wrap your text within a CDATA section you will be fine. The downside is that you will have to get rid of the CDATA stuff manually from your JSON. Indeed, the other way is by encoding yourself the text section using some sort of html encoder, like the one you're proposing, but having that in this project will affect every single conversion and I'm not sure if that's the desired behavior for all our users. |
|
hey camilo - you're right.. it may not be the best for everyone, but you could make it optional like, parser.toXml({encodeValues:"html"}) then people who are like me, will thank you :) others won't notice.. by the way xml that im parsing is not mine, so we can't always dictate how xml comes about (so CDATA is not up to me to decide) one weird thing i noticed, i convert my xml to json, and back to xml, it throws an error. but if i do xmllint --format my.xml >my1.xml then my1.xml is not html encoded, however it works perfectly (without CDATA) - so it makes me think that you can find another solution to this other than html encoding.. |
|
Hey Devrim, I haven't forget this one. I'll push html encoding as soon as I have some time. |
|
We are talking about XML, not HTML. Why should we HTML de-/encode anything? |
|
@fb55, It could be an optional element value sanitization through: parser.toXml({sanitize_values: true});And also a workaround for people that has to deal with xml containing unencoded characters and those not enclosed in CDATA sections. Like @devrim parsing mod_security rules. So to be specific, the following characters are going to be encoded, when they're found in element values, if |
|
thanks Camilo - looking forward! |
|
@c4milo That are the valid chars in XML ;) |
|
I see, @fb55, feel free to read http://en.m.wikipedia.org/wiki/XML, "Escaping" section :) |
|
this is the part of xml that requires encoding, http://pastie.org/2747404 i'm sure there are numerous others that do require it as well. as to why they do, i have no clue. |
|
Bumping.. I just encountered this issue as well, was wondering when this pull request could be authored? thx for the work! |
|
closed in 8a92e8b and released in xml2json v0.3.0 |
obj[key] values need to be htmlEncoded to produce valid XML. Encoder class better be moved to it's own file.