Skip to content
This repository was archived by the owner on Jan 7, 2021. It is now read-only.

obj[key] values need to be htmlEncoded to produce valid XML. Encoder clas#3

Closed
devrim wants to merge 1 commit into
buglabs:masterfrom
devrim:patch-1
Closed

obj[key] values need to be htmlEncoded to produce valid XML. Encoder clas#3
devrim wants to merge 1 commit into
buglabs:masterfrom
devrim:patch-1

Conversation

@devrim

@devrim devrim commented Aug 14, 2011

Copy link
Copy Markdown

obj[key] values need to be htmlEncoded to produce valid XML. Encoder class better be moved to it's own file.

@c4milo

c4milo commented Aug 14, 2011

Copy link
Copy Markdown
Contributor

hey Devrim, thanks for your pull request. I'd like to discuss a little bit more about this, though.

The thing is that the official way to signalize xml parsers to not consider some text during the parsing is by using CDATA sections. As I said in the email I sent to you, if you wrap your text within a CDATA section you will be fine. The downside is that you will have to get rid of the CDATA stuff manually from your JSON. Indeed, the other way is by encoding yourself the text section using some sort of html encoder, like the one you're proposing, but having that in this project will affect every single conversion and I'm not sure if that's the desired behavior for all our users.

@devrim

devrim commented Aug 14, 2011

Copy link
Copy Markdown
Author

hey camilo - you're right.. it may not be the best for everyone, but you could make it optional like,

parser.toXml({encodeValues:"html"}) then people who are like me, will thank you :) others won't notice.. by the way xml that im parsing is not mine, so we can't always dictate how xml comes about (so CDATA is not up to me to decide)

one weird thing i noticed, i convert my xml to json, and back to xml, it throws an error. but if i do xmllint --format my.xml >my1.xml then my1.xml is not html encoded, however it works perfectly (without CDATA) - so it makes me think that you can find another solution to this other than html encoding..

@c4milo

c4milo commented Oct 23, 2011

Copy link
Copy Markdown
Contributor

Hey Devrim, I haven't forget this one. I'll push html encoding as soon as I have some time.

@fb55

fb55 commented Oct 23, 2011

Copy link
Copy Markdown

We are talking about XML, not HTML. Why should we HTML de-/encode anything?

@c4milo

c4milo commented Oct 23, 2011

Copy link
Copy Markdown
Contributor

@fb55, It could be an optional element value sanitization through:

parser.toXml({sanitize_values: true});

And also a workaround for people that has to deal with xml containing unencoded characters and those not enclosed in CDATA sections. Like @devrim parsing mod_security rules.

So to be specific, the following characters are going to be encoded, when they're found in element values, if sanitize_values is true:

<       &lt;
>       &gt;
(       &#40;
)       &#41;
#       &#35;
&       &amp;
"       &quot;

@devrim

devrim commented Oct 23, 2011

Copy link
Copy Markdown
Author

thanks Camilo - looking forward!

@fb55

fb55 commented Oct 23, 2011

Copy link
Copy Markdown

@c4milo That are the valid chars in XML ;)

@c4milo

c4milo commented Oct 23, 2011

Copy link
Copy Markdown
Contributor

I see, @fb55, feel free to read http://en.m.wikipedia.org/wiki/XML, "Escaping" section :)

@devrim

devrim commented Oct 23, 2011

Copy link
Copy Markdown
Author

this is the part of xml that requires encoding, http://pastie.org/2747404 i'm sure there are numerous others that do require it as well. as to why they do, i have no clue.

@adebree

adebree commented Apr 18, 2012

Copy link
Copy Markdown

Bumping.. I just encountered this issue as well, was wondering when this pull request could be authored? thx for the work!

@c4milo

c4milo commented Sep 8, 2012

Copy link
Copy Markdown
Contributor

closed in 8a92e8b and released in xml2json v0.3.0

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants