Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upfix weird boto docstrings #656
Conversation
| elif isinstance(docstring, str if py3 else unicode): | ||
| pass | ||
| else: | ||
| return [] |
This comment has been minimized.
This comment has been minimized.
sebastinas
Nov 18, 2016
•
Contributor
Is the elif and else really necessary? Or in other words: does the elif really cover all valid cases?
This comment has been minimized.
This comment has been minimized.
thomasballinger
Nov 18, 2016
•
Author
Member
The cases to cover:
Py2 bytes -> decode
Py2 unicode -> nop
Py2 something else (integer etc) -> abort
Py3 bytes -> shouldn't happen, but decode
Py3 bytes -> nop
Py3 something else -> abort
Might be nicer to:
if unicode:
pass
else:
try:
docstring = docstring.decode
This comment has been minimized.
This comment has been minimized.
thomasballinger
Nov 18, 2016
Author
Member
To answer your question, docstrings should always be unicode in python 3, and in Python 2 they should always be bytestrings. (since we're getting them from pydoc.getdoc, which does this normalization) If we got a unicode string somehow in Python 2 that would be ok, but I don't know how that would happen. If we got a bytestring in Python3, which shouldn't happen, we would try to decode. So this does cover all valid cases, but it covers some extra too.
Now that I see where docstring comes from (pydoc.getdoc) I agree that the else isn't necessary.
The correct thing to do here is to find out the encoding of the source file the docstring comes from, since it doesn't have to be utf8, or at least catch errors here so a bad docstring doesn't crash bpython.
thomasballinger commentedNov 18, 2016
•
edited
Boto is doing something pretty weird: in Python 3, it makes it possible to end up with bytestring docstrings. We fix this here by always assuming utf8 in this case. Previously we assumed ascii, and did it implicitly by letting
string.split(u'\n')turn it into unicode, which was no good.