Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix weird boto docstrings #656

Merged
merged 1 commit into from Nov 19, 2016
Merged

fix weird boto docstrings #656

merged 1 commit into from Nov 19, 2016

Conversation

@thomasballinger
Copy link
Member

thomasballinger commented Nov 18, 2016

Boto is doing something pretty weird: in Python 3, it makes it possible to end up with bytestring docstrings. We fix this here by always assuming utf8 in this case. Previously we assumed ascii, and did it implicitly by letting string.split(u'\n') turn it into unicode, which was no good.

elif isinstance(docstring, str if py3 else unicode):
pass
else:
return []

This comment has been minimized.

@sebastinas

sebastinas Nov 18, 2016

Contributor

Is the elif and else really necessary? Or in other words: does the elif really cover all valid cases?

This comment has been minimized.

@thomasballinger

thomasballinger Nov 18, 2016

Author Member

The cases to cover:

Py2 bytes -> decode
Py2 unicode -> nop
Py2 something else (integer etc) -> abort

Py3 bytes -> shouldn't happen, but decode
Py3 bytes -> nop
Py3 something else -> abort

Might be nicer to:

if unicode:
    pass
else:
    try:
        docstring = docstring.decode

This comment has been minimized.

@thomasballinger

thomasballinger Nov 18, 2016

Author Member

To answer your question, docstrings should always be unicode in python 3, and in Python 2 they should always be bytestrings. (since we're getting them from pydoc.getdoc, which does this normalization) If we got a unicode string somehow in Python 2 that would be ok, but I don't know how that would happen. If we got a bytestring in Python3, which shouldn't happen, we would try to decode. So this does cover all valid cases, but it covers some extra too.

Now that I see where docstring comes from (pydoc.getdoc) I agree that the else isn't necessary.

The correct thing to do here is to find out the encoding of the source file the docstring comes from, since it doesn't have to be utf8, or at least catch errors here so a bad docstring doesn't crash bpython.

@sebastinas sebastinas merged commit f4f05b2 into master Nov 19, 2016
2 checks passed
2 checks passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.