Describe the bug
The issue pertains to the charset_normalizer.detect() method, which fails to perform valid encoding detection for expression sequences. Specifically, the method incorrectly recognizes the sequence as Big5 and returns it as a Chinese character. This behavior can be observed with the provided code snippet:
To Reproduce
Execute the code snippet, where the charset_normalizer.detect() method misidentifies the encoding of the sequence.
import charset_normalizer
text = b"What Actions Will Keep Us at 1.5-2\xbaC?"
text.decode(charset_normalizer.detect(text)["encoding"])
Expected behavior
The expected behavior can be demonstrated using the chardet library, which accurately recognizes the encoding as ISO 8859-1. The correct degree character is then returned from the sequence:
import chardet
text = b"What Actions Will Keep Us at 1.5-2\xbaC?"
text.decode(chardet.detect(text)["encoding"])
Logs
charset-normalizer:
What Actions Will Keep Us at 1.5-2慢?
chardet:
What Actions Will Keep Us at 1.5-2ºC?
{'encoding': 'ISO-8859-1', 'confidence': 0.73, 'language': ''} # chardet
{'encoding': 'Big5', 'language': 'Chinese', 'confidence': 1.0} # charset-normalizer
Desktop (please complete the following information):
- OS: Linux
- Python version 3.10
- Package version 3.3.2 / 2.1.1
Describe the bug
The issue pertains to the charset_normalizer.detect() method, which fails to perform valid encoding detection for expression sequences. Specifically, the method incorrectly recognizes the sequence as Big5 and returns it as a Chinese character. This behavior can be observed with the provided code snippet:
To Reproduce
Execute the code snippet, where the charset_normalizer.detect() method misidentifies the encoding of the sequence.
Expected behavior
The expected behavior can be demonstrated using the chardet library, which accurately recognizes the encoding as ISO 8859-1. The correct degree character is then returned from the sequence:
Logs
charset-normalizer:
What Actions Will Keep Us at 1.5-2慢?chardet:
What Actions Will Keep Us at 1.5-2ºC?Desktop (please complete the following information):