Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions Lib/test/test_cmd_line.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,6 @@ def test_verbose(self):
rc, out, err = assert_python_ok('-vv')
self.assertNotIn(b'stack overflow', err)

# TODO: RUSTPYTHON
@unittest.expectedFailure
@unittest.skipIf(interpreter_requires_environment(),
'Cannot run -E tests when PYTHON env vars are required.')
def test_xoptions(self):
Expand Down
2 changes: 0 additions & 2 deletions Lib/test/test_subprocess.py
Original file line number Diff line number Diff line change
Expand Up @@ -761,8 +761,6 @@ def test_env(self):
stdout, stderr = p.communicate()
self.assertEqual(stdout, b"orange")

# TODO: RUSTPYTHON
@unittest.expectedFailure
# Windows requires at least the SYSTEMROOT environment variable to start
# Python
@unittest.skipIf(sys.platform == 'win32',
Expand Down
4 changes: 1 addition & 3 deletions extra_tests/snippets/syntax_non_utf8.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,6 @@

dir_path = os.path.dirname(os.path.realpath(__file__))

# TODO: RUSTPYTHON, RustPython raises a SyntaxError here, but cpython raise a ValueError
error = SyntaxError if platform.python_implementation() == 'RustPython' else ValueError
with assert_raises(error):
with assert_raises(ValueError):
with open(os.path.join(dir_path , "non_utf8.txt")) as f:
eval(f.read())
28 changes: 26 additions & 2 deletions vm/src/stdlib/builtins.rs
Original file line number Diff line number Diff line change
Expand Up @@ -248,11 +248,35 @@ mod builtins {
#[cfg(feature = "rustpython-compiler")]
#[pyfunction]
fn eval(
source: Either<PyStrRef, PyRef<crate::builtins::PyCode>>,
source: Either<
Either<PyStrRef, crate::builtins::PyBytesRef>,
PyRef<crate::builtins::PyCode>,
>,
Copy link
Copy Markdown
Member

@youknowone youknowone Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this exactly (str|bytes) or ArgStrOrBytesLike?
ArgStrOrBytesLike accepts more types like bytearray. Some functions take exactly bytes, while other functions take also bytearray

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run rustpython with ArgStrOrBytesLike, it seems very suit for this case. string, bytes or code is just error message and the eval also receives bytearray Buffer 🤔 . So I'll push a commit apply ArgStrOrBytesLike. Thanks! 🙏🏻

scope: ScopeArgs,
vm: &VirtualMachine,
) -> PyResult {
run_code(vm, source, scope, compile::Mode::Eval, "eval")
// source as string
let code = match source {
Either::A(either) => {
let source: &[u8] = match either {
Either::A(ref a) => a.as_str().as_bytes(),
Either::B(ref b) => b.as_bytes(),
};
for x in source {
if *x == 0 {
return Err(vm.new_value_error(
"source code string cannot contain null bytes".to_owned(),
));
}
}
Comment thread
moreal marked this conversation as resolved.
Outdated

Ok(Either::A(
vm.ctx.new_str(std::str::from_utf8(source).unwrap()),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if bytes include non-ascii bytes?
what happens if bytes include invalid utf8 character?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if bytes include invalid utf8 character?

It causes panic. I tested with the below python code:

eval(b'\xff')

And it prints:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', vm/src/stdlib/builtins.rs:268:64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

Because the C language const char * type for string and also byte array, it can be represented as a single type with a compiler option. So I'll see more the compiler, parser section. If you have useful documentations to recommend, please leave them as comments 🙇🏻‍♂️

// _PyPegen_run_parser_from_string
    if (flags != NULL && flags->cf_flags & PyCF_IGNORE_COOKIE) {
        tok = PyTokenizer_FromUTF8(str, exec_input);
    } else {
        tok = PyTokenizer_FromString(str, exec_input);
    }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I thought was .unwrap() need to be avoided because CPython would not panic but raise a SyntaxError.
The major difference between FromUTF8 and FromString is using decoder or not. The decode error seems to be wrapped by SyntaxError somewhere.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I agree with your thought. 🙏🏻 I'll work for wrapping the decode error.

Copy link
Copy Markdown
Contributor Author

@moreal moreal Aug 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you execute eval(b'\xff') in CPython implementation, you will see the below output:

>>> eval(b"\xff")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1
    �
    ^
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

I set SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte message as format and I tried to apply it in 714ce4d.

After the commit, in RustPython:

>>>>> eval(b'\xff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

))
}
Either::B(code) => Ok(Either::B(code)),
}?;
run_code(vm, code, scope, compile::Mode::Eval, "eval")
}

/// Implements `exec`
Expand Down