support unicode literal by yanganto · Pull Request #1043 · RustPython/RustPython

yanganto · 2019-06-18T16:30:53Z

support Unicode literal as same as CPython #1025

support Unicode literal \x with 2 digits
support Unicode literal \u with 4 digits
support Unicode literal \U with 8 digits

codecov-io · 2019-06-18T18:11:00Z

Codecov Report

Merging #1043 into master will decrease coverage by 0.03%.
The diff coverage is 38.46%.

@@            Coverage Diff             @@
##           master    #1043      +/-   ##
==========================================
- Coverage   65.17%   65.14%   -0.04%     
==========================================
  Files          99       99              
  Lines       17946    17972      +26     
  Branches     3994     4004      +10     
==========================================
+ Hits        11697    11708      +11     
- Misses       3515     3524       +9     
- Partials     2734     2740       +6

Impacted Files	Coverage Δ
parser/src/lexer.rs	`78.6% <38.46%> (-1.16%)`	⬇️
vm/src/format.rs	`70.52% <0%> (+0.22%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f73c74...7f2d014. Read the comment docs.

yanganto · 2019-06-19T00:59:48Z

The error handler needs to check.

windelbouwman · 2019-06-19T11:42:07Z


+    fn uncicode_literal(&mut self, literal_number: usize) -> Result<char, LexicalError> {
+        let location = self.get_pos();
+        let mut it: Vec<u32> = Vec::new();


I suggest to pick a better name for the variable it. Perhaps digits would be better?

Thanks for the suggestion, I remove the it and just put digit into p.

windelbouwman · 2019-06-19T11:44:26Z

+                            },
+                            Some('x') if !is_bytes => match self.uncicode_literal(2) {
+                                Ok(c) => string_content.push(c),
+                                Err(e) => return Err(e),


The pattern Err(e) => return Err(e) can be short circuit like this:

Some('x') if !is_bytes => string_content.push(self.unicode_literal(2)?),

Notice the lack match statement for the explicit error handling.

Thanks. I refactor this with ?.

windelbouwman · 2019-06-19T11:47:26Z

+            }
+        }
+        let mut p: u32 = 0u32;
+        for i in 1..=literal_number {


In this case, I suggest to use enumerate to iterate over the parsed digits and construct the numerical value.

for (i, d) in it.iter().enumerate() { p += d << i*4; }

See also: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.enumerate

Will the compiler optimize it out though?

Anyway, we can even do something like this

let p = it.iter().enumerate().fold(|acc, (i, d)| acc + d << i *4)

But only if this all line ends up being optimized to a loop

The it is removed, but I learn the powerful fold method, I will try to use it next time. Thank you. @michelhe

windelbouwman · 2019-06-19T11:48:52Z

@yanganto great job! I left some comments for improvements, but the change look good!

- support unicode literal \x with 2 digits - support unicode literal \u with 4 digits - support unicode literal \U with 8 digits - avoid to parse \x as unicode literal in bytes

yanganto force-pushed the feat/unicode-iterals branch 2 times, most recently from d857e46 to 7f2d014 Compare June 18, 2019 17:15

yanganto changed the title ~~support unicode literal~~ [WIP] support unicode literal Jun 19, 2019

windelbouwman reviewed Jun 19, 2019

View reviewed changes

yanganto force-pushed the feat/unicode-iterals branch 2 times, most recently from 75bc12e to 4c672aa Compare June 19, 2019 14:48

yanganto changed the title ~~[WIP] support unicode literal~~ support unicode literal Jun 19, 2019

support unicode literal

974dc68

- support unicode literal \x with 2 digits - support unicode literal \u with 4 digits - support unicode literal \U with 8 digits - avoid to parse \x as unicode literal in bytes

yanganto force-pushed the feat/unicode-iterals branch from 4c672aa to 974dc68 Compare June 19, 2019 16:30

windelbouwman merged commit 30979d9 into RustPython:master Jun 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support unicode literal#1043

support unicode literal#1043
windelbouwman merged 1 commit intoRustPython:masterfrom
yanganto:feat/unicode-iterals

yanganto commented Jun 18, 2019

Uh oh!

codecov-io commented Jun 18, 2019

Uh oh!

yanganto commented Jun 19, 2019

Uh oh!

windelbouwman Jun 19, 2019

Uh oh!

yanganto Jun 19, 2019

Uh oh!

windelbouwman Jun 19, 2019 •

edited

Loading

Uh oh!

yanganto Jun 19, 2019

Uh oh!

windelbouwman Jun 19, 2019 •

edited

Loading

Uh oh!

michelhe Jun 19, 2019

Uh oh!

michelhe Jun 19, 2019 •

edited

Loading

Uh oh!

yanganto Jun 19, 2019

Uh oh!

windelbouwman commented Jun 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yanganto commented Jun 18, 2019

Uh oh!

codecov-io commented Jun 18, 2019

Codecov Report

Uh oh!

yanganto commented Jun 19, 2019

Uh oh!

windelbouwman Jun 19, 2019

Choose a reason for hiding this comment

Uh oh!

yanganto Jun 19, 2019

Choose a reason for hiding this comment

Uh oh!

windelbouwman Jun 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanganto Jun 19, 2019

Choose a reason for hiding this comment

Uh oh!

windelbouwman Jun 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

michelhe Jun 19, 2019

Choose a reason for hiding this comment

Uh oh!

michelhe Jun 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanganto Jun 19, 2019

Choose a reason for hiding this comment

Uh oh!

windelbouwman commented Jun 19, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

windelbouwman Jun 19, 2019 •

edited

Loading

windelbouwman Jun 19, 2019 •

edited

Loading

michelhe Jun 19, 2019 •

edited

Loading