Skip to content

support unicode literal#1043

Merged
windelbouwman merged 1 commit intoRustPython:masterfrom
yanganto:feat/unicode-iterals
Jun 19, 2019
Merged

support unicode literal#1043
windelbouwman merged 1 commit intoRustPython:masterfrom
yanganto:feat/unicode-iterals

Conversation

@yanganto
Copy link
Copy Markdown
Contributor

support Unicode literal as same as CPython #1025

  • support Unicode literal \x with 2 digits
  • support Unicode literal \u with 4 digits
  • support Unicode literal \U with 8 digits

@yanganto yanganto force-pushed the feat/unicode-iterals branch 2 times, most recently from d857e46 to 7f2d014 Compare June 18, 2019 17:15
@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #1043 into master will decrease coverage by 0.03%.
The diff coverage is 38.46%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1043      +/-   ##
==========================================
- Coverage   65.17%   65.14%   -0.04%     
==========================================
  Files          99       99              
  Lines       17946    17972      +26     
  Branches     3994     4004      +10     
==========================================
+ Hits        11697    11708      +11     
- Misses       3515     3524       +9     
- Partials     2734     2740       +6
Impacted Files Coverage Δ
parser/src/lexer.rs 78.6% <38.46%> (-1.16%) ⬇️
vm/src/format.rs 70.52% <0%> (+0.22%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0f73c74...7f2d014. Read the comment docs.

@yanganto yanganto changed the title support unicode literal [WIP] support unicode literal Jun 19, 2019
@yanganto
Copy link
Copy Markdown
Contributor Author

The error handler needs to check.

Comment thread parser/src/lexer.rs Outdated

fn uncicode_literal(&mut self, literal_number: usize) -> Result<char, LexicalError> {
let location = self.get_pos();
let mut it: Vec<u32> = Vec::new();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to pick a better name for the variable it. Perhaps digits would be better?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I remove the it and just put digit into p.

Comment thread parser/src/lexer.rs Outdated
},
Some('x') if !is_bytes => match self.uncicode_literal(2) {
Ok(c) => string_content.push(c),
Err(e) => return Err(e),
Copy link
Copy Markdown
Contributor

@windelbouwman windelbouwman Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pattern Err(e) => return Err(e) can be short circuit like this:

Some('x') if !is_bytes => string_content.push(self.unicode_literal(2)?),

Notice the lack match statement for the explicit error handling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I refactor this with ?.

Comment thread parser/src/lexer.rs Outdated
}
}
let mut p: u32 = 0u32;
for i in 1..=literal_number {
Copy link
Copy Markdown
Contributor

@windelbouwman windelbouwman Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I suggest to use enumerate to iterate over the parsed digits and construct the numerical value.

for (i, d) in it.iter().enumerate() {
   p += d << i*4;
}

See also: https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.enumerate

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the compiler optimize it out though?

Copy link
Copy Markdown
Contributor

@michelhe michelhe Jun 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, we can even do something like this

let p = it.iter().enumerate().fold(|acc, (i, d)| acc + d << i *4)

But only if this all line ends up being optimized to a loop

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The it is removed, but I learn the powerful fold method, I will try to use it next time. Thank you. @michelhe

@windelbouwman
Copy link
Copy Markdown
Contributor

@yanganto great job! I left some comments for improvements, but the change look good!

@yanganto yanganto force-pushed the feat/unicode-iterals branch 2 times, most recently from 75bc12e to 4c672aa Compare June 19, 2019 14:48
@yanganto yanganto changed the title [WIP] support unicode literal support unicode literal Jun 19, 2019
- support unicode literal \x with 2 digits
- support unicode literal \u with 4 digits
- support unicode literal \U with 8 digits
- avoid to parse \x as unicode literal in bytes
@yanganto yanganto force-pushed the feat/unicode-iterals branch from 4c672aa to 974dc68 Compare June 19, 2019 16:30
@windelbouwman windelbouwman merged commit 30979d9 into RustPython:master Jun 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants