Optimize double-width safe_substr when all double-width.#114
Conversation
|
Test table at #111 is using a theme unit testing data Japanese version. Following is a result of my blog. Almost all posts in my blog is writing about programming, so it has many single-width chars. Anyway I found a new problem, if string has a single-width chars with odd number of times, the table column will be slidden. 😓 |
|
I wrote a example solution to solve it. It is not using regex so I guess performance problem will be improved.
Following lines are unit test, I could see it is working. How do you think about this idea? |
|
Another excellent catch. I think the problem with the current code is that it's rounding up instead of rounding down. I'm going to push (what I think is) a fix for that now, incorporating your test. If you could check it over and perhaps add more tests it would be great.
I think it works great, though doing it that way is going to be slow. |
Sure! I'll send PR. |
Add single-width char to test
|
Thanks @miya0001 for extra test (and for everything else!). Based on your feedback I'm leaning towards not bothering with the "optimization" but just keeping the fix, particularly as I didn't realize that half-width characters such as |
|
"カ" is a Halfwidth Katakana with UTF-8. I am worrying about the regex, it looks very complexity. But it looks working fine, I think we can merge this PR. 👍 Thanks! 😄 |

Related #111
Looking at the @miya0001's test table at #111 (review) it struck me that most of the time a column with double-width chars will have a number of entries with double-width content only, which suggests this simple optimization which checks for this using a
preg_match_all()and just halves the length if so.Crude benchmarking suggests a performance win if the percentage of such entries is above 10% or so, and only a small (2%) penalty if it's less than that, and a major win (100s of %) if it's anything above 50%, which I'm guessing is most likely to be the case - @miya0001 could you comment on whether this is likely in your experience for real data?