This repository was archived by the owner on Feb 2, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 62
Series str contains #793
Merged
AlexanderKalistratov
merged 6 commits into
IntelPython:master
from
Rubtsowa:series_str_contains
Apr 7, 2020
Merged
Series str contains #793
Changes from 5 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
f96bcd3
impl Series.str.contains
Rubtsowa aa8a556
Merge branch 'master' of https://github.com/IntelPython/hpat into ser…
Rubtsowa 19aabbd
fix problem with PEP8
Rubtsowa cb0f58e
Merge branch 'master' of https://github.com/IntelPython/hpat into ser…
Rubtsowa 2cd87a1
add test
Rubtsowa 0b2af26
change message error
Rubtsowa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,38 @@ | ||
| # ***************************************************************************** | ||
| # Copyright (c) 2019-2020, Intel Corporation All rights reserved. | ||
| # | ||
| # Redistribution and use in source and binary forms, with or without | ||
| # modification, are permitted provided that the following conditions are met: | ||
| # | ||
| # Redistributions of source code must retain the above copyright notice, | ||
| # this list of conditions and the following disclaimer. | ||
| # | ||
| # Redistributions in binary form must reproduce the above copyright notice, | ||
| # this list of conditions and the following disclaimer in the documentation | ||
| # and/or other materials provided with the distribution. | ||
| # | ||
| # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
| # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, | ||
| # THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR | ||
| # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR | ||
| # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, | ||
| # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, | ||
| # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; | ||
| # OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, | ||
| # WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR | ||
| # OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, | ||
| # EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | ||
| # ***************************************************************************** | ||
|
|
||
| import pandas as pd | ||
| from numba import njit | ||
|
|
||
|
|
||
| @njit | ||
| def series_str_contains(): | ||
| series = pd.Series(['dog', 'foo', 'bar']) | ||
|
|
||
| return series.str.contains('o') # Expect series of True, True, False | ||
|
|
||
|
|
||
| print(series_str_contains()) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -82,7 +82,7 @@ def hpat_pandas_stringmethods_upper_impl(self): | |||||
|
|
||||||
| import numba | ||||||
| from numba.types import (Boolean, Integer, NoneType, | ||||||
| Omitted, StringLiteral, UnicodeType) | ||||||
| Omitted, StringLiteral, UnicodeType, Number, Set) | ||||||
|
|
||||||
| from sdc.utilities.sdc_typing_utils import TypeChecker | ||||||
| from sdc.datatypes.hpat_pandas_stringmethods_types import StringMethodsType | ||||||
|
|
@@ -151,6 +151,87 @@ def hpat_pandas_stringmethods_center_impl(self, width, fillchar=' '): | |||||
| return hpat_pandas_stringmethods_center_impl | ||||||
|
|
||||||
|
|
||||||
| @sdc_overload_method(StringMethodsType, 'contains') | ||||||
| def hpat_pandas_stringmethods_contains(self, pat, case=True, flags=0, na=None, regex=True): | ||||||
| """ | ||||||
| Intel Scalable Dataframe Compiler User Guide | ||||||
| ******************************************** | ||||||
| Pandas API: pandas.Series.str.contains | ||||||
|
|
||||||
| Limitations | ||||||
| ----------- | ||||||
| - Series elements are expected to be Unicode strings. Elements cannot be `NaNs`. | ||||||
| - Parameter ``na`` is supported only with default value ``None``. | ||||||
| - Parameter ``flags`` is supported only with default value ``0``. | ||||||
| - Parameter ``regex`` is supported only with default value ``True``. | ||||||
|
|
||||||
| Examples | ||||||
| -------- | ||||||
| .. literalinclude:: ../../../examples/series/str/series_str_contains.py | ||||||
| :language: python | ||||||
| :lines: 27- | ||||||
| :caption: Tests if string element contains a pattern. | ||||||
| :name: ex_series_str_contains | ||||||
|
|
||||||
| .. command-output:: python ./series/str/series_str_contains.py | ||||||
| :cwd: ../../../examples | ||||||
|
|
||||||
| .. seealso:: | ||||||
| :ref:`Series.str.startswith <pandas.Series.str.startswith>` | ||||||
| Same as endswith, but tests the start of string. | ||||||
| :ref:`Series.str.endswith <pandas.Series.str.endswith>` | ||||||
| Same as startswith, but tests the end of string. | ||||||
|
|
||||||
| Intel Scalable Dataframe Compiler Developer Guide | ||||||
| ************************************************* | ||||||
|
|
||||||
| Pandas Series method :meth:`pandas.core.strings.StringMethods.contains()` implementation. | ||||||
|
|
||||||
| .. only:: developer | ||||||
|
|
||||||
| Test: python -m sdc.runtests -k sdc.tests.test_series.TestSeries.test_series_contains | ||||||
| """ | ||||||
|
|
||||||
| ty_checker = TypeChecker('Method contains().') | ||||||
| ty_checker.check(self, StringMethodsType) | ||||||
|
|
||||||
| if not isinstance(pat, (StringLiteral, UnicodeType)): | ||||||
| ty_checker.raise_exc(pat, 'str', 'pat') | ||||||
|
|
||||||
| if not isinstance(na, (Omitted, NoneType)) and na is not None: | ||||||
| ty_checker.raise_exc(na, 'none', 'na') | ||||||
|
|
||||||
| if not isinstance(case, (Boolean, Omitted)) and case is not True: | ||||||
| ty_checker.raise_exc(case, 'bool', 'case') | ||||||
|
|
||||||
| if not isinstance(flags, (Omitted, Integer)) and flags != 0: | ||||||
| ty_checker.raise_exc(flags, 'int64', 'flags') | ||||||
|
|
||||||
| if not isinstance(regex, (Omitted, Boolean)) and regex is not True: | ||||||
| ty_checker.raise_exc(regex, 'bool', 'regex') | ||||||
|
|
||||||
| def hpat_pandas_stringmethods_contains_impl(self, pat, case=True, flags=0, na=None, regex=True): | ||||||
| if flags != 0: | ||||||
| raise ValueError('Parameter flags can be only 0.') | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I propose to do something like that.
Suggested change
|
||||||
|
|
||||||
| if not regex: | ||||||
| raise ValueError('Parameter regex can be only True.') | ||||||
|
|
||||||
| if not case: | ||||||
| _pat = pat.lower() | ||||||
| else: | ||||||
| _pat = pat | ||||||
|
|
||||||
| len_data = len(self._data) | ||||||
| res_list = numpy.empty(len_data, numba.types.boolean) | ||||||
| for idx in numba.prange(len_data): | ||||||
| res_list[idx] = _pat in self._data._data[idx] | ||||||
|
|
||||||
| return pandas.Series(res_list, self._data._index, name=self._data._name) | ||||||
|
|
||||||
| return hpat_pandas_stringmethods_contains_impl | ||||||
|
|
||||||
|
|
||||||
| @sdc_overload_method(StringMethodsType, 'endswith') | ||||||
| def hpat_pandas_stringmethods_endswith(self, pat, na=None): | ||||||
| """ | ||||||
|
|
||||||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -297,6 +297,10 @@ def rstrip_usecase(series, to_strip=None): | |
| return series.str.rstrip(to_strip) | ||
|
|
||
|
|
||
| def contains_usecase(series, pat, case=True, flags=0, na=None, regex=True): | ||
| return series.str.contains(pat, case, flags, na, regex) | ||
|
|
||
|
|
||
| class TestSeries( | ||
| TestSeries_apply, | ||
| TestSeries_map, | ||
|
|
@@ -6093,6 +6097,41 @@ def test_series_isupper_str(self): | |
| s = pd.Series(data) | ||
| pd.testing.assert_series_equal(cfunc(s), isupper_usecase(s)) | ||
|
|
||
| def test_series_contains(self): | ||
| hpat_func = self.jit(contains_usecase) | ||
| s = pd.Series(['Mouse', 'dog', 'house and parrot', '23']) | ||
| for pat in ['og', 'Og', 'OG', 'o']: | ||
| for case in [True, False]: | ||
| with self.subTest(pat=pat, case=case): | ||
| pd.testing.assert_series_equal(hpat_func(s, pat, case), contains_usecase(s, pat, case)) | ||
|
|
||
| def test_series_contains_with_na_flags_regex(self): | ||
| hpat_func = self.jit(contains_usecase) | ||
| s = pd.Series(['Mouse', 'dog', 'house and parrot', '23']) | ||
| pat = 'og' | ||
| pd.testing.assert_series_equal(hpat_func(s, pat, flags=0, na=None, regex=True), | ||
| contains_usecase(s, pat, flags=0, na=None, regex=True)) | ||
|
|
||
| def test_series_contains_unsupported(self): | ||
| hpat_func = self.jit(contains_usecase) | ||
| s = pd.Series(['Mouse', 'dog', 'house and parrot', '23']) | ||
| pat = 'og' | ||
|
|
||
| with self.assertRaises(ValueError) as raises: | ||
| hpat_func(s, pat, flags=1) | ||
| msg = 'Parameter flags can be only 0' | ||
| self.assertIn(msg, str(raises.exception)) | ||
|
|
||
| with self.assertRaises(TypingError) as raises: | ||
| hpat_func(s, pat, na=0) | ||
| msg = 'Method contains(). The object na' | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please provide full error message. |
||
| self.assertIn(msg, str(raises.exception)) | ||
|
|
||
| with self.assertRaises(ValueError) as raises: | ||
| hpat_func(s, pat, regex=False) | ||
| msg = 'Parameter regex can be only True' | ||
| self.assertIn(msg, str(raises.exception)) | ||
|
|
||
| @skip_sdc_jit('Old-style implementation returns string, but not series') | ||
| def test_series_describe_numeric(self): | ||
| def test_impl(A): | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see usage of
NumberandSet.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I don't see usage of
NumberandSet.