[Fix](pyudf) Fix some pyudf issues by linrrzqqq · Pull Request #62249 · apache/doris

linrrzqqq · 2026-04-08T19:00:55Z

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

After deleting the cache file for the corresponding UDF, using it again will cause an error.

</failure>
  </testcase>
  <testcase classname="stress.python_udf" name="python_udf_module_cache_isolation" time="100.051">
    <failure message="flowId=stress/python_udf/udf_module_cache_isolation.groovy#python_udf_module_cache_isolation" type="SuiteFailure">    logger.info("Scene 9a: function works before cache deletion, result=${r9a[0][0]}")

    // 删除所有 BE 上的 UDF cache 文件（只删文件，保留目录结构）
    aliveBEs.each { be -&gt;
        execSSH(be.Host, "find /mnt/hdd01/PERFORMANCE_ENV/be/lib/udf/ -mindepth 2 -type f -delete 2&gt;/dev/null; true")
    }
    logger.info("Scene 9b: UDF cache files deleted on all BEs")

    // 再次调用，BE 应自动重新下载 zip 并执行
    def r9b = sql "SELECT cache_recover_${runId}(10)"
^^^^^^^^^^^^^^^^^^^^^^^^^^ERROR LINE^^^^^^^^^^^^^^^^^^^^^^^^^^
    assert r9b[0][0] == 787, "cache recover after delete: expected 787, got ${r9b[0][0]}"
    logger.info("Scene 9c: function works after cache deletion, result=${r9b[0][0]}")

    // 批量验证恢复后的一致性
    def r9c = sql "SELECT cache_recover_${runId}(val) FROM t_cache_batch WHERE id &lt;= 5 ORDER BY id"
    for (int i = 0; i &lt; 5; i++) {
        def expected = (i + 1) + 777
        assert r9c[i][0] == expected, "cache recover batch[${i}]: expected ${expected}, got ${r9c[i][0]}"
    }
    logger.info("Scene 9 PASS: cache auto-recovery after manual deletion — all results consistent")

java.sql.SQLException: errCode = 2, detailMessage = (172.20.49.73)[RUNTIME_ERROR]Flight stream finish failed with message: Directory contains no Python (.py) files: /mnt/hdd01/PERFORMANCE_ENV/be/lib/udf/5/1775636895621.2cfe161cc9c6f0b2ca5e7dcb6286e924.cache_recover. Detail: Python exception: Traceback (most recent call last):
  File "pyarrow/_flight.pyx", line 2315, in pyarrow._flight._do_exchange
  File "/mnt/hdd01/PERFORMANCE_ENV/be/plugins/python_udf/python_server.py", line 2493, in do_exchange
    self._handle_exchange_udf(python_udf_meta, reader, writer)
  File "/mnt/hdd01/PERFORMANCE_ENV/be/plugins/python_udf/python_server.py", line 2005, in _handle_exchange_udf
    loader = UDFLoaderFactory.get_loader(python_udf_meta)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/hdd01/PERFORMANCE_ENV/be/plugins/python_udf/python_server.py", line 1161, in get_loader
    if UDFLoaderFactory.check_module(location):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/hdd01/PERFORMANCE_ENV/be/plugins/python_udf/python_server.py", line 1196, in check_module
    raise ValueError(
ValueError: Directory contains no Python (.py) files: /mnt/hdd01/PERFORMANCE_ENV/be/lib/udf/5/1775636895621.2cfe161cc9c6f0b2ca5e7dcb6286e924.cache_recover

The error in py code cannot be correctly passed through

Doris> CREATE FUNCTION py_err_stats_test(INT)
    -> RETURNS INT
    -> PROPERTIES (
    ->     "type"="PYTHON_UDF",
    ->     "symbol"="evaluate",
    ->     "runtime_version"="3.12.11",
    ->     "always_nullable"="true"
    -> ) AS $$
    -> def evaluate(x):
    ->     raise TypeError("consistent_error_42")
    -> $$;
Query OK, 0 rows affected (0.004 sec)

-- Expected Error
Doris> SELECT py_err_stats_test(1);
+----------------------+
| py_err_stats_test(1) |
+----------------------+
|                 NULL |
+----------------------+
1 row in set (0.070 sec)

The python process cannot execute correctly after it hangs up.

 </testcase>
  <testcase classname="stress.python_udf" name="python_udf_cross_feature_import_storage" time="0.430">
    <failure message="flowId=stress/python_udf/cross_feature_import_storage.groovy#python_udf_cross_feature_import_storage" type="SuiteFailure">    def pyVer = System.getenv('python_version') ?: '3.12.11'
    def db = context.config.getDbNameByFile(context.file)
    sql "DROP DATABASE IF EXISTS ${db} FORCE"
    sql "CREATE DATABASE IF NOT EXISTS ${db}"
    sql "USE ${db}"

    if (!python_udf_require_olap_table("python_udf_cross_feature_import_storage.src_t")) {
        return
    }
    if (!python_udf_require_inline_function("python_udf_cross_feature_import_storage.inline_runtime", pyVer)) {
^^^^^^^^^^^^^^^^^^^^^^^^^^ERROR LINE^^^^^^^^^^^^^^^^^^^^^^^^^^
        return
    }

    // ===== 1. INSERT INTO ... SELECT 中使用 UDF =====
    sql "DROP TABLE IF EXISTS src_t"
    sql """
        CREATE TABLE src_t (id INT, v INT)
        DISTRIBUTED BY HASH(id) BUCKETS 1
        PROPERTIES("replication_num"="1")
    """

java.lang.IllegalStateException: PYTHON_UDF_BLOCKED suite=python_udf_cross_feature_import_storage scenario=python_udf_cross_feature_import_storage.inline_runtime reason=inline probe failed. reason=errCode = 2, detailMessage = (172.20.49.73)[INTERNAL_ERROR]IOError: Flight stream finish failed with gRPC code 14, message: failed to connect to all addresses; last error: UNKNOWN: unix:/tmp/doris_python_udf_55799.sock: Connection refused

Support parameterless calls

  </testcase>
  <testcase classname="stress.python_udf" name="python_udf_thirdparty_packages" time="6.084">
    <failure message="flowId=stress/python_udf/udf_thirdparty_packages.groovy#python_udf_thirdparty_packages" type="SuiteFailure">            versions["pandas"] = "not_found"
        try:
            import jieba
            versions["jieba"] = jieba.__version__
        except:
            versions["jieba"] = "not_found"
        return json.dumps(versions)
    \$\$
    """
    def rVer = sql("SELECT py_pkg_versions()")
^^^^^^^^^^^^^^^^^^^^^^^^^^ERROR LINE^^^^^^^^^^^^^^^^^^^^^^^^^^
    assert rVer[0][0] != null
    def verJson = new groovy.json.JsonSlurper().parseText(rVer[0][0].toString())
    logger.info("Package versions: python=${verJson.python}, numpy=${verJson.numpy}, pandas=${verJson.pandas}, jieba=${verJson.jieba}")
    // 验证包确实已安装（非 not_found）
    assert verJson.numpy != 'not_found', "numpy should be installed"
    assert verJson.pandas != 'not_found', "pandas should be installed"
    assert verJson.jieba != 'not_found', "jieba should be installed"
    logger.info("Package version verification: PASS")
    logger.warn("Package version info: skipped — PRODUCT LIMITATION: Python UDF does not support zero-argument functions")


java.sql.SQLException: errCode = 2, detailMessage = (172.20.49.73)[INVALID_ARGUMENT]Python UDF input types is empty

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

hello-stephen · 2026-04-08T19:01:02Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

linrrzqqq · 2026-04-08T19:21:58Z

run buildall

hello-stephen · 2026-04-08T22:23:43Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (67/67) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	71.85% (26719/37188)
Line Coverage	54.84% (283032/516076)
Region Coverage	51.89% (234450/451828)
Branch Coverage	53.38% (101447/190046)

linrrzqqq · 2026-04-12T19:29:26Z

run buildall

hello-stephen · 2026-04-12T21:42:16Z

BE Regression && UT Coverage Report

Increment line coverage 86.54% (90/104) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.73% (27428/37202)
Line Coverage	57.42% (296481/516358)
Region Coverage	54.73% (247442/452095)
Branch Coverage	56.35% (107160/190175)

…ache files also check .zip

fix p0 fix p0 fix p0

linrrzqqq · 2026-04-13T09:56:51Z

run buildall

hello-stephen · 2026-04-13T13:06:48Z

BE Regression && UT Coverage Report

Increment line coverage 84.80% (106/125) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.64% (27388/37192)
Line Coverage	57.28% (295713/516250)
Region Coverage	54.36% (245718/452003)
Branch Coverage	56.07% (106590/190113)

linrrzqqq force-pushed the pyudf-test branch from 2d69b21 to a2ee82f Compare April 8, 2026 19:14

linrrzqqq force-pushed the pyudf-test branch from a2ee82f to 6c5b9ce Compare April 12, 2026 19:21

linrrzqqq added 3 commits April 13, 2026 17:42

[Fix](pyudf) Fix functions throw errors after manually deleting udf c…

650d37f

…ache files also check .zip

[Fix](pyudf) Fix python udf error propagation

c8e46d7

fix p0 fix p0 fix p0

[Fix](pyudf) only select alive python servers in get_process

aad73a9

linrrzqqq force-pushed the pyudf-test branch from 6c5b9ce to 25ea1db Compare April 13, 2026 09:51

[Enhancement](pyudf) Support empty arg pyudf && udtf

1f85b7a

linrrzqqq force-pushed the pyudf-test branch from 25ea1db to 1f85b7a Compare April 13, 2026 09:54

linrrzqqq changed the title ~~[Fix](pyudf) Fix functions throw errors after manually deleting udf cache files~~ [Fix](pyudf) Fix some pyudf issues Apr 14, 2026

linrrzqqq added 2 commits April 17, 2026 12:47

[Fix](pyudf) fix udf mixed vector fallback

a94f89c

[Fix](udf) mark udf nondeterministic

8f831c7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix](pyudf) Fix some pyudf issues#62249

[Fix](pyudf) Fix some pyudf issues#62249
linrrzqqq wants to merge 6 commits intoapache:masterfrom
linrrzqqq:pyudf-test

linrrzqqq commented Apr 8, 2026 •

edited

Loading

Uh oh!

hello-stephen commented Apr 8, 2026

Uh oh!

linrrzqqq commented Apr 8, 2026

Uh oh!

hello-stephen commented Apr 8, 2026

Uh oh!

linrrzqqq commented Apr 12, 2026

Uh oh!

hello-stephen commented Apr 12, 2026

Uh oh!

linrrzqqq commented Apr 13, 2026

Uh oh!

hello-stephen commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

linrrzqqq commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Apr 8, 2026

Uh oh!

linrrzqqq commented Apr 8, 2026

Uh oh!

hello-stephen commented Apr 8, 2026

BE Regression && UT Coverage Report

Uh oh!

linrrzqqq commented Apr 12, 2026

Uh oh!

hello-stephen commented Apr 12, 2026

BE Regression && UT Coverage Report

Uh oh!

linrrzqqq commented Apr 13, 2026

Uh oh!

hello-stephen commented Apr 13, 2026

BE Regression && UT Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

linrrzqqq commented Apr 8, 2026 •

edited

Loading