Skip to content

Fix TSan data race with FFI tests#7244

Merged
myrrc merged 1 commit intodevelopfrom
myrrc/ffi-sink-race
Apr 9, 2026
Merged

Fix TSan data race with FFI tests#7244
myrrc merged 1 commit intodevelopfrom
myrrc/ffi-sink-race

Conversation

@myrrc
Copy link
Copy Markdown
Contributor

@myrrc myrrc commented Apr 1, 2026

There are TSan data race reports like

https://github.com/vortex-data/vortex/actions/runs/23804465237/job/69373850775?pr=7018

Significant parts of stack trace

Read of size 8 at 0xffff888084a0 by thread T40:

#8 vx_array_sink_open_file::{closure#0}::{closure#0} vortex-ffi/src/sink.rs:75
#9 <vortex_io::runtime::handle::Handle>::spawn::<vx_array_sink_open_file::{closure#0}::{closure#0}, ...> vortex-io/src/runtime/handle.rs:73
#31 vx_array_sink_close::{closure#0} vortex-ffi/src/sink.rs:110
#33 vx_array_sink_close vortex-ffi/src/sink.rs:106
#34 test_sink_basic_workflow vortex-ffi/src/sink.rs:166

Previous write of size 8 at 0xffff888084a0 by thread T42:

#28 <vortex_io::runtime::current::CurrentThreadRuntime>::block_on::<vx_array_sink_close::{closure#0}::{closure#0}, ...> vortex-io/src/runtime/current.rs:102
#29 vx_array_sink_close::{closure#0} vortex-ffi/src/sink.rs:110
#31 vx_array_sink_close vortex-ffi/src/sink.rs:106
#32 test_sink_multiple_arrays

Location is heap block of size 112 at 0xffff88808490 allocated by thread T42:

#13 <vortex_file::writer::VortexWriteOptions>::write::<&mut async_fs::File, ...>::{closure#0} vortex-file/src/writer.rs:137
#14 vx_array_sink_open_file::{closure#0}::{closure#0} vortex-ffi/src/sink.rs:75
#15 <vortex_io::runtime::handle::Handle>::spawn::<vx_array_sink_open_file::{closure#0}::{closure#0}, ...>::{closure#0} vortex-io/src/runtime/handle.rs:73
#37 vx_array_sink_close::{closure#0} vortex-ffi/src/sink.rs:110
#40 test_sink_multiple_arrays vortex-ffi/src/sink.rs:208

which happens in sink FFI tests but can be narrowed down to

fn test(path: String) {
    let session = VortexSession::default().with_handle(RUNTIME.handle());
    let dtype = DType::Primitive(vortex::dtype::PType::I32, false.into());
    let (mut sink, rx) = mpsc::channel(32);
    let array_stream = ArrayStreamAdapter::new(dtype.clone(), rx.into_stream());

    let array = PrimitiveArray::new(buffer![3i32], Validity::NonNullable);
    RUNTIME.block_on(async move {
        sink.send(Ok(array.into())).await.unwrap();
        sink.close_channel();

        let mut file = File::create(path).await.unwrap();
        session
            .write_options()
            .write(&mut file, array_stream)
            .await.unwrap();
        file.sync_all()
            .await
            .map_err(|e| vortex_err!("sync error: {e}")).unwrap();
    });
}
#[test] fn test_f1() { test("1".to_string()); }
#[test] fn test_f2() { test("2".to_string()); }

The underlying issue is that RUNTIME in FFI is a smol::Executor, and as we don't
have a background thread pool, and we use CurrentThreadRuntime, it runs on host
program's threads, in our case, cargo test's. If t1 and t2 are scheduled to
run on different threads, this triggers TSan in vortex-io's spawn() (case 1,
oneshot) and in vortex-file write_internal (case 2, kanal::bounded_async) which
don't protect multiple different thread consumers to read the state.

The solution here is to benchmark replacing oneshot with futures-rs oneshot
channel

@myrrc myrrc added the changelog/fix A bug fix label Apr 1, 2026
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Apr 1, 2026

Merging this PR will not alter performance

✅ 1122 untouched benchmarks
⏩ 1530 skipped benchmarks1


Comparing myrrc/ffi-sink-race (13a731b) with develop (24736e1)

Open in CodSpeed

Footnotes

  1. 1530 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Apr 1, 2026

@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Apr 1, 2026

The second traceback is about the (possibly) same issue in kanal while writing a file.
I suspect the biggest issue is our usage of smol::Executor without a background thread pool with our FFI.

@myrrc myrrc force-pushed the myrrc/ffi-sink-race branch from 044dfff to ade6f6c Compare April 9, 2026 09:08
@myrrc myrrc requested a review from joseph-isaacs April 9, 2026 09:08
@myrrc myrrc marked this pull request as ready for review April 9, 2026 09:08
@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Apr 9, 2026

I believe it's a false positive in the end, muting because the CI is flaky

Comment thread vortex-ffi/tsan_suppressions.txt Outdated
Comment thread vortex-ffi/README.md
@myrrc myrrc force-pushed the myrrc/ffi-sink-race branch from ade6f6c to 7b6cd59 Compare April 9, 2026 09:19
@0ax1
Copy link
Copy Markdown
Contributor

0ax1 commented Apr 9, 2026

#[sanitize(thread = "off")] would be great but is nightly only.

@myrrc
Copy link
Copy Markdown
Contributor Author

myrrc commented Apr 9, 2026

I've tried ignoring sanitization directly on the function, but it still triggers TSan, let's keep the suppression list:

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 44b70f68e..6ca9d1c7a 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -404,7 +404,7 @@ jobs:
       ASAN_SYMBOLIZER_PATH: "/usr/bin/llvm-symbolizer"
       MSAN_OPTIONS: "symbolize=1"
       MSAN_SYMBOLIZER_PATH: "/usr/bin/llvm-symbolizer"
-      TSAN_OPTIONS: "symbolize=1:suppressions=$GITHUB_WORKSPACE/vortex-ffi/tsan_suppressions.txt"
+      TSAN_OPTIONS: "symbolize=1"
       TSAN_SYMBOLIZER_PATH: "/usr/bin/llvm-symbolizer"
       VORTEX_SKIP_SLOW_TESTS: "1"
       # -Cunsafe-allow-abi-mismatch=sanitizer: libraries like compiler_builtins
diff --git a/vortex-ffi/tsan_suppressions.txt b/vortex-ffi/tsan_suppressions.txt
deleted file mode 100644
index be979b9de..000000000
--- a/vortex-ffi/tsan_suppressions.txt
+++ /dev/null
@@ -1,4 +0,0 @@
-# https://github.com/vortex-data/vortex/pull/7244
-# This is likely a false positive in TSan for oneshot::channel and kanal
-# where ordering is correct but uses an explicit fence and not relaxed load
-race:oneshot-*/src/channel.rs
diff --git a/vortex-io/build.rs b/vortex-io/build.rs
new file mode 100644
index 000000000..59dbbc366
--- /dev/null
+++ b/vortex-io/build.rs
@@ -0,0 +1,13 @@
+use std::process::Command;
+
+fn main() {
+    println!("cargo:rerun-if-changed=build.rs");
+    let is_nightly = Command::new("rustc")
+        .arg("-V")
+        .output()
+        .map(|output| String::from_utf8_lossy(&output.stdout).contains("nightly"))
+        .unwrap_or(false);
+    if is_nightly {
+        println!("cargo::rustc-cfg=vortex_nightly");
+    }
+}
diff --git a/vortex-io/src/lib.rs b/vortex-io/src/lib.rs
index f30f64ff3..8f0a584ff 100644
--- a/vortex-io/src/lib.rs
+++ b/vortex-io/src/lib.rs
@@ -1,5 +1,6 @@
 // SPDX-License-Identifier: Apache-2.0
 // SPDX-FileCopyrightText: Copyright the Vortex contributors
+#![cfg_attr(vortex_nightly, feature(sanitize))]
 
 //! Core traits and implementations for asynchronous IO.
 //!
diff --git a/vortex-io/src/runtime/handle.rs b/vortex-io/src/runtime/handle.rs
index f04b897b2..25707e55c 100644
--- a/vortex-io/src/runtime/handle.rs
+++ b/vortex-io/src/runtime/handle.rs
@@ -1,6 +1,5 @@
 // SPDX-License-Identifier: Apache-2.0
 // SPDX-FileCopyrightText: Copyright the Vortex contributors
-
 use std::pin::Pin;
 use std::sync::Arc;
 use std::sync::Weak;
@@ -165,6 +164,10 @@ impl<T> Task<T> {
 impl<T> Future for Task<T> {
     type Output = T;
 
+    // https://github.com/vortex-data/vortex/pull/7244
+    // This is likely a false positive in TSan for oneshot::channel and kanal
+    // where ordering is correct but uses an explicit fence and not relaxed load
+    #[cfg_attr(vortex_nightly, sanitize(thread = "off"))]
     #[allow(clippy::panic)]
     fn poll(mut self: Pin<&mut Self>, cx: &mut Context<'_>) -> Poll<Self::Output> {
         match ready!(self.recv.poll_unpin(cx)) {

@myrrc myrrc force-pushed the myrrc/ffi-sink-race branch from 7b6cd59 to f14f0da Compare April 9, 2026 09:40
Signed-off-by: Mikhail Kot <to@myrrc.dev>
@myrrc myrrc force-pushed the myrrc/ffi-sink-race branch from f14f0da to 13a731b Compare April 9, 2026 09:44
@myrrc myrrc enabled auto-merge (squash) April 9, 2026 09:45
@myrrc myrrc merged commit bab7798 into develop Apr 9, 2026
58 checks passed
@myrrc myrrc deleted the myrrc/ffi-sink-race branch April 9, 2026 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/fix A bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants