Skip to content

Pub/Sub: enable parallel writes to GCS in Pub/Sub Dataflow example#5547

Merged
anguillanneuf merged 7 commits into
masterfrom
pubsub-gcs
Mar 23, 2021
Merged

Pub/Sub: enable parallel writes to GCS in Pub/Sub Dataflow example#5547
anguillanneuf merged 7 commits into
masterfrom
pubsub-gcs

Conversation

@anguillanneuf
Copy link
Copy Markdown
Member

@anguillanneuf anguillanneuf commented Mar 19, 2021

Fixes #5441

Used this opportunity to update comments and clean up this example too.

Tried Splittable DoFn but the documentation on parallel writes is very slim, and the Python support lags behind Java.

@anguillanneuf anguillanneuf requested review from a team and hongalex as code owners March 19, 2021 17:00
@google-cla google-cla Bot added the cla: yes This human has signed the Contributor License Agreement. label Mar 19, 2021
@product-auto-label product-auto-label Bot added the samples Issues that are directly related to samples. label Mar 19, 2021
Copy link
Copy Markdown

@davidcavazos davidcavazos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking great, thank you!

Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py
Comment thread pubsub/streaming-analytics/PubSubToGCS.py
Comment thread pubsub/streaming-analytics/README.md Outdated
Copy link
Copy Markdown

@davidcavazos davidcavazos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I have some minor comments, but overall LGTM!

Comment thread pubsub/streaming-analytics/PubSubToGCS.py Outdated
Comment thread pubsub/streaming-analytics/PubSubToGCS.py
Comment thread pubsub/streaming-analytics/README.md Outdated
Comment thread pubsub/streaming-analytics/README.md Outdated
Comment thread pubsub/streaming-analytics/README.md Outdated
Comment thread pubsub/streaming-analytics/README.md Outdated
Comment thread pubsub/streaming-analytics/README.md Outdated
Comment thread pubsub/streaming-analytics/README.md Outdated
@davidcavazos
Copy link
Copy Markdown

Looks great! Thanks for these improvements, LGTM!

@anguillanneuf anguillanneuf merged commit 97e7e82 into master Mar 23, 2021
@anguillanneuf anguillanneuf deleted the pubsub-gcs branch March 23, 2021 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes This human has signed the Contributor License Agreement. samples Issues that are directly related to samples.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limited Parallelism due to fixed key within GroupWindowsIntoBatches

4 participants