Skip to content

[WIP] A mechanism to batch multiple DPL devices into one#8529

Closed
sawenzel wants to merge 1 commit into
AliceO2Group:devfrom
sawenzel:swenzel/dplspecmerger
Closed

[WIP] A mechanism to batch multiple DPL devices into one#8529
sawenzel wants to merge 1 commit into
AliceO2Group:devfrom
sawenzel:swenzel/dplspecmerger

Conversation

@sawenzel
Copy link
Copy Markdown
Collaborator

@sawenzel sawenzel commented Apr 8, 2022

Prototype showing a mechanism how we can combined multiple DPL specs into one.

The goal is to use such mechanisms to group multiple
source "data-reader" devices into one service without
code duplications.

The benefit shall be a reduced number of processes,
less memory, less system spikes for GRID processing.

The new example workflow can be started with or without
option "--combined-source" which switches between merged and unmerged source
devices.

Prototype showing a mechanism how we can combined multiple DPL specs into one.

The goal is to use such mechanisms to group multiple
source "data-reader" devices into one service without
code duplications.

The benefit shall be a reduced number of processes,
less memory, less system spikes for GRID processing.

The new example workflow can be started with or without
option "--combined-source" which switches between merged and unmerged source
devices.
@sawenzel sawenzel requested a review from a team as a code owner April 8, 2022 13:46
@sawenzel sawenzel marked this pull request as draft April 8, 2022 13:47
@sawenzel
Copy link
Copy Markdown
Collaborator Author

sawenzel commented Apr 8, 2022

@shahor02 @ktf @pzhristov @davidrohr : We can use something generic like this to reduce the number of source devices/processes to reduce the burden for GRID MC processing. Side benefits will be less ROOT, less memory.

Early feedback, comments welcome.

@ktf
Copy link
Copy Markdown
Member

ktf commented Apr 8, 2022

This is fine, but why not simply having a single device for doing the reading in the first place? That works well in analysis. Why can't we do something similar also for reco?

Copy link
Copy Markdown
Collaborator

@davidrohr davidrohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach, particularly since it is very lightweight.
One comment that comes to my mind is whether we could add any checks for inputs / outputs.
E.g. what happens if there are overlapping data descriptions / subspecs?
For the input it would probably just work, for the output I am not so sure.

@ktf
Copy link
Copy Markdown
Member

ktf commented Apr 11, 2022

There is also the issue of what happens when there is a dependency between two devices. IMHO, actually the right place where this could be done (automatically, actually) is after the topological sort. After the topology is clear, we can group together all the devices which belong to the same executable and which do not overlap in inputs / outputs. This way there would be more optimisations possible, e.g. a long chain of devices could be multiplexed on a single device as well.

That said, what is really limiting this kind of approach is the fact that we have one executable per workflow. For the long term it would be much better if the defineDataProcessing() was actually the callback of some plugin, so that we could multiplex things without having to worry about the executable they live in.

@davidrohr
Copy link
Copy Markdown
Collaborator

But for the dependency, I think it would just be the responsibility of the user that he must not put in dependent devices.

With the topological sort, I agree that makes most sense with some kind of plugin system to define the workflows. But here I
think we are far away from having that, and we have many other important issues to solve.

And since this PR is quite small and straight forward, I would use @sawenzel approach for now.

@ktf
Copy link
Copy Markdown
Member

ktf commented Apr 11, 2022

As I said, this is fine with me, as this simply uses what is already available as API to define the workflow. I would however prefer it's done at a level which does not involve the tasks, e.g. using something like:

#8548

ktf added a commit to ktf/AliceO2 that referenced this pull request Apr 21, 2022
Take the opportunity to acknowledge the idea of @sawenzel in
AliceO2Group#8529 for the
associated development.
@sawenzel
Copy link
Copy Markdown
Collaborator Author

Closing this one after #8548. A concrete PR where this technology is used in digitization, reconstruction will follow.

@sawenzel sawenzel closed this Apr 23, 2022
@sawenzel sawenzel deleted the swenzel/dplspecmerger branch January 13, 2023 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants