[WIP] A mechanism to batch multiple DPL devices into one#8529
Conversation
Prototype showing a mechanism how we can combined multiple DPL specs into one. The goal is to use such mechanisms to group multiple source "data-reader" devices into one service without code duplications. The benefit shall be a reduced number of processes, less memory, less system spikes for GRID processing. The new example workflow can be started with or without option "--combined-source" which switches between merged and unmerged source devices.
|
@shahor02 @ktf @pzhristov @davidrohr : We can use something generic like this to reduce the number of source devices/processes to reduce the burden for GRID MC processing. Side benefits will be less ROOT, less memory. Early feedback, comments welcome. |
|
This is fine, but why not simply having a single device for doing the reading in the first place? That works well in analysis. Why can't we do something similar also for reco? |
davidrohr
left a comment
There was a problem hiding this comment.
I like this approach, particularly since it is very lightweight.
One comment that comes to my mind is whether we could add any checks for inputs / outputs.
E.g. what happens if there are overlapping data descriptions / subspecs?
For the input it would probably just work, for the output I am not so sure.
|
There is also the issue of what happens when there is a dependency between two devices. IMHO, actually the right place where this could be done (automatically, actually) is after the topological sort. After the topology is clear, we can group together all the devices which belong to the same executable and which do not overlap in inputs / outputs. This way there would be more optimisations possible, e.g. a long chain of devices could be multiplexed on a single device as well. That said, what is really limiting this kind of approach is the fact that we have one executable per workflow. For the long term it would be much better if the defineDataProcessing() was actually the callback of some plugin, so that we could multiplex things without having to worry about the executable they live in. |
|
But for the dependency, I think it would just be the responsibility of the user that he must not put in dependent devices. With the topological sort, I agree that makes most sense with some kind of plugin system to define the workflows. But here I And since this PR is quite small and straight forward, I would use @sawenzel approach for now. |
|
As I said, this is fine with me, as this simply uses what is already available as API to define the workflow. I would however prefer it's done at a level which does not involve the tasks, e.g. using something like: |
Take the opportunity to acknowledge the idea of @sawenzel in AliceO2Group#8529 for the associated development.
|
Closing this one after #8548. A concrete PR where this technology is used in digitization, reconstruction will follow. |
Prototype showing a mechanism how we can combined multiple DPL specs into one.
The goal is to use such mechanisms to group multiple
source "data-reader" devices into one service without
code duplications.
The benefit shall be a reduced number of processes,
less memory, less system spikes for GRID processing.
The new example workflow can be started with or without
option "--combined-source" which switches between merged and unmerged source
devices.