You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Detectors/Calibration/README.md
+44-26Lines changed: 44 additions & 26 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,60 +4,68 @@
4
4
5
5
# Time-interval based calibration flow for O2
6
6
7
-
The calibration flow of O2 foresees that every calibration device (expected to all run on one single aggregation node) will receive the TimeFrames with calibration input from every EPN in an asynchronous way. The calibration device will have to process the TFs in time intervals (TimeSlots) which allow to create CCDB entries with the needed granularity and update frequency (defined by the calibration device itself).
7
+
The calibration flow of O2 foresees that all calibration devices are running on dedicated EPN calibration nodes. These nodes are also called aggregator nodes. A particular calibration device can only run on a single calibration node, as it is supposed to receive the complete input data from each EPN processing node.
8
+
The processing nodes can send the data either for every TF or sporadically. In the latter case for example accumulated for every 10 TFs.
9
+
In both cases the calibration input reaches the calibration nodes asynchronously. And since there are up to 250 EPNs processing TFs in two different NUMA domains, TFs which reach the calibration node consecutively might in reality be 500 TFs apart in absolute time.
10
+
This is because the processing time of individual TFs can vary quite significantly.
11
+
The calibration devices have to prepare the calibration objects for time intervals (TimeSlots) which are defined by the user.
12
+
They have a specificied duration and a minimum statistics requirement.
13
+
For example a calibration is supposed to aggregate data for 10 minutes and for a meaningful calibration within these 10 minutes a certain amount of global tracks is required. In case after 10 minutes the amount of global tracks is not sufficient, the framework can automatically increase the interval of 10 minutes until enough global tracks are available to create the calibration object.
8
14
9
-
## TimeSlotCalibration<Input, Container>
10
-
Each calibration device (to be run in a workflow) has to derive from `o2::calibration::TimeSlotCalibration`, which is a templated class that takes as types the Input type (i.e. the object to be processed, coming from the upstream device) and the Container type (i.e. the object that will contain the calibration data per TimeSlot). Each calibration device has to be configured with the following parameters:
11
15
12
-
`tf-per-slot` : default length of a TiemSlot in TFs (will be widened in case of too little statistics). If this is set to `o2::calibration::INFINITE_TF`, then there will be only 1 slot at a time, valid till infinity. Value 0 is reserved for a special mode: a single slot w/o explicit boundaries is
13
-
filled until the requested statistics is reached. Once `hasEnoughData` return true, the slot will be closed with really seen min/max TFs and new one will be created with lower boundary equal the end of the previous slot.
14
-
The slot duration can be also set via methods `setSlotLengthInSeconds(int s)` or `setSlotLengthInOrbits(int n)`, which will be internally converted (and rounded) to the number of TFs at the 1st TF processing (when the NHBF per orbit will be available from the GRPECS).
16
+
## TimeSlotCalibration<Container>
17
+
Each calibration device which is supposed to run on the aggregator should derive from `o2::calibration::TimeSlotCalibration<Container>`. It is a templated class. The `Container` type is the object in which the calibration data per TimeSlot will be accumulated.
15
18
16
-
`updateInterval` : to be used together with `tf-per-slot = o2::calibration::INFINITE_TF`: it allows to try to finalize the slot (and produce calibration) when the `updateInterval` has passed. Note that this is an approximation (as explained in the code) due to the fact that TFs will come asynchronously (not ordered in time).
19
+
### Configuration of the TimeSlot
17
20
18
-
`max-delay` : maximum arrival delay of a TF with respect to the most recent one processed; units in number of TimeSlots; if beyond this, the TF will be considered too old, and discarded.
19
-
If `tf-per-slot == o2::calibration::INFINITE_TF`, or `updateAtTheEndOfRunOnly == true`, its value is irrelevant.
21
+
Internally the default length of a TimeSlot is calculated in number of TFs. The default TF length is 128 orbits, but theoretically this can change. Therefore it is advised to set the TimeSlot length via the methods `setSlotLengthInSeconds(int s)` or `setSlotLengthInOrbits(int n)`, which will be internally converted (and rounded) to the corresponding number of TFs at the 1st TF processing. At that time we know the TF length from the GRPECS object.
22
+
One can also set the number of TFs perslot directly via `setSlotLength(o2::calibration::TFType n)`. With `setSlotLength(o2::calibration::INFINITE_TF)` there will be only 1 slot at a time, valid till infinity. A special mode is configured with `setSlotLength(0)` in which case there will only be a single slot, w/o explicit boundaries which is filled until the statistics is reached. In case `setSlotLength(0)` is configured we can also use `setCheckIntervalInfiniteSlot(o2::calibration::TFType updateInterval)` in which case the calibration checks whether the statistics is sufficient only after `updateInterval` TFs. Otherwise it would check after every TF.
20
23
21
-
`updateAtTheEndOfRunOnly` : to tell the TimeCalibration to finalize the slots and prepare the CCDB entries only at the end of the run.
24
+
The TFs arrive asynchronously at the aggregator node. The `TimeSlotCalibration` keeps a `std::deque` of TimeSlots for which it aggregates the input data simultaneously. Whenever a slot has reached its configured duration the statistics requirement is checked. In case it is not fulfilled, the slot can be extended or merged to the previous slot in order to obtain the required statistics.
22
25
23
-
Example for the options above:
24
-
`tf-per-slot = 20`
25
-
`max-delay = 3`
26
-
Then if we are processing TF 61 and TF 0 comes, TF 0 will be discarded.
26
+
By default, TFs which arrive more than `3 * SlotLengthInTF` later than the most recent TF processed are discarded. This maximum delay can be configured via `setMaxSlotsDelay(int nSlots)`. If it is set to 4 and each slot has the length of 30 TFs, then upon processing of TF 121 the input from TF0 would be discarded, if it was not already processed.
27
27
28
-
Each calibration device has to implement the following methods:
29
28
30
-
`void initOutput()`: initialization of the output object (typically a vector of calibration objects and another one with the associated CcdbObjectInfo);
29
+
In order to prepare only one CCDB object at the end of the run you can use `setUpdateAtTheEndOfRunOnly()`. In this case all the above settings for the slot duration are irrelevant. And upon the `endOfStream`of your calibration device you should make a call to `checkSlotsToFinalize()`.
31
30
32
-
`bool hasEnoughData(const o2::calibration::TimeSlot<Container>& slot)` : method to determine whether a TimeSlot has enough data to be calibrated; if not, it will be merged to the following (in time) one;
33
31
34
-
`void finalizeSlot(o2::calibration::TimeSlot<Container>& slot)` : method to process the calibration data accumulated in each TimeSlot;
32
+
### Mandatory methods to implement when deriving from `o2::calibration::TimeSlotCalibration<Container>`
35
33
36
-
`o2::calibration::TimeSlot<Container>& slot emplaceNewSlot(bool front, TFType tstart, TFType tend)` : method to creata a new TimeSlot; this is specific to the calibration procedure as it instantiates the detector-calibration-specific object.
34
+
35
+
-`void initOutput()`: initialization of the output object (typically a vector of calibration objects and another one with the associated CcdbObjectInfo);
36
+
37
+
-`bool hasEnoughData(const o2::calibration::TimeSlot<Container>& slot)` : method to determine whether a TimeSlot has enough data to be calibrated; if not, it will be merged to the following (in time) one;
38
+
39
+
-`void finalizeSlot(o2::calibration::TimeSlot<Container>& slot)` : method to process the calibration data accumulated in each TimeSlot;
40
+
41
+
-`o2::calibration::TimeSlot<Container>& slot emplaceNewSlot(bool front, TFType tstart, TFType tend)` : method to creata a new TimeSlot; this is specific to the calibration procedure as it instantiates the detector-calibration-specific object.
37
42
38
43
See e.g. LHCClockCalibrator.h/cxx in AliceO2/Detectors/TOF/calibration/include/TOFCalibration/LHCClockCalibrator.h and AliceO2/Detectors/TOF/calibration/srcLHCClockCalibrator.cxx
39
44
40
45
## TimeSlot<Container>
46
+
41
47
The TimeSlot is a templated class which takes as input type the Container that will hold the calibration data needed to produce the calibration objects (histograms, vectors, array...). Each calibration device could implement its own Container, according to its needs.
42
48
43
49
The Container class needs to implement the following methods:
44
50
45
-
`void fill(const gsl::span<const Input> data)` : method to decide how to use the calibration data within the container (e.g. fill a vector);
46
-
or
47
-
`void fill(o2::dataformats::TFIDInfo& ti, const gsl::span<const Input> data)` : method to decide how to use the calibration data within the container (e.g. fill a vector) and having access to the TFIDInfo struct providing relevant info for current TF (tfCounter, runNumber, creationTime etc.)
48
-
If provided, this latter method will be used.
51
+
-`void merge(const Container* prev)` : method to allow merging of the content of a TimeSlot to the content of the following one, when stastics is limited.
49
52
50
-
`void merge(const Container* prev)` : method to allow merging of the content of a TimeSlot to the content of the following one, when stastics is limited.
53
+
-`void print()` : method to print the content of the Container
51
54
52
-
`void print()` : method to print the content of the Container
55
+
-`void fill(DATA data, ...)` : method to decide how to use the calibration data within the container (e.g. fill a vector). The type of `DATA` is usually `const gsl::span<your-input-data>`, but can also be anything else. Optionally the `fill` method can accept additional input of arbitrary type;
56
+
57
+
or, alternatively
58
+
59
+
-`void fill(o2::dataformats::TFIDInfo& ti, DATA data, ...)` : method to decide how to use the calibration data within the container (e.g. fill a vector) and having access to the TFIDInfo struct providing relevant info for current TF (tfCounter, runNumber, creationTime etc.).
60
+
If provided, this latter method will be used.
53
61
54
62
See e.g. LHCClockCalibrator.h/cxx in AliceO2/Detectors/TOF/calibration/include/TOFCalibration/LHCClockCalibrator.h and AliceO2/Detectors/TOF/calibration/srcLHCClockCalibrator.cxx
55
63
56
64
The Slot provides a generic methods to access its boundaries: `getTFStart()` and `getTFEnd()` in terms of TF counter (as assigned by the DataDistribution) and `getStartTimeMS()`, `getEndTimeMS()` for the absolute time stamp in milleseconds.
57
65
58
66
## detector-specific-calibrator-workflow
59
67
60
-
Each calibration will need to be implemented in the form of a workflow, whose options should include those for the calibration device itself (`tf-per-slot`and `max-delay`, see above).
68
+
Each calibration will need to be implemented in the form of a workflow, whose options should include those for the calibration device itself (for example the slot length and statistics requirement).
61
69
The output to be sent by the calibrator should include:
62
70
63
71
* a vector of the snapshots of the object to be put in the CCDB;
@@ -76,6 +84,16 @@ Note that in order to access the absolute time of the slot boundaries, one shoul
76
84
77
85
See e.g. AliceO2/Detectors/TOF/calibration/testWorkflow/LHCClockCalibratorSpec.h, AliceO2/Detectors/TOF/calibration/testWorkflow/lhc-clockphase-workflow.cxx
78
86
87
+
## Integration of calibration workflows into the global framework
88
+
89
+
For the synchronous processing on the EPN the calibration workflows are grouped according to their origin (BARREL, CALO, MUON and FORWARD) and to the nature of their input (TF for devices expecting input for every TF and SPORADIC for devices expecting input sporadically).
90
+
For each group (e.g. `BARREL_TF`) a pair of `o2-dpl-output-proxy` running on the processing EPNs and `o2-dpl-raw-proxy` running on the calibration nodes is initialized and these proxys are used to transfer the input from processing nodes to the calibration node.
91
+
Have a look at the `DATA/common/setenv_calib.sh` script in O2DPG where for each calibration the required data descriptors are added to the proxies.
92
+
In addition there is always some logic to decide whether a specific calibration should be enabled or not.
93
+
94
+
The workflow which is running on the processing nodes should be added in the `prodtests/full-system-test/calib-workflow.sh` script.
95
+
The workflow running on the aggregator should be added to `prodtests/full-system-test/aggregator-workflow.sh`.
96
+
79
97
## Calibrating over multiple runs
80
98
81
99
Some statistics-hungry calibrations define single time-slot which integrates data of the whole run. If there is a possibility that for the short run the slot will not accumulate enough statistics,
0 commit comments