Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
22b0cad
added creation of list of aod outputs
pbuehler Mar 18, 2020
90e2807
Merge branch 'dev'
pbuehler Mar 18, 2020
4aeadca
Merge remote-tracking branch 'pbuehler/dev' into dev
pbuehler Mar 18, 2020
17f9802
corrctions
pbuehler Mar 18, 2020
b6a07a3
added creation of list of aod outputs
pbuehler Mar 18, 2020
959e396
corrctions
pbuehler Mar 18, 2020
310515d
Added functionality which allows to specify which columns of which ta…
pbuehler Mar 20, 2020
4c37bb5
Merge remote-tracking branch 'pbuehler/dev' into dev
pbuehler Mar 20, 2020
fb8aa39
clang-format changes
pbuehler Mar 20, 2020
5708cb9
clang-format changes
pbuehler Mar 20, 2020
e05a1e2
added creation of list of aod outputs
pbuehler Mar 18, 2020
c4af8fb
corrctions
pbuehler Mar 18, 2020
a5e79a4
added creation of list of aod outputs
pbuehler Mar 18, 2020
5d7dbed
corrctions
pbuehler Mar 18, 2020
3201727
Added functionality which allows to specify which columns of which ta…
pbuehler Mar 20, 2020
a7c0aa4
clang-format changes
pbuehler Mar 20, 2020
434bd3c
clang-format changes
pbuehler Mar 20, 2020
8695f5a
Merge remote-tracking branch 'pbuehler/dev' into dev
pbuehler Mar 21, 2020
54c7f52
optimisations, clean-ups
pbuehler Mar 21, 2020
ab307e4
solve Codacy conflict
pbuehler Mar 21, 2020
7c83ff8
clang-formating
pbuehler Mar 22, 2020
676d397
Updated ANALYSIS.md with information about --keep option syntax
pbuehler Mar 23, 2020
f0ad9e0
Removed some comments
pbuehler Mar 23, 2020
7ee15a2
Codacy does not allow : in headings!!
pbuehler Mar 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 73 additions & 10 deletions Framework/Core/ANALYSIS.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ defineDataProcessing() {

> **Implementation details**: `AnalysisTask` is simply a `struct`. Since `struct` default inheritance policy is `public`, we can omit specifying it when declaring MyTask.
>
> `AnalysisTask` will not actually provide any virtual method, as the `adaptAnalysis` helper relyes on template argument matching to discover the properties of the task. It will come clear in the next paragraph how this allow is used to avoid the proliferation of data subscription methods.
> `AnalysisTask` will not actually provide any virtual method, as the `adaptAnalysis` helper relies on template argument matching to discover the properties of the task. It will come clear in the next paragraph how this allow is used to avoid the proliferation of data subscription methods.

## Processing data

Expand All @@ -55,7 +55,7 @@ struct MyTask : AnalysisTask {
};
```

will allow you to get a per timeframe collection of tracks. You can then iterate on the tracks using the syntax:
will allow you to get a per time frame collection of tracks. You can then iterate on the tracks using the syntax:

```cpp
for (auto &track : tracks) {
Expand All @@ -77,7 +77,7 @@ This has the advantage that you might be able to benefit from vectorization / pa

> **Implementation notes**: as mentioned before, the arguments of the process method are inspected using template argument matching. This way the system knows at compile time what data types are requested by a given `process` method and can create the relevant DPL data descriptions.
>
> The distiction between `Tracks` and `Track` above is simply that one refers to the whole collection, while the second is an alias to `Tracks::iterator`. Notice that we assume that each collection is of type `o2::soa::Table` which carries metadata about the dataOrigin and dataDescription to be used by DPL to subscribe to the associated data stream.
> The distinction between `Tracks` and `Track` above is simply that one refers to the whole collection, while the second is an alias to `Tracks::iterator`. Notice that we assume that each collection is of type `o2::soa::Table` which carries meta data about the dataOrigin and dataDescription to be used by DPL to subscribe to the associated data stream.

### Navigating data associations

Expand All @@ -89,7 +89,7 @@ void process(o2::aod::Collision const& collision, o2::aod::Tracks &tracks) {
}
```

the above will be called once per collision found in the timeframe, and `tracks` will allow you to iterate on all the tracks associated to the given collision.
the above will be called once per collision found in the time frame, and `tracks` will allow you to iterate on all the tracks associated to the given collision.

Alternatively, you might not require to have all the tracks at once and you could do with:

Expand Down Expand Up @@ -176,7 +176,7 @@ struct MyTask : AnalysisTask {
};
```

the `etaphi` object is a functor that will effectively act as a cursor which allows to populate the `EtaPhi` table. Each invokation of the functor will create a new row in the table, using the arguments as contents of the given column. By default the arguments must be given in order, but one can give them in any order by using the correct column type. E.g. in the example above:
the `etaphi` object is a functor that will effectively act as a cursor which allows to populate the `EtaPhi` table. Each invocation of the functor will create a new row in the table, using the arguments as contents of the given column. By default the arguments must be given in order, but one can give them in any order by using the correct column type. E.g. in the example above:

```cpp
etaphi(track::Phi(calculatePhi(track), track::Eta(calculateEta(track)));
Expand All @@ -186,7 +186,7 @@ etaphi(track::Phi(calculatePhi(track), track::Eta(calculateEta(track)));

Sometimes columns are not backed by actual persisted data, but they are merely
derived from it. For example you might want to have different representations
(e.g. spherical, cylindrical) for a given persisten representation. You can
(e.g. spherical, cylindrical) for a given persistent representation. You can
do that by using the `DECLARE_SOA_DYNAMIC_COLUMN` macro.

```cpp
Expand All @@ -200,10 +200,10 @@ DECLARE_SOA_DYNAMIC_COLUMN(R2, r2, [](float x, float y) { return x*x + y+y; });
DECLARE_SOA_TABLE(Point, "MISC", "POINT", X, Y, (R2<X,Y>));
```

Notice how the dynamic column is defined as a standalone column and binds to X and Y
Notice how the dynamic column is defined as a stand alone column and binds to X and Y
only when you attach it as part of a table.

### Executing a finalisation method, post run
### Executing a finalization method, post run

Sometimes it's handy to perform an action when all the data has been processed, for example executing a fit on a histogram we filled during the processing. This can be done by implementing the postRun method.

Expand Down Expand Up @@ -335,7 +335,7 @@ struct MyTask : AnalysisTask {
### Getting combinations (pairs, triplets, ...)
To get combinations of distinct tracks, helper functions from `ASoAHelpers.h` can be used. Presently, there are 3 combinations policies available: strictly upper, upper and full.

The number of elements in a combination is deduced from the number of arguments pased to `combinations()` call. For example, to get pairs of tracks from the same source, one must specify `tracks` table twice:
The number of elements in a combination is deduced from the number of arguments passed to `combinations()` call. For example, to get pairs of tracks from the same source, one must specify `tracks` table twice:

```cpp
struct MyTask : AnalysisTask {
Expand All @@ -362,7 +362,7 @@ struct MyTask : AnalysisTask {
};
```

It will be possible to specify a filter for a combination as a whole, and only matching combinations will be then outputed. Currently, the filter is applied to each element separately. Note that for filter version the input tables are mentioned twice, both in policy constructor and in `combinations()` call itself.
It will be possible to specify a filter for a combination as a whole, and only matching combinations will be then output. Currently, the filter is applied to each element separately. Note that for filter version the input tables are mentioned twice, both in policy constructor and in `combinations()` call itself.

```cpp
struct MyTask : AnalysisTask {
Expand All @@ -384,6 +384,69 @@ combinations(tracks, tracks); // equivalent to combinations(CombinationsStrictly
combinations(filter, tracks, covs); // equivalent to combinations(CombinationsFullIndexPolicy(tracks, covs), filter, tracks, covs);
```

### Saving tables to file

Produced tables can be saved to file as TTrees. This process is customized by the command line option `--keep` (of the internal-dpl-AOD-writer). **Please be aware, that the format of the `keep` option as described here is preliminary and might be changed in future.**

`keep` is a comma-separated list of `DataOuputDescriptions`.

`keep`
```csh
DataOuputDescription1,DataOuputDescription2, ...
```

Each `DataOuputDescription` is a semicolon-separated list of 4 items

`DataOuputDescription`
```csh
table:tree:columns:file
```
and instructs the internal-dpl-AOD-writer, to save the columns `columns` of table `table` as TTree `tree` into files `file_x.root`, where `x` is an incremental number. The selected columns are saved as separate TBranches of TTree `tree`.

By default `x` is incremented with every time frame. This behavior can be modified with the command line option `--ntfmerge`. The value of `ntfmerge` specifies the number of time frames to merge into one file.

The first item of a `DataOuputDescription` is mandatory and needs to be specified, otherwise the `DataOuputDescription` is ignored. The other three items are optional and are filled by default values if missing.

The format of `table` is

`table`
```csh
AOD/tablename/0
```
`tablename` is the name of the table as defined in the workflow definition.

The format of `tree` is a simple string which names the TTree the table will be saved to. If `tree` is not specified then `tablename` will be used as TTree name.

`columns` is a slash(/)-separated list of column names., e.g.

`columns`
```csh
col1/col2/col3
```
The column names are expected to match column names of table `tablename` as defined in the respective workflow. Non-matching columns are ignored. The selected table columns are saved as separate TBranches with the same names as the corresponding table columns. If `columns` is not specified then all table columns will be saved.

`file` finally specifies the base name of the files the tables are saved to. The actual file names are composed as `file`_`x`.root, where 'x' is an incremental number. If `file` is not specified the default file name is used. The default file name can be set with the command line option `--res-file`. However, if `res-file` is missing then the default file name is set to `AnalysisResults`.

#### Valid example command line options

```csh
--keep AOD/UNO/0
# save all columns of table 'UNO' to TTree 'UNO' in files 'AnalysisResults'_x.root

--keep AOD/UNO/0::c2/c4:unoresults
# save columns 'c2' and 'c4' of table 'UNO' to TTree 'UNO' in files 'unoresults'_x.root

--res-file myskim --ntfmerge 50 --keep AOD/UNO/0:trsel1:c1/c2,AOD/DUE/0:trsel2:c6/c7/c8
# save columns 'c1' and 'c2' of table 'UNO' to TTree 'trsel1' in files 'myskim'_x.root and
# save columns 'c6', 'c7' and 'c8' of table 'DUE' to TTree 'trsel2' in files 'myskim'_x.root.
# Merge 50 time frames in each file.

```

#### Limitations

If the provided `--keep` option contains two `DataOuputDescriptions` with equal combination of `tree` and `file` then the processing will be stopped! It is not pssible to save two trees with equal name to a given file.

### Possible ideas

We could add a template `<typename C...> reshuffle()` method to the Table class which allows you to reduce the number of columns or attach new dynamic columns. A template wrapper could
Expand Down
1 change: 1 addition & 0 deletions Framework/Core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ o2_add_library(Framework
src/TMessageSerializer.cxx
src/TableBuilder.cxx
src/TableConsumer.cxx
src/DataOutputDirector.cxx
src/Task.cxx
src/TextControlService.cxx
src/Variant.cxx
Expand Down
9 changes: 4 additions & 5 deletions Framework/Core/include/Framework/CommonDataProcessors.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#include "Framework/DataProcessorSpec.h"
#include "Framework/InputSpec.h"
#include "Framework/DataOutputDirector.h"
#include "TTree.h"

#include <vector>
Expand All @@ -34,11 +35,9 @@ struct CommonDataProcessors {
/// not going to be used by the returned DataProcessorSpec.
static DataProcessorSpec getGlobalFileSink(std::vector<InputSpec> const& danglingInputs,
std::vector<InputSpec>& unmatched);
/// Helper function to create and write TTree
static void table2tree(TTree* tout,
std::shared_ptr<arrow::Table> table,
bool tupdate);
static DataProcessorSpec getGlobalAODSink(std::vector<InputSpec> const& danglingInputs);
/// writes inputs of kind AOD to file
static DataProcessorSpec getGlobalAODSink(std::vector<InputSpec> const& OutputInputs,
std::vector<bool> const& isdangling);
/// @return a dummy DataProcessorSpec which requires all the passed @a InputSpec
/// and simply discards them.
static DataProcessorSpec getDummySink(std::vector<InputSpec> const& danglingInputs);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
#include <string>
#include <vector>
#include <memory>
#include <regex>

namespace o2
{
Expand Down Expand Up @@ -66,6 +67,9 @@ struct DataDescriptorQueryBuilder {
///
/// Example for config: TPC/CLUSTER/0;ITS/TRACKS/1
static DataDescriptorQuery buildFromKeepConfig(std::string const& config);
static DataDescriptorQuery buildFromExtendedKeepConfig(std::string const& config);
static std::unique_ptr<data_matcher::DataDescriptorMatcher> buildNode(std::string const& nodeString);
static std::smatch getTokens(std::string const& nodeString);
};

} // namespace framework
Expand Down
175 changes: 175 additions & 0 deletions Framework/Core/include/Framework/DataOutputDirector.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
// Copyright CERN and copyright holders of ALICE O2. This software is
// distributed under the terms of the GNU General Public License v3 (GPL
// Version 3), copied verbatim in the file "COPYING".
//
// See http://alice-o2.web.cern.ch/license for full licensing information.
//
// In applying this license CERN does not waive the privileges and immunities
// granted to it by virtue of its status as an Intergovernmental Organization
// or submit itself to any jurisdiction.
#ifndef o2_framework_DataOutputDirector_H_INCLUDED
#define o2_framework_DataOutputDirector_H_INCLUDED

#include "TFile.h"

#include "Framework/DataDescriptorQueryBuilder.h"
#include "Framework/DataDescriptorMatcher.h"
#include "Framework/DataSpecUtils.h"
#include "Framework/InputSpec.h"

#include <regex>

namespace o2
{
namespace framework
{
namespace data_matcher
{

struct DataOutputDescriptor {
/// Holds information concerning the writing of aod tables.
/// The information includes the table specification, treename,
/// columns to save, and the file name

std::string tablename = "";
std::string treename = "";
std::string filename = "";
std::vector<std::string> colnames;
std::unique_ptr<DataDescriptorMatcher> matcher;

DataOutputDescriptor(std::string sin)
{
// sin is an item consisting of 4 parts which are separated by a ':'
// "origin/description/subSpec:treename:col1/col2/col3:filename"
// the 1st part is used to create a DataDescriptorMatcher
// the other parts are used to fill treename, colnames, and filename

// remove all spaces
auto s = remove_ws(sin);

// reset
treename = "";
colnames.clear();
filename = "";

// analyze the parts of the input string
static const std::regex delim1(":");
std::sregex_token_iterator end;
std::sregex_token_iterator iter1(s.begin(),
s.end(),
delim1,
-1);

// create the DataDescriptorMatcher
if (iter1 == end)
return;
auto a = iter1->str();
matcher = DataDescriptorQueryBuilder::buildNode(a);

// get the table name
auto m = DataDescriptorQueryBuilder::getTokens(a);
if (!std::string(m[2]).empty())
tablename = m[2];

// get the tree name
// defaul tree name is the table name
treename = tablename;
++iter1;
if (iter1 == end)
return;
if (!iter1->str().empty())
treename = iter1->str();

// get column names
++iter1;
if (iter1 == end)
return;
if (!iter1->str().empty()) {
auto cns = iter1->str();

static const std::regex delim2("/");
std::sregex_token_iterator iter2(cns.begin(),
cns.end(),
delim2,
-1);
for (; iter2 != end; ++iter2)
if (!iter2->str().empty())
colnames.emplace_back(iter2->str());
}

// get the filename
++iter1;
if (iter1 == end)
return;
if (!iter1->str().empty())
filename = iter1->str();
}

void setFilename(std::string fn) { filename = fn; }

void printOut()
{
LOG(INFO) << "DataOutputDescriptor";
LOG(INFO) << " table name: " << tablename.c_str();
LOG(INFO) << " file name : " << filename.c_str();
LOG(INFO) << " tree name : " << treename.c_str();
LOG(INFO) << " columns : " << colnames.size();
for (auto cn : colnames)
LOG(INFO) << " " << cn.c_str();
}

std::string remove_ws(const std::string& s)
{
std::string s_wns;
for (auto c : s)
if (!std::isspace(c))
s_wns += c;
return s_wns;
}
};

struct DataOutputDirector {

int ndod = 0;
std::string defaultfname;
std::vector<DataOutputDescriptor*> dodescrs;

std::vector<std::string> tnfns;

std::vector<std::string> fnames;
std::vector<int> fcnts;
std::vector<TFile*> fouts;

DataOutputDirector();
void reset();

// fill the DataOutputDirector with information from a
// keep-string
void readString(std::string const& keepString);

// fill the DataOutputDirector with information from a
// list of InputSpec
void readSpecs(std::vector<InputSpec> inputs);

// fill the DataOutputDirector with information from a json file
//readJson (std::string const& fnjson) {};

// get matching DataOutputDescriptors
std::vector<DataOutputDescriptor*> getDataOutputDescriptors(header::DataHeader dh);
std::vector<DataOutputDescriptor*> getDataOutputDescriptors(InputSpec spec);

// get the matching TFile
TFile* getDataOutputFile(DataOutputDescriptor* dod,
int ntf, int ntfmerge, std::string filemode);
void closeDataOutputFiles();

void setDefaultfname(std::string dfn) { defaultfname = dfn; }

void printOut();
};

} // namespace data_matcher
} // namespace framework
} // namespace o2

#endif // o2_framework_DataOutputDirector_H_INCLUDED
Loading