Propagate MC information through ClusterFinder#676
Conversation
KlewinS
commented
Nov 16, 2017
- Propagate MC Truth information
- removed unnecessary Box-/HwCluster classes
- more efficient thread spawning in ClusterFinder (+configurable at runtime)
|
@wiechula could you please have a look, there are quite a lot of changes... |
|
Error while checking build/O2/o2 for 6abacccbbb81584900cf7915332e0cdaa1afe94a: Full log here. |
|
Error while checking build/O2/o2-dev-fairroot for 6abacccbbb81584900cf7915332e0cdaa1afe94a: Full log here. |
|
Error while checking build/o2/macos for 6abacccbbb81584900cf7915332e0cdaa1afe94a: Full log here. |
wiechula
left a comment
There was a problem hiding this comment.
Somehow I don't find the point where the MC truth for the clusters is actually calculated. It looks as if mcDigitTruth and mClusterMcLabelArray (in the HwClusterer) are not used.
Is this only a preparation?
|
HwClusterer.cxx, line 443 to 447 the MC Truth information of the digits is kept in this class and not propagated to the actual CF, but the digit-Index and event number of the digit. With those, the used digits can be identified again and the MC Truth information can be collected and combined at this point. |
|
I see, the code was folded, so I didn't spot it in the first palce. Did you check the results of the cluster finder before and after the changes using the same input digits? |
|
No, I didn't implement a sorting because I couldn't see why this should make a difference. In the end it would only change in which order the labels are inserted . Shouldn't the container take care of something like this? There the data index is used to lookup the header, wich is then incremented. Why makes the order here any difference? I didn't do a detailed analysis of the resulting found clusters, but the histograms in the root file look reasonable. |
|
Maybe I just don't understand how this container really works. I will add a sorting stage. |
|
In the digitisation there is an intermediate stage where per digit we count how often a label occurs. Then, before filling the MC truth output container the labels are sorted according to their occurrence. This is important, because otherwise you don't know which is the most probable label. Like this we know the first label ist the most probable. |
|
@wiechula sorting should now be there |
|
Error while checking build/O2/o2-dev-fairroot for 73f9380721034aa3dff7dcaa4e7d994f3f6cf654: Full log here. |
|
Error while checking build/o2/macos for 73f9380721034aa3dff7dcaa4e7d994f3f6cf654: Full log here. |
|
Error while checking build/O2/o2 for 73f9380721034aa3dff7dcaa4e7d994f3f6cf654: Full log here. |
|
Error while checking build/o2/macos for 44ce376d630392c35ee3cddbd133d4a3c4371dd3: Full log here. |
There was a problem hiding this comment.
@KlewinS it will be much faster to use labelCount[l]++ instead of try/catch
There was a problem hiding this comment.
indeed, using labelCount[l]++ faster
There was a problem hiding this comment.
i'd say "compacter" - nothing about std::map is fast :)
There was a problem hiding this comment.
speaking of which: since you use it in a tight loop - did you profile the code? maybe using a more performant container would help if you have a hotspot here - in many cases a simple sorted vector (or array) performs better with a binary lookup.
There was a problem hiding this comment.
@mkrzewic : don't exclude that sorted array will be faster, but the labelCount.at(l)++ is indeed faster than try/catch: e.g. for (int j=10;j--;) {labelCount[gRandom->Integer(10)]++; } is faster that the try/catch equivalent by factor >30.
There was a problem hiding this comment.
I don't think, this will be the bottleneck and it's done only once per event. But I will keep in mind that this could be improved if we have to speed up this part.
There was a problem hiding this comment.
@shahor02 factor 30 seems a bit much, but indeed, the "proper way" with the direct increment is faster by ~2X if we have mostly duplicated entries to ~7X is we always throw for all unique entries (with keys that are as large as here). (llvm 7.3, -O2, on my laptop).
d486d2e to
9a682cf
Compare
matthiasrichter
left a comment
There was a problem hiding this comment.
I have a few minor comments and questions for my understanding.
There was a problem hiding this comment.
better set these parameters as defaults in the extended constructor and forbid that one (Clusterer() = delete;)
There was a problem hiding this comment.
"pointer to HW Clusterer"
There was a problem hiding this comment.
just out of curiosity, is the clusterization running on simulated and HW clusters in the same process so that it makes sense to carry the two types of arrays? Or is it more a kind of a specialization since the task is running on either the one or the other?
There was a problem hiding this comment.
There is still this BoxClusterer, which could in principle also be used. Although I think the BoxClusterer doesn't work anymore.
There was a problem hiding this comment.
is this some kind of FIFO, or what does the Last mean? If I understand correctly, this is supposed to keep a set of containers for multiple events, correct? I don't see it used in the code though. How does this relate to the above defined mClusterMcLabelArray?
There was a problem hiding this comment.
Yes, it is a kind of FIFO and is used only in HwClusterer.cxx line 460.
It stores the MC Truth information of the last event(s) because the actual ClusterFinder doesn't work only on digits of the "current" event. The digits there are kept also across event-boundaries to enable the continuous simulation.
During this, I also found, that there are some events in the simulation consisting of only very few timebins (even only a single timebin). That is, why it is not enough to store only the information from the "last" event, but from several previous events.
There was a problem hiding this comment.
Is anything missing to accept this pull request?
fix gsl include directory WIP: Cluster MCLabels
WIP: Cluster MCLabels, cleaning up WIP: Cluster MCLabels, fix buffering of MC labels of previous events
WIP: Cluster MCLabels, cleaning up continued finished Cluster MCLabels + cleaning up
also wrong include paths, introduced by to moving Cluster related classes into reconstruction
|
|
||
| // Register MC Truth output container | ||
| mHwClustersMCTruthArray = std::make_unique<MCLabelContainer>(); | ||
| mgr->Register("TPCClusterHWMCTruth", "TPC", mHwClustersMCTruthArray.get(), kTRUE); |
There was a problem hiding this comment.
a tiny thing here: If we register the branch like this, we have the well known problem that the TBrowser cannot draw from the branch (see ROOT issue: https://sft.its.cern.ch/jira/browse/ROOT-8993)
According to the ROOT team, the correct way to do it (in this case) is to add a trailing dot to the branch name like this
TPCClusterHWMCTruth.; I verified that this work but it needs adjustment also in the macro reading the clusters.
Shall we do it for the sake of usabibility? I can add this fix and merge it quickly.
There was a problem hiding this comment.
Yes, this sounds good. For me it is fine if you do this.
Please go ahead
* Add whitespace formatting check * Mark incorrect copyright notices in GitHub UI