Skip to content

Implement first version of EMCAL hit creation#523

Merged
sawenzel merged 5 commits into
AliceO2Group:devfrom
mfasDa:feature-emcalhit
Aug 31, 2017
Merged

Implement first version of EMCAL hit creation#523
sawenzel merged 5 commits into
AliceO2Group:devfrom
mfasDa:feature-emcalhit

Conversation

@mfasDa
Copy link
Copy Markdown
Collaborator

@mfasDa mfasDa commented Aug 30, 2017

  • Bug fix (temporary) in Shishkebab geometry
  • Small fix in EMCAL geometry
  • Remove unused variable (shut) in Hit
  • Fix z-coordinate of Hit
  • Fix add function for hit: All hits from the same track are
    added to 1 hit. For this a lookup table is used to map the hits
  • Handle calculation of the module ID: Module ID needs to be
    obtained from the copy number of the volume and the volume ID
    over various levels in the geometry tree
  • Process Hits add currently hits once a volume is passed

- Bug fix (temporary) in Shishkebab geometry
- Small fix in EMCAL geometry
- Remove unused variable (shut) in Hit
- Fix z-coordinate of Hit
- Fix add function for hit: All hits from the same track are
  added to 1 hit. For this a lookup table is used to map the hits
- Handle calculation of the module ID: Module ID needs to be
  obtained from the copy number of the volume and the volume ID
  over various levels in the geometry tree
- Process Hits add currently hits once a volume is passed
{
class ShishKebabTrd1Module;

using doublevect = std::vector<double>;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typenames should follow the type convention DoubleVect. This will be enforced soon.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not use "using" in headers. Moreover it is considered bad practice to have such a generic aliases, as they confuse the reader. You read code far more times you write it, so you should not worry about saving a few characters (of course for a specific type it's a different matter).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree with you to ban using directives from header files in case they are used for removing namespaces as this is a super dangerous procedure just in order to save a few characters, which is a wide-spread approach in AliRoot/AliPhysics code. Here however the using directive is used as a typedef, which is even declare within the EMCAL namespace. Furthermore I don't really see where it has any impact on readability: The name "DoubleVect" should be self-explaining.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree somewhat with @ktf that "std::vector" is a bit clearer than DoubleVect and the gain in typing is in any case minor.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if doublevect is so clear, can be vector of type double, or a somehow duplicated vector, some vector which is there twice. My first association goes more in the second direction, and that makes a simple thing confusing

Copy link
Copy Markdown
Member

@ktf ktf Aug 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DoubleVect serves no purpose but to save a couple of keystrokes and hide the actual type, possibly generating confusion. If you really need a typedef, you should give it some specific type meaning like "WeightVector", "EnergyVector" or similar. In any case do not keep it at namespace level, but hide it inside the class itself. Having it at namespace level means that everyone, in any namespace, will have to agree on what your type is, which is bad.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will change this back to std::vector in the commit together with the std::set issue.

@sawenzel
Copy link
Copy Markdown
Collaborator

@dberzano : Is the macos build down?

@dberzano
Copy link
Copy Markdown
Contributor

@sawenzel that's a bug in the continuous builder not appearing on Linux that I believe I have fixed now, hold on.

@dberzano
Copy link
Copy Markdown
Contributor

...aaand it's fixed.

Double_t mBirkC1;
Double_t mBirkC2;

std::set<TrackHit>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::sets are very expensive, both for both per item overhead and per look up. Depending on your vector size, you might want to use a sorted std::vector<TrackHit> and do a binary search or a linear search. For a nice article about binary vs linear search: https://dirtyhandscoding.wordpress.com/2017/08/25/performance-comparison-linear-search-vs-binary-search/. If your vector is very big, you might want to sort a separate index vector and keep std::vector<TrackHit> unsorted.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the idea is to use a sorted container, and yes, binary search would be the preferred option as search algorithm. However even a linear search should be more efficient than the current implementation which is performing a linear search on an unsorted list.

My worry for the sorted vector is that I have to sort it completely each time I add a new node so I was wondering whether a std::set should be better suited for this task. However the map structure is very lightweight (integer key and pointer to mapped object) so also the sorting shouldn't be too expensive.

The size of the vector is proportional to the amount of particles passing the EMCAL (one hit per particle - also secondary - per EMCAL module).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you should somehow refactor your algorithm to perform all the insertions first, then sort (once) and then do the lookups. Alternatively, you could keep an array of fixed size array where you bin your hits using a lower precision key. In case there is a conflict you create a new fixed size array and use that for the conflicting keys.
In case there are no conflicts (i.e. you only have unique lower precision keys), you end up with a vector sorted by construction. In case there is conflicts you end up with K sorted vectors where K is the maximum number of conflicts you have for a given shorter key.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose is to have a lookup during the stepping I don't see how to perform all insert operations before sorting. As the set is used for search operations it needs to expand in time. As as the amount of entries is a-priori unknown a fixed size array is also not possible.

The second proposal should be nothing else than a hashtable, right? That could work. I think std::unordered_map is the c++ stl implementation for it. I will give it a try.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically std::unordered_map is implemented as a vector of lists, so it's actually slightly different from what I am proposing. Keep in mind that all these structures are optimised for insertion / removal while you most likely want insert once, lookup many in hopefully compact store.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you propose is a hash table (table where access is handled by an index which is obtained by a hash function of a key - even if you explained it differently). And from what I read std::unordered_map implements a hash table even though the internal implementation is different. And the lookup complexity follows what is to be expected for hashtables.

Indeed lookup operations are expected to be much more abundant then insert / remove operations.

I would prefer sticking to containers which are on the market which show good performance instead of reimplementing the custom solutions. If the std::unordered_map fulfills my needs I see no reason to implement something myself. If you are aware of something better on the market - which is within our software stack - I am open for discussion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lookup complexity can be the same, but the constants times involved are not the same and neither are per-item overheads. This is real life, not Algorithms 1 course. ;-) Of course the choice should be done based on a cost - benefit analysis, if you are happy with the unordered_map, it's for sure better than an ordered one.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the point in mind and I replace the unordered_map once I have something better. For the moment it is "good enough" and it should be more efficient than the version implemented in AliRoot.

- Remove typedefs for std::vector<int/double>
- Replace std::set by std::unordered_map for the lookup table
  in order to improve on speed
auto hitentry = mEventHits.find(trackID);
if (hitentry != mEventHits.end()) {
myhit = hitentry->mHit;
myhit = hitentry->second;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfasDa : I believe this mechanism might actually not be necessary. The Geant engine will ( I will confirm this ) finish processing a track until it moves to another track. So the trackID will essentially be the same until we change it to the next track. You can hence easily detect when a new track is seen. When this is the case you stop accumulation of the hit and start a new one. We use this mechanism in TPC: See Detector.cxx:312. In this case, it is not necessary to have this lookup structure.

Instead of lookup table store index for curent track / cell, and add a
pointer to the current hit object.
// - Inside different cell
// - First track of the event
std::cout << "New track / cell started\n";
TLorentzVector pos, mom;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a tiny optimization hint: This will create (construct) TLorentzVector again and again. (I have seen this is the past). Consider making mPosCache, mMomCache private member variables of the detector.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even less important (especially if you then use it as a data member of the class): consider using ROOT::Math::LorentzVector, most of the time it can be used as a drop-in replacement of TLorentzVector, it does not have the TObject overhead and you can use the float version for half the size in memory.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mpuccio : How can this be used if the VMC interface expects TLorentzVector?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, could be done, although I normally prefer to not have local variables as local variables. On the long term, in this case maybe one could discuss with Ivana to add the same member functions with primitive output types (array of doubles) in order to stay minimalistic. All we need is the x and p vector, the functionality of TLorentzVector stays largely unused.

…onstruction of an expensive object multiple times
@sawenzel
Copy link
Copy Markdown
Collaborator

@mfasDa : Thanks for the changes.

@sawenzel sawenzel merged commit fc43839 into AliceO2Group:dev Aug 31, 2017
@mfasDa mfasDa deleted the feature-emcalhit branch August 31, 2017 14:38
mbroz84 pushed a commit to mbroz84/AliceO2 that referenced this pull request Mar 16, 2022
* Add tracking efficiency calculation using custom index
* Improve aliases
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

6 participants