DPL: enable shmem (O2-1434)#3667
Conversation
* If no value is specified, use the default, i.e. 2GB. * If one of the subworkflows in the pipe chain specifies a value, use that value for everyone. * If more than one subworkflow in the pipe chain specifies a value, use the largest value for everyone.
|
@davidrohr this should give you the behaviour you requested. Can you please test it before I merge it? |
|
OK, thanks, I will try it. Before we merge this, we should wait for #3552 and alisw/alidist#2262 to have proper buffer sizes requested, and propper error handling when the buffer size is exceeded. |
| info.cmdLineArgs.erase(it, it + 2); | ||
| } | ||
| if (segmentSize == 0) { | ||
| segmentSize = 2000000000LL; |
There was a problem hiding this comment.
We'll eventually want to back the shm segments with huge pages, where the size become important (multiple of page size). Perhaps this should be (2ULL << 30)
There was a problem hiding this comment.
Do we want GB huge pages or 2/4 MB? GB huge pages need to be allocated at boot time if I remember correctly?
There was a problem hiding this comment.
I think 2M would be fine for us. Don't think there would be much benefit going to 1G. There are several reservation methods with different flexibility: kernel boot parameter, early during boot, etc. We'll have to see what works best for us.
There was a problem hiding this comment.
Notice that that's the current default in FairMQ, but I agree a 1ULL <<31 is probably better in any case. This could should actually be handled at FairMQ level, IMHO: always snap and align to page boundaries...
|
@ironMann do you understand why StfSender dies? |
|
@kft @davidrohr AliceO2Group/DataDistribution@a18de74 ? is this not needed here anymore? |
|
@ironMann : Yes, I assume so. AMD ROCm is still building with CC7 devtoolset 7. Has this workaround been removed, or why is it failing now? |
|
No it's still there. I think this problem spread to other binaries. It's weird, because the boost program_options is done in all binaries... Looks like linking of StfBuilder fails this time. |
|
I cannot do it easily, perhaps @ktf can. But I am quite sure it is the same problem. So I'd just merge the change in DataDistribution and then retry here. |
|
I cannot do it easily because it would require doing so in some sort of alidist / gpu builder, which at the moment we do not have (that's why it trickled through the CI). |
|
Anyways, I think it's unrelated to the PR and more to the 0.7.5 bump I did yesterday. @davidrohr can you confirm this behaves as expected for you? |
|
Then each test would require alidist bump? |
|
Also, I suggest we do the change for 2ULL << 30 in a separate PR once FairMQ changes it's default, otherwise people might be puzzled that the help says one thing and then we do another. @rbx any objection to change the default in FairMQ? |
|
I can build it manually in the gpucibuilder container, but I'll have to build from scratch since I don't have a setup anymore and don't have much time now. I can do it over the weekend. |
|
@ironMann the correct thing would be to have a CI for AliceO2Group/DataDistribution which builds any PR entering there with the current master of alidist. I can set it up, but not in the next few hours. |
Which 0.7.5 bump? Will test it asap. |
|
This already tried to build 0.7.5 of DataDistribution (in alidist). However the changes in between versions broke the gpu builder. |
This would be cool. Let me know if you need anything from me. |
|
We anyway decided in WP3 to merge the builders for checkcode / gpu / and linux o2. |
yeah, probably we should... The problem is that we are juggling too many builds at the moment and I need to do some reshuffling in the CI code before I am confident it will not break everything. For DataDistribution it's easy, because it's a small package at the tip of the dependency chain... |
|
I will let you know if I need anything, but I should have superpowers. |
|
OK, since this is urgent as O2 GPU CI is broken generally, I started a local build in the container now. Will report back in 2 hours when it is hopefully done. |
|
@ironMann : The below patch fixes the DataDistribution compilation in the GPU CI for me (as expected). Could you add it to DataDistribution and bump it in alidist? |
No objection, will change it to |
|
@davidrohr , I added the same for each executable. alisw/alidist#2273 |
|
@ktf : It seems applying the option works, but e.g. fails with Shouldn't I get at some point an error that the memory size was exceeded? |
|
ah, ok.... I think it's unrelated from this change. You get that message when something goes wrong with the merging of the workflow. I will try to see if I can create a better error message. |
|
I suggest we nevertheless merge this, objections? |
|
So this seems to work for me. Just one comment: If I set the shared segment size too small, the workflow fails to initialize. I get errors like |
|
No, the configuration is actually passed via stdout / stdin between the parent process and the children. I suspect when the memory is too low some printout scrambles the json passed between the two. I need to check. |
|
@ktf : The TPC change is done, you can merge this as soon as you want. |
|
With this PR I processingFcn of some devices not called. If I revert a5815c3, everything works. I've rebuild everything with fresh alidist, does not help. Problem (ubuntu18.04) can be reproduced by with a5815c3 reverted there are lot of messages like and @ktf any hint how to debug this? |
|
I've some issues as well on macOS. I suspect it's a regression in FairMQ 1.4.18, but have no further clue, for now. I made #3680 to revert. |
|
Ok, I will have a look. |
|
So, I can now confirm there were two problems:
|
|
I confirm that without |
|
@matthiasrichter does it ring any bell? |
|
@ktf yes, ITSMFT clusterized at the moment sends an empty vector ITS/CLUSTERS, this is a legacy feature which will be suppressed once MFT will be ready to use COMPCLUSTERS |
|
ok, but why does "empty vector" mean nullptr in the shmem case? |
|
Actually I remember there was a similar issue for the TPC. When there was no data in a sector, the event was lost before, and this was fixed by @matthiasrichter |
|
Yes, I remember. Let me find matthias fix. Maybe it was in a separate part of the code. |
Is that only relevant for mac or also for linux? |
|
I think it's relevant for both. What might mitigate it on linux is that maybe |
Fully agree, so what prevents us from using sha2? |
|
Indeed, its a similar problem like I have fixed for the InputRecord in #3578, we should have looked for other places immediately. Empty vector means zero-length message, and such messages have been handled differently in FairMQ. While ZMQ transport provides a valid pointer and message length zero, the SHM transport returns |
|
I also feel crypto hash would be the best. My only hesitation is requiring openssl just for that. Another option is to use crc32, it's included in boost and gives different output for small/similar inputs (just tested on mac and linux). While not as robust as SHA, it may suffice for this use case. It is also short enough that it doesn't need the truncation. |
|
Regarding |
|
Hm, but what exactly is the problem with openssl? I mean, it should be available everywhere, and for just SHA2, there should be no strict version requirement. On the other hand, for SHA2, we could also just use an ad-hoc implementation. Should not be that much code. |
|
@matthiasrichter There is one more place where the payload is checked for |
|
FairMQ v1.4.19 is now available with the fix for the shmem alignment (via MemoryResource), the |
This enables shared memory when on the same host and makes sure that the
--shm-segment-sizeoption is the same and the largest specified value for every device. If no device requires a special size, the default of 2GB is used.