Is your feature request related to a problem? Please describe.
At the moment there are two internal joins in Pregel:
- edges LEFT JOIN states ON id = src -> triplets
- triplets LEFT JOIN states ON id = dst -> full-triplets
It is a right way to generate full triplets and I still think that it is better than GraphX model (materialize and update triplets) because memory is more important that cpu.
But. For some algorithms (LPA, PageRank, K-Core, Rocha-Thatte, etc.) the destination state is not required at all!
Describe the solution you would like
A new optional flag requireDstState with true by default (current behavior). If it is false, skip the second join at all.
Provide this only in Scala, cause it is a kind of "developer API" that require from people to understand what do they do. For example, if user requested sendMsgToDst or any dst columns in message generation and at the same time turn of the described above flag, there will be tricky and complicated runtime errors. I would like to keep it in public JVM API only.
Also I want to update all the standard library (LPA, K-Core, Rocha-Thatte) with this optimization.
Component
Additional context
Are you planning on creating a PR?
Is your feature request related to a problem? Please describe.
At the moment there are two internal joins in Pregel:
It is a right way to generate full triplets and I still think that it is better than GraphX model (materialize and update triplets) because memory is more important that cpu.
But. For some algorithms (LPA, PageRank, K-Core, Rocha-Thatte, etc.) the destination state is not required at all!
Describe the solution you would like
A new optional flag
requireDstStatewithtrueby default (current behavior). If it isfalse, skip the second join at all.Provide this only in Scala, cause it is a kind of "developer API" that require from people to understand what do they do. For example, if user requested
sendMsgToDstor anydstcolumns in message generation and at the same time turn of the described above flag, there will be tricky and complicated runtime errors. I would like to keep it in public JVM API only.Also I want to update all the standard library (LPA, K-Core, Rocha-Thatte) with this optimization.
Component
Additional context
Are you planning on creating a PR?