| layout | global |
|---|---|
| displayTitle | GraphFrames Quick-Start Guide |
| title | Quick-Start Guide |
| description | GraphFrames GRAPHFRAMES_VERSION guide for getting started quickly |
This quick-start guide shows how to get started using GraphFrames. After you work through this guide, move on to the User Guide to learn more about the many queries and algorithms supported by GraphFrames.
- Table of contents {:toc}
If you are new to using Apache Spark, refer to the Apache Spark Documentation and its Quick-Start Guide for more information.
If you are new to using Spark packages, you can find more information in the Spark User Guide on using the interactive shell. You just need to make sure your Spark shell session has the package as a dependency.
The following example shows how to run the Spark shell with the GraphFrames package.
We use the --packages argument to download the graphframes package and any dependencies automatically.
{% highlight bash %} $ ./bin/spark-shell --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 {% endhighlight %}
{% highlight bash %} $ ./bin/pyspark --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11 {% endhighlight %}
The above examples of running the Spark shell with GraphFrames use a specific version of the GraphFrames
package. To use a different version, just change the last part of the --packages argument;
for example, to run with version 0.1.0-spark1.6, pass the argument
--packages graphframes:graphframes:0.1.0-spark1.6.
The following example shows how to create a GraphFrame, query it, and run the PageRank algorithm.
// Query: Get in-degree of each vertex. g.inDegrees.show()
// Query: Count the number of "follow" connections in the graph. g.edges.filter("relationship = 'follow'").count()
// Run PageRank algorithm, and show results. val results = g.pageRank.resetProbability(0.01).maxIter(20).run() results.vertices.select("id", "pagerank").show() {% endhighlight %}
g.inDegrees.show()
g.edges.filter("relationship = 'follow'").count()
results = g.pageRank(resetProbability=0.01, maxIter=20) results.vertices.select("id", "pagerank").show() {% endhighlight %}