Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
136 commits
Select commit Hold shift + click to select a range
f76668d
Make python/graphframes/examples/data folder exist
rjurney Dec 25, 2024
13c5e74
Update user-guide.md section on motif finding to point at motif findi…
rjurney Dec 25, 2024
1145f6f
Added motif finding tutorial to index.md
rjurney Dec 25, 2024
1319434
Ignore examples data dir, .vscode
rjurney Dec 25, 2024
a50c5a3
Minor grammatical fixes to docs release guide
rjurney Dec 25, 2024
e3a5ca3
We are all grownups here. We can wrap our own text :)
rjurney Dec 25, 2024
76a7ea7
Text wrapping for README.md
rjurney Dec 25, 2024
a013c5f
Ankur Dave's last name is Dave, not Ankur
rjurney Dec 25, 2024
9a8d845
Refer to motif finding tutorial from website header
rjurney Dec 25, 2024
77f0233
Major README overhaul. I am classically bad at Scala
rjurney Dec 25, 2024
333dc1b
Added motif output
rjurney Dec 25, 2024
ee843b3
Added requirements for motif finding tutorial: click, py7zr and reque…
rjurney Dec 25, 2024
058cdfb
Script for motif finding tutorial, to download and uncompress Stack E…
rjurney Dec 25, 2024
8d84baa
Working stackexchange data dump graph building script
rjurney Dec 25, 2024
d1cbfe4
Minor README improvements.
rjurney Dec 25, 2024
0654a3e
In progress motif tutorial. Covered downloading the data and building…
rjurney Dec 25, 2024
d5becef
Added section 'Learn GraphFrames' that links to the motif finding tut…
rjurney Dec 25, 2024
781a13e
install and use
rjurney Dec 25, 2024
e872602
More on entity resolution re: connected components
rjurney Dec 25, 2024
a323879
Shorten line into two lines
rjurney Dec 25, 2024
e87a985
More on entity resolution
rjurney Dec 25, 2024
c18f06f
Split long lines
rjurney Dec 25, 2024
6dd0375
Moved example graph.py to stackexchange.py due to existence of graphs.py
rjurney Dec 25, 2024
fc80c4e
Changed graph.py path in motif tutorial to stackexchange.py
rjurney Dec 25, 2024
f40afe2
Removed memory settings
rjurney Dec 26, 2024
f644db8
Added utils file for motif tutorial
rjurney Dec 26, 2024
8dc6432
Now loading the graph nodes/edges and counting the types
rjurney Dec 26, 2024
d65ff29
Motif finding tutorial script
rjurney Dec 26, 2024
587c79f
Long note explaining there is one node type in a GraphFrame.
rjurney Dec 26, 2024
ea8f4da
Minor work on motif tutorial, removed connected components as it hits…
rjurney Dec 26, 2024
f4de3af
Remove memory settings
rjurney Dec 26, 2024
1fba393
Minor cleanup of motif tutorial code
rjurney Dec 26, 2024
17bb0cf
Added section on what are graphlets and network motifs with an image …
rjurney Dec 26, 2024
6548658
Fixed margins on graphlet image
rjurney Dec 26, 2024
6943029
Caption to motif image
rjurney Dec 26, 2024
fe56706
Added url to directed motif list
rjurney Dec 26, 2024
d49d29f
Braindead commit
rjurney Dec 27, 2024
d42f0bd
More work on motif tutorial. Fun :)
rjurney Jan 2, 2025
71fff1a
Simplified SparkContext code - don't need it.
rjurney Jan 2, 2025
867361b
Trying to fix SparkSession script... about to just pull the initializ…
rjurney Jan 2, 2025
4633fce
Fixed network motif finding description in index.md to be clearer
rjurney Jan 3, 2025
1e503e5
Shortened long lines
rjurney Jan 3, 2025
9d0b761
Remove SparkSession setup to prevent build from breaking
rjurney Jan 3, 2025
dc1c26c
Create uninitialized spark variable
rjurney Jan 3, 2025
21aab04
Now setting SparkSession to None
rjurney Jan 3, 2025
ebbf9d6
Created module python.graphframes.tutorials to separate the stackexch…
rjurney Jan 3, 2025
7ee98e4
Removed __init__.py to do away with graphframes.tutorials module. Now…
rjurney Jan 3, 2025
6376395
Added link to spark packages
rjurney Jan 7, 2025
46ccb66
Doubled RAM for run-tests.sh to make tests pass
rjurney Jan 11, 2025
50c795d
More RAM did not help, back to 2g
rjurney Jan 11, 2025
de5f46e
Trying to return SparkSession setup now that I removed __init__.py fr…
rjurney Jan 11, 2025
37d46c7
Removed earlier options for JDK 11 which broke JDK 8 :(
rjurney Jan 11, 2025
afb95a2
Added links to Google Group and GraphFrames channel on GraphGeeks dis…
rjurney Jan 11, 2025
29e7f33
Merge branch 'master' of github.com:graphframes/graphframes into rjur…
rjurney Jan 18, 2025
4083ea4
Bumped versions on the Dockerfile and made it run again
rjurney Jan 18, 2025
73f14b0
Cleanup of docs code
rjurney Jan 18, 2025
5e64876
python/graphframes/examples/utils.py -> python/graphframes/tutorials/…
rjurney Jan 18, 2025
2be6c95
Double init and type hint cleanup
rjurney Jan 18, 2025
185186a
In progress motif tutorial
rjurney Jan 18, 2025
f074f87
More writing on the motif tutorial
rjurney Jan 18, 2025
64f23aa
Really getting out the bones :)
rjurney Jan 18, 2025
5c6f051
Moved the spiel about knowledge discovery down
rjurney Jan 18, 2025
0c3a36e
Trying to get the big shebang to work
rjurney Jan 18, 2025
5d4fb8c
The motif tutorial code actually runs, dunno about the big result yet
rjurney Jan 18, 2025
4074a16
Some rewording cleanup
rjurney Jan 18, 2025
f1523d5
Trying to upgrade the Python CI version to 3.11 as I upgraded Dockerfile
rjurney Jan 18, 2025
f213805
G11 motif diagram - 4 nodes, edges pointing in.
rjurney Jan 18, 2025
0e66395
Resize G11 graphlet
rjurney Jan 18, 2025
3875292
Bigger G11 to match G10
rjurney Jan 18, 2025
35c6a0c
More work on finale motif
rjurney Jan 18, 2025
61be774
Got the G11 sized right
rjurney Jan 18, 2025
a14d935
Much work on the data table for the raw paths of the motif...
rjurney Jan 18, 2025
d9ced29
Minor cleanup of release guide
rjurney Jan 31, 2025
8cdcdf9
Cleanup
rjurney Jan 31, 2025
8c392bc
Make python/graphframes/tutorials/data exist
rjurney Jan 31, 2025
08ee330
Moved download.py from examples to tutorials folder
rjurney Jan 31, 2025
82fe372
Minor edit to title
rjurney Jan 31, 2025
a27daf2
Ignore tutorials data folder rather than examples
rjurney Jan 31, 2025
8f5ef9a
Moved references to examples to tutorials module
rjurney Jan 31, 2025
b2b3aa8
Fixed stale paths pointing at python/graphframes/examples --> python/…
rjurney Jan 31, 2025
c1dc313
Now comma formatting node and edge type counts
rjurney Jan 31, 2025
1d19683
Was displaying edge counts twice
rjurney Jan 31, 2025
3ec886d
Validating that GraphFrame edges all point at valid node ids
rjurney Jan 31, 2025
1d5a1fa
New h2 Validating GraphFrames
rjurney Jan 31, 2025
9450f5d
Display 3 of our initial path
rjurney Jan 31, 2025
2eed755
Trimmed table again, was too wide in rendered doc
rjurney Jan 31, 2025
9ad573c
Put back the code to visualize the second motif we search for
rjurney Jan 31, 2025
7278ebc
Commas in numbers galore
rjurney Jan 31, 2025
b6bb86d
Swapped out directed motif image with something better. Added commas …
rjurney Feb 1, 2025
76ab1cc
Converted pythonic testing from nose to pytest. Pulled Python 2 refer…
rjurney Feb 1, 2025
5064713
Merge branch 'master' of github.com:graphframes/graphframes into rjur…
rjurney Feb 1, 2025
0910f65
All but 2 tests pass :)
rjurney Feb 1, 2025
9a6b11d
Graphframes' new logo :)
rjurney Feb 1, 2025
1515e7f
Python 3.11.9 in CI to 3.11.11
rjurney Feb 1, 2025
a3d4167
Fix a test
rjurney Feb 1, 2025
0d1964f
Build an actual Python package for once...
rjurney Feb 1, 2025
7353b07
Back to Python 3.9.20
rjurney Feb 1, 2025
31321b5
Back to Python 3.9
rjurney Feb 1, 2025
867839c
Python 3.9.20 --> 3.9.21
rjurney Feb 1, 2025
9196b10
Now loading built wheel in unit tests... doesn't matter
rjurney Feb 1, 2025
a63913d
Require pyspark 2.0.0
rjurney Feb 1, 2025
74e9c6a
Ignore egg
rjurney Feb 1, 2025
f847ebb
Ignore the python files associated with building a package
rjurney Feb 1, 2025
8696b79
Bumped versions for scala 2.12 and Spark 3.5
rjurney Feb 1, 2025
d87d5a0
Moved pytest out from main requirements.txt and put it and Sphinx int…
rjurney Feb 1, 2025
0630b57
Building a Python package now :)
rjurney Feb 1, 2025
3c3ab2e
Back to Python 3.11
rjurney Feb 2, 2025
037da48
Bump version to 0.8.5 since we did release a 0.8.4
rjurney Feb 2, 2025
e172020
Install the dev dependencies to run tests
rjurney Feb 2, 2025
a9ec785
Additional Python requirements
rjurney Feb 3, 2025
c275966
Reworking end because that motif was too expensive to calculate
rjurney Feb 3, 2025
d95e8ee
New 4-node pattern I'm tracking down...
rjurney Feb 3, 2025
ef2e6a1
New Excel images as I sort and label motifs
rjurney Feb 3, 2025
c4f7cfa
Starting to hand label the paths in Excel to explain them to myself :)
rjurney Feb 3, 2025
091ef3e
Fixed out of place CSV save
rjurney Feb 3, 2025
0092b12
Change pre-commit version
rjurney Feb 3, 2025
ffc7a2e
Ignore Python build files and doc build files
rjurney Feb 3, 2025
11530dc
Ignore Python build folders
rjurney Feb 3, 2025
7dd5c15
Build the Python package before running unit tests
rjurney Feb 3, 2025
e38d081
Get version.sbt version
rjurney Feb 3, 2025
99cee06
Now putting version in one place: version.sbt
rjurney Feb 3, 2025
548a848
Merge branch 'master' of github.com:graphframes/graphframes into rjur…
rjurney Feb 4, 2025
1ad3e09
Almost done with the motif search tutorial...
rjurney Feb 4, 2025
b986a86
Merge in latest master
rjurney Feb 4, 2025
6e38579
Split Posts into Questions and Answers. User-[Answers]->Post changed …
rjurney Feb 7, 2025
d9742d1
Total rework of graph and accordingly the motif tutorial
rjurney Feb 7, 2025
f359fef
Rewrite done through simple motifs
rjurney Feb 7, 2025
205985e
+G17 directed motif image
rjurney Feb 7, 2025
f98cdf3
Swapped out G22 image for G17
rjurney Feb 7, 2025
c68e32f
G17 --> G30
rjurney Feb 7, 2025
5211cf7
Wrapped initial network motif tutorial
rjurney Feb 7, 2025
ea89dac
Now using properties in an aggregation of motif paths
rjurney Feb 7, 2025
eb57303
Remove unused images
rjurney Feb 7, 2025
4dc9cc1
More unused images
rjurney Feb 7, 2025
74432f7
Sync'd tutorial with motif.py
rjurney Feb 7, 2025
e78c654
Merge branch 'master' of github.com:graphframes/graphframes into rjur…
rjurney Feb 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ jobs:
matrix:
include:
- spark-version: 3.5.4
scala-version: 2.12.18
python-version: 3.9.19
scala-version: 2.12.20
python-version: 3.11.11
runs-on: ubuntu-22.04
env:
# define Java options for both official sbt and sbt-extras
Expand All @@ -35,8 +35,11 @@ jobs:
run: |
python -m pip install --upgrade pip wheel
pip install -r ./python/requirements.txt
pip install -r ./python/requirements-dev.txt
pip install pyspark==${{ matrix.spark-version }}
- name: Test
run: |
python python/setup.py install
python python/setup.py bdist_wheel
export SPARK_HOME=$(python -c "import os; from importlib.util import find_spec; print(os.path.join(os.path.dirname(find_spec('pyspark').origin)))")
./python/run-tests.sh
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,13 @@ project/plugins/project/

# Mac
*.DS_Store
.vscode

# Python specific
python/build
python/dist
build/lib
python/graphframes.egg-info
python/graphframes/tutorials/data
python/docs/_build
python/docs/_site
10 changes: 5 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
FROM ubuntu:22.04

ARG PYTHON_VERSION=3.8
ARG PYTHON_VERSION=3.9
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for a docker file that we don't use at here at github..
put this in one PR
Like update dockerFile..

ARG DEBIAN_FRONTEND=noninteractive

RUN apt-get update && \
apt-get install -y wget bzip2 build-essential openjdk-8-jdk ssh sudo && \
apt-get install -y wget bzip2 build-essential openjdk-11-jdk ssh sudo && \
apt-get clean

# Install Spark and update env variables.
ENV SCALA_VERSION 2.12.17
ENV SPARK_VERSION "3.4.1"
ENV SPARK_BUILD "spark-${SPARK_VERSION}-bin-hadoop3.2"
ENV SCALA_VERSION 2.12.20
ENV SPARK_VERSION "3.5.4"
ENV SPARK_BUILD "spark-${SPARK_VERSION}-bin-hadoop3"
ENV SPARK_BUILD_URL "https://dist.apache.org/repos/dist/release/spark/spark-${SPARK_VERSION}/${SPARK_BUILD}.tgz"
RUN wget --quiet "$SPARK_BUILD_URL" -O /tmp/spark.tgz && \
tar -C /opt -xf /tmp/spark.tgz && \
Expand Down
159 changes: 145 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,34 +6,165 @@

# GraphFrames: DataFrame-based Graphs

This is a package for DataFrame-based graphs on top of Apache Spark.
Users can write highly expressive queries by leveraging the DataFrame API, combined with a new
API for motif finding. The user also benefits from DataFrame performance optimizations
within the Spark SQL engine.
This is a package for DataFrame-based graphs on top of Apache Spark. Users can write highly expressive queries by leveraging the DataFrame API, combined with a new API for network motif finding. The user also benefits from DataFrame performance optimizations within the Spark SQL engine. GraphFrames works in Java, Scala, and Python.

You can find user guide and API docs at https://graphframes.github.io/graphframes.
You can find user guide and API docs at https://graphframes.github.io/graphframes

## Installation and Quick-Start

The easiest way to start using GraphFrames is through the [Spark Packages system](https://spark-packages.org/package/graphframes/graphframes). Just run the following command:

```bash
# Interactive Scala/Java
$ spark-shell --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

graphframes:0.8.3 .4 I belive?


# Interactive Python
$ pyspark --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12

# Submit a script in Scala/Java/Python
$ spark-submit --packages graphframes:graphframes:0.8.3-spark3.5-s_2.12 script.py
```

Now you can create a GraphFrame as follows.

In Python:

```python
from pyspark.sql import SparkSession
from graphframes import GraphFrame

spark = SparkSession.builder.getOrCreate()

nodes = [
(1, "Alice", 30),
(2, "Bob", 25),
(3, "Charlie", 35)
]
nodes_df = spark.createDataFrame(nodes, ["id", "name", "age"])

edges = [
(1, 2, "friend"),
(2, 1, "friend"),
(2, 3, "friend"),
(3, 2, "enemy") # eek!
]
edges_df = spark.createDataFrame(edges, ["src", "dst", "relationship"])

g = GraphFrame(nodes_df, edges_df)
```

Now let's run some graph algorithms at scale!

```python
g.inDegrees.show()

# +---+--------+
# | id|inDegree|
# +---+--------+
# | 2| 2|
# | 1| 1|
# | 3| 1|
# +---+--------+

g.outDegrees.show()

# +---+---------+
# | id|outDegree|
# +---+---------+
# | 1| 1|
# | 2| 2|
# | 3| 1|
# +---+---------+

g.degrees.show()

# +---+------+
# | id|degree|
# +---+------+
# | 1| 2|
# | 2| 4|
# | 3| 2|
# +---+------+

g2 = g.pageRank(resetProbability=0.15, tol=0.01)
g2.vertices.show()

# +---+-----+---+------------------+
# | id| name|age| pagerank|
# +---+-----+---+------------------+
# | 1| John| 30|0.7758750474847483|
# | 2|Alice| 25|1.4482499050305027|
# | 3| Bob| 35|0.7758750474847483|
# +---+-----+---+------------------+

# GraphFrames' most used feature...
# Connected components can do big data entity resolution on billions or even trillions of records!
# First connect records with a similarity metric, then run connectedComponents.
# This gives you groups of identical records, which you then link by same_as edges or merge into list-based master records.
sc.setCheckpointDir("/tmp/graphframes-example-connected-components") # required by GraphFrames.connectedComponents
g.connectedComponents().show()

# +---+-----+---+---------+
# | id| name|age|component|
# +---+-----+---+---------+
# | 1| John| 30| 1|
# | 2|Alice| 25| 1|
# | 3| Bob| 35| 1|
# +---+-----+---+---------+

# Find frenemies with network motif finding! See how graph and relational queries are combined?
(
g.find("(a)-[e]->(b); (b)-[e2]->(a)")
.filter("e.relationship = 'friend' and e2.relationship = 'enemy'")
.show()
)

# These are paths, which you can aggregate and count to find complex patterns.
# +------------+--------------+----------------+-------------+
# | a| e| b| e2|
# +------------+--------------+----------------+-------------+
# |{2, Bob, 25}|{2, 3, friend}|{3, Charlie, 35}|{3, 2, enemy}|
# +------------+--------------+----------------+-------------+
```

## Learn GraphFrames

To learn more about GraphFrames, check out these resources:

* [GraphFrames Network Motif Finding Tutorial](https://graphframes.github.io/graphframes/docs/_site/motif-tutorial.html)
* [Introducing GraphFrames](https://databricks.com/blog/2016/03/03/introducing-graphframes.html)
* [On-Time Flight Performance with GraphFrames for Apache Spark](https://databricks.com/blog/2016/03/16/on-time-flight-performance-with-graphframes-for-apache-spark.html)

## GraphFrames on PyPI is Unofficial

The project is not in ownership or control of the [graphframes PyPI package](https://pypi.org/project/graphframes/) (installs 0.6.0) or [graphframes-latest PyPI package](https://pypi.org/project/graphframes-latest/) (installs 0.8.3). We recommend using the Spark Packages system to install the latest version of GraphFrames. The PyPI packages are not maintained by the GraphFrames project.

If you are in control of one of these packages, please reach out to us to discuss how we can work together to keep them up to date. Hopefully this situation will be addressed in the near future.

See [Installation and Quick-Start](#installation-and-quick-start) for the best way to install and use GraphFrames.

## GraphFrames Internals

To learn how GraphFrames works internally to combine graph and relational queries, check out the paper [GraphFrames: An Integrated API for Mixing Graph and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note about the google usergroup?

Relational Queries, Dave et al. 2016](https://people.eecs.berkeley.edu/~matei/papers/2016/grades_graphframes.pdf).

## Building and running unit tests

To compile this project, run `build/sbt assembly` from the project home directory.
This will also run the Scala unit tests.
To compile this project, run `build/sbt assembly` from the project home directory. This will also run the Scala unit tests.

To run the Python unit tests, run the `run-tests.sh` script from the `python/` directory.
You will need to set `SPARK_HOME` to your local Spark installation directory.
To run the Python unit tests, run the `run-tests.sh` script from the `python/` directory. You will need to set `SPARK_HOME` to your local Spark installation directory.

## Release new version

Please see guide `dev/release_guide.md`.

## Spark version compatibility

This project is compatible with Spark 2.4+. However, significant speed improvements have been
made to DataFrames in more recent versions of Spark, so you may see speedups from using the latest
Spark version.
This project is compatible with Spark 2.4+. However, significant speed improvements have been made to DataFrames in more recent versions of Spark, so you may see speedups from using the latest Spark version.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spark 3.4 or something..


## Contributing

GraphFrames is collaborative effort among UC Berkeley, MIT, and Databricks.
We welcome open source contributions as well!
GraphFrames is collaborative effort among UC Berkeley, MIT, Databricks and the open source community. We welcome open source contributions as well!

## Releases:

Expand Down
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ import ReleaseTransformations._
lazy val sparkVer = sys.props.getOrElse("spark.version", "3.5.4")
lazy val sparkBranch = sparkVer.substring(0, 3)
lazy val defaultScalaVer = sparkBranch match {
case "3.5" => "2.12.18"
case "3.5" => "2.12.20"
case _ => throw new IllegalArgumentException(s"Unsupported Spark version: $sparkVer.")
}
lazy val scalaVer = sys.props.getOrElse("scala.version", defaultScalaVer)
Expand Down
11 changes: 5 additions & 6 deletions dev/release_guide.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Guild for releasing a new Graphframe version
# Guild for releasing a new Graphframes version

## How to build GraphFrame package ?
## How to build GraphFrames package ?

To build a GraphFrame package for releasing, you only need to run the following command:
To build a GraphFrames package for releasing, you only need to run the following command:

```
cd graphframe_repo
Expand Down Expand Up @@ -30,10 +30,9 @@ then upload the zip file generated by instructions in "How to build GraphFrame p

## How to publish the GraphFrame doc ?

GraphFrame doc is hosted in 'https://graphframes.github.io/graphframes/', to publish doc,
you just need to build doc content, then push the doc content to gh-pages branch of https://github.com/graphframes/graphframes project.
GraphFrames docs are hosted in 'https://graphframes.github.io/graphframes/'. To publish the docs, you just need to build the doc content, then push the doc content to gh-pages branch of the https://github.com/graphframes/graphframes project.

Before building doc, you need to install jekyll, please refer to 'docs/README.md' for details.
Before building the docs, you need to install jekyll, please refer to 'docs/README.md' for details.

The following command is for building and publishing doc:
```
Expand Down
41 changes: 13 additions & 28 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,28 +1,18 @@
Welcome to the GraphFrames Spark Package documentation!

This readme will walk you through navigating and building the GraphFrames documentation, which is
included here with the source code.
This readme will walk you through navigating and building the GraphFrames documentation, which is included here with the source code.

Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the
documentation yourself. Why build it yourself? So that you have the docs that correspond to
whichever version of GraphFrames you currently have checked out of revision control.
Read on to learn more about viewing documentation in plain text (i.e., markdown) or building the documentation yourself. Why build it yourself? So that you have the docs that correspond to whichever version of GraphFrames you currently have checked out of revision control.

## Generating the Documentation HTML

We include the GraphFrames documentation as part of the source (as opposed to using a hosted wiki, such as
the github wiki, as the definitive documentation) to enable the documentation to evolve along with
the source code and be captured by revision control (currently git). This way the code automatically
includes the version of the documentation that is relevant regardless of which version or release
you have checked out or downloaded.
We include the GraphFrames documentation as part of the source (as opposed to using a hosted wiki, such as the github wiki, as the definitive documentation) to enable the documentation to evolve along with the source code and be captured by revision control (currently git). This way the code automatically
includes the version of the documentation that is relevant regardless of which version or release you have checked out or downloaded.

In this directory you will find textfiles formatted using Markdown, with an ".md" suffix. You can
read those text files directly if you want. Start with index.md.
In this directory you will find textfiles formatted using Markdown, with an ".md" suffix. You can read those text files directly if you want. Start with index.md.

The markdown code can be compiled to HTML using the [Jekyll tool](http://jekyllrb.com).
`Jekyll` and a few dependencies must be installed for this to work. We recommend
installing via the Ruby Gem dependency manager. Since the exact HTML output
varies between versions of Jekyll and its dependencies, we list specific versions here
in some cases:
`Jekyll` and a few dependencies must be installed for this to work. We recommend installing via the Ruby Gem dependency manager. Since the exact HTML output varies between versions of Jekyll and its dependencies, we list specific versions here in some cases:

$ sudo gem install jekyll
$ sudo gem install jekyll-redirect-from
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove use of root to install python packages. people to day are using some env. for python

Expand All @@ -32,8 +22,7 @@ On macOS, with the default Ruby, please install Jekyll with Bundler as [instruct
$ sudo gem install jekyll bundler
$ sudo gem install jekyll-redirect-from

Execute `jekyll build` from the `docs/` directory to compile the site. Compiling the site with Jekyll will create a directory
called `_site` containing index.html as well as the rest of the compiled files.
Execute `jekyll build` from the `docs/` directory to compile the site. Compiling the site with Jekyll will create a directory called `_site` containing index.html as well as the rest of the compiled files.

You can modify the default Jekyll build as follows:

Expand All @@ -45,27 +34,23 @@ You can modify the default Jekyll build as follows:
$ PRODUCTION=1 jekyll build

Note that `SPARK_HOME` must be set to your local Spark installation in order to generate the docs.

To manually point to a specific `Spark` installation,
$ SPARK_HOME=<your-path-to-spark-home> PRODUCTION=1 jekyll build

## Sphinx

We use Sphinx to generate Python API docs, so you will need to install it by running
`sudo pip install sphinx`.

sudo pip install sphinx

## API Docs (Scaladoc, Sphinx)

You can build just the scaladoc by running `build/sbt unidoc` from the GRAPHFRAMES_PROJECT_ROOT directory.

Similarly, you can build just the Python docs by running `make html` from the
GRAPHFRAMES_PROJECT_ROOT/python/docs directory. Documentation is only generated for classes that are listed as
public in `__init__.py`.
Similarly, you can build just the Python docs by running `make html` from the GRAPHFRAMES_PROJECT_ROOT/python/docs directory. Documentation is only generated for classes that are listed as public in `__init__.py`.

When you run `jekyll` in the `docs` directory, it will also copy over the scaladoc for the various
subprojects into the `docs` directory (and then also into the `_site` directory). We use a
jekyll plugin to run `build/sbt unidoc` before building the site so if you haven't run it (recently) it
may take some time as it generates all of the scaladoc. The jekyll plugin also generates the
When you run `jekyll` in the `docs` directory, it will also copy over the scaladoc for the various subprojects into the `docs` directory (and then also into the `_site` directory). We use a jekyll plugin to run `build/sbt unidoc` before building the site so if you haven't run it (recently) it may take some time as it generates all of the scaladoc. The jekyll plugin also generates the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have this and dev/release_guide.md and docs/_config.yml in a own PR -> update docs

Python docs [Sphinx](http://sphinx-doc.org/).

NOTE: To skip the step of building and copying over the Scala, Python API docs, run `SKIP_API=1
jekyll build`. To skip building Scala API docs, run `SKIP_SCALADOC=1 jekyll build`; to skip building Python API docs, run `SKIP_PYTHONDOC=1 jekyll build`.
NOTE: To skip the step of building and copying over the Scala, Python API docs, run `SKIP_API=1 jekyll build`. To skip building Scala API docs, run `SKIP_SCALADOC=1 jekyll build`; to skip building Python API docs, run `SKIP_PYTHONDOC=1 jekyll build`.
2 changes: 1 addition & 1 deletion docs/_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ include:

# These allow the documentation to be updated with newer releases
# of Spark, Scala, and Mesos.
GRAPHFRAMES_VERSION: 0.8.4
GRAPHFRAMES_VERSION: 0.8.5
#SCALA_BINARY_VERSION: "2.10"
#SCALA_VERSION: "2.10.4"
#MESOS_VERSION: 0.21.0
Expand Down
1 change: 1 addition & 0 deletions docs/_layouts/global.html
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@
<ul class="dropdown-menu">
<li><a href="quick-start.html">Quick Start</a></li>
<li><a href="user-guide.html">GraphFrames User Guide</a></li>
<li><a href="motif-tutorial.html">Network Motif Finding Tutorial</a></li>
</ul>
</li>

Expand Down
Binary file added docs/img/4-node-directed-graphlets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/Directed-Graphlet-G17.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/Directed-Graphlet-G22.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/G11_motif.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/G4_and_G5_directed_network_motif.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/GraphFrames-Logo-Dark-Small.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/GraphFrames-Logo-Large.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/GraphFrames-Logo-Small.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/img/directed_graphlets.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
9 changes: 8 additions & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,21 @@ GraphFrames supplied as a package.
* [Quick Start](quick-start.html): a quick introduction to the GraphFrames API; start here!
* [GraphFrames User Guide](user-guide.html): detailed overview of GraphFrames
in all supported languages (Scala, Java, Python)
* [Motif Finding Tutorial](motif-tutorial.html): learn to perform pattern recognition with GraphFrames using a technique called network motif finding over the knowledge graph for the `stackexchange.com` subdomain [data dump](https://archive.org/details/stackexchange)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have one PR for Motif Finding Tutorial this seams to be big and we need to do it for us self before it is merged.


**API Docs:**

* [GraphFrames Scala API (Scaladoc)](api/scala/index.html#org.graphframes.package)
* [GraphFrames Python API (Sphinx)](api/python/index.html)

**Community Forums:**

* [GraphFrames Mailing List](https://groups.google.com/g/graphframes/): ask questions about GraphFrames here
* [#graphframes Discord Channel on GraphGeeks](https://discord.com/channels/1162999022819225631/1326257052368113674)

**External Resources:**

* [Apache Spark Homepage](http://spark.apache.org)
* [Apache Spark Wiki](https://cwiki.apache.org/confluence/display/SPARK)
* [Mailing Lists](http://spark.apache.org/mailing-lists.html): Ask questions about Spark here
* [Apache Spark Mailing Lists](http://spark.apache.org/mailing-lists.html)
* [GraphFrames on Stack Overflow](https://stackoverflow.com/questions/tagged/graphframes)
Loading