This is a Singer tap that produces JSON-formatted data from the GitHub API following the Singer spec.
This tap:
- Pulls raw data from the GitHub REST API
- Extracts the following resources from GitHub for a single repository:
- Outputs the schema for each resource
- Incrementally pulls data based on the input state
-
Install
We recommend using a virtualenv to install the version in this repo.
> virtualenv -p python3 venv > source venv/bin/activate > python -m pip install git+https://github.com/CodeForPhilly/tap-github.git@cfp-main
-
Create a GitHub access token
Login to your GitHub account, go to the Personal Access Tokens settings page, and generate a new token with at least the
reposcope. Save this access token, you'll need it for the next step. -
Create the config file
Create a JSON file containing the access token you just created and the path to one or multiple repositories that you want to extract data from. Each repo path should be space delimited. The repo path is relative to
https://github.com/. For example the path for this repository issinger-io/tap-github.{"access_token": "your-access-token", "repository": "singer-io/tap-github singer-io/getting-started"} -
Run the tap in discovery mode to get a
catalog.jsonfiletap-github --config config.json --discover > catalog.json -
In the catalog.json file, select the streams to sync
Each stream in the properties.json file has a "schema" entry. To select a stream to sync, add
"selected": trueto that stream's "schema" entry. For example, to sync the pull_requests stream:... "tap_stream_id": "pull_requests", "schema": { "selected": true, "properties": { "updated_at": { "format": "date-time", "type": [ "null", "string" ] } ... -
Run the application
tap-githubcan be run with:tap-github --config config.json --catalog catalog.json
Copyright © 2021 CodeForPhilly Copyright © 2018 Stitch