Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,12 @@ Version 0.3.0
* Bug fixes
* Optimized the normalization
* Unsupervised allows to center size factors by mean
* Unsupervised allows to computed adjuted log normalized counts
* Unsupervised allows to computed adjusted log normalized counts
* Unsupervised allows to compute the number of clusters automatically
* Supervised allows to normalize the data
* Supervised allows to input train/classes with different spots
than in the train/test data and in different order
* st_data_plotter allows to highlith selected spots
* st_data_plotter allows to highlight selected spots
* st_data_plotter only plots the spots where the gene is present
when a gene reg-exp is given
* st_data_plotter allows to normalize the data
Expand All @@ -35,4 +35,10 @@ Version 0.4.1
* Fixed a bug in the noise filtering function

Version 0.4.2
* Added compatibility with Python 3
* Added compatibility with Python 3

Version 0.4.5
* Added merge_replicates.py script
* Added slice_regions_matrix.py script
* Optimized and improved differential_analysis.py
* Added compatibility with R 3.4 and rpy2 latest versions
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
The MIT License (MIT)
Copyright (c) 2016 Jose Fernandez Navarro.
Copyright (c) 2017 Jose Fernandez Navarro, KTH.

Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the "Software"),
Expand Down
151 changes: 112 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# Spatial Transcriptomics Analysis

Different tools for visualization, data processing and analysis (supervised and un-supervised learning, differential expression analysis, etc..) of Spatial Transcriptomics data (can also be used for single cell data).
Different tools for visualization, data processing and analysis (supervised and un-supervised learning,
differential expression analysis, etc..) of Spatial Transcriptomics datasets (can also be used for single cell data).

The package is compatible with the output format of the data generated with the ST Pipeline (https://github.com/SpatialTranscriptomicsResearch/st_pipeline) and give full support to plot the data onto the tissue images but it is compatible with any single cell datasets where the data is stored as a matrix of counts (genes as columns and spot/cells as rows).
The package is compatible with the output format of the data generated with the
ST Pipeline (https://github.com/SpatialTranscriptomicsResearch/st_pipeline) and give full
support to plot the data onto the tissue images but it is compatible with any single cell datasets
where the data is stored as a matrix of counts (genes as columns and spot/cells as rows).

This package makes use of the following tools:
This package makes use of the following R packages:

t-SNE
https://github.com/lvdmaaten/bhtsne
Expand All @@ -27,9 +31,10 @@ See AUTHORS file.
### Contact
For bugs, feedback or help you can contact Jose Fernandez Navarro <jose.fernandez.navarro@scilifelab.se>

### Note
### Input Format
The referred matrix format is the ST data format, a matrix of counts where spot coordinates are row names
and the genes are column names.
and the genes are column names. This matrix format (.TSV) is generated with the
[ST Pipeline](https://github.com/SpatialTranscriptomicsResearch/st_pipeline)

The scripts that allow you to pass the tissue HE image can optionally take a 3x3 alignment file.
If the images are cropped to the exact array boundaries the alignment file is not needed
Expand All @@ -44,12 +49,51 @@ Where each a correspondonds to a cell of the affine transformation matrix.

### Installation

Note that the ST Analysis package requires R (https://cran.r-project.org/) installed in your system.
To install the ST Analsysis packate just clone or download the repository, cd into the cloned folder and type:
We recommend that you install the latest version 3.4.x. Once you have installed R you can open
a R terminal or Rstudio and type the following:

python setup.py install
source("https://bioconductor.org/biocLite.R")
biocLite("monocle")
biocLite("scran")
biocLite("DESeq2")
biocLite("Rtsne")
biocLite("edgeR")

A bunch of scripts will then be available in your system.
Before you install the ST Analysis package we recommend that you create a Python 3 virtual
environment. We recommend [Anaconda](https://anaconda.org/anaconda/python).
The latest versions of rpy2 (R binder for Python) are only compatible with Python 3.

#### OSX
The following instructions are for installing the ST Analysis package with Python 3.4 and Anaconda
(should be the same for Python 3.6)
Note: we advice to update Xcode to the latest version.

conda create -n python3.4 python=3.4
source activate python3.4
brew install freetype
brew install gcc
export CC=/usr/local/Cellar/gcc/7.2.0/bin/gcc-7
pip install rpy2
export CC=/usr/bin/clang
conda install matplotlib
conda install pandas
conda install scikit-learn
python setup.py install

#### Linux
The following instructions are for installing the ST Analysis package with Python 3.4 and Anaconda
(should be the same for Python 3.6)
Note: we advice to install and update the developer tools packages

conda create -n python3.4 python=3.4
source activate python3.4
pip install rpy2
conda install matplotlib
conda install pandas
conda install scikit-learn
python setup.py install

A bunch of scripts (described behind) will then be available in your system.
Note that you can always type script_name.py --help to get more information
about how the script works.
The ST Analysis package is compatible with Python 2 and 3 and we recomend to use
Expand All @@ -60,50 +104,79 @@ a virtual environment to make the installation of the dependencies easier.
## Analysis tools

### To do un-supervised learning
To see how spots cluster together based on their expression profiles you can run :
To see how spots cluster together based on their expression profiles you can run:

unsupervised.py --counts-table-files matrix_counts.tsv --normalization DESeq2 --num-clusters 5 --clustering KMeans --dimensionality tSNE --image-files tissue_image.JPG --use-log-scale

The script can be given one or serveral datasets (matrices with counts). It will perform dimesionality reduction
and then cluster the spots together based on the dimesionality reduced coordinates.
It generates a scatter plot of the clusters. It also generates an image for
each dataset of the predicted classes on top of the tissue image (tissue image for each dataset must be given and optionally
an alignment file to convert to pixel coordiantes).
It also generate a file with the predicted classes for each spot that can be used in other analysis.

To know more about the parameters you can type --help
The script can be given one or serveral datasets (matrices with counts). It will perform dimesionality reduction
and then cluster the spots together based on the dimesionality reduced coordinates.
It generates a scatter plot of the clusters. It also generates an image for
each dataset of the predicted classes on top of the tissue image (tissue image for each dataset must be given and optionally
an alignment file to convert to pixel coordiantes).
It also generate a file with the predicted classes for each spot that can be used in other analysis.
To know more about the parameters you can type --help

### To do supervised learning
You can train a classifier with the expression profiles of a set of spots
where you know the class (cell type) and then predict on a new dataset
of the same tissue. For that you can use the following script :
where you know the class (spot type) and then predict on a new dataset
of the same tissue. For that you can use the following script:

supervised.py --train-data data_matrix.tsv --test-data data_matrix.tsv --train-casses train_classes.txt --test-classes test_classes.txt --image tissue_image.jpg

This will generate some statistics, a file with the predicted classes for each spot and a plot of the predicted spots on top of the tissue image (if the image and the alignment matrix are given).
The script can take several datasets for the training set and it allows to normalize the training and testing data.

To know more about the parameters you can type --help
This will generate some statistics, a file with the predicted classes for each spot and a plot of
the predicted spots on top of the tissue image (if the image and the alignment matrix are given).
The script can take several datasets for the training set and it allows to normalize the training and testing data.
The test/train classes file shoud look like:

XxY 1
XxY 1
XxY 2

Where X is the spot X coordinate and Y is the spot Y coordinate and 1,1 and 2 are
spot classes (regions).
To know more about the parameters you can type --help

### To visualize ST data (output from the ST Pipeline)
Use the script st_data_plotter.py. It can plot ST data, it can use
filters (counts or genes) it can highlight spots with reg. expressions
Use the script st_data_plotter.py to plot ST data, it can use
filters (counts or genes) it can highlight spots with regular expressions
of genes and it can highlight spots by giving a file with spot coordinates
and labels. You need a matrix with the gene counts by spot and optionally
the a tissue image and an alignment matrix. A example run would be :
and labels. You can also normalize the data for visualization.
You need a matrix with the gene counts and spots and optionally
a tissue image and an optional alignment matrix. A example run would be:

st_data_plotter.py --cutoff 2 --filter-genes Actb* --image tissue_image.jpg --alignment alignment_file.txt data_matrix.tsv
st_data_plotter.py --cutoff 2 --show-genes Actb* --image tissue_image.jpg data_matrix.tsv

This will generate a scatter plot of the expression of the spots that contain a gene Actb and with higher expression than 2 and it will use the tissue image as background. You could optionally pass a list of spots with their classes (Generated with unsupervised.py) to highlight spots in the scatter plot. More info if you type --help
This will generate a scatter plot of the expression of the spots that contain a gene Actb and
with higher expression than 2 and it will use the tissue image as background.
You could optionally pass a list of spots with their classes (Generated with unsupervised.py)
to highlight spots in the scatter plot. More info if you type --help

### To slice a matrix of counts based of regions of interest
You can slice a dataset based on regions of interests (spots) obtained
manually or with unsupervised.py. You need a file defining classes for each spot
(unsupervised.py generates such files):

XxY 1
XxY 1
XxY 2

Where X is the spot X coordinate and Y is the spot Y coordinate and 1,1 and 2 are
spot classes (regions).
A example run would be:

slice_regions_matrix.py --counts-matrix dataset.tsv --spot-classes classes.txt --regions 1 3

### To perform Differential Expression Analysis (DEA)
You can perform a D.E.A using the output from unsupervised.py and a list of groups to where the D.E.A will be performed.
The scripts generates different plots and the list of D.E genes in a text file. Basically the script
needs one or more matrices of counts with ST data (genes as columns), a tab delimited file with two columns where
the first column is a class and the second is a spot (for each input matrix) and finally the list of comparisions to be made
from the classes present in the data (for example: 0:1-0:2 0:1-0:5). Where 0 refers to the first input dataseet and 1,2,5 refers to
the classes defined the classes file.

differential_analysis.py --input-data stdata.tsv --data-classes spot_classes.txt --condition-tuples 1-2 1-3
You can perform a D.E.A between ST datasets (most likely regions of interests)
The scripts generates different plots and the list of D.E. genes in a text file for each comparison.
Basically the script needs one or more matrices of counts with ST data (genes as columns) and a list
of comparisons to make:

DATASET0-DATASET2 DATASET1-DATASET3 ...

Where 0 refers to the first input dataset. The scripts allows for different normalization methods and
different D.E.A. algorithms (see --help). An example run would be:

differential_analysis.py --input-data stdata_region1.tsv stdata_region2.tsv --comparisons 0-1

To know more about the parameters you can type --help
To know more about the parameters you can type --help
Loading