Release notes

Version 1.10

1.10.0 the future

Features

Docs

Bug fixes

Ecosystem

Version 1.9

1.9.2 the future

Documentation

Bug fixes

Performance

1.9.1 2022-04-05

Bug fixes

  • normalize_total() works when Dask is not installed PR 2209 R Cannoodt

  • Fix embedding plots by bumping matplotlib dependency to version 3.4 PR 2212 I Virshup

1.9.0 2022-04-01

Tutorials

Experimental module

Features

  • filter_rank_genes_groups() now allows to filter with absolute values of log fold change PR 1649 S Rybakov

  • _choose_representation now subsets the provided representation to n_pcs, regardless of the name of the provided representation (should affect mostly neighbors()) PR 2179 I Virshup PG Majev

  • scanpy.external.pp.scrublet() (and related functions) can now be used on AnnData objects containing multiple batches PR 1965 J Manning

  • Number of variables plotted with pca_loadings() can now be controlled with n_points argument. Additionally, variables are no longer repeated if the anndata has less than 30 variables PR 2075 Yves33

  • Dask arrays now work with scanpy.pp.normalize_total() PR 1663 G Buckley, I Virshup

  • embedding_density() now allows more than 10 groups PR 1936 A Wolf

  • Embedding plots can now pass colorbar_loc to specify the location of colorbar legend, or pass None to not show a colorbar PR 1821 A Schaar I Virshup

  • Embedding plots now have a dimensions argument, which lets users select which dimensions of their embedding to plot and uses the same broadcasting rules as other arguments PR 1538 I Virshup

  • print_versions() now uses session_info PR 2089 P Angerer I Virshup

Ecosystem

Multiple packages have been added to our ecosystem page, including:

Bug fixes

Version 1.8

1.8.2 2021-11-3

Docs

  • Update conda installation instructions PR 1974 L Heumos

Bug fixes

Ecosystem

  • Added PASTE (a tool to align and integrate spatial transcriptomics data) to scanpy ecosystem.

1.8.1 2021-07-07

Bug fixes

1.8.0 2021-06-28

Metrics module

Features

Ecosystem

  • Added Cubé to ecosystem page PR 1878 C Lambden

  • Added triku a feature selection method to the ecosystem page PR 1722 AM Ascensión

  • Added dorothea and progeny to the ecosystem page PR 1767 P Badia-i-Mompel

Documentation

Bug fixes

Development processes

  • Switched to flit for building and deploying the package, a simple tool with an easy to understand command line interface and metadata PR 1527 P Angerer

  • Use pre-commit for style checks PR 1684 PR 1848 L Heumos I Virshup

Deprecations

Version 1.7

1.7.2 2021-04-07

Bug fixes

Ecosystem

  • Added triku a feature selection method to the ecosystem page PR 1722 AM Ascensión

  • Added dorothea and progeny to the ecosystem page PR 1767 P Badia-i-Mompel

1.7.1 2021-02-24

Documentation

  • More twitter handles for core devs PR 1676 G Eraslan

Bug fixes

1.7.0 2021-02-03

Features

  • Add new 10x Visium datasets to visium_sge() PR 1473 G Palla

  • Enable download of source image for 10x visium datasets in visium_sge() PR 1506 H Spitzer

  • Refactor of scanpy.pl.spatial(). Better support for plotting without an image, as well as directly providing images PR 1512 G Palla

  • Dict input for scanpy.queries.enrich() PR 1488 G Eraslan

  • rank_genes_groups_df() can now return fraction of cells in a group expressing a gene, and allows retrieving values for multiple groups at once PR 1388 G Eraslan

  • Color annotations for gene sets in heatmap() are now matched to color for cluster PR 1511 L Sikkema

  • PCA plots can now annotate axes with variance explained PR 1470 bfurtwa

  • Plots with groupby arguments can now group by values in the index by passing the index’s name (like pd.DataFrame.groupby). PR 1583 F Ramirez

  • Added na_color and na_in_legend keyword arguments to embedding() plots. Allows specifying color for missing or filtered values in plots like umap() or spatial() PR 1356 I Virshup

  • embedding() plots now support passing dict of {cluster_name: cluster_color, ...} for palette argument PR 1392 I Virshup

External tools (new)

External tools (changes)

Documentation

Performance

Bugfixes

  • Consistent fold-change, fractions calculation for filter_rank_genes_groups PR 1391 S Rybakov

  • Fixed bug where score_genes would error if one gene was passed PR 1398 I Virshup

  • Fixed log1p inplace on integer dense arrays PR 1400 I Virshup

  • Fix docstring formatting for rank_genes_groups() PR 1417 P Weiler

  • Removed PendingDeprecationWarning`s from use of `np.matrix PR 1424 P Weiler

  • Fixed indexing byg in ~scanpy.pp.highly_variable_genes PR 1456 V Bergen

  • Fix default number of genes for marker_genes_overlap PR 1464 MD Luecken

  • Fixed passing groupby and dendrogram_key to dendrogram() PR 1465 M Varma

  • Fixed download path of pbmc3k_processed PR 1472 D Strobl

  • Better error message when computing DE with a group of size 1 PR 1490 J Manning

  • Update cugraph API usage for v0.16 PR 1494 R Ilango

  • Fixed marker_gene_overlap default value for top_n_markers PR 1464 MD Luecken

  • Pass random_state to RAPIDs UMAP PR 1474 C Nolet

  • Fixed anndata version requirement for concat() (re-exported from scanpy as sc.concat) PR 1491 I Virshup

  • Fixed the width of the progress bar when downloading data PR 1507 M Klein

  • Updated link for moignard15 dataset PR 1542 I Virshup

  • Fixed bug where calling set_figure_params could block if IPython was installed, but not used. PR 1547 I Virshup

  • violin() no longer fails if .raw not present PR 1548 I Virshup

  • spatial() refactoring and better handling of spatial data PR 1512 G Palla

  • pca() works with chunked=True again PR 1592 I Virshup

  • ingest() now works with umap-learn 0.5.0 PR 1601 S Rybakov

Version 1.6

1.6.0 2020-08-15

This release includes an overhaul of dotplot(), matrixplot(), and stacked_violin() (PR 1210 F Ramirez), and of the internals of rank_genes_groups() (PR 1156 S Rybakov).

Overhaul of dotplot(), matrixplot(), and stacked_violin() PR 1210 F Ramirez

  • An overhauled tutorial → tutorial: plotting/core.

  • New plotting classes can be accessed directly (e.g., DotPlot) or using the return_fig param.

  • It is possible to plot log fold change and p-values in the rank_genes_groups_dotplot() family of functions.

  • Added ax parameter which allows embedding the plot in other images.

  • Added option to include a bar plot instead of the dendrogram containing the cell/observation totals per category.

  • Return a dictionary of axes for further manipulation. This includes the main plot, legend and dendrogram to totals

  • Legends can be removed.

  • The groupby param can take a list of categories, e.g., groupby=[‘tissue’, ‘cell type’].

  • Added padding parameter to dotplot and stacked_violin. PR 1270

  • Added title for colorbar and positioned as in dotplot for matrixplot().

  • dotplot() changes:

    • Improved the colorbar and size legend for dotplots. Now the colorbar and size have titles, which can be modified using the colorbar_title and size_title params. They also align at the bottom of the image and do not shrink if the dotplot image is smaller.

    • Allow plotting genes in rows and categories in columns (swap_axes).

    • Using DotPlot, the dot_edge_color and line width can be modified, a grid can be added, and other modifications are enabled.

    • A new style was added in which the dots are replaced by an empty circle and the square behind the circle is colored (like in matrixplots).

  • stacked_violin() changes:

    • Violin colors can be colored based on average gene expression as in dotplots.

    • The linewidth of the violin plots is thinner.

    • Removed the tics for the y-axis as they tend to overlap with each other. Using the style method they can be displayed if needed.

Additions

Bug fixes

Version 1.5

1.5.1 2020-05-21

Bug fixes

  • Fixed a bug in pca(), where random_state did not have an effect for sparse input PR 1240 I Virshup

  • Fixed docstring in pca() which included an unused argument PR 1240 I Virshup

1.5.0 2020-05-15

The 1.5.0 release adds a lot of new functionality, much of which takes advantage of anndata updates 0.7.0 - 0.7.2. Highlights of this release include support for spatial data, dedicated handling of graphs in AnnData, sparse PCA, an interface with scvi, and others.

Spatial data support

New functionality

External tools

Performance

  • pca() now uses efficient implicit centering for sparse matrices. This can lead to signifigantly improved performance for large datasets PR 1066 A Tarashansky

  • score_genes() now has an efficient implementation for sparse matrices with missing values PR 1196 redst4r.

Warning

The new pca() implementation can result in slightly different results for sparse matrices. See the pr (PR 1066) and documentation for more info.

Code design

Bug fixes

Version 1.4

1.4.6 2020-03-17

Functionality in external

Code design

Bug fixes

1.4.5 2019-12-30

Please install scanpy==1.4.5.post3 instead of scanpy==1.4.5.

New functionality

Code design

Warning

  • changed default solver in pca() from auto to arpack

  • changed default use_raw in score_genes() from False to None

1.4.4 2019-07-20

New functionality

  • scanpy.get adds helper functions for extracting data in convenient formats PR 619 I Virshup

Bug fixes

  • Stopped deprecations warnings from AnnData 0.6.22 I Virshup

Code design

  • normalize_total() gains param exclude_highly_expressed, and fraction is renamed to max_fraction with better docs A Wolf

1.4.3 2019-05-14

Bug fixes

  • neighbors() correctly infers n_neighbors again from params, which was temporarily broken in v1.4.2 I Virshup

Code design

1.4.2 2019-05-06

New functionality

  • combat() supports additional covariates which may include adjustment variables or biological condition PR 618 G Eraslan

  • highly_variable_genes() has a batch_key option which performs HVG selection in each batch separately to avoid selecting genes that vary strongly across batches PR 622 G Eraslan

Bug fixes

  • rank_genes_groups() t-test implementation doesn’t return NaN when variance is 0, also changed to scipy’s implementation PR 621 I Virshup

  • umap() with init_pos='paga' detects correct dtype A Wolf

  • louvain() and leiden() auto-generate key_added=louvain_R upon passing restrict_to, which was temporarily changed in 1.4.1 A Wolf

Code design

1.4.1 2019-04-26

New functionality

Code design

  • .layers support of scatter plots F Ramirez

  • fix double-logarithmization in compute of log fold change in rank_genes_groups() A Muñoz-Rojas

  • fix return sections of docs P Angerer

Version 1.3

1.3.6 2018-12-11

Major updates

Interactive exploration of analysis results through manifold viewers

Code design

1.3.5 2018-12-09

  • uncountable figure improvements PR 369 F Ramirez

1.3.4 2018-11-24

  • leiden() wraps the recent graph clustering package by [^cite_traag18] K Polanski

  • bbknn() wraps the recent batch correction package [^cite_polanski19] K Polanski

  • calculate_qc_metrics() caculates a number of quality control metrics, similar to calculateQCMetrics from Scater [^cite_mccarthy17] I Virshup

1.3.3 2018-11-05

Major updates

  • a fully distributed preprocessing backend T White and the Laserson Lab

Code design

Note

Also see changes in anndata 0.6.

  • changed default compression to None in write_h5ad() to speed up read and write, disk space use is usually less critical

  • performance gains in write_h5ad() due to better handling of strings and categories S Rybakov


1.3.1 2018-09-03

RNA velocity in single cells [^cite_manno18]

  • Scanpy and AnnData support loom’s layers so that computations for single-cell RNA velocity [^cite_manno18] become feasible S Rybakov and V Bergen

  • scvelo harmonizes with Scanpy and is able to process loom files with splicing information produced by Velocyto [^cite_manno18], it runs a lot faster than the count matrix analysis of Velocyto and provides several conceptual developments

Plotting (Generic)

There now is a section on imputation in external:

  • magic() for imputation using data diffusion [^cite_vandijk18] PR 187 S Gigante

  • dca() for imputation and latent space construction using an autoencoder [^cite_eraslan18] PR 186 G Eraslan

Version 1.2

1.2.1 2018-06-08

Plotting of Generic marker genes and quality control.

  • highest_expr_genes() for quality control; plot genes with highest mean fraction of cells, similar to plotQC of Scater [^cite_mccarthy17] PR 169 F Ramirez

1.2.0 2018-06-08

  • paga() improved, see PAGA; the default model changed, restore the previous default model by passing model='v1.0'

Version 1.1

1.1.0 2018-06-01

  • set_figure_params() by default passes vector_friendly=True and allows you to produce reasonablly sized pdfs by rasterizing large scatter plots A Wolf

  • draw_graph() defaults to the ForceAtlas2 layout [^cite_jacomy14] [^cite_chippada18], which is often more visually appealing and whose computation is much faster S Wollock

  • scatter() also plots along variables axis MD Luecken

  • pca() and log1p() support chunk processing S Rybakov

  • regress_out() is back to multiprocessing F Ramirez

  • read() reads compressed text files G Eraslan

  • mitochondrial_genes() for querying mito genes FG Brundu

  • mnn_correct() for batch correction [^cite_haghverdi18] [^cite_kang18]

  • phate() for low-dimensional embedding [^cite_moon17] S Gigante

  • sandbag(), cyclone() for scoring genes [^cite_scialdone15] [^cite_fechtner18]

Version 1.0

1.0.0 2018-03-30

Major updates

  • Scanpy is much faster and more memory efficient: preprocess, cluster and visualize 1.3M cells in 6h, 130K cells in 14min, and 68K cells in 3min A Wolf

  • the API gained a preprocessing function neighbors() and a class Neighbors() to which all basic graph computations are delegated A Wolf

Warning

Upgrading to 1.0 isn’t fully backwards compatible in the following changes

  • the graph-based tools louvain() dpt() draw_graph() umap() diffmap() paga() require prior computation of the graph: sc.pp.neighbors(adata, n_neighbors=5); sc.tl.louvain(adata) instead of previously sc.tl.louvain(adata, n_neighbors=5)

  • install numba via conda install numba, which replaces cython

  • the default connectivity measure (dpt will look different using default settings) changed. setting method='gauss' in sc.pp.neighbors uses gauss kernel connectivities and reproduces the previous behavior, see, for instance in the example paul15.

  • namings of returned annotation have changed for less bloated AnnData objects, which means that some of the unstructured annotation of old AnnData files is not recognized anymore

  • replace occurances of group_by with groupby (consistency with pandas)

  • it is worth checking out the notebook examples to see changes, e.g. the seurat example.

  • upgrading scikit-learn from 0.18 to 0.19 changed the implementation of PCA, some results might therefore look slightly different


```{rubric} Further updates
  • UMAP [^cite_mcinnes18] can serve as a first visualization of the data just as tSNE, in contrast to tSNE, UMAP directly embeds the single-cell graph and is faster; UMAP is also used for measuring connectivities and computing neighbors, see neighbors() A Wolf

  • graph abstraction: AGA is renamed to PAGA: paga(); now, it only measures connectivities between partitions of the single-cell graph, pseudotime and clustering need to be computed separately via louvain() and dpt(), the connectivity measure has been improved A Wolf

  • logistic regression for finding marker genes rank_genes_groups() with parameter method='logreg' A Wolf

  • louvain() provides a better implementation for reclustering via restrict_to A Wolf

  • scanpy no longer modifies rcParams upon import, call settings.set_figure_params to set the ‘scanpy style’ A Wolf

  • default cache directory is ./cache/, set settings.cachedir to change this; nested directories in this are avoided A Wolf

  • show edges in scatter plots based on graph visualization draw_graph() and umap() by passing edges=True A Wolf

  • downsample_counts() for downsampling counts MD Luecken

  • default 'louvain_groups' are called 'louvain' A Wolf

  • 'X_diffmap' contains the zero component, plotting remains unchanged A Wolf

Version 0.4

0.4.3 2018-02-09

0.4.2 2018-01-07

  • amendments in PAGA and its plotting functions A Wolf

0.4.0 2017-12-23

Version 0.3

0.3.2 2017-11-29

0.3.0 2017-11-16

Version 0.2

0.2.9 2017-10-25

Initial release of the new trajectory inference method PAGA

  • paga() computes an abstracted, coarse-grained (PAGA) graph of the neighborhood graph A Wolf

  • paga_compare() plot this graph next an embedding A Wolf

  • paga_path() plots a heatmap through a node sequence in the PAGA graph A Wolf

0.2.1 2017-07-24

Scanpy includes preprocessing, visualization, clustering, pseudotime and trajectory inference, differential expression testing and simulation of gene regulatory networks. The implementation efficiently deals with datasets of more than one million cells. A Wolf, P Angerer

Version 0.1

0.1.0 2017-05-17

Scanpy computationally outperforms and allows reproducing both the Cell Ranger R kit’s and most of Seurat’s clustering workflows. A Wolf, P Angerer