SCARR-Vis - Single Cell Ambient RNA Removal and Visualization

SCARR-Vis removes ambient RNA from 10x Genomics single-cell data using SoupX , DecontX , scCDC , or FastCAR and lets you compare pre vs post cleanup across QC, clustering, UMAP, heatmaps, and per-cell tables.

Data upload

Provide both the raw/droplet matrix and the filtered/cell matrix. You can upload as:

  • A zip folder containing matrix.mtx.gz, barcodes.tsv.gz, features.tsv.gz; or
  • HDF5 (.h5) from Cell Ranger / other tools.
  • Estimate contamination

    Go to Step 2. Estimate Contamination and choose a method.

  • SoupX: estimates sample-level ambient contamination (global ρ) from empty droplets; supports parameter tuning and cluster-aware count adjustment.
  • DecontX: Infers per-cell Bayesian contamination; uses user-provided clusters or re-clusters; outputs decontaminated counts.
  • scCDC: gene-specific contamination detection and correction (with doublet-aware adjustments).
  • FastCAR: fast ambient RNA correction using an empty-droplet UMI cutoff and gene-level contamination probability threshold, with optional ambient profiling to suggest a cutoff.
  • Compare results

  • Check QC (Pre) vs QC (Post), Estimation, Cluster Counts (Pre vs Post), UMAP (Pre vs Post), Top Genes, and the Cells table.
  • Outputs and Visualization

  • Each figure and table can be downloaded individually with the selected dimensions, or the user can click the Bulk Download option to download all publication-quality plots in JPG, TIFF, PDF, SVG, PNG, BMP, EPS, or PS format, along with summary tables in .csv format. The user can also download cleaned Seurat/SCE objects, summaries, .h5 files, matrix/barcode/feature files based on the input format, or a complete ZIP bundle containing all available outputs.

  • Tips

    • If SoupX errors on empty droplets, widen soupRange or use the automatic global soup profile fallback.
    • If your dataset contains erythroid/testis cells, adjust 'Optional Markers'.

    use SCARR-Vis online

    SCARR-Vis is deployed at: https://www.gudalab-rtools.net/SCARR-Vis


    Launch SCARR-Vis using R and GitHub

    SCARR-Vis were deposited under the GitHub repository: https://github.com/GudaLab/SCARR-Vis
    Before running the app, users must have the following versions installed: R (>= 4.5.2), RStudio (>= 2025.09.2), Bioconductor (>= 3.22) and Shiny (>= 1.11.1) (Tested with this version).
    Note: SCARR-Vis has been tested with these versions. If users are running an older version of R, they may encounter errors during package installation. Therefore, it is recommended to update R to the latest version first.
    Once R is open in the command line or in RStudio, users should run the following command in R to install the shiny package.

    install.packages('shiny')
    library(shiny)

    Start the app

    Start the R session using RStudio and run these lines:

    shiny::runGitHub('SCARR-Vis','GudaLab')
    or Alternatively, download the source code from GitHub and run the following command in the R session using RStudio:
    library(shiny)
    runApp('/path/to/the/SCARR-Vis-master', launch.browser=TRUE)

    Usage

    Please refer our Manual tab.


    Developed and maintained by

    SCARR-Vis was developed by Sankarasubramanian Jagadesan and Babu Guda. We share a passion for developing a user-friendly tool for biologists, particularly those who do not have access to bioinformaticians or programming expertise.


    Total number of views:



    Instructions for Uploading Sample Files

    1. H5 Files (Cell Ranger Output)
      • Cell Ranger file: raw_feature_bc_matrix.h5, filtered_feature_bc_matrix.h5.
    2. Cell Ranger Matrix Files
      • Cell Ranger files: filtered_feature_bc_matrix (matrix.mtx.gz, feature.tsv.gz, barcode.tsv.gz) and filtered_feature_bc_matrix (matrix.mtx.gz, feature.tsv.gz, barcode.tsv.gz).
      • Compress the folder to .zip format.

    Clarification example file format

    Users can download this example dataset to better understand the required structure. Following this reference will help ensure that your files are correctly prepared and fully compatible with our tool
    H5 File (Cell Ranger Output)
    Raw_feature_bc_matrix H5 File
    Filtered_feature_bc_matrix H5 File
    or
    Cell Ranger Matrix Files
    Raw_feature_bc_matrix Matrix Files
    Filtered_feature_bc_matrix Matrix Files
    Loading...
    
                        





    Detected genes per cell

    Loading...

    UMIs per cell

    Loading...

    Mitocondrial percentage

    Loading...

    rho density

    Loading...

    rho vs nUMIs

    Loading...

    Auto estimation contamination diagnostic

    Loading...

    Ambient profile plot

    Loading...

    UMIs removed per cell

    Loading...

    Detected genes per cell

    Loading...

    UMIs cell

    Loading...

    Mitocondrial percentage

    Loading...

    Cell counts table before and after ambinent RNA removal

    Loading...

    Cell counts plot before and after ambinent RNA removal

    Loading...

    Cell stats table

    Loading...

    Top genes

    Loading...

    UMAP plot before ambient RNA removal

    Loading...

    UMAP plot after ambient RNA removal

    Loading...

    TSNE plot before ambient RNA removal

    Loading...

    TSNE plot after ambient RNA removal

    Loading...

    Heatmap before or after ambient RNA removal

    Loading...

    Feature plot before ambient RNA removal

    Loading...

    SCARR-Vis: Manual & Parameter Reference

    SCARR-Vis is an R/Shiny application for interactive assessment and correction of ambient RNA contamination in single-cell and single nucleus RNA-seq data. The interface follows a typical workflow: upload 10x matrices, choose a decontamination method, inspect pre- and post-correction QC, and explore clustering and gene expression.

    Overview of the Example Dataset (GSM7681687)

    Throughout this manual we use the single-cell RNA-seq sample GSM7681687 from NCBI GEO. FASTQ files were processed with Cell Ranger to generate both raw and filtered feature-barcode matrices . Because raw matrices are typically not submitted to GEO, SCARR-Vis bundles this dataset as example data so users can play with the full pipeline, including ambient RNA estimation that relies on the raw matrix.

    To quickly test SCARR-Vis, select “Use example data” in the upload panel and run the pipeline with the default parameters.


    Upload 10x Matrices

    In step 1, SCARR-Vis expects both a raw and a filtered 10x feature-barcode matrix. You can either use the bundled GSM7681687 example or upload your own matrices.

    • Species / genome build: choose the appropriate reference.
    • Raw matrix: a 10x HDF5 file .
    • Filtered matrix: a zipped directory contain barcode, matrix feature file show in above image.
    • Optional: collapse duplicate gene names; convert Ensembl IDs to gene symbols.
    Figure 1. Upload panel and estimation method selector. Sub-panels show: (a) upload matrices, (b) SoupX, (c) DecontX, (d) scCDC, (e) FastCAR parameter panels.

    Note: when starting from FASTQ files, you must first run Cell Ranger (or a similar pipeline) to generate the raw and filtered matrices required by SCARR-Vis.


    Choose Estimation Method and Parameters

    In step 2 (Estimate Contamination), SCARR-Vis provides adapters for four methods:

    • SoupX: models background 'soup' RNA and adjusts counts.
    • DecontX: from celda infers cell-specific contamination fractions.
    • scCDC: identifies contamination-causing genes (GCGs) and optionally corrects them.
    • FastCAR: profiles ambient RNA using empty droplets and estimates per-cell contamination.

    Each method has its own parameter panel (see Figure 1b–1e). After setting parameters, click Run to start estimation. Progress and status messages are shown in the log.


    1. QC (Pre)

    The QC (Pre) tab summarizes per-cell metrics before any correction:

    • Detected genes per cell.
    • UMIs per cell.
    • Mitochondrial percentage (species-aware MT gene pattern).
    Figure 2. QC (Pre) for GSM7681687: (a) detected genes per cell, (b) UMIs per cell, (c) mitochondrial %.

    2. Estimation Diagnostics

    The Estimation tab displays method-specific diagnostic plots.

    2.1 SoupX

    SoupX output includes the distribution of contamination fractions (ρ), their relationship with UMI counts, and other model diagnostics.

    Figure 3.1. SoupX diagnostics: (a) ρ density, (b) ρ vs nUMIs (c) auto estimation contamination diagnostic.

    2.2 DecontX

    DecontX produces a contamination density plot, ρ vs nUMIs, and a UMAP colored by estimated contamination.

    Figure 3.2. DecontX diagnostics: contamination density, ρ vs nUMIs, and contamination on UMAP.

    2.3 scCDC

    scCDC focuses on genes driving contamination. SCARR-Vis shows UMIs per cell post vs pre, mean counts of top GCGs, and entropy vs mean expression to highlight putative contamination genes.

    Figure 3.3. scCDC diagnostics: (a) UMIs per cell (post vs pre), (b) top GCGs (pre vs post), (c) entropy vs mean expression.

    2.4 FastCAR

    FastCAR scans a grid of empty-droplet UMI cutoffs to profile ambient RNA and identifies an appropriate threshold for empty droplets and contamination. SCARR-Vis displays the number of empty droplets and genes in ambient RNA at each cutoff, as well as the distribution of UMIs removed per cell.

    Figure 3.4. FastCAR diagnostics: (a) empty-droplet profile, (b) reads removed per cell.

    3. QC (Post)

    After decontamination, the QC (Post) tab repeats the same metrics as QC (Pre) but on the corrected counts, allowing a direct comparison.

    Figure 4. QC (Post) histograms for GSM7681687: detected genes, UMIs, and mitochondrial %.

    4. Cluster counts (Pre vs Post)

    The Cluster counts (Pre vs Post) tab compares the number of cells in each cluster before and after correction, making it easy to see whether ambient RNA disproportionately affected particular clusters.

    Figure 5. Cluster counts per cluster (pre vs post).
    Figure 6. Bar plot with Cluster counts (pre vs post).

    5. Cell table

    The Cells Table tab lists per-cell metrics such as cluster ID, UMIs, QC statistics, and contamination estimates. Users can sort and filter rows for detailed inspection.

    Figure 7. Cell-level summary table.

    6 Top genes

    The Top Genes tab highlights the genes most affected by ambient RNA and compares their pre- and post-correction expression.

    Figure 8. Top genes affected by ambient RNA (pre vs post).

    7. UMAP / tSNE (Pre vs Post)

    Global structure is visualized in the UMAP/TSNE (Pre vs Post) tab. SCARR-Vis recomputes embeddings on both the original and corrected counts using Seurat.

    • UMAP (Pre) and UMAP (Post).
    • tSNE (Pre) and tSNE (Post).
    Figure 9. UMAP and tSNE embeddings for GSM7681687 before and after correction.

    8. Heatmap (Pre vs Post)

    The Heatmap tab visualizes expression of top variable genes across clusters. Users can switch between pre- and post-correction matrices to see how ambient removal changes gene-level patterns.

    Figure 10. Heatmaps of top variable genes for pre- and post-correction data.

    9. Feature plots (Pre vs Post)

    The Feature Plot tab displays per-gene expression over UMAP/tSNE. Users provide one or more comma-separated gene symbols (e.g. FTH1 ), and SCARR-Vis plots paired pre- and post-correction feature maps.

    Figure 11. Example feature plots for FTH1 (Pre and Post).

    10. Reproducibility and Session Info

    SCARR-Vis provides a reproducibility summary and full R session information. The reproducibility table records the selected method, key parameter values, and dataset-level statistics. The Session Info tab shows R version, platform, and package versions. These should be included in reports or manuscripts so that analyses can be fully reproduced.

    Figure 12. Reproducibility summary table with selected parameters.

    Pipeline summary: The Seurat-based processing (normalization, variable feature selection, PCA, neighbors, clustering, and UMAP) is run twice: first on the uploaded filtered counts (Pre) and again on the decontaminated counts (Post). The downstream visualization tabs always reflect this paired design.


    Parameter reference

    General

    Name Default Min Max Notes
    min_cells 3 1 Minimum cells per gene to retain when creating Seurat objects.

    SoupX

    Name Default Min Max Notes
    do_auto TRUE FALSE TRUE If TRUE, uses autoEstCont to estimate ρ per cell.
    manual_rho 0.05 0 1 Used only if do_auto = FALSE; uniform ρ.
    soupRange c(0, 100) 0 2000 UMI range of empty droplets to build soup profile.
    keepDroplets FALSE FALSE TRUE Keeps droplet table in memory; uses more RAM.

    DecontX

    Name Default Min Max Notes
    decontx_use_clusters TRUE FALSE TRUE If TRUE, uses Seurat clusters as priors.
    decontx_maxiter (maxIter) 500 50 10000 Maximum EM iterations.
    decontx_delta 10,10 >0,>0 Dirichlet prior hyperparameters as two numbers.
    decontx_estimateDelta TRUE FALSE TRUE Estimate delta during fitting.
    decontx_convergence 0.001 1e-6 0.1 EM tolerance for convergence.
    decontx_iterLogLik 10 1 1000 Iterations between log-likelihood checks.
    decontx_varGenes 5000 100 30000 Number of variable genes used by decontX.

    scCDC

    Name Default / Option Notes
    restriction_factor 0.5 (dropdown) Controls aggressiveness of GCG detection.
    min.cell 100 (dropdown) Minimum cells per gene for estimation.
    percent.cutoff 0.2 (dropdown) Threshold for ambient fraction filtering.

    FastCAR

    Name Default Min Max Notes
    fastcar_empty_cutoff 100 10 5000 Maximum UMIs to call a droplet 'empty'. Higher values can over-correct lowly expressed genes.
    fastcar_contam_cutoff 0.05 0 0.5 Contamination chance cutoff used for background detection; lower is more conservative.
    fastcar_do_profile TRUE FALSE TRUE If TRUE, runs describe.ambient.RNA.sequence to profile ambient RNA over a grid of empty-droplet cutoffs.
    fastcar_profile_start 10 1 2000 Lower bound of UMI cutoff grid for ambient profiling.
    fastcar_profile_stop 500 50 10000 Upper bound of UMI cutoff grid for ambient profiling.
    fastcar_profile_by 10 1 100 Step size of UMI cutoff grid for ambient profiling.
    fastcar_use_recommended TRUE FALSE TRUE If TRUE, uses FastCAR's recommended empty-droplet cutoff based on the ambient profile.

    Seurat processing (defaults used in app)

    Step Key parameters (value) Notes
    Mito % PercentageFeatureSet pattern = ^MT- (human) / ^mt- (mouse) Species-aware mitochondrial regex.
    NormalizeData normalization.method="LogNormalize" , scale.factor=10000 Standard log-normalization.
    FindVariableFeatures selection.method="vst" , nfeatures=2000 Top 2,000 HVGs (Seurat default).
    ScaleData center=TRUE , scale=TRUE , verbose=FALSE Centers and scales features before PCA.
    RunPCA features=VariableFeatures(object) , npcs=30 , verbose=FALSE PCA on HVGs; 30 PCs kept.
    FindNeighbors reduction='pca' , dims=1:20 , k.param=20 SNN graph on first 20 PCs; k=20.
    FindClusters resolution=0.5 , algorithm=1 Louvain (algorithm 1) at res=0.5.
    RunUMAP reduction='pca' , dims=1:20 , n.neighbors=30 , min.dist=0.3 , umap.method="uwot" , metric="cosine" UMAP via uwot; first 20 PCs.

    See the Estimation, QC, and visualization tabs for diagnostics and plots after you run the pipeline.


    Expected runtime

    Runtime depends on dataset size, selected method, and hardware. On a typical modern laptop (4–8 CPU cores, 16 GB RAM), running the full SCARR-Vis pipeline on the bundled GSM7681687 example data < 5000 cells (SoupX/DecontX/scCDC/FastCAR, plus QC, clustering, UMAP/TSNE, and plots) usually completes in a few minutes per method (roughly 2–5 minutes).

    R Session info


    Loading...