SCARR-Vis: Manual & Parameter Reference
SCARR-Vis is an R/Shiny application for interactive assessment and correction of ambient RNA contamination in single-cell and single nucleus RNA-seq data. The interface follows a typical workflow: upload 10x matrices, choose a decontamination method, inspect pre- and post-correction QC, and explore clustering and gene expression.
Overview of the Example Dataset (GSM7681687)
Throughout this manual we use the single-cell RNA-seq sample
GSM7681687
from NCBI GEO. FASTQ files were processed with
Cell Ranger
to generate both
raw
and
filtered feature-barcode matrices
.
Because raw matrices are typically not submitted to GEO, SCARR-Vis bundles
this dataset as
example data
so users can play with the
full pipeline, including ambient RNA estimation that relies on the raw matrix.
To quickly test SCARR-Vis, select
“Use example data”
in the upload panel and run the pipeline with the default parameters.
Upload 10x Matrices
In step 1, SCARR-Vis expects both a raw and a filtered 10x feature-barcode matrix.
You can either use the bundled GSM7681687 example or upload your own matrices.
-
Species / genome build:
choose the appropriate reference.
-
Raw matrix:
a 10x HDF5 file .
-
Filtered matrix:
a zipped directory contain barcode, matrix feature file show in above image.
- Optional: collapse duplicate gene names; convert Ensembl IDs to gene symbols.
Figure 1. Upload panel and estimation method selector.
Sub-panels show: (a) upload matrices,
(b) SoupX, (c) DecontX, (d) scCDC, (e) FastCAR parameter panels.
Note:
when starting from FASTQ files, you must first run Cell Ranger (or a similar
pipeline) to generate the raw and filtered matrices required by SCARR-Vis.
Choose Estimation Method and Parameters
In step 2 (Estimate Contamination), SCARR-Vis provides adapters for four methods:
-
SoupX:
models background 'soup' RNA and adjusts counts.
-
DecontX:
from celda infers cell-specific contamination fractions.
-
scCDC:
identifies contamination-causing genes (GCGs) and optionally corrects them.
-
FastCAR:
profiles ambient RNA using empty droplets and estimates per-cell contamination.
Each method has its own parameter panel (see Figure 1b–1e).
After setting parameters, click
Run
to start estimation.
Progress and status messages are shown in the log.
1. QC (Pre)
The
QC (Pre)
tab summarizes per-cell metrics before any correction:
- Detected genes per cell.
- UMIs per cell.
- Mitochondrial percentage (species-aware MT gene pattern).
Figure 2. QC (Pre) for GSM7681687: (a) detected genes per cell,
(b) UMIs per cell, (c) mitochondrial %.
2. Estimation Diagnostics
The
Estimation
tab displays method-specific diagnostic plots.
2.1 SoupX
SoupX output includes the distribution of contamination fractions (ρ),
their relationship with UMI counts, and other model diagnostics.
Figure 3.1. SoupX diagnostics: (a) ρ density, (b) ρ vs nUMIs (c) auto estimation contamination diagnostic.
2.2 DecontX
DecontX produces a contamination density plot, ρ vs nUMIs, and a UMAP colored
by estimated contamination.
Figure 3.2. DecontX diagnostics: contamination density, ρ vs nUMIs,
and contamination on UMAP.
2.3 scCDC
scCDC focuses on genes driving contamination. SCARR-Vis shows UMIs per cell
post vs pre, mean counts of top GCGs, and entropy vs mean expression to highlight
putative contamination genes.
Figure 3.3. scCDC diagnostics: (a) UMIs per cell (post vs pre),
(b) top GCGs (pre vs post), (c) entropy vs mean expression.
2.4 FastCAR
FastCAR scans a grid of empty-droplet UMI cutoffs to profile ambient RNA and
identifies an appropriate threshold for empty droplets and contamination.
SCARR-Vis displays the number of empty droplets and genes in ambient RNA at each
cutoff, as well as the distribution of UMIs removed per cell.
Figure 3.4. FastCAR diagnostics: (a) empty-droplet profile,
(b) reads removed per cell.
3. QC (Post)
After decontamination, the
QC (Post)
tab repeats the same metrics
as QC (Pre) but on the corrected counts, allowing a direct comparison.
Figure 4. QC (Post) histograms for GSM7681687: detected genes, UMIs, and mitochondrial %.
4. Cluster counts (Pre vs Post)
The
Cluster counts (Pre vs Post)
tab compares the number of cells
in each cluster before and after correction, making it easy to see whether
ambient RNA disproportionately affected particular clusters.
Figure 5. Cluster counts per cluster (pre vs post).
Figure 6. Bar plot with Cluster counts (pre vs post).
5. Cell table
The
Cells Table
tab lists per-cell metrics such as cluster ID,
UMIs, QC statistics, and contamination estimates. Users can sort and filter
rows for detailed inspection.
Figure 7. Cell-level summary table.
6 Top genes
The
Top Genes
tab highlights the genes most affected by ambient RNA
and compares their pre- and post-correction expression.
Figure 8. Top genes affected by ambient RNA (pre vs post).
7. UMAP / tSNE (Pre vs Post)
Global structure is visualized in the
UMAP/TSNE (Pre vs Post)
tab. SCARR-Vis recomputes embeddings on both the original and corrected counts using Seurat.
- UMAP (Pre) and UMAP (Post).
- tSNE (Pre) and tSNE (Post).
Figure 9. UMAP and tSNE embeddings for GSM7681687 before and after correction.
8. Heatmap (Pre vs Post)
The
Heatmap
tab visualizes expression of top variable genes across clusters.
Users can switch between pre- and post-correction matrices to see how ambient removal
changes gene-level patterns.
Figure 10. Heatmaps of top variable genes for pre- and post-correction data.
9. Feature plots (Pre vs Post)
The
Feature Plot
tab displays per-gene expression over UMAP/tSNE.
Users provide one or more comma-separated gene symbols (e.g.
FTH1
), and
SCARR-Vis plots paired pre- and post-correction feature maps.
Figure 11. Example feature plots for FTH1 (Pre and Post).
10. Reproducibility and Session Info
SCARR-Vis provides a reproducibility summary and full R session information.
The reproducibility table records the selected method, key parameter values,
and dataset-level statistics. The Session Info tab shows R version, platform,
and package versions. These should be included in reports or manuscripts so
that analyses can be fully reproduced.
Figure 12. Reproducibility summary table with selected parameters.
Pipeline summary:
The Seurat-based processing (normalization, variable feature selection, PCA,
neighbors, clustering, and UMAP) is run twice: first on the uploaded filtered
counts (Pre) and again on the decontaminated counts (Post). The downstream
visualization tabs always reflect this paired design.
Parameter reference
General
| Name |
Default |
Min |
Max |
Notes |
min_cells
|
3 |
1 |
— |
Minimum cells per gene to retain when creating Seurat objects. |
SoupX
| Name |
Default |
Min |
Max |
Notes |
do_auto
|
TRUE |
FALSE |
TRUE |
If TRUE, uses autoEstCont to estimate ρ per cell. |
manual_rho
|
0.05 |
0 |
1 |
Used only if do_auto = FALSE; uniform ρ. |
soupRange
|
c(0, 100) |
0 |
2000 |
UMI range of empty droplets to build soup profile. |
keepDroplets
|
FALSE |
FALSE |
TRUE |
Keeps droplet table in memory; uses more RAM. |
DecontX
| Name |
Default |
Min |
Max |
Notes |
decontx_use_clusters
|
TRUE |
FALSE |
TRUE |
If TRUE, uses Seurat clusters as priors. |
decontx_maxiter (maxIter)
|
500 |
50 |
10000 |
Maximum EM iterations. |
decontx_delta
|
10,10 |
>0,>0 |
— |
Dirichlet prior hyperparameters as two numbers. |
decontx_estimateDelta
|
TRUE |
FALSE |
TRUE |
Estimate delta during fitting. |
decontx_convergence
|
0.001 |
1e-6 |
0.1 |
EM tolerance for convergence. |
decontx_iterLogLik
|
10 |
1 |
1000 |
Iterations between log-likelihood checks. |
decontx_varGenes
|
5000 |
100 |
30000 |
Number of variable genes used by decontX. |
scCDC
| Name |
Default / Option |
Notes |
restriction_factor
|
0.5 (dropdown) |
Controls aggressiveness of GCG detection. |
min.cell
|
100 (dropdown) |
Minimum cells per gene for estimation. |
percent.cutoff
|
0.2 (dropdown) |
Threshold for ambient fraction filtering. |
FastCAR
| Name |
Default |
Min |
Max |
Notes |
fastcar_empty_cutoff
|
100 |
10 |
5000 |
Maximum UMIs to call a droplet 'empty'. Higher values can over-correct lowly expressed genes. |
fastcar_contam_cutoff
|
0.05 |
0 |
0.5 |
Contamination chance cutoff used for background detection; lower is more conservative. |
fastcar_do_profile
|
TRUE |
FALSE |
TRUE |
If TRUE, runs describe.ambient.RNA.sequence to profile ambient RNA over a grid of empty-droplet cutoffs. |
fastcar_profile_start
|
10 |
1 |
2000 |
Lower bound of UMI cutoff grid for ambient profiling. |
fastcar_profile_stop
|
500 |
50 |
10000 |
Upper bound of UMI cutoff grid for ambient profiling. |
fastcar_profile_by
|
10 |
1 |
100 |
Step size of UMI cutoff grid for ambient profiling. |
fastcar_use_recommended
|
TRUE |
FALSE |
TRUE |
If TRUE, uses FastCAR's recommended empty-droplet cutoff based on the ambient profile. |
Seurat processing (defaults used in app)
| Step |
Key parameters (value) |
Notes |
| Mito % |
PercentageFeatureSet pattern = ^MT- (human) / ^mt- (mouse) |
Species-aware mitochondrial regex. |
| NormalizeData |
normalization.method="LogNormalize" , scale.factor=10000 |
Standard log-normalization. |
| FindVariableFeatures |
selection.method="vst" , nfeatures=2000 |
Top 2,000 HVGs (Seurat default). |
| ScaleData |
center=TRUE , scale=TRUE , verbose=FALSE |
Centers and scales features before PCA. |
| RunPCA |
features=VariableFeatures(object) , npcs=30 , verbose=FALSE |
PCA on HVGs; 30 PCs kept. |
| FindNeighbors |
reduction='pca' , dims=1:20 , k.param=20 |
SNN graph on first 20 PCs; k=20. |
| FindClusters |
resolution=0.5 , algorithm=1 |
Louvain (algorithm 1) at res=0.5. |
| RunUMAP |
reduction='pca' , dims=1:20 , n.neighbors=30 , min.dist=0.3 , umap.method="uwot" , metric="cosine" |
UMAP via uwot; first 20 PCs. |
See the Estimation, QC, and visualization tabs for diagnostics and plots after you run the pipeline.
Expected runtime
Runtime depends on dataset size, selected method, and hardware.
On a typical modern laptop (4–8 CPU cores, 16 GB RAM), running the full
SCARR-Vis pipeline on the bundled GSM7681687 example data < 5000 cells (SoupX/DecontX/scCDC/FastCAR,
plus QC, clustering, UMAP/TSNE, and plots) usually completes in a few minutes per method (roughly 2–5 minutes).