Single Cell RNA Data Analysis and Visualization (ScRDAVis)

Introduction

ScRDAVis is a browser-based and user-friendly R Shiny application designed for researchers without programming proficiency to analyze and visualize single-cell RNA (scRNA) results. It supports single and multiple sample analyses as well as group comparisons. The application includes the following key functional analyses:

1. Single or Multiple Samples Analysis

This section offers various tabs to analyze one or more samples, which can be grouped into up to six groups.

1.1 Stats

Displays the QC plot and cell summary of the uploaded sample(s).

1.2 Sample Groups and QC Filtering

Assists in filtering QC metrics for the sample(s) for further analysis.

1.3 Normalization and PCA Analysis

Allows normalization of samples using multiple methods and generates PCA plots.

1.4 Clustering

Uses the Seurat clustering algorithm to group cells into clusters and visualizes them with UMAP or tSNE.

1.5 Remove Doublets

Employs DoubletFinder to detect doublet or singlet cells, allowing users to keep or remove doublets cells.

1.6 Marker Identification

Identifies markers for all clusters, a specific cluster, or between clusters and supports the identification of conserved markers.

1.7 Cell Type Prediction

Offers multiple options for cell type identification, including ScType, SingleR, GPTCelltype, or custom user-provided labels.

1.8 Cluster-Based Plots

Displays expressed genes in each cluster using Dot, Violin, Ridge, or Feature plots.

1.9 Condition-Based Analysis

Identifies expressed genes between two groups, with visualization options including Dot, Violin, Ridge, Feature, or Volcano plots.

2. Subclustering

Allows sub-clustering within one or more clusters from single or multiple sample analyses or gene of interst in positive or negative selection, which follows similar steps as in the primary analysis.

3. Correlation Network Analysis

Uses the genesorteR package to identify the correlation between cell clusters. Provides correlation summary tables and visualizations of correlation matrix and network plots.

4. Genome Ontology (GO) Terms

Uses the clusterProfiler package to identify biological processes, molecular functions, and cellular components for marker genes. Provides GO summary tables and visualizations in Dot, Bar, Net and UpSetplots.

5. Pathway Analysis

Employs the clusterProfiler and ReactomePA packages to identify pathways in single or multiple clusters, with results displayed in Dot, Bar, Net and UpSetplots.

6. GSEA Analysis

Performs Gene Set Enrichment Analysis (GSEA) using the fgsea and msigdb packages to identify enriched gene sets. Results are displayed in GSEA plots, Bar plots, and PlotGseaTables.

7. Cell-Cell Communication

Uses the Cellchat package to identify signaling communication between clusters, with receptor-ligand interactions visualized in Circular, Chord, Heatmap, Bubble, Bar, and Violin plots.

8. Trajectory and Pseudotime Analysis

Utilizes the Monocle3 package to order clusters in pseudotime and analyze gene function changes over time. Visualizations include trajectory and pseudotime plots, bar plots, and gene functional changes in pseudotime.

9. Co-Expression and TF analysis

9.1 Co-Expression Network Analysis

Uses the hdWGCNA package to identify co-expression networks as undirected, weighted gene networks. These are visualized through co-expression networks with modules, soft power plots, module relationship plots, module network plots, and module UMAP plots

9.2 Transcription factor regulatory network analysis

Uses the hdWGCNA package to identify the transcription factor (TFs) within co-expression modules. These TFs play a key role in regulating gene expression networks in single-cell data. These TFs are visualized through bar plot, network plot and module UMAP plots

Outputs and Visualization

ScRDAVis provides publication-quality plots in seven formats: JPG, TIFF, PDF, SVG, BMP, EPS, and PS. Summary tables are also generated in .csv format for easy visualization and download.

use ScRDAVis online

ScRDAVis is deployed at: https://www.gudalab-rtools.net/ScRDAVis

Launch ScRDAVis using R and GitHub

ScRDAVis were deposited under the GitHub repository: https://github.com/GudaLab/ScRDAVis
Before running the app, users must have the following versions installed: R (>= 4.5.1), RStudio (>= 2025.05.1), Bioconductor (>= 3.21) and Shiny (>= 1.11.1) (Tested with this version).
Note: ScRDAVis has been tested with these versions. If users are running an older version of R, they may encounter errors during package installation. Therefore, it is recommended to update R to the latest version first.
Once R is open in the command line or in RStudio, users should run the following command in R to install the shiny package.

install.packages('shiny')

library(shiny)

Start the app

Start the R session using RStudio and run these lines:

shiny::runGitHub('ScRDAVis','GudaLab')

or Alternatively, download the source code from GitHub and run the following command in the R session using RStudio:

library(shiny)
runApp('/path/to/the/ScRDAVis-master', launch.browser=TRUE)

Usage

Please refer our Manual tab.

Developed and maintained by

ScRDAVis was developed by Sankarasubramanian Jagadesan and Babu Guda. We share a passion for developing a user-friendly tool for biologists, particularly those who do not have access to bioinformaticians or programming expertise.

Total number of views:

Sample names of uploaded file(s)

Instructions for Uploading Sample Files

H5 Files (Cell Ranger Output)

Cell Ranger file: filtered_feature_bc_matrix.h5.
Rename it to SAMPLE_NAME.h5 for proper identification.

Cell Ranger Matrix Files

Cell Ranger files: matrix.mtx.gz, feature.tsv.gz, barcode.tsv.gz.
Rename as SAMPLE_NAME_matrix.mtx.gz, SAMPLE_NAME_features.tsv.gz, and SAMPLE_NAME_barcodes.tsv.gz, and upload together as a set.

Seurat Objects

Format: filename.rds (Seurat object). The orig.ident attribute should match the sample name(s).

Matrix count file

Format: Filename.txt with rows as genes and columns as sample_cellID.

Clarification example file format

Users can download this example dataset to better understand the required structure. Following this reference will help ensure that your files are correctly prepared and fully compatible with our tool

H5 File (Cell Ranger Output)
H5 File
Cell Ranger Matrix Files
Barcodes file
Features File
Matrix File
Seurat Object file
Seurat Object Filet
Matrix count file

Matrix count file Note: Please extract the .gz file and upload the .txt file, which is approximately 715 MB in size. The rows representing genes and columns labeled by sample and cell IDs in the format sample_cellID. If you have multiple samples (e.g., 4 samples), please combine them into a single .txt file.

Number of cells in the given sample(s)

QC Plot before filtering

Feature-Feature relationships plot

Select number of sample group(s)

Type Group1 name

Type Group2 name

Type Group3 name

Type Group4 name

Type Group5 name

Type Group6 name

Define filtering parameters

Exclude cells based on their number of expressed genes and the percentage of reads that map to the mitochondrial genome.

Keep cells that are expressed at least this number of genes

Exclude cell that expressed more than this number of genes

Filter cells that have more than this percentage mitochondrial counts

Number of cells after QC

Sample(s) based

Group(s) based

QC plot after filtering

Sample(s) based

Group(s) based

Bar plots

Sample(s) based

Group(s) based

Normalization method

Scale factor

variable genes detection

Number of top variable features

Number of dimensions (PCA)

PC significance (JackStraw)

Max PCs to test (ScoreJackStraw dims)

JackStraw num.replicate

Plot PCs up to (for JackStrawPlot)

Dimension reduction heatmap for PCA data

Elbow plot

PC significance (JackStraw)

PCA sample(s) based

PCA group(s) based

Nearest-neighbour graph construction

Number of dimensions

k.param

n.trees

Clustering parameters and integration method

Resolution

Clustering algorithm

Integration method (except none, minimum two samples are required)

Dimension reduction

Plot Options

UMAP t-SNE

UMAP parameters

Number of dimensions

k-nearest-neighbours

min.dist

Show label

t-SNE parameters

Number of dimensions

Show label

UMAP / t-SNE cluster plot

Cluster based count bar plot

UMAP / t-SNE condition(s) based plot

Condition(s) based count bar plot

UMAP / t-SNE sample(s) based plot

Sample(s) based count Bar plot

Number of cells in clusters

Number of cells in clusters based on condition(s)

Number of cells in clusters based on sample(s)

Parameters to detect doublets

Estimated percentage of doublets in dataset

Dim plots label

Singlet / doublets plot

Singlet / doublets plot based on condition(s)

Singlet / doublets plot based on sample(s)

Singlet / doublets plot based on clusters

Number of singlet / doublets in sample(s)

Parameters to keep or remove doublets

Keep or remove doublets

Remove doublets Keep doublets

Number of cell counts used for further analysis

Singlet/doublet after keeping or removal

Based on condition(s)

Based on sample(s)

Based on clusters

Clusters split by condition(s) and sample(s)

Bar plots after keeping or removing doublet

Number of cells in clusters

Number of cells in clusters based on condition(s)

Number of cells in clusters based on sample(s)

Markers identification or Differential expression analysis

Select the analysis type

Gene expression markers parameters

Minimal percentage of cells

log fold change threshold

Statistical test

Return only positive markers

group.var

Identified markers / differentially expressed genes

Conserved Markers genes

Heatmap for top 5 marker genes in cluster(s)

Predict Cell Type

Please make sure 'Identify markers in all clusters' were runned in the previous step, if you are using GPTCelltype

Cell type prediction method

Select reference data

Select tissue

DE.method

Select model

Top gene numbers to predict cell type

Dim plots label

Dimplot of annotated clusters

ScType scores

SingleR Scores

SingleR score heatmap

SingleR Delta distribution

Select the plot type to display

Please make sure 'Identify markers in all clusters' and the same 'cell prediction method' were runned in the previous steps.

No. of features to display

Enter your genes for ploting (eg: gene names separated by , )

Plot type

group.by

split.by

Dot / Violin / Ridge / Feature plot

Top or selected genes, cell counts and proportion

Differential expression analysis between two groups

Parameters to find the DEGs

Minimal percentage of cells

log fold change threshold

Statistical test

Return only positive markers

Parameters for ploting

Plot type

group.by

No. of features to display

Enter your genes for ploting (eg: gene names separated by , )

Dot / Violin / Ridge / Feature / Volcano plot

Differentially expressed genes

Number of cells in the sample(s)

QC Stats for sleected sub clusters

Please use the same normalization method used in single or multiple samples analysis

Normalization method

Scale factor

variable genes detection

Number of top variable features

Number of dimensions (PCA)

Dimension reduction heatmap for PCA data

Elbow plot

PCA sample(s) based

PCA group(s) based

Nearest-neighbour graph construction

Number of dimensions

k.param

n.trees

Clustering parameters and integration method

Resolution

Clustering algorithm

Integration method (except none, minimum two samples are required)

Dimension reduction

Plot Options

UMAP t-SNE

UMAP parameters

Number of dimensions

k-nearest-neighbours

min.dist

Show label

t-SNE parameters

Number of dimensions

Show label

UMAP / t-SNE cluster plot

Cluster based count bar plot

UMAP / t-SNE condition(s) based plot

Condition(s) based count bar plot

UMAP / t-SNE sample(s) based plot

Sample(s) based count Bar plot

Clusters split by condition(s) and sample(s)

Number of cells in clusters

Number of cells in clusters based on condition(s)

Number of cells in clusters based on sample(s)

Markers identification or Differential expression analysis

Select the analysis type

Gene expression markers parameters

Minimal percentage of cells

log fold change threshold

Statistical test

Return only positive markers

group.var

Identified markers / differentially expressed genes

Conserved Markers genes

Heatmap for top 5 marker genes in cluster(s)

Predict Cell Type

Please make sure 'Identify markers in all clusters' were runned in the previous step, if you are using GPTCelltype

Cell type prediction method

Select reference data

Select tissue

DE.method

Select model

Top gene numbers to predict cell type

Dim plots label

Dimplot of annotated clusters

ScType scores

SingleR Scores

SingleR score heatmap

SingleR Delta distribution

Select the plot type to display

Please make sure 'Identify markers in all clusters' and the same 'cell prediction method' were runned in the previous steps.

No. of features to display

Enter your genes for ploting (eg: gene names separated by , )

Plot type

group.by

split.by

Dot / Violin / Ridge / Feature plot

Top or selected genes, cell counts and proportion

Differential expression analysis between two groups

Parameters to find the DEGs

Minimal percentage of cells

log fold change threshold

Statistical test

Return only positive markers

Parameters for ploting

Plot type

group.by

No. of features to display

Enter your genes for ploting (eg: gene names separated by , )

Dot / Violin / Ridge / Feature / Volcano plot

Differentially expressed genes

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Cell cluster correlation network analysis

Select the input data and celltype method for analysis

Input data

Select the celltype method

Correlation method

Cluster-based correlation matrix plot

Cluster-based Correlation Network plot

Cluster-based correlation table

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and cluster(s) for analysis

Input data

Enter your genes (eg: gene names separated by , )

Select the celltype method

p_val_adj

GO term parameters

Organism

Ontology

pAdjustMethod

pvalueCutoff

qvalueCutoff

Minimal size of genes

Maximal size of genes

Plot type

No. of category to plot

Go term plot

Summary table

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and cluster(s) for analysis

Pathway analysis type

Input data

Enter your genes (eg: gene names separated by , )

Select the celltype method

p_val_adj

Pathway parameters

Organism

pAdjustMethod

pvalueCutoff

qvalueCutoff

Minimal size of genes

Maximal size of genes

Plot type

No. of category to plot

Pathway plot

Summary table

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and cluster(s) for analysis

Input data

Select the celltype method

p_val_adj

GSEA parameters

Organism

Category (from MSigDB)

ScoreType

Minimal size of genes

Maximal size of genes

Number of permutations

Plot type

No. of significance to plot

GSEA plot

Summary table

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and celltype method for analysis

Input data

Select the celltype method

Cell-cell communication parameters (CellChat)

Organism

Threshold of the percent of cells expressed

Threshold of Log Fold Change

Threshold of p-values

Methods for computing the average gene expression per cell group

Minmum number of cells required in each cell group for cell-cell communication

Communication pattern k-value

Show label

Interactions plot with counts

Interactions plot with weights/strength

Interaction heatmap

Incoming and outgoing signaling patterns

Incoming and Outgoing communication pattern of target and secreting cells

Interaction table

Show all the significant interactions associated with certain signaling pathways

Show label

Interactions plot (Circle)

Interactions plot (Chord)

Interaction heatmap

Hierachy plot

Bubble plot

Network analysis contribution bar plot

Gene expression plot

Interaction table

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and annotation method

Input data

Select the celltype method

Parameters to Learn Trajectory

Please make sure you have used UMAP in the clustering steps

use_partition

close_loop

label_groups_by_cluster

label_branch_points

label_roots

label_leaves

Trajectory plot

Order cells in pseudotime

label_groups_by_cluster

label_branch_points

label_roots

label_leaves

Cells plotted in pseudotime

Cells ordered by Seurat cluster and Monocle3 pseudotime

Find genes that changes function during the pseudotime

neighbor_graph

label_groups_by_cluster

Pseudotime plot

List of genes that changes function during the pseudotime

Plot the top or user listed genes to see the changes in pseudotime

No. of genes to display

Enter your genes for ploting (eg: gene names separated by , )

Pseudotime plot for the top selected genes

Co-expression network analysis
Transcription Factor Regulatory Network Analysis

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction step.

Co-expression network analysis using hdWGCNA

Select the input data and cluster(s) for analysis

Input data

Select the celltype method

select the reduction type

Construct metacells

Nearest-neighbors parameter (k)

Minimum number of cells in a particular grouping to construct metacells

Maximum number of shared cells between two metacells

Maximum target number of metacells to construct

Select soft-power

Network Type

Module eigengenes and connectivity

Scale model

Harmonized module eigengenes

Show top N hub genes

Module based UMAP Plot

No. of hub genes to label in each module

Show edges between genes in different modules (grey edges)

UMAP plot to check the loaded the data is correct

Soft power threshold plots

Co-expression network plot

Module ranked by eigengene-based connectivity kME

Module feature plots

Module correlagram plot

Module with Seurat’s dot plot

Individual module network plots

UMAP plot for co-expression networks

Soft-power threshold table

Module assignment table

Top N hub genes

Transcription factor regulatory network analysis

Organism

Identify TFs in promoter regions (uses JASPAR 2024 database, Motif scan and XGBoost)

max_depth

eta

alpha

Define TF Regulons

Threshold for regulatory score

The number of top TFs to keep for each gene

Calculate regulon expression signatures

Positive regulon score thresold

Negative regulon score thresold

Module regulatory network plot (Positive)

Module regulatory network plot (Negative)

Module regulatory network plot (Both)

Module regulatory network plot (Module UMAP)

TF network table

Select a TF of interest

Bar plot parameter

Number of top and bottom target genes

Network plot parameter

Attribute to color the network edges

Number of layers to extend the TF network

Feature plot of selected TF

Top target genes within TF regulons

TF network plot (Positive)

TF network plot (Negative)

TF network plot (Both)

Single Cell RNA Data Analysis and Visualization

This section will introduce how to prepare input files:

Supported Input Formats:

H5 Files (Cell Ranger Output)

Cell Ranger file: filtered_feature_bc_matrix.h5.
Rename it to SAMPLE_NAME.h5 for proper identification.

Cell Ranger Matrix Files

Cell Ranger files: matrix.mtx.gz, feature.tsv.gz, barcode.tsv.gz.
Rename as SAMPLE_NAME_matrix.mtx.gz, SAMPLE_NAME_features.tsv.gz, and SAMPLE_NAME_barcodes.tsv.gz, and upload together as a set.

Seurat Objects

Format: filename.rds (Seurat object). The orig.ident attribute should match the sample name(s).

Matrix count file

Format: Filename.txt with rows as genes and columns as sample_cellID.

Data Size and Handling:

The tool can handle scRNA-seq data up to 3GB in the specified formats.
Supports analysis of single or multiple samples, including up to six sample groups.
After data upload, users can proceed with the analysis through a step-by-step workflow for the 1st Module, with the 'Next Step' button guiding users through each tab in the process.
Once the single or multiple analysis is completed, users can analysis as per their need, there is no steps involved further

Output and Visualizations:

High-Quality Plot Download: Users can download plots in seven formats: JPG, TIFF, PDF, SVG, BMP, EPS, and PS. However, a few specific plots, such as those requiring exceptionally high detail or complex rendering (e.g., network graphs or high-resolution heatmaps), are only available as PDF files to preserve their quality and detail.
Summary Tables: Tables are displayed using the DT package. Users can visualize up to 100 rows (default is 10) and download the entire table as a CSV file.
Download Seurat Object: In single or multiple sample analyses, users can download the processed results as an RDS file (Seurat Object).

Example Datasets:

To ensure seamless analysis and reproducibility, ScRDAVis includes one reference dataset for each input format, sourced from NCBI, which has been pre-tested with the tool. These datasets allow users to explore the tool's functionalities and understand the analysis workflow effectively.

H5 File: GSE271107
Matrix Files: GSE266873
Seurat Object: GSE250488
Matrix count file: GSE155953
Example data to test the tool (C2_vs_P2) from : GSE277476

Estimated Runtime for Analysis Tabs

Tab Name	Estimated Time	Notes
Stats	1–2 minutes	Upload & initial QC plots. H5 is faster; RDS or raw matrix takes longer.
Sample Groups & QC Filtering	1–2 minutes	Depends on number of samples and filtering thresholds.
Normalization & PCA	2–5 minutes	SCTransform takes longer than LogNormalize.
JackStraw Analysis	10–30 minutes	Depends on number of PCs (e.g., 20–50) and resampling (e.g., 100 reps).
Clustering & UMAP/tSNE	1–3 minutes	Slightly longer for large datasets or high resolution.
Doublet Detection	3–30 minutes	Depends on dataset size and expected doublet rate.
Marker Identification	1–3 minutes	Multiple clusters increase runtime (e.g., 10+ clusters).
Cell Type Prediction	2–15 minutes	ScType & SingleR are fast; GPTCelltype depends on OpenAI API latency.
Cluster-Based Plots	<1 minute	Faster for fewer genes and features.
Condition-Based DEG Analysis	1–2 minutes	Similar to marker detection; volcano plot adds a few seconds.
Subclustering	2–30 minutes	Includes filtering + reclustering a subset of cells. (Whole analysis)
Correlation Network	2–4 minutes	Larger clusters or using Kendall correlation may take longer.
GO Term Enrichment	1–3 minutes	Depends on number of DE genes and ontology selected.
Pathway Analysis	1–3 minutes	KEGG & Reactome databases processed similarly.
GSEA Analysis	1–3 minutes	MSigDB categories vary in size; more permutations = longer time.
Cell-Cell Communication	5–30 minutes	One of the longest steps. Time depends on the number of groups & PPI size.
Trajectory & Pseudotime	3–30 minutes	UMAP-based; Monocle3 processing varies with complexity.
Co-expression Network (hdWGCNA)	15 minutes to 1 hour	Metacell and soft-thresholding steps are the most time-consuming.
TF Regulatory Network	30 minutes to 2 hours	Motif scanning + XGBoost modeling can be moderately slow.

Additional Notes:

Smaller datasets (<5k cells): Most steps complete in under 2–5 minutes.
Larger datasets (>100k cells): Some modules may exceed 10 minutes to 2 hours.
Most time-consuming modules:
- JackStraw
- Doublet Detection
- SingleR
- CellChat (Cell-Cell Communication)
- Trajectory & Pseudotime
- hdWGCNA
- TF Regulatory Network

Step-by-Step Approach for User Interaction using GSE266873 consisting of 9 samples across three groups Group1 (n=3, 0-6 hours post-ICH, G1), Group2 (n=3, 6-24 hours post-ICH, G2), and Group3 (n=3, 24-48 hours post-ICH, G3)

1. Single or Multiple samples analysis

1.1 Stats

Upload and Adjust Parameters:
- Minimum cell expression per gene: Define the minimum number of cells that should express each gene (Default: 0, Min: 0, Max: number of cells).
- Minimum gene expression per cell: Set the minimum number of genes each cell should express (Default: 0, Min: 0, Max: number of cells).
Execution:
- Click the Submit button to run the analysis based on selected parameters (Fig. 1.1a).
Output:
- QC plots: Quality metrics before filtering (Fig. 1.1b).
- Feature-Feature Relationships Plot (Fig. 1.1c)
- Sample(s) cell counts table (Fig. 1.1d)

1.2. Sample Groups and QC Filtering

Assign sample(s) group(s):
Choose the number of group(s) based on sample grouping
- If only one sample: Select 1 group.
- For multiple samples in a single condition: Select 1 group.
- For multiple samples in different conditions: Select up to 6 groups.
Define Filtering Parameters:
- Min gene count per cell (Default: 0) – Filters out cells with fewer than this number of genes expressed. [Recommended: 200 to 500].
- Max gene count per cell (Default: 7500) – Filters out cells with more than this number of genes expressed. [Recommended: 5000 to 7500].
- Max mitochondrial % (Default: 5) – Removes cells with excessive mitochondrial gene expression, often indicating low-quality or dying cells. [Recommended: <10%].
Execution:
- Update Filtered Data: Click to apply filters and update the dataset (Fig. 1.2a).
Outputs after Filtering:
- QC matrics and Bar Plot (Fig. 1.2b-e)
- Summary Table: Filtered cell count data (Fig. 1.2f,g)

1.3. Normalization and PCA Analysis

Normalization Methods:
- LogNormalize: Adjusts for sequencing depth or read count differences.
  - Scale factor (Default: 10000, Min: 1, Max: 1e6) – Scale factor used in LogNormalize method for total expression normalization.
  - Variable gene method (Default: vst) – Method for selecting variable features: vst (default), mean.var.plot, or dispersion.
  - Number of variable genes (Default: 2000, Min: 100, Max: 10000) – Number of top variable genes to retain for downstream analysis.
- SCT (SCTransform): Uses regularized negative binomial regression for clustering and differential expression.
PCA Settings:
- PCA dimensions (Default: 50, Min: 2, Max: 100) – Number of principal components computed for dimensionality reduction.
- JackStraw max dims (Default: 20, Min: 1, Max: 100) – Maximum number of PCs tested for significance in JackStraw analysis.
- JackStraw num.replicate (Default: 100, Min: 10, Max: 1000) – Number of permutations used in JackStraw resampling.
- JackStraw plot max PCs (Default: 20, Min: 1, Max: 100) – Maximum PCs to display in JackStraw significance plot.
Execution:
- Click Submit button to start the analysis (Fig. 1.3a).
Outputs:
- PCA Heatmap (Fig. 1.3b)
- Elbow Plot (Fig. 1.3c)
- Jackstraw Plot (Fig. 1.3d)
- PCA Plot (sample-wise or group-wise) (Fig. 1.3e,f)

1.4. Clustering

Clustering Step:
- Find Neighbors: (Default: 20, Min: 2, Max: 50) The users selects the dimensions to use (PCA, integrated dimensions, etc.) and k-nearest neighbors.
- Clustering Algorithm: Select between Louvain, SLM or Leiden algorithms for clustering.
- Resolution Control: The users can adjust the resolution (0.1 to 1) parameter to control the granularity of clusters.
Dimension Reduction:
- o Choose between UMAP or t-SNE for dimensionality reduction.
  - For UMAP: Users can adjust parameters like min.dist, k-nearest-neighbours, and the number of dimensions (Default: 30, Min: 2, Max: 100).
  - For t-SNE: Users can adjust the number of dimensions (Default: 30, Min: 2, Max: 100)
Integration Method:
- Select integration method(s): CCA, RPCA, Harmony, or JointPCA. These methods will allow users to handle dataset complexities and integrate data from multiple samples.
- Integration method (Default: None) – Data inte.g.ration method. If 'None', no inte.g.ration is performed.
- HarmonyIntegration (Default: Reduction = harmony; Distance = Cosine) – Batch correction using Harmony with cosine distance.
- CCAIntegration (Default: Reduction = cca; Distance = Euclidean) – Canonical correlation analysis for dataset inte.g.ration.
- RPCAIntegration(Default: Reduction = rpca; Distance = Euclidean) – Faster, scalable variant of CCA.
- JointPCAIntegration (Default: Reduction = jointpca; Distance = Euclidean) – Joint PCA embedding for multi-dataset inte.g.ration.
Execution:
- Click Submit button to start the analysis (Fig. 1.4a).
Visualize and Compare:
- Display UMAP or t-SNE plots with clustering labels and sample/condition overlays (Fig. 1.4b,d,f).
- Bar charts (Fig. 4c,e,g) and tables show cell counts per cluster and per sample/condition (Fig. 1.4h-j).

1.5. Remove Doublets

Doublet Detection:
- Uses DoubletFinder to identify potential doublets, helping identify cells that may contain RNA from more than one original cell.
- Start with an estimated doublet rate of 7.5%-10% (0.075 to 0.1) of total cell count.
Execution:
- Click Detect Doublet button to start the analysis (Fig. 1.5a).
Outputs:
- UMAP or t-SNE Plot: Shows singlet and doublet cells, color-coded by cluster, sample, or group (Fig. 1.5b-e).
- Summary Table: Counts of singlet and doublet cells (Fig. 1.5f).
Remove or keep doublets:
- Users can choose to Keep or Remove Doublets (Fig. 1.5g). Updated plots and tables reflect the selection (Fig. 1.5h-q).

1.6. Marker Identification

Identify markers in all clusters (FindAllMarkers):
- Customizable parameters: (Fig. 1.6a)
  - Minimum cell percentage (min.pct) to specify the minimum fraction of cells in which a gene is expressed (Default: 0.25, Min: 0.01, Max: 1.0).
  - Log fold-change threshold (logfc.threshold) to filter markers based on expression magnitude (Default: 0.25, Min: 0.01, Max: ∞).
  - Statistical test options (test.use), including Wilcoxon rank sum (wilcox), Wilcoxon-Limma hybrid (wilcox_limma), binomial (bimod), ROC, t-test, likelihood ratio test (LR), and MAST.
  - Positive markers only (only.pos), with options for yes or no, to focus on upregulated genes in the target cluster.
  - When using SCTransform for normalization, out tool uses PrepSCTFindMarkers preps the data for accurate differential testing by adjusting the SCT assay, making results more reliable for FindMarkers and FindAllMarkers.
- Output: Heatmap of the top 5 genes per cluster (Fig. 1.6b), helping users visualize the distinguishing genes for each cluster and Summary table of markers or expressed genes (Fig. 1.6c).
Marker identification in one specific cluster or between two clusters (FindMarkers):
- Identifies markers for one cluster against another or against all other clusters.
- Includes all the customizable parameters noted above, enabling targeted cluster comparison with refined criteria.
- Output: A table format displaying the expressed genes for the specified clusters, ideal for in-depth comparisons.
Conserved marker identification for one vs. all cluster or between two clusters (FindConservedMarkers):
- Finds markers conserved across groups (e.g., conditions) for a cluster, or conserved markers between two specific clusters.
- Utilizes the same customizable parameters for consistency across comparisons.
- Output: A table format with expressed genes, providing insights into markers consistently expressed across groups or clusters.

1.7. Cell Type Prediction

ScType:

Predefined Tissue Types: Users can select from 15 tissue types, including: Adrenal, Brain, Eye, Heart, Immune, Intestine, Kidney, Liver, Lung, Muscle, Pancreas, Placenta, Spleen, Stomach, Thymus.
Tissue Classification: Automatically classifies the cells based on the selected tissue type.

SingleR:

Reference Datasets: Users can use reference datasets such as: Human Primary Cell Atlas, Blueprint/ENCODE, Mouse RNA-seq, Immunological Genome Project, Database of Immune Cell Expression/eQTLs/Epigenomics, Novershtern Hematopoietic data, Monaco immune data.
Prediction: Predicts the cell types based on these well-known reference datasets.

GPTCelltype:

GPT Models: Utilizes various GPT models, including: GPT-5, GPT-5-mini, GPT-5-nano, GPT-4, GPT-4-turbo, GPT-4o-mini, GPT-4o, ChatGPT-4o-latest, GPT-3.5-turbo, GPT-3.5-turbo.
Gene Requirements: Requires a minimum number of top genes for accurate prediction.
Availability: Available via the web platform. To use it locally, users need to update their API key by setting Sys.setenv(OPENAI_API_KEY = 'your_openai_API_key') in the global.R file.

Own Cell Labels:

User-Defined Labels: Users can manually input their own cell type labels for each cluster.
Cluster Grouping: If multiple clusters need the same label, users should provide the same label name for those clusters.

UMAP/t-SNE Labels:
- Label display options: Users can choose to show or hide cell type labels in the UMAP or t-SNE plots.
Execution:
- Click Detect cell type button to start the analysis (Fig. 1.7a).
Output:

Plot: Generates an image plot showing the predicted cell types (Fig. 1.7b,c).
Summary Table: Provides a summary table with the predicted cell types and associated scores (Fig. 1.7d).

1.8. Cluster-Based Plots

Gene Selection:
- Top Genes: Users can select the top features or genes (from 2 to 10).
- Custom Genes: Users may also input custom gene names by selecting them from a drop-down menu (list of genes) and entering the desired gene names as a comma-separated list.
Plot Types:
- Multiple visualization formats are available, including Dot Plot, Violin Plot, Ridge Plot, and Feature Plot. For Dot Plot, Violin Plot, and Ridge Plot, users can adjust parameters to visualize the plots for either all Seurat clusters or selected specific clusters.
Grouping and Splitting:
- Group by: Users can organize the data by Seurat clusters or labels generated from previous cell type prediction steps.
- Split by: If multiple samples are present, plots can be split by condition or sample to compare expression patterns across groups.
Execution:
- Click Generate plots button to start the analysis (Fig. 1.8a).
Output:
- Plot: The user receives one of the chosen plot formats (violin plot, dot plot, feature plot or ridge plot) (Fig. 1.8b-e).
- Summary Tables: The tool generates tables showing marker gene cell counts and cell proportions, providing an additional layer of quantitative insight (Fig. 1.8f).

1.9. Condition-Based Analysis

Group Selection:
- Users can compare gene expression between two conditions by selecting one group per dropdown menu.
Customizable Parameters:
- Minimum Cell Percentage (min.pct): Sets the minimum fraction of cells in which a gene must be expressed (Default: 0.25, Min: 0.01, Max: 1.0).
- Log Fold-Change Threshold (logfc.threshold): Filters markers by expression magnitude (Default: 0.25, Min: 0.01, Max: ∞).
- Statistical Tests (test.use): Users can choose from various methods, including Wilcoxon rank sum, Wilcoxon-Limma hybrid, binomial, ROC, t-test, likelihood ratio test, and MAST.
- Positive Markers Only (only.pos): Option to display only upregulated genes in the target cluster.
Visualization Options:
- Multiple formats are available, including: Dot Plot, Violin Plot, Ridge Plot, Feature Plot, Volcano Plot
Grouping:
- Group By: Users can group data by Seurat clusters or predicted cell type labels.
- Number of Features: Allows display of a specific number of up- and down-regulated genes (e.g., 15).Users may also input custom gene names by selecting them from a drop-down menu (list of genes).
Execution:
- Click Submit button to start the analysis (Fig. 1.9a).
Output:
- Plot: The users receives the chosen plot type, providing visual comparison (Fig. 1.9b-f).
- Summary Tables: Table contains the differentially expressed genes between the slected groups (Fig. 1.9g).
- This setup enables users to conduct detailed comparisons between conditions, facilitating insights into differential gene expression and cellular responses.

Subclustering

In ScRDAVis, users can further explore specific clusters of interest by performing subclustering analysis. This feature allows for a more granular examination of cell populations within one or multiple clusters, based on the user’s selection. Similar output were generated as like as above for the selected cluster(s) or cell type(s).

2.1. Cluster Selection:

Users can choose one or multiple clusters for subclustering.
Clusters can be selected based on Seurat clusters or previously predicted annotation labels.
Users can select genes of interest to extract cells for reclustering (positive selection); for example, FCN1 or multiple genes like FCN1,PSAP. When specifying multiple genes, separate each gene name with a comma.
Exclude genes expressed in cells and perform the analysis using the remaining cells (negative selection); for example, FCN1 or multiple genes like FCN1,PSAP. When specifying multiple genes, separate each gene name with a comma.

2.2. Subclustering Analysis Steps:

The subclustering process mirrors the main workflow, with dedicated tabs for each stage, allowing users to perform the following analyses on the selected cluster(s):
- Cell Stats: Overview of cell metrics within the selected clusters, including minimum gene and cell expression thresholds.
- Normalization and PCA Analysis: Options to normalize and vizualize the PCA data for secific cluster(s). (use the same method used in the above menu)
- Clustering: Allows users to re-cluster cells within the subclusters, providing insights into finer subpopulations.
- Marker Identification: Users can identify markers specific to subclusters, with options to customize parameters for marker detection.
- Cell Type Prediction: Provides options to predict cell types within the selected subclusters using ScType, SingleR, GPTCelltype, or custom labels.
- Cluster-Based Plots: Users can visualize gene expression within subclusters through Dot, Violin, Ridge, or Feature plots.
- Condition-Based Analysis: Enables differential expression comparisons within subclusters, providing insights into condition-specific gene expression patterns.

3. Correlation Network Analysis

ScRDAVis includes Cluster-Based Correlation Analysis using the genesorteR package. This feature helps users explore relationships and interactions among genes within specific clusters by calculating pairwise correlations.

Prerequisites:
- Correlation Network Analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
- Users can choose to conduct analysis on: Seurat clusters or predicted cell type labels from single, multiple, or subcluster analyses.
Correlation Methods:
- Pearson
- Spearman
- Kendall
Execution:
- Click the Cluster correlation network button to run the analysis based on selected parameters (Fig. 3a).
Output:
- Correlation Heatmap: Displays the correlation values between genes within clusters in a matrix format (Fig. 3b).
- Correlation Network Plot: Depicts the relationships between genes as a network, highlighting strongly correlated pairs (Fig. 3c).
- Summary Table: With the complete correlation matrix for detailed analysis (Fig. 3d)

This analysis provides a deeper understanding of gene co-expression and interaction patterns within clusters, aiding in the identification of significant biological relationships.

4. GO Term Analysis

ScRDAVis provides integrated Gene Ontology (GO) term analysis using the clusterProfiler package, enabling users to explore biological functions, molecular mechanisms, and cellular components related to gene expression patterns in single, multiple, or subcluster analyses. Here’s how users can conduct GO analysis:.

Prerequisites:
- GO analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
Input Options:
Users can choose to conduct GO analysis on:
- Seurat clusters or predicted cell type labels from single, multiple, or subcluster analyses.
- Users can use one or multiple clusters at a time, with an adjustable parameter (p_val_adj < 0.05) for significant results.
- A custom list of genes: Users can manually enter gene names (comma-separated) to investigate GO terms for genes of specific interest.
Organisms Supported for GO Term Mapping:
ScRDAVis supports GO analysis for five organisms, mapping gene IDs to gene symbols:
- Human: org.Hs.eg.db
- Mouse: org.Mm.eg.db
- Rat: org.Mmu.eg.db
- Pig: org.Ss.eg.db
- Rhesus: org.Rn.eg.db
GO Term Analysis Parameters:
Ontology Method: Users can choose to focus on specific biological aspects or all three:
- Biological Process (BP)
- Molecular Function (MF)
- Cellular Component (CC)
- All: To analyze across all three categories.
Adjustable Parameters:
- pAdjustMethod: Select the method to adjust for multiple testing (Default: BH).
- pvalueCutoff: Set a cutoff for p-values (Default: 0.05, Min: 0, Max: 1).
- qvalueCutoff: Define a q-value threshold for significance (Default: 0.2, Min: 0, Max: 1).
- Minimum Size of Genes: Minimum number of genes required in a GO term (Default: 10, Min: 1, Max: 500).
- Maximum Size of Genes: Maximum number of genes in a GO term (Default: 500, Min: 10, Max: 5000).
- Plot Type: Choose a visualization format (Dot Plot, Bar Plot, Net and UpSetPlot).
- Number of Categories to Plot: Select the number of categories to display (Default: 10, Min: 1, Max: 50).
Execution:
- Click the GO Term button to run the analysis based on selected parameters (Fig. 4a).
Output
- Plots: Dot plot, Bar plot, UpSet plot and Network plot for the selected ontology categories (Fig. 4b-e).
- Summary Table: A downloadable table summarizing the GO terms, adjusted p-values, and other relevant metrics, allowing users to interpret and visualize biological insights (Fig. 4f).

This GO term analysis feature in ScRDAVis provides users with an accessible, visually informative, and comprehensive view of gene functionality across clusters and conditions, enabling enhanced biological interpretation of scRNA-seq data.

5. Pathway Analysis

ScRDAVis offers pathway analysis through KEGG and Reactome databases using the clusterProfiler and ReactomePA packages. Users can gain insights into biological pathways associated with specific gene expression profiles from single, multiple, or subcluster analyses.

Prerequisites:
- Pathway analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
Input Options:
Pathway analysis can be performed on:
- Seurat clusters or predicted cell type labels from single, multiple, or subcluster analyses.
- One or multiple clusters simultaneously, with results filtered by an adjusted p-value (p_val_adj < 0.05).
- A custom list of genes: Users can input specific gene names (comma-separated) to focus on pathways for genes of interest.
Organisms Supported for Pathway Mapping:
ScRDAVis enables pathway mapping for multiple organisms:
- KEGG Pathways: Supports human (org.Hs.eg.db), mouse (org.Mm.eg.db), and rat (org.Mmu.eg.db) for mapping gene IDs to symbols.
- Reactome Pathways: Available for human, mouse, and rat.
Pathway Analysis Parameters:
- pAdjustMethod: Choose a method for multiple testing correction (Default: BH).
- pvalueCutoff: Set a threshold for p-values (Default: 10, Min: 1, Max: 500).
- qvalueCutoff: Define a q-value cutoff for pathway significance.
- Minimum Size of Genes: Minimum gene count per pathway (Default: 10, Min: 1, Max: 500).
- Maximum Size of Genes: Maximum gene count per pathway (Default: 500, Min: 10, Max: 5000).
- Plot Type: Choose visualization format (Dot Plot, Bar Plot, Net and UpSetPlot Plot).
- Number of Pathways to Plot: Select the number of pathways to display (1 to 50).
Execution:
- Click the Pathway Analysis button to run the analysis with the selected parameters (Fig. 5a).
Output
Pathway analysis results include:
- Visualizations: Dot plot, Bar plot, UpSet plot and Network plot, showcasing significant pathways (Fig. 5b-e).
- Summary Table: A downloadable table with pathway details, adjusted p-values, and other metrics for further exploration and interpretation (Fig. 5f).

The pathway analysis functionality in ScRDAVis helps users understand the biological processes and signaling pathways linked to gene expression profiles across clusters and conditions, providing a deep functional understanding of their ScRNA-seq data.

6. GSEA Analysis

The Gene Set Enrichment Analysis (GSEA) feature in ScRDAVis leverages the fgsea package to identify enriched pathways using ranked gene lists, such as those generated from differential expression analysis. This allows users to assess pathway-level expression changes and gain insights into functional changes across clusters or conditions.

Prerequisites
- GSEA analysis can be conducted following single or multiple samples analysis or subclustering analysis up to cell type prediction.
Input Options
GSEA analysis can be performed on:
- Seurat clusters or predicted cell type labels derived from single, multiple, or subcluster analyses.
- Single or multiple clusters simultaneously, with results filtered by an adjusted p-value (p_val_adj < 0.05).
Organisms and Gene Sets Supported
- Organisms: Human and mouse gene ID mapping.
- Gene Set Categories: Using the msigdbr package, which provides gene sets compatible with fgsea from the Molecular Signatures Database (MSigDB). Available categories include:
  - Hallmark Gene Sets (H)
  - Positional Gene Sets (C1)
  - Curated Gene Sets (C2)
  - Regulatory Target Gene Sets (C3)
  - Computational Gene Sets (C4)
  - Ontology Gene Sets (C5)
  - Oncogenic Signature Gene Sets (C6)
  - Immunologic Signature Gene Sets (C7)
  - Cell Type Signature Gene Sets (C8)
GSEA Analysis Parameters:
- scoreType: Define the scoring method for pathway enrichment (Default: std).
- Minimal Size of Genes: Minimum number of genes in a gene set (Default: 15, Min: 5, Max: 500).
- Maximal Size of Genes: Maximum number of genes in a gene set (Default: 50, Min: 15, Max: 5000).
- Number of Permutations: Control the precision of p-value calculations (Default: 100, Min: 10, Max: 10000).
- Plot Type: Choose visualization format (GSEA Plot, PlotGseaTable, Bar Plot).
- Number of Significant Pathways to Plot: Select the number of pathways to display (Default: 10, Min: 1, Max: 50).
Execution:
- Click the GSEA Analysis button to run the analysis with the selected parameters (Fig. 6a).
Output
The GSEA analysis provides:
- Visualizations:
  - GSEA Plot: Displays the enrichment score curve (Fig. 6b).
  - PlotGseaTable: Shows enriched pathways and their enrichment scores (Fig. 6c).
  - Bar Plot: Highlights top significant pathways (Fig. 6d).
- Summary Table: A downloadable table of enriched pathways, adjusted p-values, and scores. If the users selects the top 10 significant pathways, the tool displays the top 5 upregulated and top 5 downregulated pathways (Fig. 6e).

GSEA analysis in ScRDAVis offers a powerful method for understanding pathway-level dynamics, supporting biological interpretation of ScRNA-seq data through visual and quantitative assessments of enriched pathways.

7. Cell-Cell Communication Analysis

ScRDAVis integrates CellChat to enable users to analyze cell-cell communication within single or multiple samples, as well as for subclusters. This analysis identifies potential ligand-receptor interactions, allowing users to explore how different cell types or clusters communicate based on gene expression patterns.

Input Options
- Source of Input: Users can analyze cell-cell communication using Seurat clusters or predicted cell type labels generated from single, multiple, or subcluster analysis.
- Organisms Supported: Human and mouse datasets are available for ligand-receptor interaction mapping.

7.1. Parameters for Cell-Cell Communication

Identify Over-Expressed Genes:
- Threshold of Cell Expression Percentage: Minimum percentage of cells expressing the genes (Default: 0, Min: 0, Max: 100).
- Log Fold Change Threshold: Minimum log fold-change required for genes to be considered over-expressed (Default: 0, Min: 0, Max: 10).
- p-Value Threshold: Statistical significance threshold (Default: 0.05, Min: 0.0001, Max: 1).
Compute Communication Probability:
- Expression Method: Choose how to compute the average expression per cell group (options: triMean, truncatedMean, thresholdedMean, median).
Filter Communication:
- Minimum Cell Requirement: Minimum number of cells needed in each cell group to analyze cell-cell communication (Default: 0.05, Min: 0.0001, Max: 1).
Communication Pattern Identification:
- Pattern k-Value: Defines the number of communication patterns to identify (Default: 2, Min: 2, Max: 20).
Label Option:
- Show or hide labels in plots.
Execution:
- Click to Cell-Cell communication analysis button to start the analysis (Fig. 7.1a).
Output for Cell-Cell Communication Analysis
The analysis generates the following visual outputs:
- Interaction Plots:
  - Counts and Weights/Strength: Displays the frequency and intensity of interactions among cell groups (Fig. 7.1b,c).
  - Interaction Heatmap: Shows interaction strengths across all clusters or cell types (Fig. 7.1d).
  - Incoming and Outgoing Signaling Patterns: Visualizes communication patterns for target and secreting cells (Fig. 7.1e,f).
- Interaction Table: Includes source and target cell types, ligand-receptor pairs, and interaction scores (Fig. 7.1g).

7.2. Analyzing Specific Signaling Pathways

For a more focused analysis, users can select a specific signaling pathway from a drop-down menu, enabling detailed visualization of the chosen pathway (Fig. 7.2a).

Outputs for Specific Signaling Pathway:
- Circle Plot: Visualizes interactions among cell groups by counts (Fig. 7.2b).
- Chord Plot: Depicts connections between cell types via ligand-receptor pairs (Fig. 7.2c).
- Interaction Heatmap: Interaction strengths among clusters for the specific pathway (Fig. 7.2d).
- Bubble Plot and Bar Plot: Display interaction intensity for the selected pathway (Fig. 7.2e).
- Hierarchy Plot: Shows the hierarchical organization of cell types and their interactions (Fig. 7.2f).
- Bar Plot: Shows the network analysis contribution in bar plot (Fig. 7.2g).
- Violin Plot: Shows expression of pathway-associated genes (Fig. 7.2h).
- Signaling Pathway Table: Contains source, target, ligand, receptor, and interaction details for the specific pathway (Fig. 7.2i).

This suite of tools and visualizations enables detailed exploration of cell communication, allowing users to interpret inter-cellular signaling dynamics in ScRNA-seq datasets with biological relevance.

8. Trajectory and Pseudotime Analysis

ScRDAVis integrates Monocle3 for trajectory and pseudotime analysis, allowing users to study the dynamic progression of cells over pseudotime and identify genes with functional changes along this trajectory.

Preparing for Trajectory and Pseudotime Analysis
- Prerequisites: Users must complete analysis up to the cell type prediction step in either single or multiple sample analysis, or subclustering analysis.
- Input Format: The tool automatically converts the Seurat object to Monocle3 format, and users can choose between Seurat clusters or predicted cell type labels as input.
- UMAP Requirement: UMAP should be used in clustering steps for compatibility with Monocle3.

8.1. Parameters for Learning Trajectory

Partitioning Options:
- use_partition: Toggle to specify partitions for different groups.
- close_loop: Set to close or open the trajectory loop.
- label_groups_by_cluster: Labels cell groups by cluster.
- label_branch_points, label_roots, label_leaves: Allows labeling of key points on the trajectory (branches, roots, leaves).
Execution:
- Once parameters are set, users can click the Learn Trajectory button to generate the trajectory plot (Fig. 8a).
Output:
- Trajectory Plot: Displays cell progression in trajectory space, providing insight into the cellular development path (Fig. 8b).

8.2. Pseudotime Ordering of Cells

Parameters:
- Root Cluster Selection: Users must select one cluster to serve as the root cluster, marking the starting point of pseudotime.
- Labeling Options: Parameters include options to label groups by clusters, as well as marking branch points, roots, and leaves.
Execution:
- Click to Submit button to start the analysis (Fig. 8c).
Output:
- Pseudotime Plot: Cells are arranged by pseudotime, showing the developmental trajectory (Fig. 8d).
- Bar Chart: Cells are ordered based on both Seurat clusters and Monocle3 pseudotime (Fig. 8e).

8.3. Identifying Genes with Functional Changes in Pseudotime

To explore gene expression dynamics along the pseudotime trajectory, users can analyze gene expression changes:

Parameters:
- Neighbor Graph Selection: Users can select between Principal Graph or K-Nearest Neighbor (KNN) to model gene expression changes.
Execution:
- Click Find Genes Button: Begins the identification of genes whose functions vary along pseudotime (Fig. 8f).
Output:
- Pseudotime Plot of Cells: Visual representation of cells in pseudotime with associated gene expression (Fig. 8g).
- Summary Table: Lists genes with dynamic functional changes along pseudotime (Fig. 8h).

8.4. Plotting Gene Expression in Pseudotime

Users can visualize specific genes to observe their expression patterns over pseudotime: (Fig. 8i)

Gene Selection:
- Top Genes: By default, the tool plots the top 5 genes with dynamic changes, adjustable between 1 to 10 genes.
- Custom Genes: Users can specify a custom list of genes (comma-separated) to plot in pseudotime.
Output:
- Creates a feature plot to display gene expression across cells in pseudotime (Fig. 8j).

This functionality helps users analyze and visualize gene dynamics, offering insights into cellular progression and identifying key genes in developmental pathways.

9. Co-Expression and TF Analysis

9.1. Co-Expression Network Analysis

ScRDAVis incorporates co-expression network analysis for ScRNA-seq data using the hdWGCNA package. This feature enables users to identify gene modules and their relationships in Seurat clusters or predicted cell type labels.

Prerequisites:
- Co-expression network analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
- User can use one cluster at a time.
Metacell Construction:
Aggregates small groups of similar cells from the same biological sample. Uses the k-Nearest Neighbors (KNN) algorithm to group similar cells and compute a metacell gene expression matrix.
- Parameters:
  - k: Number of nearest neighbors for aggregation (Default: 10, Min: 1, Max: 100).
  - min_cells: Minimum number of cells in a group to construct metacells (Default: 10, Min: 5, Max: 100).
  - max_shared: Maximum number of cells shared across two metacells (Default: 15, Min: 1, Max: 100).
  - target_metacells: Maximum number of target metacells to construct (Default: 1000, Min: 50, Max: 5000).
Co-Expression Network Construction:
Builds networks with customizable parameters:
- softpower: Determines the scale-free topology for constructing networks.
- networkType: Options include signed, unsigned, or signed hybrid.
Module Eigengenes and Connectivity:
- Scales data using selectable models: linear, poisson, or negbinom.
- Allows Harmony batch correction for harmonized module eigengenes (hMEs), selectable by the users.
Hub Gene Extraction:
- Extracts the top N hub genes for selected modules, aiding in the identification of key regulators (Default: 5, Min: 1, Max: 50).
Execution:
- Click the WGCNA Analysis button initiates co-expression network analysis (Fig. 9a).
Outputs:
Few plots were not available in image files format so we have provided those as pdf files.
- Soft Power Plots: Visualizes the selection of the optimal soft power parameter for network construction (Fig. 9.1b).
- Co-Expression Network Visualization: Displays modules with distinct colors representing gene clusters (Fig. 9.1c).
- Ranked Genes in Modules: Provides a list of genes ranked by module membership (kME) (Fig. 9.1d).
- Feature Plots: Highlights the expression of modules or specific genes (Fig. 9.1e).
- Module Relationships Plots: Correlation between modules based on harmonized module eigengenes (hMEs) (Fig. 9.1f).
- Seurat DotPlot with Modules: Displays module-specific gene expression across clusters (Fig. 9.1g).
- Individual Module Network Plots: Visualizes the gene network for specific modules (Fig. 9.1h).
- Module UMAP Plots: Maps modules onto UMAP visualizations for spatial context (Fig. 9.1i).
- Summary Table: Soft Power Table: Lists optimal soft power values (Fig. 9.1j). Module Assignment Table: Details gene-module relationships with colors (Fig. 9.1k). Hub Genes Table: Identifies top hub genes per module (Fig. 9.1l).

This functionality provides a robust framework for uncovering intricate co-expression patterns and identifying key drivers in single-cell datasets.

9.2. Transcription Factor Regulatory Network Analysis

Transcription Factor (TF) Regulatory Network Analysis in ScRDAVis employs the hdWGCNA package to construct and analyze TF regulatory networks based on ScRNA-seq data. This feature allows users to identify gene modules and investigate TF-mediated regulation within clusters or predicted cell type labels.

Prerequisites:
- Complete single or multiple sample analysis or subclustering analysis, including cell type prediction.
- Analysis is performed one cluster at a time.
TF Regulatory Network Construction:
- TF Binding Motif Information:
  - Human: EnsDb.Hsapiens.v86, BSgenome.Hsapiens.UCSC.hg38.
  - Mouse: EnsDb.Mmusculus.v79, BSgenome.Mmusculus.UCSC.mm10.
  - Motifs from the JASPAR 2020 database for multiple species.
- Machine Learning Model:
  - XGBoost: used to model TF regulation for each gene with
  - max_depth : Maximum depth of a tree (Default: 1, Min: 1, Max: 10)
  - eta : Step size shrinkage used in update to prevent overfitting (Default: 0.1, Min: 0.01, Max: 1)
  - alpha: L1 regularization term on weights (Default: 0.5, Min: 0, Max: 1)
- TF Regulon Strategy:
  - Strategy A selects the top TFs for each gene by default
  - reg_thresh : Threshold for regulatory score) (Default: 0.01, Min: 0, Max: 1)
  - n_tfs : The number of top TFs to keep for each gene (Default: 10, Min: 1, Max: 50)
- Regulon Expression Signatures:
  - Positive correlation: cor_thresh = 0.05 (Default: 0.05, Min: 0, Max: 1). Threshold for TF-gene correlation for genes to be included in the positive regulon score
  - Negative correlation: cor_thresh = -0.05 (Default: -0.05, Min: -1, Max: 0). threshold for TF-gene correlation for genes to be included in the negative regulon score
Execution:
- Click Transcription factor analysis button to start the analysis (Fig. 9.2.1a).
Output and Visualization:
- Module Regulatory Network Plots: Positive, negative, and combined regulatory network plots. Visualize TF-to-target relationships categorized by regulatory effects (Fig. 9.2.1b-e).
- Regulated Scores Table: Comprehensive list of TFs and their downstream targets (Fig. 9.2.1f).
TF-Specific Visualizations:
Unravel regulatory mechanisms governing gene expression in cellular contexts. Identify key transcription factors and their target genes for hypothesis generation and validation. Explore positive and negative regulatory effects within gene modules.
- Select a TF from a dropdown menu to generate specific plots: (Fig. 9.2.2a)
Outputs:
- UMAP Plots: Spatial distribution of the TF (Fig. 9.2.2b).
- Bar Plots: Contribution of the TF across modules (Fig. 9.2.2c).
- Network Plots: Positive, negative, and combined networks, with primary, secondary and tertiary targets (Fig. 9.2.2d-f).

This functionality provides a comprehensive view of transcriptional regulation in ScRNA-seq data, enabling detailed exploration of TF-driven cellular processes.

Single Cell RNA Data Analysis and Visualization (ScRDAVis)

Introduction

1. Single or Multiple Samples Analysis

1.1 Stats

1.2 Sample Groups and QC Filtering

1.3 Normalization and PCA Analysis

1.4 Clustering

1.5 Remove Doublets

1.6 Marker Identification

1.7 Cell Type Prediction

1.8 Cluster-Based Plots

1.9 Condition-Based Analysis

2. Subclustering

3. Correlation Network Analysis

4. Genome Ontology (GO) Terms

5. Pathway Analysis

6. GSEA Analysis

7. Cell-Cell Communication

8. Trajectory and Pseudotime Analysis

9. Co-Expression and TF analysis

9.1 Co-Expression Network Analysis

9.2 Transcription factor regulatory network analysis

Outputs and Visualization

use ScRDAVis online

Launch ScRDAVis using R and GitHub

Start the app

Usage

Developed and maintained by

Sample names of uploaded file(s)

Instructions for Uploading Sample Files

Clarification example file format

Users can download this example dataset to better understand the required structure. Following this reference will help ensure that your files are correctly prepared and fully compatible with our tool

Number of cells in the given sample(s)

QC Plot before filtering

Feature-Feature relationships plot

Define filtering parameters

Number of cells after QC

Sample(s) based

Group(s) based

QC plot after filtering

Sample(s) based

Group(s) based

Bar plots

Sample(s) based

Group(s) based

PC significance (JackStraw)

Dimension reduction heatmap for PCA data

Elbow plot

PC significance (JackStraw)

PCA sample(s) based

PCA group(s) based

Nearest-neighbour graph construction

Clustering parameters and integration method

Dimension reduction

UMAP parameters

t-SNE parameters

UMAP / t-SNE cluster plot

Cluster based count bar plot

UMAP / t-SNE condition(s) based plot

Condition(s) based count bar plot

UMAP / t-SNE sample(s) based plot

Sample(s) based count Bar plot

Number of cells in clusters Download as csv

Number of cells in clusters based on condition(s) Download as csv

Number of cells in clusters based on sample(s) Download as csv

Parameters to detect doublets

Singlet / doublets plot

Singlet / doublets plot based on condition(s)

Singlet / doublets plot based on sample(s)

Singlet / doublets plot based on clusters

Number of singlet / doublets in sample(s)

Parameters to keep or remove doublets

Number of cell counts used for further analysis

Singlet/doublet after keeping or removal

Based on condition(s)

Based on sample(s)

Based on clusters

Clusters split by condition(s) and sample(s)

Bar plots after keeping or removing doublet

Number of cells in clusters Download as csv

Number of cells in clusters

Number of cells in clusters based on condition(s)

Number of cells in clusters based on sample(s)

Number of cells in clusters

Number of cells in clusters based on condition(s)

Number of cells in clusters based on sample(s)

Number of cells in clusters

Number of cells in clusters based on condition(s)

Number of cells in clusters based on sample(s)