Single Cell RNA Data Analysis and Visualization (ScRDAVis)


Introduction


ScRDAVis is a browser-based and user-friendly R Shiny application designed for researchers without programming proficiency to analyze and visualize single-cell RNA (scRNA) results. It supports single and multiple sample analyses as well as group comparisons. The application includes the following key functional analyses:

1. Single or Multiple Samples Analysis

This section offers various tabs to analyze one or more samples, which can be grouped into up to six groups.

1.1 Stats

Displays the QC plot and cell summary of the uploaded sample(s).

1.2 Sample Groups and QC Filtering

Assists in filtering QC metrics for the sample(s) for further analysis.

1.3 Normalization and PCA Analysis

Allows normalization of samples using multiple methods and generates PCA plots.

1.4 Clustering

Uses the Seurat clustering algorithm to group cells into clusters and visualizes them with UMAP or tSNE.

1.5 Remove Doublets

Employs DoubletFinder to detect doublet or singlet cells, allowing users to keep or remove doublets cells.

1.6 Marker Identification

Identifies markers for all clusters, a specific cluster, or between clusters and supports the identification of conserved markers.

1.7 Cell Type Prediction

Offers multiple options for cell type identification, including ScType, SingleR, GPTCelltype, or custom user-provided labels.

1.8 Cluster-Based Plots

Displays expressed genes in each cluster using Dot, Violin, Ridge, or Feature plots.

1.9 Condition-Based Analysis

Identifies expressed genes between two groups, with visualization options including Dot, Violin, Ridge, Feature, or Volcano plots.

2. Subclustering

Allows sub-clustering within one or more clusters from single or multiple sample analyses or gene of interst in positive or negative selection, which follows similar steps as in the primary analysis.

3. Correlation Network Analysis

Uses the genesorteR package to identify the correlation between cell clusters. Provides correlation summary tables and visualizations of correlation matrix and network plots.

4. Genome Ontology (GO) Terms

Uses the clusterProfiler package to identify biological processes, molecular functions, and cellular components for marker genes. Provides GO summary tables and visualizations in Dot, Bar, Net and UpSetplots.

5. Pathway Analysis

Employs the clusterProfiler and ReactomePA packages to identify pathways in single or multiple clusters, with results displayed in Dot, Bar, Net and UpSetplots.

6. GSEA Analysis

Performs Gene Set Enrichment Analysis (GSEA) using the fgsea and msigdb packages to identify enriched gene sets. Results are displayed in GSEA plots, Bar plots, and PlotGseaTables.

7. Cell-Cell Communication

Uses the Cellchat package to identify signaling communication between clusters, with receptor-ligand interactions visualized in Circular, Chord, Heatmap, Bubble, Bar, and Violin plots.

8. Trajectory and Pseudotime Analysis

Utilizes the Monocle3 package to order clusters in pseudotime and analyze gene function changes over time. Visualizations include trajectory and pseudotime plots, bar plots, and gene functional changes in pseudotime.

9. Co-Expression and TF analysis

9.1 Co-Expression Network Analysis

Uses the hdWGCNA package to identify co-expression networks as undirected, weighted gene networks. These are visualized through co-expression networks with modules, soft power plots, module relationship plots, module network plots, and module UMAP plots

9.2 Transcription factor regulatory network analysis

Uses the hdWGCNA package to identify the transcription factor (TFs) within co-expression modules. These TFs play a key role in regulating gene expression networks in single-cell data. These TFs are visualized through bar plot, network plot and module UMAP plots

Outputs and Visualization

ScRDAVis provides publication-quality plots in seven formats: JPG, TIFF, PDF, SVG, BMP, EPS, and PS. Summary tables are also generated in .csv format for easy visualization and download.


use ScRDAVis online

ScRDAVis is deployed at: https://www.gudalab-rtools.net/ScRDAVis


Launch ScRDAVis using R and GitHub

ScRDAVis were deposited under the GitHub repository: https://github.com/GudaLab/ScRDAVis
Before running the app, users must have the following versions installed: R (>= 4.5.1), RStudio (>= 2025.05.1), Bioconductor (>= 3.21) and Shiny (>= 1.11.1) (Tested with this version).
Note: ScRDAVis has been tested with these versions. If users are running an older version of R, they may encounter errors during package installation. Therefore, it is recommended to update R to the latest version first.
Once R is open in the command line or in RStudio, users should run the following command in R to install the shiny package.

install.packages('shiny')
library(shiny)

Start the app

Start the R session using RStudio and run these lines:

shiny::runGitHub('ScRDAVis','GudaLab')
or Alternatively, download the source code from GitHub and run the following command in the R session using RStudio:
library(shiny)
runApp('/path/to/the/ScRDAVis-master', launch.browser=TRUE)

Usage

Please refer our Manual tab.


Developed and maintained by

ScRDAVis was developed by Sankarasubramanian Jagadesan and Babu Guda. We share a passion for developing a user-friendly tool for biologists, particularly those who do not have access to bioinformaticians or programming expertise.


Total number of views:


Sample names of uploaded file(s)


                        

Instructions for Uploading Sample Files

  1. H5 Files (Cell Ranger Output)
    • Cell Ranger file: filtered_feature_bc_matrix.h5.
    • Rename it to SAMPLE_NAME.h5 for proper identification.
  2. Cell Ranger Matrix Files
    • Cell Ranger files: matrix.mtx.gz, feature.tsv.gz, barcode.tsv.gz.
    • Rename as SAMPLE_NAME_matrix.mtx.gz, SAMPLE_NAME_features.tsv.gz, and SAMPLE_NAME_barcodes.tsv.gz, and upload together as a set.
  3. Seurat Objects
    • Format: filename.rds (Seurat object). The orig.ident attribute should match the sample name(s).
  4. Matrix count file
    • Format: Filename.txt with rows as genes and columns as sample_cellID.

Clarification example file format

Users can download this example dataset to better understand the required structure. Following this reference will help ensure that your files are correctly prepared and fully compatible with our tool
H5 File (Cell Ranger Output)
H5 File
Cell Ranger Matrix Files
Barcodes file
Features File
Matrix File
Seurat Object file
Seurat Object Filet
Matrix count file
Matrix count file Note: Please extract the .gz file and upload the .txt file, which is approximately 715 MB in size. The rows representing genes and columns labeled by sample and cell IDs in the format sample_cellID. If you have multiple samples (e.g., 4 samples), please combine them into a single .txt file.

Number of cells in the given sample(s)

Loading...

QC Plot before filtering

Loading...

Feature-Feature relationships plot

Loading...

Define filtering parameters

Exclude cells based on their number of expressed genes and the percentage of reads that map to the mitochondrial genome.




Number of cells after QC

Sample(s) based

Loading...

Group(s) based

Loading...

QC plot after filtering

Sample(s) based

Loading...


Group(s) based

Loading...

Bar plots

Sample(s) based

Loading...

Group(s) based

Loading...

PC significance (JackStraw)



Dimension reduction heatmap for PCA data

Loading...

Elbow plot

Loading...

PC significance (JackStraw)

Loading...

PCA sample(s) based

Loading...

PCA group(s) based

Loading...

Nearest-neighbour graph construction

Clustering parameters and integration method

Dimension reduction

UMAP parameters

t-SNE parameters


UMAP / t-SNE cluster plot

Loading...

Cluster based count bar plot

Loading...

UMAP / t-SNE condition(s) based plot

Loading...

Condition(s) based count bar plot

Loading...

UMAP / t-SNE sample(s) based plot

Loading...

Sample(s) based count Bar plot

Loading...

Number of cells in clusters

Loading...

Number of cells in clusters based on condition(s)

Loading...

Number of cells in clusters based on sample(s)

Loading...

Parameters to detect doublets



Singlet / doublets plot

Loading...

Singlet / doublets plot based on condition(s)

Loading...

Singlet / doublets plot based on sample(s)

Loading...

Singlet / doublets plot based on clusters

Loading...

Number of singlet / doublets in sample(s)

Loading...

Parameters to keep or remove doublets



Number of cell counts used for further analysis

Loading...

Singlet/doublet after keeping or removal

Loading...

Based on condition(s)

Loading...

Based on sample(s)

Loading...

Based on clusters

Loading...

Clusters split by condition(s) and sample(s)

Loading...

Bar plots after keeping or removing doublet

Loading...

Number of cells in clusters

Loading...

Number of cells in clusters based on condition(s)

Loading...

Number of cells in clusters based on sample(s)

Loading...

Markers identification or Differential expression analysis

Gene expression markers parameters


Identified markers / differentially expressed genes

Conserved Markers genes

Loading...


Heatmap for top 5 marker genes in cluster(s)

Loading...




Predict Cell Type

Please make sure 'Identify markers in all clusters' were runned in the previous step, if you are using GPTCelltype




Dimplot of annotated clusters

Loading...



ScType scores

SingleR Scores

Loading...

SingleR score heatmap

Loading...

SingleR Delta distribution

Loading...



Select the plot type to display

Please make sure 'Identify markers in all clusters' and the same 'cell prediction method' were runned in the previous steps.




Dot / Violin / Ridge / Feature plot

Loading...

Top or selected genes, cell counts and proportion

Loading...

Differential expression analysis between two groups

Parameters to find the DEGs

Parameters for ploting



Dot / Violin / Ridge / Feature / Volcano plot

Loading...

Differentially expressed genes

Loading...


Number of cells in the sample(s)

Loading...

QC Stats for sleected sub clusters

Loading...
Please use the same normalization method used in single or multiple samples analysis


Dimension reduction heatmap for PCA data

Loading...

Elbow plot

Loading...

PCA sample(s) based

Loading...

PCA group(s) based

Loading...

Nearest-neighbour graph construction

Clustering parameters and integration method

Dimension reduction

UMAP parameters

t-SNE parameters


UMAP / t-SNE cluster plot

Loading...

Cluster based count bar plot

Loading...

UMAP / t-SNE condition(s) based plot

Loading...

Condition(s) based count bar plot

Loading...

UMAP / t-SNE sample(s) based plot

Loading...

Sample(s) based count Bar plot

Loading...

Clusters split by condition(s) and sample(s)

Loading...

Number of cells in clusters

Loading...

Number of cells in clusters based on condition(s)

Loading...

Number of cells in clusters based on sample(s)

Loading...

Markers identification or Differential expression analysis

Gene expression markers parameters


Identified markers / differentially expressed genes

Conserved Markers genes

Loading...


Heatmap for top 5 marker genes in cluster(s)

Loading...




Predict Cell Type

Please make sure 'Identify markers in all clusters' were runned in the previous step, if you are using GPTCelltype




Dimplot of annotated clusters

Loading...



ScType scores

SingleR Scores

Loading...

SingleR score heatmap

Loading...

SingleR Delta distribution

Loading...



Select the plot type to display

Please make sure 'Identify markers in all clusters' and the same 'cell prediction method' were runned in the previous steps.




Dot / Violin / Ridge / Feature plot

Loading...

Top or selected genes, cell counts and proportion

Loading...

Differential expression analysis between two groups

Parameters to find the DEGs

Parameters for ploting




Dot / Violin / Ridge / Feature / Volcano plot

Loading...

Differentially expressed genes

Loading...


To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Cell cluster correlation network analysis

Select the input data and celltype method for analysis


Cluster-based correlation matrix plot

Loading...

Cluster-based Correlation Network plot

Loading...

Cluster-based correlation table

Loading...

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and cluster(s) for analysis

GO term parameters


Go term plot

Loading...

Summary table

Loading...

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and cluster(s) for analysis

Pathway parameters


Pathway plot

Loading...

Summary table

Loading...

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and cluster(s) for analysis

GSEA parameters


GSEA plot

Loading...

Summary table

Loading...

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and celltype method for analysis

Cell-cell communication parameters (CellChat)



Interactions plot with counts

Loading...

Interactions plot with weights/strength

Loading...

Interaction heatmap

Loading...

Incoming and outgoing signaling patterns

Loading...

Incoming and Outgoing communication pattern of target and secreting cells

Loading...

Interaction table

Loading...

Show all the significant interactions associated with certain signaling pathways





Interactions plot (Circle)

Loading...

Interactions plot (Chord)

Loading...

Interaction heatmap

Loading...

Hierachy plot

Loading...

Bubble plot

Loading...

Network analysis contribution bar plot

Loading...

Gene expression plot

Loading...

Interaction table

Loading...

To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction and Marker Identification step.

Select the input data and annotation method

Parameters to Learn Trajectory

Please make sure you have used UMAP in the clustering steps



Trajectory plot

Loading...


Order cells in pseudotime




Cells plotted in pseudotime

Loading...

Cells ordered by Seurat cluster and Monocle3 pseudotime

Loading...


Find genes that changes function during the pseudotime


Pseudotime plot

Loading...

List of genes that changes function during the pseudotime

Loading...

Plot the top or user listed genes to see the changes in pseudotime



Pseudotime plot for the top selected genes

Loading...


To begin this analysis, please complete Single or Multiple samples or subclustering analysis until Cell Type Prediction step.

Co-expression network analysis using hdWGCNA

Select the input data and cluster(s) for analysis

Construct metacells

Select soft-power

Module eigengenes and connectivity

Module based UMAP Plot


UMAP plot to check the loaded the data is correct

Loading...

Soft power threshold plots

Loading...

Co-expression network plot

Loading...

Module ranked by eigengene-based connectivity kME

Loading...

Module feature plots

Loading...

Module correlagram plot

Loading...

Module with Seurat’s dot plot

Loading...

Individual module network plots

Loading...

UMAP plot for co-expression networks

Loading...

Soft-power threshold table

Loading...

Module assignment table

Loading...

Top N hub genes

Loading...


Transcription factor regulatory network analysis

Identify TFs in promoter regions (uses JASPAR 2024 database, Motif scan and XGBoost)

Define TF Regulons

Calculate regulon expression signatures


Module regulatory network plot (Positive)

Loading...

Module regulatory network plot (Negative)

Loading...

Module regulatory network plot (Both)

Loading...

Module regulatory network plot (Module UMAP)

Loading...

TF network table

Loading...

Select a TF of interest

Bar plot parameter

Network plot parameter



Feature plot of selected TF

Loading...

Top target genes within TF regulons

Loading...

TF network plot (Positive)

Loading...

TF network plot (Negative)

Loading...

TF network plot (Both)

Loading...


Single Cell RNA Data Analysis and Visualization


This section will introduce how to prepare input files:

Supported Input Formats:

  1. H5 Files (Cell Ranger Output)
    • Cell Ranger file: filtered_feature_bc_matrix.h5.
    • Rename it to SAMPLE_NAME.h5 for proper identification.
  2. Cell Ranger Matrix Files
    • Cell Ranger files: matrix.mtx.gz, feature.tsv.gz, barcode.tsv.gz.
    • Rename as SAMPLE_NAME_matrix.mtx.gz, SAMPLE_NAME_features.tsv.gz, and SAMPLE_NAME_barcodes.tsv.gz, and upload together as a set.
  3. Seurat Objects
    • Format: filename.rds (Seurat object). The orig.ident attribute should match the sample name(s).
  4. Matrix count file
    • Format: Filename.txt with rows as genes and columns as sample_cellID.

Data Size and Handling:

  • The tool can handle scRNA-seq data up to 3GB in the specified formats.
  • Supports analysis of single or multiple samples, including up to six sample groups.
  • After data upload, users can proceed with the analysis through a step-by-step workflow for the 1st Module, with the 'Next Step' button guiding users through each tab in the process.
  • Once the single or multiple analysis is completed, users can analysis as per their need, there is no steps involved further

Output and Visualizations:

  • High-Quality Plot Download: Users can download plots in seven formats: JPG, TIFF, PDF, SVG, BMP, EPS, and PS. However, a few specific plots, such as those requiring exceptionally high detail or complex rendering (e.g., network graphs or high-resolution heatmaps), are only available as PDF files to preserve their quality and detail.
  • Summary Tables: Tables are displayed using the DT package. Users can visualize up to 100 rows (default is 10) and download the entire table as a CSV file.
  • Download Seurat Object: In single or multiple sample analyses, users can download the processed results as an RDS file (Seurat Object).

Example Datasets:

To ensure seamless analysis and reproducibility, ScRDAVis includes one reference dataset for each input format, sourced from NCBI, which has been pre-tested with the tool. These datasets allow users to explore the tool's functionalities and understand the analysis workflow effectively.


Estimated Runtime for Analysis Tabs

Tab Name Estimated Time Notes
Stats 1–2 minutes Upload & initial QC plots. H5 is faster; RDS or raw matrix takes longer.
Sample Groups & QC Filtering 1–2 minutes Depends on number of samples and filtering thresholds.
Normalization & PCA 2–5 minutes SCTransform takes longer than LogNormalize.
JackStraw Analysis 10–30 minutes Depends on number of PCs (e.g., 20–50) and resampling (e.g., 100 reps).
Clustering & UMAP/tSNE 1–3 minutes Slightly longer for large datasets or high resolution.
Doublet Detection 3–30 minutes Depends on dataset size and expected doublet rate.
Marker Identification 1–3 minutes Multiple clusters increase runtime (e.g., 10+ clusters).
Cell Type Prediction 2–15 minutes ScType & SingleR are fast; GPTCelltype depends on OpenAI API latency.
Cluster-Based Plots <1 minute Faster for fewer genes and features.
Condition-Based DEG Analysis 1–2 minutes Similar to marker detection; volcano plot adds a few seconds.
Subclustering 2–30 minutes Includes filtering + reclustering a subset of cells. (Whole analysis)
Correlation Network 2–4 minutes Larger clusters or using Kendall correlation may take longer.
GO Term Enrichment 1–3 minutes Depends on number of DE genes and ontology selected.
Pathway Analysis 1–3 minutes KEGG & Reactome databases processed similarly.
GSEA Analysis 1–3 minutes MSigDB categories vary in size; more permutations = longer time.
Cell-Cell Communication 5–30 minutes One of the longest steps. Time depends on the number of groups & PPI size.
Trajectory & Pseudotime 3–30 minutes UMAP-based; Monocle3 processing varies with complexity.
Co-expression Network (hdWGCNA) 15 minutes to 1 hour Metacell and soft-thresholding steps are the most time-consuming.
TF Regulatory Network 30 minutes to 2 hours Motif scanning + XGBoost modeling can be moderately slow.
Additional Notes:
  • Smaller datasets (<5k cells): Most steps complete in under 2–5 minutes.
  • Larger datasets (>100k cells): Some modules may exceed 10 minutes to 2 hours.
  • Most time-consuming modules:
    • JackStraw
    • Doublet Detection
    • SingleR
    • CellChat (Cell-Cell Communication)
    • Trajectory & Pseudotime
    • hdWGCNA
    • TF Regulatory Network

Step-by-Step Approach for User Interaction using GSE266873 consisting of 9 samples across three groups Group1 (n=3, 0-6 hours post-ICH, G1), Group2 (n=3, 6-24 hours post-ICH, G2), and Group3 (n=3, 24-48 hours post-ICH, G3)

1. Single or Multiple samples analysis

1.1 Stats

  • Upload and Adjust Parameters:
    • Minimum cell expression per gene: Define the minimum number of cells that should express each gene (Default: 0, Min: 0, Max: number of cells).
    • Minimum gene expression per cell: Set the minimum number of genes each cell should express (Default: 0, Min: 0, Max: number of cells).
  • Execution:
    • Click the Submit button to run the analysis based on selected parameters (Fig. 1.1a).
  • Output:
    • QC plots: Quality metrics before filtering (Fig. 1.1b).
    • Feature-Feature Relationships Plot (Fig. 1.1c)
    • Sample(s) cell counts table (Fig. 1.1d)

1.2. Sample Groups and QC Filtering

  • Assign sample(s) group(s):
    Choose the number of group(s) based on sample grouping
    • If only one sample: Select 1 group.
    • For multiple samples in a single condition: Select 1 group.
    • For multiple samples in different conditions: Select up to 6 groups.
  • Define Filtering Parameters:
    • Min gene count per cell (Default: 0) – Filters out cells with fewer than this number of genes expressed. [Recommended: 200 to 500].
    • Max gene count per cell (Default: 7500) – Filters out cells with more than this number of genes expressed. [Recommended: 5000 to 7500].
    • Max mitochondrial % (Default: 5) – Removes cells with excessive mitochondrial gene expression, often indicating low-quality or dying cells. [Recommended: <10%].
  • Execution:
    • Update Filtered Data: Click to apply filters and update the dataset (Fig. 1.2a).
  • Outputs after Filtering:
    • QC matrics and Bar Plot (Fig. 1.2b-e)
    • Summary Table: Filtered cell count data (Fig. 1.2f,g)

1.3. Normalization and PCA Analysis

  • Normalization Methods:
    • LogNormalize: Adjusts for sequencing depth or read count differences.
      • Scale factor (Default: 10000, Min: 1, Max: 1e6) – Scale factor used in LogNormalize method for total expression normalization.
      • Variable gene method (Default: vst) – Method for selecting variable features: vst (default), mean.var.plot, or dispersion.
      • Number of variable genes (Default: 2000, Min: 100, Max: 10000) – Number of top variable genes to retain for downstream analysis.
    • SCT (SCTransform): Uses regularized negative binomial regression for clustering and differential expression.
  • PCA Settings:
    • PCA dimensions (Default: 50, Min: 2, Max: 100) – Number of principal components computed for dimensionality reduction.
    • JackStraw max dims (Default: 20, Min: 1, Max: 100) – Maximum number of PCs tested for significance in JackStraw analysis.
    • JackStraw num.replicate (Default: 100, Min: 10, Max: 1000) – Number of permutations used in JackStraw resampling.
    • JackStraw plot max PCs (Default: 20, Min: 1, Max: 100) – Maximum PCs to display in JackStraw significance plot.
  • Execution:
    • Click Submit button to start the analysis (Fig. 1.3a).
  • Outputs:
    • PCA Heatmap (Fig. 1.3b)
    • Elbow Plot (Fig. 1.3c)
    • Jackstraw Plot (Fig. 1.3d)
    • PCA Plot (sample-wise or group-wise) (Fig. 1.3e,f)

1.4. Clustering

  • Clustering Step:
    • Find Neighbors: (Default: 20, Min: 2, Max: 50) The users selects the dimensions to use (PCA, integrated dimensions, etc.) and k-nearest neighbors.
    • Clustering Algorithm: Select between Louvain, SLM or Leiden algorithms for clustering.
    • Resolution Control: The users can adjust the resolution (0.1 to 1) parameter to control the granularity of clusters.
  • Dimension Reduction:
    • o Choose between UMAP or t-SNE for dimensionality reduction.
      • For UMAP: Users can adjust parameters like min.dist, k-nearest-neighbours, and the number of dimensions (Default: 30, Min: 2, Max: 100).
      • For t-SNE: Users can adjust the number of dimensions (Default: 30, Min: 2, Max: 100)
  • Integration Method:
    • Select integration method(s): CCA, RPCA, Harmony, or JointPCA. These methods will allow users to handle dataset complexities and integrate data from multiple samples.
    • Integration method (Default: None) – Data inte.g.ration method. If 'None', no inte.g.ration is performed.
    • HarmonyIntegration (Default: Reduction = harmony; Distance = Cosine) – Batch correction using Harmony with cosine distance.
    • CCAIntegration (Default: Reduction = cca; Distance = Euclidean) – Canonical correlation analysis for dataset inte.g.ration.
    • RPCAIntegration(Default: Reduction = rpca; Distance = Euclidean) – Faster, scalable variant of CCA.
    • JointPCAIntegration (Default: Reduction = jointpca; Distance = Euclidean) – Joint PCA embedding for multi-dataset inte.g.ration.
  • Execution:
    • Click Submit button to start the analysis (Fig. 1.4a).
  • Visualize and Compare:
    • Display UMAP or t-SNE plots with clustering labels and sample/condition overlays (Fig. 1.4b,d,f).
    • Bar charts (Fig. 4c,e,g) and tables show cell counts per cluster and per sample/condition (Fig. 1.4h-j).

1.5. Remove Doublets

  • Doublet Detection:
    • Uses DoubletFinder to identify potential doublets, helping identify cells that may contain RNA from more than one original cell.
    • Start with an estimated doublet rate of 7.5%-10%  (0.075 to 0.1) of total cell count.
  • Execution:
    • Click Detect Doublet button to start the analysis (Fig. 1.5a).
  • Outputs:
    • UMAP or t-SNE Plot: Shows singlet and doublet cells, color-coded by cluster, sample, or group (Fig. 1.5b-e).
    • Summary Table: Counts of singlet and doublet cells (Fig. 1.5f).
  • Remove or keep doublets:
    • Users can choose to Keep or Remove Doublets (Fig. 1.5g). Updated plots and tables reflect the selection (Fig. 1.5h-q).

1.6. Marker Identification

  • Identify markers in all clusters (FindAllMarkers):
    • Customizable parameters: (Fig. 1.6a)
      • Minimum cell percentage (min.pct) to specify the minimum fraction of cells in which a gene is expressed (Default: 0.25, Min: 0.01, Max: 1.0).
      • Log fold-change threshold (logfc.threshold) to filter markers based on expression magnitude (Default: 0.25, Min: 0.01, Max: ∞).
      • Statistical test options (test.use), including Wilcoxon rank sum (wilcox), Wilcoxon-Limma hybrid (wilcox_limma), binomial (bimod), ROC, t-test, likelihood ratio test (LR), and MAST.
      • Positive markers only (only.pos), with options for yes or no, to focus on upregulated genes in the target cluster.
      • When using SCTransform for normalization, out tool uses PrepSCTFindMarkers preps the data for accurate differential testing by adjusting the SCT assay, making results more reliable for FindMarkers and FindAllMarkers.
    • Output: Heatmap of the top 5 genes per cluster (Fig. 1.6b), helping users visualize the distinguishing genes for each cluster and Summary table of markers or expressed genes (Fig. 1.6c).
  • Marker identification in one specific cluster or between two clusters (FindMarkers):
    • Identifies markers for one cluster against another or against all other clusters.
    • Includes all the customizable parameters noted above, enabling targeted cluster comparison with refined criteria.
    • Output: A table format displaying the expressed genes for the specified clusters, ideal for in-depth comparisons.
  • Conserved marker identification for one vs. all cluster or between two clusters (FindConservedMarkers):
    • Finds markers conserved across groups (e.g., conditions) for a cluster, or conserved markers between two specific clusters.
    • Utilizes the same customizable parameters for consistency across comparisons.
    • Output: A table format with expressed genes, providing insights into markers consistently expressed across groups or clusters.

1.7. Cell Type Prediction

  • ScType:
    • Predefined Tissue Types: Users can select from 15 tissue types, including: Adrenal, Brain, Eye, Heart, Immune, Intestine, Kidney, Liver, Lung, Muscle, Pancreas, Placenta, Spleen, Stomach, Thymus.
    • Tissue Classification: Automatically classifies the cells based on the selected tissue type.
  • SingleR:
    • Reference Datasets: Users can use reference datasets such as: Human Primary Cell Atlas, Blueprint/ENCODE, Mouse RNA-seq, Immunological Genome Project, Database of Immune Cell Expression/eQTLs/Epigenomics, Novershtern Hematopoietic data, Monaco immune data.
    • Prediction: Predicts the cell types based on these well-known reference datasets.
  • GPTCelltype:
    • GPT Models: Utilizes various GPT models, including: GPT-5, GPT-5-mini, GPT-5-nano, GPT-4, GPT-4-turbo, GPT-4o-mini, GPT-4o, ChatGPT-4o-latest, GPT-3.5-turbo, GPT-3.5-turbo.
    • Gene Requirements: Requires a minimum number of top genes for accurate prediction.
    • Availability: Available via the web platform. To use it locally, users need to update their API key by setting Sys.setenv(OPENAI_API_KEY = 'your_openai_API_key') in the global.R file.
  • Own Cell Labels:
    • User-Defined Labels: Users can manually input their own cell type labels for each cluster.
    • Cluster Grouping: If multiple clusters need the same label, users should provide the same label name for those clusters.
  • UMAP/t-SNE Labels:
    • Label display options: Users can choose to show or hide cell type labels in the UMAP or t-SNE plots.
  • Execution:
    • Click Detect cell type button to start the analysis (Fig. 1.7a).
  • Output:
    • Plot: Generates an image plot showing the predicted cell types (Fig. 1.7b,c).
    • Summary Table: Provides a summary table with the predicted cell types and associated scores (Fig. 1.7d).

1.8. Cluster-Based Plots

  • Gene Selection:
    • Top Genes: Users can select the top features or genes (from 2 to 10).
    • Custom Genes: Users may also input custom gene names by selecting them from a drop-down menu (list of genes) and entering the desired gene names as a comma-separated list.
  • Plot Types:
    • Multiple visualization formats are available, including Dot Plot, Violin Plot, Ridge Plot, and Feature Plot. For Dot Plot, Violin Plot, and Ridge Plot, users can adjust parameters to visualize the plots for either all Seurat clusters or selected specific clusters.
  • Grouping and Splitting:
    • Group by: Users can organize the data by Seurat clusters or labels generated from previous cell type prediction steps.
    • Split by: If multiple samples are present, plots can be split by condition or sample to compare expression patterns across groups.
  • Execution:
    • Click Generate plots button to start the analysis (Fig. 1.8a).
  • Output:
    • Plot: The user receives one of the chosen plot formats (violin plot, dot plot, feature plot or ridge plot) (Fig. 1.8b-e).
    • Summary Tables: The tool generates tables showing marker gene cell counts and cell proportions, providing an additional layer of quantitative insight (Fig. 1.8f).

1.9. Condition-Based Analysis

  • Group Selection:
    • Users can compare gene expression between two conditions by selecting one group per dropdown menu.
  • Customizable Parameters:
    • Minimum Cell Percentage (min.pct): Sets the minimum fraction of cells in which a gene must be expressed (Default: 0.25, Min: 0.01, Max: 1.0).
    • Log Fold-Change Threshold (logfc.threshold): Filters markers by expression magnitude (Default: 0.25, Min: 0.01, Max: ∞).
    • Statistical Tests (test.use): Users can choose from various methods, including Wilcoxon rank sum, Wilcoxon-Limma hybrid, binomial, ROC, t-test, likelihood ratio test, and MAST.
    • Positive Markers Only (only.pos): Option to display only upregulated genes in the target cluster.
  • Visualization Options:
    • Multiple formats are available, including: Dot Plot, Violin Plot, Ridge Plot, Feature Plot, Volcano Plot
  • Grouping:
    • Group By: Users can group data by Seurat clusters or predicted cell type labels.
    • Number of Features: Allows display of a specific number of up- and down-regulated genes (e.g., 15).Users may also input custom gene names by selecting them from a drop-down menu (list of genes).
  • Execution:
    • Click Submit button to start the analysis (Fig. 1.9a).
  • Output:
    • Plot: The users receives the chosen plot type, providing visual comparison (Fig. 1.9b-f).
    • Summary Tables: Table contains the differentially expressed genes between the slected groups (Fig. 1.9g).
    • This setup enables users to conduct detailed comparisons between conditions, facilitating insights into differential gene expression and cellular responses.

Subclustering

In ScRDAVis, users can further explore specific clusters of interest by performing subclustering analysis. This feature allows for a more granular examination of cell populations within one or multiple clusters, based on the user’s selection. Similar output were generated as like as above for the selected cluster(s) or cell type(s).

2.1. Cluster Selection:

  • Users can choose one or multiple clusters for subclustering.
  • Clusters can be selected based on Seurat clusters or previously predicted annotation labels.
  • Users can select genes of interest to extract cells for reclustering (positive selection); for example, FCN1 or multiple genes like FCN1,PSAP. When specifying multiple genes, separate each gene name with a comma.
  • Exclude genes expressed in cells and perform the analysis using the remaining cells (negative selection); for example, FCN1 or multiple genes like FCN1,PSAP. When specifying multiple genes, separate each gene name with a comma.

2.2. Subclustering Analysis Steps:

  • The subclustering process mirrors the main workflow, with dedicated tabs for each stage, allowing users to perform the following analyses on the selected cluster(s):
    • Cell Stats: Overview of cell metrics within the selected clusters, including minimum gene and cell expression thresholds.
    • Normalization and PCA Analysis: Options to normalize and vizualize the PCA data for secific cluster(s). (use the same method used in the above menu)
    • Clustering: Allows users to re-cluster cells within the subclusters, providing insights into finer subpopulations.
    • Marker Identification: Users can identify markers specific to subclusters, with options to customize parameters for marker detection.
    • Cell Type Prediction: Provides options to predict cell types within the selected subclusters using ScType, SingleR, GPTCelltype, or custom labels.
    • Cluster-Based Plots: Users can visualize gene expression within subclusters through Dot, Violin, Ridge, or Feature plots.
    • Condition-Based Analysis: Enables differential expression comparisons within subclusters, providing insights into condition-specific gene expression patterns.

3. Correlation Network Analysis

ScRDAVis includes Cluster-Based Correlation Analysis using the genesorteR package. This feature helps users explore relationships and interactions among genes within specific clusters by calculating pairwise correlations.
  • Prerequisites:
    • Correlation Network Analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
    • Users can choose to conduct analysis on: Seurat clusters or predicted cell type labels from single, multiple, or subcluster analyses. 
  • Correlation Methods:
    • Pearson
    • Spearman
    • Kendall
  • Execution:
    • Click the Cluster correlation network button to run the analysis based on selected parameters (Fig. 3a).
  • Output:
    • Correlation Heatmap: Displays the correlation values between genes within clusters in a matrix format (Fig. 3b).
    • Correlation Network Plot: Depicts the relationships between genes as a network, highlighting strongly correlated pairs (Fig. 3c).
    • Summary Table: With the complete correlation matrix for detailed analysis (Fig. 3d)
This analysis provides a deeper understanding of gene co-expression and interaction patterns within clusters, aiding in the identification of significant biological relationships.

4. GO Term Analysis

ScRDAVis provides integrated Gene Ontology (GO) term analysis using the clusterProfiler package, enabling users to explore biological functions, molecular mechanisms, and cellular components related to gene expression patterns in single, multiple, or subcluster analyses. Here’s how users can conduct GO analysis:.
  • Prerequisites:
    • GO analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
  • Input Options:
    Users can choose to conduct GO analysis on:
    • Seurat clusters or predicted cell type labels from single, multiple, or subcluster analyses.
    • Users can use one or multiple clusters at a time, with an adjustable parameter (p_val_adj < 0.05) for significant results.
    • A custom list of genes: Users can manually enter gene names (comma-separated) to investigate GO terms for genes of specific interest.
  • Organisms Supported for GO Term Mapping:
    ScRDAVis supports GO analysis for five organisms, mapping gene IDs to gene symbols:
    • Human: org.Hs.eg.db
    • Mouse: org.Mm.eg.db
    • Rat: org.Mmu.eg.db
    • Pig: org.Ss.eg.db
    • Rhesus: org.Rn.eg.db
  • GO Term Analysis Parameters:
    Ontology Method: Users can choose to focus on specific biological aspects or all three:
    • Biological Process (BP)
    • Molecular Function (MF)
    • Cellular Component (CC)
    • All: To analyze across all three categories.
  • Adjustable Parameters:
    • pAdjustMethod: Select the method to adjust for multiple testing (Default: BH).
    • pvalueCutoff: Set a cutoff for p-values (Default: 0.05, Min: 0, Max: 1).
    • qvalueCutoff: Define a q-value threshold for significance (Default: 0.2, Min: 0, Max: 1).
    • Minimum Size of Genes: Minimum number of genes required in a GO term (Default: 10, Min: 1, Max: 500).
    • Maximum Size of Genes: Maximum number of genes in a GO term (Default: 500, Min: 10, Max: 5000).
    • Plot Type: Choose a visualization format (Dot Plot, Bar Plot, Net and UpSetPlot).
    • Number of Categories to Plot: Select the number of categories to display (Default: 10, Min: 1, Max: 50).
  • Execution:
    • Click the GO Term button to run the analysis based on selected parameters (Fig. 4a).
  • Output
      The GO Term analysis provides:
    • Plots: Dot plot, Bar plot, UpSet plot and Network plot for the selected ontology categories (Fig. 4b-e).
    • Summary Table: A downloadable table summarizing the GO terms, adjusted p-values, and other relevant metrics, allowing users to interpret and visualize biological insights (Fig. 4f).
This GO term analysis feature in ScRDAVis provides users with an accessible, visually informative, and comprehensive view of gene functionality across clusters and conditions, enabling enhanced biological interpretation of scRNA-seq data.

5. Pathway Analysis

ScRDAVis offers pathway analysis through KEGG and Reactome databases using the clusterProfiler and ReactomePA packages. Users can gain insights into biological pathways associated with specific gene expression profiles from single, multiple, or subcluster analyses.
  • Prerequisites:
    • Pathway analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
  • Input Options:
    Pathway analysis can be performed on:
    • Seurat clusters or predicted cell type labels from single, multiple, or subcluster analyses.
    • One or multiple clusters simultaneously, with results filtered by an adjusted p-value (p_val_adj < 0.05).
    • A custom list of genes: Users can input specific gene names (comma-separated) to focus on pathways for genes of interest.
  • Organisms Supported for Pathway Mapping:
    ScRDAVis enables pathway mapping for multiple organisms:
    • KEGG Pathways: Supports human (org.Hs.eg.db), mouse (org.Mm.eg.db), and rat (org.Mmu.eg.db) for mapping gene IDs to symbols.
    • Reactome Pathways: Available for human, mouse, and rat.
  • Pathway Analysis Parameters:
    • pAdjustMethod: Choose a method for multiple testing correction (Default: BH).
    • pvalueCutoff: Set a threshold for p-values (Default: 10, Min: 1, Max: 500).
    • qvalueCutoff: Define a q-value cutoff for pathway significance.
    • Minimum Size of Genes: Minimum gene count per pathway (Default: 10, Min: 1, Max: 500).
    • Maximum Size of Genes: Maximum gene count per pathway (Default: 500, Min: 10, Max: 5000).
    • Plot Type: Choose visualization format (Dot Plot, Bar Plot, Net and UpSetPlot Plot).
    • Number of Pathways to Plot: Select the number of pathways to display (1 to 50).
  • Execution:
    • Click the Pathway Analysis button to run the analysis with the selected parameters (Fig. 5a).
  • Output
    Pathway analysis results include:
    • Visualizations: Dot plot, Bar plot, UpSet plot and Network plot, showcasing significant pathways (Fig. 5b-e).
    • Summary Table: A downloadable table with pathway details, adjusted p-values, and other metrics for further exploration and interpretation (Fig. 5f).
The pathway analysis functionality in ScRDAVis helps users understand the biological processes and signaling pathways linked to gene expression profiles across clusters and conditions, providing a deep functional understanding of their ScRNA-seq data.

6. GSEA Analysis

The Gene Set Enrichment Analysis (GSEA) feature in ScRDAVis leverages the fgsea package to identify enriched pathways using ranked gene lists, such as those generated from differential expression analysis. This allows users to assess pathway-level expression changes and gain insights into functional changes across clusters or conditions.
  • Prerequisites
    • GSEA analysis can be conducted following single or multiple samples analysis or subclustering analysis up to cell type prediction.
  • Input Options
    GSEA analysis can be performed on:
    • Seurat clusters or predicted cell type labels derived from single, multiple, or subcluster analyses.
    • Single or multiple clusters simultaneously, with results filtered by an adjusted p-value (p_val_adj < 0.05).
  • Organisms and Gene Sets Supported
    • Organisms: Human and mouse gene ID mapping.
    • Gene Set Categories: Using the msigdbr package, which provides gene sets compatible with fgsea from the Molecular Signatures Database (MSigDB). Available categories include:
      • Hallmark Gene Sets (H)
      • Positional Gene Sets (C1)
      • Curated Gene Sets (C2)
      • Regulatory Target Gene Sets (C3)
      • Computational Gene Sets (C4)
      • Ontology Gene Sets (C5)
      • Oncogenic Signature Gene Sets (C6)
      • Immunologic Signature Gene Sets (C7)
      • Cell Type Signature Gene Sets (C8)
  • GSEA Analysis Parameters:
    • scoreType: Define the scoring method for pathway enrichment (Default: std).
    • Minimal Size of Genes: Minimum number of genes in a gene set (Default: 15, Min: 5, Max: 500).
    • Maximal Size of Genes: Maximum number of genes in a gene set (Default: 50, Min: 15, Max: 5000).
    • Number of Permutations: Control the precision of p-value calculations (Default: 100, Min: 10, Max: 10000).
    • Plot Type: Choose visualization format (GSEA Plot, PlotGseaTable, Bar Plot).
    • Number of Significant Pathways to Plot: Select the number of pathways to display (Default: 10, Min: 1, Max: 50).
  • Execution:
    • Click the GSEA Analysis button to run the analysis with the selected parameters (Fig. 6a).
  • Output
    The GSEA analysis provides:
    • Visualizations:
      • GSEA Plot: Displays the enrichment score curve (Fig. 6b).
      • PlotGseaTable: Shows enriched pathways and their enrichment scores (Fig. 6c).
      • Bar Plot: Highlights top significant pathways (Fig. 6d).
    • Summary Table: A downloadable table of enriched pathways, adjusted p-values, and scores. If the users selects the top 10 significant pathways, the tool displays the top 5 upregulated and top 5 downregulated pathways (Fig. 6e).
GSEA analysis in ScRDAVis offers a powerful method for understanding pathway-level dynamics, supporting biological interpretation of ScRNA-seq data through visual and quantitative assessments of enriched pathways.

7. Cell-Cell Communication Analysis

ScRDAVis integrates CellChat to enable users to analyze cell-cell communication within single or multiple samples, as well as for subclusters. This analysis identifies potential ligand-receptor interactions, allowing users to explore how different cell types or clusters communicate based on gene expression patterns.
  • Input Options
    • Source of Input: Users can analyze cell-cell communication using Seurat clusters or predicted cell type labels generated from single, multiple, or subcluster analysis.
    • Organisms Supported: Human and mouse datasets are available for ligand-receptor interaction mapping.

7.1. Parameters for Cell-Cell Communication

  • Identify Over-Expressed Genes:
    • Threshold of Cell Expression Percentage: Minimum percentage of cells expressing the genes (Default: 0, Min: 0, Max: 100).
    • Log Fold Change Threshold: Minimum log fold-change required for genes to be considered over-expressed (Default: 0, Min: 0, Max: 10).
    • p-Value Threshold: Statistical significance threshold (Default: 0.05, Min: 0.0001, Max: 1).
  • Compute Communication Probability:
    • Expression Method: Choose how to compute the average expression per cell group (options: triMean, truncatedMean, thresholdedMean, median).
  • Filter Communication:
    • Minimum Cell Requirement: Minimum number of cells needed in each cell group to analyze cell-cell communication (Default: 0.05, Min: 0.0001, Max: 1).
  • Communication Pattern Identification:
    • Pattern k-Value: Defines the number of communication patterns to identify (Default: 2, Min: 2, Max: 20).
  • Label Option:
    • Show or hide labels in plots.
  • Execution:
    • Click to Cell-Cell communication analysis button to start the analysis (Fig. 7.1a).
  • Output for Cell-Cell Communication Analysis
    The analysis generates the following visual outputs:
    • Interaction Plots:
      • Counts and Weights/Strength: Displays the frequency and intensity of interactions among cell groups (Fig. 7.1b,c).
      • Interaction Heatmap: Shows interaction strengths across all clusters or cell types (Fig. 7.1d).
      • Incoming and Outgoing Signaling Patterns: Visualizes communication patterns for target and secreting cells (Fig. 7.1e,f).
    • Interaction Table: Includes source and target cell types, ligand-receptor pairs, and interaction scores (Fig. 7.1g).

7.2. Analyzing Specific Signaling Pathways

For a more focused analysis, users can select a specific signaling pathway from a drop-down menu, enabling detailed visualization of the chosen pathway (Fig. 7.2a).
  • Outputs for Specific Signaling Pathway:
    • Circle Plot: Visualizes interactions among cell groups by counts (Fig. 7.2b).
    • Chord Plot: Depicts connections between cell types via ligand-receptor pairs (Fig. 7.2c).
    • Interaction Heatmap: Interaction strengths among clusters for the specific pathway (Fig. 7.2d).
    • Bubble Plot and Bar Plot: Display interaction intensity for the selected pathway (Fig. 7.2e).
    • Hierarchy Plot: Shows the hierarchical organization of cell types and their interactions (Fig. 7.2f).
    • Bar Plot: Shows the network analysis contribution in bar plot (Fig. 7.2g).
    • Violin Plot: Shows expression of pathway-associated genes (Fig. 7.2h).
    • Signaling Pathway Table: Contains source, target, ligand, receptor, and interaction details for the specific pathway (Fig. 7.2i).
This suite of tools and visualizations enables detailed exploration of cell communication, allowing users to interpret inter-cellular signaling dynamics in ScRNA-seq datasets with biological relevance.

8. Trajectory and Pseudotime Analysis

ScRDAVis integrates Monocle3 for trajectory and pseudotime analysis, allowing users to study the dynamic progression of cells over pseudotime and identify genes with functional changes along this trajectory.
  • Preparing for Trajectory and Pseudotime Analysis
    • Prerequisites: Users must complete analysis up to the cell type prediction step in either single or multiple sample analysis, or subclustering analysis.
    • Input Format: The tool automatically converts the Seurat object to Monocle3 format, and users can choose between Seurat clusters or predicted cell type labels as input.
    • UMAP Requirement: UMAP should be used in clustering steps for compatibility with Monocle3.

8.1. Parameters for Learning Trajectory

  • Partitioning Options:
    • use_partition: Toggle to specify partitions for different groups.
    • close_loop: Set to close or open the trajectory loop.
    • label_groups_by_cluster: Labels cell groups by cluster.
    • label_branch_points, label_roots, label_leaves: Allows labeling of key points on the trajectory (branches, roots, leaves).
  • Execution:
    • Once parameters are set, users can click the Learn Trajectory button to generate the trajectory plot (Fig. 8a).
  • Output:
    • Trajectory Plot: Displays cell progression in trajectory space, providing insight into the cellular development path (Fig. 8b).

8.2. Pseudotime Ordering of Cells

  • Parameters:
    • Root Cluster Selection: Users must select one cluster to serve as the root cluster, marking the starting point of pseudotime.
    • Labeling Options: Parameters include options to label groups by clusters, as well as marking branch points, roots, and leaves.
  • Execution:
    • Click to Submit button to start the analysis (Fig. 8c).
  • Output:
    • Pseudotime Plot: Cells are arranged by pseudotime, showing the developmental trajectory (Fig. 8d).
    • Bar Chart: Cells are ordered based on both Seurat clusters and Monocle3 pseudotime (Fig. 8e).

8.3. Identifying Genes with Functional Changes in Pseudotime

To explore gene expression dynamics along the pseudotime trajectory, users can analyze gene expression changes:
  • Parameters:
    • Neighbor Graph Selection: Users can select between Principal Graph or K-Nearest Neighbor (KNN) to model gene expression changes.
  • Execution:
    • Click Find Genes Button: Begins the identification of genes whose functions vary along pseudotime (Fig. 8f).
  • Output:
    • Pseudotime Plot of Cells: Visual representation of cells in pseudotime with associated gene expression (Fig. 8g).
    • Summary Table: Lists genes with dynamic functional changes along pseudotime (Fig. 8h).

8.4. Plotting Gene Expression in Pseudotime

Users can visualize specific genes to observe their expression patterns over pseudotime: (Fig. 8i)
  • Gene Selection:
    • Top Genes: By default, the tool plots the top 5 genes with dynamic changes, adjustable between 1 to 10 genes.
    • Custom Genes: Users can specify a custom list of genes (comma-separated) to plot in pseudotime.
  • Output:
    • Creates a feature plot to display gene expression across cells in pseudotime (Fig. 8j).
This functionality helps users analyze and visualize gene dynamics, offering insights into cellular progression and identifying key genes in developmental pathways.

9. Co-Expression and TF Analysis

9.1. Co-Expression Network Analysis

ScRDAVis incorporates co-expression network analysis for ScRNA-seq data using the hdWGCNA package. This feature enables users to identify gene modules and their relationships in Seurat clusters or predicted cell type labels.
  • Prerequisites:
    • Co-expression network analysis becomes available after completing single or multiple samples analysis or subclustering analysis up to cell type prediction.
    • User can use one cluster at a time.
  • Metacell Construction:
    Aggregates small groups of similar cells from the same biological sample. Uses the k-Nearest Neighbors (KNN) algorithm to group similar cells and compute a metacell gene expression matrix.
    • Parameters:
      • k: Number of nearest neighbors for aggregation (Default: 10, Min: 1, Max: 100).
      • min_cells: Minimum number of cells in a group to construct metacells (Default: 10, Min: 5, Max: 100).
      • max_shared: Maximum number of cells shared across two metacells (Default: 15, Min: 1, Max: 100).
      • target_metacells: Maximum number of target metacells to construct (Default: 1000, Min: 50, Max: 5000).
  • Co-Expression Network Construction:
    Builds networks with customizable parameters:
    • softpower: Determines the scale-free topology for constructing networks.
    • networkType: Options include signed, unsigned, or signed hybrid.
  • Module Eigengenes and Connectivity:
    • Scales data using selectable models: linear, poisson, or negbinom.
    • Allows Harmony batch correction for harmonized module eigengenes (hMEs), selectable by the users.
  • Hub Gene Extraction:
    • Extracts the top N hub genes for selected modules, aiding in the identification of key regulators (Default: 5, Min: 1, Max: 50).
  • Execution:
    • Click the WGCNA Analysis button initiates co-expression network analysis (Fig. 9a).
  • Outputs:
    Few plots were not available in image files format so we have provided those as pdf files.
    • Soft Power Plots: Visualizes the selection of the optimal soft power parameter for network construction (Fig. 9.1b).
    • Co-Expression Network Visualization: Displays modules with distinct colors representing gene clusters (Fig. 9.1c).
    • Ranked Genes in Modules: Provides a list of genes ranked by module membership (kME) (Fig. 9.1d).
    • Feature Plots: Highlights the expression of modules or specific genes (Fig. 9.1e).
    • Module Relationships Plots: Correlation between modules based on harmonized module eigengenes (hMEs) (Fig. 9.1f).
    • Seurat DotPlot with Modules: Displays module-specific gene expression across clusters (Fig. 9.1g).
    • Individual Module Network Plots: Visualizes the gene network for specific modules (Fig. 9.1h).
    • Module UMAP Plots: Maps modules onto UMAP visualizations for spatial context (Fig. 9.1i).
    • Summary Table: Soft Power Table: Lists optimal soft power values (Fig. 9.1j). Module Assignment Table: Details gene-module relationships with colors (Fig. 9.1k). Hub Genes Table: Identifies top hub genes per module (Fig. 9.1l).
This functionality provides a robust framework for uncovering intricate co-expression patterns and identifying key drivers in single-cell datasets.

9.2. Transcription Factor Regulatory Network Analysis

Transcription Factor (TF) Regulatory Network Analysis in ScRDAVis employs the hdWGCNA package to construct and analyze TF regulatory networks based on ScRNA-seq data. This feature allows users to identify gene modules and investigate TF-mediated regulation within clusters or predicted cell type labels.
  • Prerequisites:
    • Complete single or multiple sample analysis or subclustering analysis, including cell type prediction.
    • Analysis is performed one cluster at a time.
  • TF Regulatory Network Construction:
    • TF Binding Motif Information:
      • Human: EnsDb.Hsapiens.v86, BSgenome.Hsapiens.UCSC.hg38.
      • Mouse: EnsDb.Mmusculus.v79, BSgenome.Mmusculus.UCSC.mm10.
      • Motifs from the JASPAR 2020 database for multiple species.
    • Machine Learning Model:
      • XGBoost: used to model TF regulation for each gene with
      • max_depth : Maximum depth of a tree (Default: 1, Min: 1, Max: 10)
      • eta : Step size shrinkage used in update to prevent overfitting (Default: 0.1, Min: 0.01, Max: 1)
      • alpha: L1 regularization term on weights (Default: 0.5, Min: 0, Max: 1)
    • TF Regulon Strategy:
      • Strategy A selects the top TFs for each gene by default
      • reg_thresh : Threshold for regulatory score) (Default: 0.01, Min: 0, Max: 1)
      • n_tfs : The number of top TFs to keep for each gene (Default: 10, Min: 1, Max: 50)
    • Regulon Expression Signatures:
      • Positive correlation: cor_thresh = 0.05 (Default: 0.05, Min: 0, Max: 1). Threshold for TF-gene correlation for genes to be included in the positive regulon score
      • Negative correlation: cor_thresh = -0.05 (Default: -0.05, Min: -1, Max: 0). threshold for TF-gene correlation for genes to be included in the negative regulon score
  • Execution:
    • Click Transcription factor analysis button to start the analysis (Fig. 9.2.1a).
  • Output and Visualization:
    • Module Regulatory Network Plots: Positive, negative, and combined regulatory network plots. Visualize TF-to-target relationships categorized by regulatory effects (Fig. 9.2.1b-e).
    • Regulated Scores Table: Comprehensive list of TFs and their downstream targets (Fig. 9.2.1f).
  • TF-Specific Visualizations:
    Unravel regulatory mechanisms governing gene expression in cellular contexts. Identify key transcription factors and their target genes for hypothesis generation and validation. Explore positive and negative regulatory effects within gene modules.
    • Select a TF from a dropdown menu to generate specific plots: (Fig. 9.2.2a)
  • Outputs:
    • UMAP Plots: Spatial distribution of the TF (Fig. 9.2.2b).
    • Bar Plots: Contribution of the TF across modules (Fig. 9.2.2c).
    • Network Plots: Positive, negative, and combined networks, with primary, secondary and tertiary targets (Fig. 9.2.2d-f).
This functionality provides a comprehensive view of transcriptional regulation in ScRNA-seq data, enabling detailed exploration of TF-driven cellular processes.

R Session


Loading...