| Title: | Tidy, ggplot2-Native Visualization for Genomic Variants |
|---|---|
| Description: | A simple, opinionated toolkit for visualizing genomic variant data using a ggplot2-native grammar. Accepts VCF files or plain data frames and produces publication-ready lollipop plots, consequence summaries, mutational spectrum charts, and cohort-level comparisons with minimal code. Designed for both wet-lab biologists and experienced bioinformaticians. |
| Authors: | Joash Joshua Ayo [aut, cre] (ORCID: <https://orcid.org/0009-0007-1642-0172>) |
| Maintainer: | Joash Joshua Ayo <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-31 07:36:00 UTC |
| Source: | https://github.com/josh45-source/ggvariant |
A simple, opinionated toolkit for visualizing genomic variant data using a ggplot2-native grammar. Accepts VCF files or plain data frames and produces publication-ready lollipop plots, consequence summaries, mutational spectrum charts, and cohort-level comparisons with minimal code. Designed for both wet-lab biologists and experienced bioinformaticians.
Maintainer: Joash Joshua Ayo [email protected] (ORCID)
Useful links:
Report bugs at https://github.com/josh45-source/ggvariant/issues
gvf objectIf you already have variant data in a data.frame (e.g. exported from
Excel, a database, or another tool), use this function to prepare it for
use with ggvariant plotting functions.
coerce_variants( x, chrom = "chrom", pos = "pos", ref = "ref", alt = "alt", consequence = "consequence", gene = "gene", sample = "sample" )coerce_variants( x, chrom = "chrom", pos = "pos", ref = "ref", alt = "alt", consequence = "consequence", gene = "gene", sample = "sample" )
x |
A |
chrom |
Column name containing chromosome (default |
pos |
Column name containing position (default |
ref |
Column name containing reference allele (default |
alt |
Column name containing alternate allele (default |
consequence |
Column name containing variant consequence annotation,
e.g. |
gene |
Column name containing gene symbol (default |
sample |
Column name containing sample identifier (default |
A gvf object.
df <- data.frame( chromosome = c("chr1", "chr1", "chr7"), position = c(100200, 100350, 55249071), ref_allele = c("A", "G", "C"), alt_allele = c("T", "A", "T"), variant_class = c("missense_variant", "synonymous_variant", "missense_variant"), hugo_symbol = c("GENE1", "GENE1", "EGFR"), tumor_sample = c("S1", "S2", "S2") ) variants <- coerce_variants(df, chrom = "chromosome", pos = "position", ref = "ref_allele", alt = "alt_allele", consequence = "variant_class", gene = "hugo_symbol", sample = "tumor_sample" )df <- data.frame( chromosome = c("chr1", "chr1", "chr7"), position = c(100200, 100350, 55249071), ref_allele = c("A", "G", "C"), alt_allele = c("T", "A", "T"), variant_class = c("missense_variant", "synonymous_variant", "missense_variant"), hugo_symbol = c("GENE1", "GENE1", "EGFR"), tumor_sample = c("S1", "S2", "S2") ) variants <- coerce_variants(df, chrom = "chromosome", pos = "position", ref = "ref_allele", alt = "alt_allele", consequence = "variant_class", gene = "hugo_symbol", sample = "tumor_sample" )
Access the built-in colour palettes used by ggvariant plot functions.
gv_palette(type = c("consequence", "spectrum", "domain"), n = 8L)gv_palette(type = c("consequence", "spectrum", "domain"), n = 8L)
type |
One of |
n |
Integer. For |
A named character vector of hex colour codes.
gv_palette("consequence") gv_palette("spectrum")gv_palette("consequence") gv_palette("spectrum")
Summarises variant consequences (e.g. missense, frameshift, synonymous) across one or more samples, producing a stacked or grouped bar chart.
plot_consequence_summary( variants, samples = NULL, group_by = c("consequence", "gene"), top_n = 10L, position = c("stack", "fill", "dodge"), palette = NULL, flip = FALSE, interactive = FALSE )plot_consequence_summary( variants, samples = NULL, group_by = c("consequence", "gene"), top_n = 10L, position = c("stack", "fill", "dodge"), palette = NULL, flip = FALSE, interactive = FALSE )
variants |
A |
samples |
Character vector of sample names to include. |
group_by |
|
top_n |
Integer. For |
position |
|
palette |
Named character vector of colours. |
flip |
Logical. If |
interactive |
Logical. Returns a |
A ggplot object.
vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) # Consequence counts per sample plot_consequence_summary(variants) # Proportional bars plot_consequence_summary(variants, position = "fill") # Top 10 genes coloured by consequence plot_consequence_summary(variants, group_by = "gene", top_n = 10)vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) # Consequence counts per sample plot_consequence_summary(variants) # Proportional bars plot_consequence_summary(variants, position = "fill") # Top 10 genes coloured by consequence plot_consequence_summary(variants, group_by = "gene", top_n = 10)
Draws a lollipop (stem-and-dot) diagram showing variant positions along a gene, coloured by consequence. Optionally overlays protein domain annotations when domain boundaries are supplied.
plot_lollipop( variants, gene = NULL, domains = NULL, color_by = "consequence", palette = NULL, protein_length = NULL, stack_dots = TRUE, title = NULL, interactive = FALSE )plot_lollipop( variants, gene = NULL, domains = NULL, color_by = "consequence", palette = NULL, protein_length = NULL, stack_dots = TRUE, title = NULL, interactive = FALSE )
variants |
A |
gene |
Character. Gene to filter on. If |
domains |
A |
color_by |
Column name to use for dot colour. Default |
palette |
Named character vector of colours for each consequence/sample
category. |
protein_length |
Integer. Total length of the protein in amino acids,
used to scale the x-axis. If |
stack_dots |
Logical. If |
title |
Character. Plot title. Defaults to the gene name. |
interactive |
Logical. If |
A ggplot object (or a plotly object when interactive = TRUE).
vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) # Basic lollipop for the most-mutated gene plot_lollipop(variants) # Specific gene plot_lollipop(variants, gene = "TP53") # With domain annotation tp53_domains <- data.frame( name = c("Transactivation", "DNA-binding", "Tetramerization"), start = c(1, 102, 323), end = c(67, 292, 356) ) plot_lollipop(variants, gene = "TP53", domains = tp53_domains)vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) # Basic lollipop for the most-mutated gene plot_lollipop(variants) # Specific gene plot_lollipop(variants, gene = "TP53") # With domain annotation tp53_domains <- data.frame( name = c("Transactivation", "DNA-binding", "Tetramerization"), start = c(1, 102, 323), end = c(67, 292, 356) ) plot_lollipop(variants, gene = "TP53", domains = tp53_domains)
Plots the single-base substitution (SBS) spectrum — the relative frequency of each of the 6 substitution classes (C>A, C>G, C>T, T>A, T>C, T>G) — optionally broken down by trinucleotide context.
plot_variant_spectrum( variants, sample = NULL, context = FALSE, genome = NULL, facet_by_sample = FALSE, palette = NULL, normalize = TRUE, interactive = FALSE )plot_variant_spectrum( variants, sample = NULL, context = FALSE, genome = NULL, facet_by_sample = FALSE, palette = NULL, normalize = TRUE, interactive = FALSE )
variants |
A |
sample |
Character. Sample name to filter on. |
context |
Logical. If |
genome |
A |
facet_by_sample |
Logical. If |
palette |
Named character vector with names matching substitution
classes ( |
normalize |
Logical. If |
interactive |
Logical. Returns a |
A ggplot object.
vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) # Basic 6-class SBS spectrum plot_variant_spectrum(variants) # Faceted by sample plot_variant_spectrum(variants, facet_by_sample = TRUE)vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) # Basic 6-class SBS spectrum plot_variant_spectrum(variants) # Faceted by sample plot_variant_spectrum(variants, facet_by_sample = TRUE)
Parses a standard VCF (v4.x) file and returns a tidy data.frame (a
gvf object) that all ggvariant plotting functions accept. For users
who already have variant data in a plain data.frame or tibble, see
coerce_variants().
read_vcf(path, samples = NULL, pass_only = TRUE, info_fields = NULL)read_vcf(path, samples = NULL, pass_only = TRUE, info_fields = NULL)
path |
Path to a |
samples |
Character vector of sample names to retain. |
pass_only |
Logical. If |
info_fields |
Character vector of INFO field names to expand into
columns. |
A gvf (genomic variant frame) — a data.frame with columns:
Chromosome (character)
Position (integer)
Reference allele
Alternate allele (multi-allelic sites are split into rows)
QUAL score (numeric)
FILTER field
Sample name (NA for single-sample VCFs without GT field)
Variant consequence if ANN/CSQ INFO field is present
Gene symbol if ANN/CSQ INFO field is present
coerce_variants(), plot_lollipop(), plot_consequence_summary()
vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) head(variants)vcf_file <- system.file("extdata", "example.vcf", package = "ggvariant") variants <- read_vcf(vcf_file) head(variants)
A clean, publication-ready theme based on theme_minimal. Applied
automatically by all ggvariant plot functions; export it to customise
further.
theme_ggvariant(base_size = 12, base_family = "")theme_ggvariant(base_size = 12, base_family = "")
base_size |
Base font size in pt. Default |
base_family |
Base font family. Default |
A ggplot2 theme object.
library(ggplot2) ggplot(mtcars, aes(mpg, wt)) + geom_point() + theme_ggvariant()library(ggplot2) ggplot(mtcars, aes(mpg, wt)) + geom_point() + theme_ggvariant()