Resources
LS-GKM: A new gkm-SVM software for large-scale datasets
gkm-SVM is a popular machine-learning method that predicts cis-regulatory elements (CREs) from their DNA sequence. LS-GKM is an improvement on gkm-SVM, offering increased scalability and advanced features based on gapped k-mer kernels.
gkmQC: gapped k-mer-SVM quality check and optimization
gkmQC is a sequence-based computational tool for assessing and refining the quality of chromatin accessibility data using gkm-SVM. It uses the overall “predictability” of the peaks/regions as a metric of the data quality. It trains a support vector classifier (SVC) using gapped-kmer kernels (Ghandi et al., 2014; Lee, 2016), and learns DNA sequence features predictive of regulatory element activities. It can also be used to optimize a peak calling threshold, which is particularly useful for rare cell types from single-cell ATAC-seq data.
MTSA: MPRA Tag Sequence Analysis
MTSA is a sequence-based analysis for estimating tag sequence effects on gene expression in massively parallele reporter assay (MPRA) experiment. It trains a support vector regression (SVR) using gapped-kmer kernels (gkm-kernels) (Ghandi et al., 2014; Lee, 2016), and learns sequence features that modulate gene expressions. We also introduce the users to a basic tutorial of running MTSA: https://github.com/chlee-tabin/mtsa-tutorial