Vasily V. Grinev
Candidate of Sciences in Biology, Associate Professor,
Scientific Head of the Sector of Human Molecular Genetics.
News | Curriculum Vitae | Research | Publications | Software | Presentations | Contacts
You can find more information about our software development activity on page https://github.com/VGrinev/transcriptome-analysis at GitHub repository.
It is a new high-level R function overlapJunctions for calculation of overlaps between reference and experimentally detected exon-exon junctions. An input file for function overlapJunctions may include output results from limma/diffSplice or JunctionSeq pipelines with data for differential usage of exon-exon junctions in two (or more) experimental conditions. Function returns an object of class list which include consolidated results of calculations.
It is a new high-level R function splDistance for consolidation of global statistics on splicing distances. An input file for function splDistance may include output results from limma/diffSplice or JunctionSeq pipelines with data for differential usage of exon-exon junctions in two (or more) experimental conditions. This function produces an object of new class splicingDistances that contains input data as well as all calculated statistics.
High-level R function for filtering out of reads with wonky CIGAR strings from BAM files. Current experimental version of function allow to work with only BAM (not SAM) files and only against two type of bad CIGAR: i) CIGAR op has zero length; ii) CIGAR M operator maps off end of reference. In addition, users can specify their own list of unwanted reads.
High-performance high-level R function for filtration of the Cufflinks (or similar ones) assembled transcripts. Filtration procedure includes the following steps: i) removing of unstranded transcripts; ii) removing of transcripts that match two different strands; iii) removing of records from non-canonical chromosomes; iv) removing of one-exon transcripts; v) removing too short transcripts; vi) removing of transcripts with too short exon(-s); vii) removing of transcripts with too short intron(-s) and viii) removing of transcripts with too low abundance. Each step is controlled by respective arguments. The results will be stored in a file of GTF/GFF format or as a local SQLite database.
Identification of PTC in transcripts
This is R code for fast annotation of transcripts with premature termination codons.
This is R code for easy and fast conversion of the standard (linear) transcriptional models of genes into directed acyclic weighed exon graphs. The code permits to i) transform GTF/GFF file with gene annotations into local SQLite database, ii) retrieve the metadata from GTF/GFF file (if there is any relevant information), iii) reconstruct exon graph (from TranscriptDb object) as a list of edges, iv) assign weights (from metadata) to the edges, v) convert of edges list into directed acyclic weighed exon graph (as object of class igraph) and vi) save exon graph as a list of edges with weights in tab-delimited TXT file.
This is consolidated R-based wrap for analysis of the differential RNA splicing with linear modeling. The pipeline includes the following basic steps: 1) loading of the primary counts matrix in R workspace; 2) filtering of the primary counts matrix; 3) wrapping of the counts matrix in a digital gene expression object; 4) estimation of the normalization (scaling) factors to calculate an effective size for each RNA-seq library using the “trimmed mean of M-values” normalization method; 5) performing of the voom normalisation and transformation of the counts data that show some degree of heteroscedasticity; 6) fitting of the linear models to the normalized and transformed counts; 7) analysis of the differential splicing; 8) visualization and inspection of the results (including Volcano plots); 9) consolidation and saving of the results of interest.
A set of R functions for identification of significant open reading frames in nucleotide sequences using multinomial model.
Subjunc-based alignment of the RNA-seq reads and identification of exon-exon junctions
This is R-based wrap for alignment of the RNA-seq reads and identification of exon-exon junctions with seed-and-vote approach described in “Liao Y., Smyth G. K., Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. // Nucleic Acids Res. – 2013 May 1;41(10):e108. doi: 10.1093/nar/gkt214”.
Analysis of the ChIP-seq data
This R-based code is designed to identification of significant peaks from ChIP-seq data. Pipeline includes indexing of reference genome, local alignment of the DNA-seq reads, peaks calling with a fully Bayesian hidden Markov model and Monte Carlo simulations, filtering of peaks against posterior probabilities and depth/coverage and saving of final results.
Analysis of the MALDI-TOF spectra
This R-based code is designed to assess the similarity of the MALDI-TOF spectra. Pipeline includes loading raw data from .mzXML file(-s) in R workspace, preprocessing every spectrum, the creation of a consolidated matrix of spectra, pairwise comparison of spectra by different approaches, calculation of the basic statistics, preparation and saving the numerical results in standard tab-delimited tables and plots. The similarity of the spectra is estimated using Pearson's r, Spearman's rho, Euclidean distances, principal component analysis and/or spectral angle mapper. Finally, samples can be clustered according to selected metric of spectra similarity.
CelNetAnalyzer is a simple-to-use Java-based software package for topological analysis of the large undirected cellular networks. This software is managed through a graphical user interface and it returns a comprehensive list of the topological indices. The returned list of structural metrics is oriented on cellular networks and it includes degree and neighbourhood, clustering, distance, centrality and heterogeneity indices as well as simple cycles, compositional complexity and Shannon information entropy of network. Comparative studies have shown that the CelNetAnalyzer calculates these parameters significantly faster than competitors, thanks to parallelization and enhanced and newly developed algorithms. CelNetAnalyzer is an open-source project and free distributed for non-commercial use. CelNetAnalyzer requires JavaTM Platform Standard Edition 6 or higher. Downloadable archive contains GUI version of software, source code, the user manual, test network and the results of the topological analysis of this network.
Страница обновлена: 20.04.2017 17:47