Resources - BioBank, Minerva & RIMBANet | Icahn School of Medicine

The Department of Genetics and Genomic Sciences has invested in state-of-the-art software and computing systems, and collaborates with a number of Divisions, Centers, and Programs across the Mount Sinai Health System to provide statistical, technological, educational, and methodological resources to its researchers.

Minerva

The Icahn School of Medicine at Mount Sinai is proud to introduce Minerva, a supercomputer dedicated to improving scientific discovery. Minerva is named after the Roman goddess of wisdom and medicine. Minerva consists of 7,680 AMD compute cores, 30,000 gigabytes of memory, and 1,500,000 gigabytes of high speed parallel storage. It has a peak speed of 70,000 gigaflops, and will provide 64 million hours of computation per year.

Data Ark: Data Commons

The overarching goal of the Data Ark is to ensure that research data at Mount Sinai are managed, processed, and combined in a way that optimizes the power, pace, and relevance of our science.

Data Ark’s current data sets include the 1,000 Genomes Project, GTEx (Genotype-Tissue Expression), GWAS (Genome-Wide Association Studies) Summary Stats, UK Biobank, the Mount Sinai Data Warehouse Electronic Health Record COVID-19 data set, STOP COVID NYC Cohort, and the Mount Sinai COVID-19 Biobank. The number of data sets on the Data Ark will increase substantially in the coming months. The Data Ark is located on Minerva at: /sc/arion/projects/data-ark/

Anyone can contact the Data Ark team with questions or ticket submissions by writing to data-ark-team@lists.mssm.edu. Additionally, researchers utilizing Data Ark data sets are invited to join the Data Ark Slack channel to ask and answer questions surrounding common data sets. To join the channel, navigate to https://join.slack.com/t/data-ark/signup and sign up using your Mount Sinai credentials. You’ll be able to start interacting with other researchers on common data sets right away.

BioBank

Our BioMe Biobank platform is an ongoing, prospective, hospital-based population research study that has enrolled over 25,000 participants and currently enrolls, on average, 600 new participants per month. Today, BioMe is a prototype example of an EMR-linked Biobank that integrates a patient’s clinical care information and research data.

You can learn more about our program in the National Research Council’s report 'Towards Precision Medicine.'

RIMBANet

One of the major goals of systems biology is to understand how genetic and environmental variations drive transcriptional networks, protein-protein interaction networks, metabolite networks, and give arise to complex phenotypes.

We developed a computational framework centered around Bayesian network and implemented it in RIMBANet, which you can download for free here.

We have used RIMBANet to discover causal relationships in complex human diseases such as diabetes and obesity. We’ve also applied RIMBANet to investigate how genetic variations regulate transcriptional and metabolite level changes in yeast.

For questions related to the RIMBANet package or the yeast data set, please contact Jun Zhu, PhD or Eric Schadt, PhD.

REVEL

REVEL (Rare Exome Variant Ensemble Learner) is an ensemble method for predicting the pathogenicity of missense variants in the human genome by combining 18 individual functional prediction and conservation scores. REVEL performs significantly better than existing tools, especially for rare variants. REVEL scores for all human missense variants are available at: https://sites.google.com/site/revelgenomics/

CHEAR Data Center

The NIH-funded Children's Health Exposure Analysis Resource (CHEAR) Data Center serves as a data repository for researchers to comprehensively assess the vast spectrum of environmental exposures that may affect children's health. For more information on the CHEAR Data Center, please visit: http://cheardatacenter.mssm.edu.