The HSHSL is a part of the University of Maryland, Baltimore | My UMB The Elm UM Shuttle Blackboard

601 West Lombard Street
Baltimore MD 21201-1512

Reference: 410-706-7996
Circulation: 410-706-7928


Center for Data and Bioinformation Services

Communications and Contributions

Follow our blog for CDABS updates, information about data and bioinformation related opportunities and events at UMB and beyond, and in-depth looks at useful tools and resources. Check out our research contributions (including supporting data and code) and presentations to see how we are enhancing the field of data and bioinformation.

Stay Informed

Enter your email address to have our CDABS content sent directly to your inbox!

You can also subscribe to our posts via   RSS!

Blog Posts

View all posts

DABS Volume 2 Issue 1: National COVID Cohort Collaborative (N3C) Data Now Available to UMB Researchers
Posted on Friday, April 30, 2021

The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication.

CDABS is excited to announce that UMB has recently signed a Data Use Agreement with NCATS N3C Data Enclave, making this rich source of COVID clinical data from across the country available to UMB researchers.

What is N3C?

The National COVID Cohort Collaborative (N3C) Data Enclave was launched by the National Center for Advancing Translational Sciences (NCATS) and the National Center for Data to Health (CD2H), in partnership with experts from Observational Health Data Sciences and Informatics (OHDSI), PCORnet, the Accrual to Clinical Trials (ACT) network, and TriNetX. The N3C aims to aggregate, harmonize, and make accessible vast amounts of clinical data nationwide to accelerate COVID-19 research and clinical care. With the uncertainty of the COVID-19 global pandemic, the scientific community and the Clinical and Translational Science Awards (CTSA) Program created the N3C as a partnership to overcome technical, regulatory, policy, and governance barriers to harmonizing and sharing individual-level clinical data.

What can I do with N3C data?

The N3C Data Enclave supports collaborative analytics across a broad range of clinical and translational domains, such as acute kidney injury, diabetes, pregnancy, cancer, immunosuppression, social determinants of health, and many other conditions to target treatment mechanism, drug discovery, and best care practices for COVID-19. The N3C Data Enclave opened on September 2, 2020 and now has over 5 billion rows of data on more than 4 million patient records, including over 1 million COVID positive patients.

There are three tiers of data available with different restrictions and requirements for access. From most to least restricted these are: Limited, De-identified, and Synthetic. Check out the N3C data governance page for more details on these tiers. In order to maintain adequate security, row-level data must remain in the enclave, but many tools are provided to researchers for working with the data from within the platform.

To see what others are doing with N3C data, visit the projects page.

How do I get access?

    1. Register for and gain access to the N3C Data Enclave here.Screenshot of registration button
    2. Choose the InCommon option on the login screen to log in with your UMB credentials.

Screenshot of InCommon login option

Account creation may take a few days. You will also need to complete the NIH Information Security and Information Management Training course and, if you wish to access the limited or de-identified datasets, submit evidence of having completed a Human Subjects Research Protection training course.

Once you obtain access to the Enclave, you will need to submit a Data Use Request for each specific project you intend to do. If you would like to use the limited dataset, you will need to submit a copy of your IRB determination letter as well.  For more details on requirements, please see the onboarding checklist.

Where can I get more information?

Visit the tutorials page for basics on using the N3C platform. There are additional training modules available within the Enclave. And of course, do not hesitate to reach out to your friendly, neighborhood CDABS team with any questions!

Questions? Contact: Amy Yarnell, Data Services Librarian and Jean-Paul Courneya, Bioinformationist – at data@hshsl.umaryland.edu.

To read more of our content and stay informed please visit our communications page and use the form to subscribe: https://www2.hshsl.umaryland.edu/cdabs/communications


DABS (Data and Bioinformation Stuff) Volume 1 Issue 10: Machine Learning.
Posted on Friday, March 12, 2021

The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication. 

We are wrapping up another week (Mar 08 -12) of learning and growing at CDABS. This week's DABS focus will be on machine learning. What is machine learning? Machine learning involves using specialized computer software for automation and decisions by a person to extract knowledge from data. There are two main categories of Machine learning, supervised learning which involves making predictions using data (for example: spam filters) and unsupervised learning for finding a structure from data (Topic Modeling, in Natural Language Processing, to elucidate topics in a collection of texts). There are several resources on the web to get started doing machine learning in your research. Here are a few places to start digging in that are phenomenal. 

  1. Data School is an online portal with blog posts, videos, courses, Jupyter notebooks, and webcast recordings to learn data science. Data School offers three Machine Learning courses: Introduction to Machine Learning with scikit-learn, Building an Effective Machine Learning Workflow with scikit-learn, and Machine Learning with Text in Python. Learn more here: (10 minute read) https://www.dataschool.io/ml-courses/
  2. Machine Learning Mastery is a website dedicated to making you awesome at machine learning. They use a top down approach to learn modern machine learning via hands-on tutorials. (15 minute read) https://machinelearningmastery.com/start-here/
  3. StatQuest provides an “An epic journey through statistics and machine learning”. Join Josh Starmer and see his unique and fun approach to breaking down the complex topics into small digestible bits via engaging YouTube videos with accompanying codeCheck out the section on machine learning but also make sure to broaden your exploration to many other topics in statistics. (10 minute read) https://statquest.org/video-index/#machine 

Questions?  

Contact: Amy Yarnell, Data Services Librarian and Jean-Paul CourneyaBioinformationist  at data@hshsl.umaryland.edu. 

To read more of our content and stay informed please visit our communications page and fill out the form to subscribe.  

Subscribe here: https://www2.hshsl.umaryland.edu/cdabs/communications 


DABS (Data and Bioinformation Stuff) Volume 1 Issue 9: Teaching how to code with the Carpentries.
Posted on Friday, March 5, 2021

The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication.

It has been a busy week here in CDABS-land! The CDABS team started the process of becoming certified Carpentries instructors with an intensive 4-day training. The Carpentries is a global community dedicated to teaching foundational coding and data science skills to researchers in an interactive and inclusive manner.

The centerpiece of the Carpentries program is the intensive 2-day (or 4 half-day) workshops that teach how to accomplish common tasks in popular open source programming languages. In addition to workshops, they make a great deal of material available online for self-paced learning – though it is hard to beat the experience you get from a full workshop! The Carpentries is an umbrella for three separate programs. each with a slightly different focus:

  1. Software Carpentry is intended for researchers who need to become more effective programmers in a short amount of time. Workshop lessons typically cover the Unix Shell, Git, and programming with either R or Python.
  2. Data Carpentry offers domain-specific training for data intensive tasks. Data Carpentry workshops also feature lessons in the Unix Shell, Git, R, and Python while working with datasets relevant to particular domains like Genomics and Social Sciences.
  3. Library Carpentry focuses on the needs of librarians and other information professionals. In fact, many of your HSHSL librarians recently went through this training! The curriculum includes lessons on the Unix Shell, Git, OpenRefine, and Regular Expressions.

Once we complete our certifications, we hope to bring these workshops to the UMB community, so be on the look out for more information. If you know you are interested in attending one of these workshops or arranging one for your group, please reach out and let us know!

Questions?

Contact: Amy Yarnell, Data Services Librarian and Jean-Paul Courneya, Bioinformationist – at data@hshsl.umaryland.edu.

Sign up to get DABS delivered to your email or RSS feed.


DABS (Data and Bioinformation Stuff) Volume 1 Issue 8: Cloud Computing
Posted on Friday, February 26, 2021

The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication.

We are wrapping up another week (Feb 22 -26) of learning and growing at CDABS. Our adventures had us working on HPC (High Performance Computing) at IU (Indiana University) as part of their HPC Onboarding for Biologist workshop. The National Center for Genome Analysis Support (NCGAS) provides this HPC workshop to help new users learn about HPC resources available to them, other course offerings, and NCGAS services. This workshop and a video linked below had me thinking quite a bit about research computing particularly computing on the cloud. Folks let me say that computing on the cloud is becoming more pervasive in research computing. Knowing about this topic is worth your time since as researchers in the modern age we will be faced with having to use the cloud to do our computing more and more. Datasets are moving to the cloud. Software has already moved to the cloud. And some day our workstations may only be terminals to connect to our actual computers which exist on the cloud. 
 
Check out these links for resources and learning about computing for genomics on the cloud.
  1. This article, freely available in PMC (Pubmed Central), by Ben Langmead and Abhinav Nellore describes how cloud computing is used in genomics for research and large-scale collaborations, and argues that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data. (10 minute read) https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6452449/ 
  2. Keynote from European Bioconductor Meeting 2020: Sehyun Oh - Bioinformatics On Cloud: How to leverage cloud-based resources for your bioinformatics works. (40 minute watch) https://youtu.be/bFvT4_fqpwE
  3. The Seven Bridges Platform is a cloud-based environment for analyzing genomics data. Use the Platform to securely store, analyze, and share data amongst team members working both locally and globally. The Platform co-locates analysis workflows alongside genomic datasets to optimize processing. Read and learn more at the SevenBridges knowledge center. (10 minute overview) https://docs.sevenbridges.com/
  4. Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate. The vision of Terra is to enable the next generation of collaborative biomedical research. There are several projects that exist independently on the platform - AnVIL, BioData Catalyst, and FireCloud, for example. Each project on Terra serves a unique research purpose, while still offering the benefits of the Terra platform to every user. (10 minute read) https://terra.bio/
  5. Galaxy is an open source, web-based platform for data intensive biomedical research. The main Galaxy instance is an installation of the Galaxy software combined with many common tools and data; this site has been available since 2007 for anyone to analyze their data free of charge. The site provides substantial CPU and disk space, making it possible to analyze large datasets. You can even install your own Galaxy and choose from thousands of tools from the Tool Shed. (10 minute overview) https://galaxyproject.org/tutorials/g101/ (Galaxy Main) https://usegalaxy.org/

Questions?  

Contact: Amy Yarnell, Data Services Librarian and Jean-Paul Courneya, Bioinformationist -- atdata@hshsl.umaryland.edu. 

To read more of our content and stay informed please visit our communications page and fill out the form to subscribe. Subscribe here: https://www2.hshsl.umaryland.edu/cdabs/communications

  1.  

 


Citation List

  • A Model for Centralizing Data and Bioinformation Services at the Health Sciences and Human Services Library. Courneya JP, Yarnell A. Project Briefing presented at: Coalition for Networked Information (CNI) Fall Membership Meeting; 2020 Nov 10 - Dec 15; Virtual.

    (CDABS primary work)

  • Creating campuswide engagement opportunities with library professionals through promotion of Love Data Week 2020. Yarnell A, Courneya JP. Posted presented at: Medical Library Association, Mid-Atlantic Chapter (MAC/MLA) Annual Meeting; 2020 Oct 19-21; Virtual.

    (CDABS primary work)

  • Delayed microglial depletion after spinal cord injury reduces chronic inflammation and neurodegeneration in the brain and improves neurological recovery in male mice. Li Y, Ritzel RM, Khan N, Cao T, He J, Lei Z, Matyas JJ, Sabirzhanov B, Liu S, Li H, Stoica BA, Loane DJ, Faden AI, Wu J. Theranostics. 2020 Sep 14;10(25):11376-11403. doi: 10.7150/thno.49199.

      Read It

    (CDABS contribution: Functional gene enrichment analysis)

  • Full-length IL-33 regulates Smad3 phosphorylation and gene transcription in a distinctive AP2-dependent manner. Luzina IG, Fishelevich R, Hampton BS, Courneya JP, Parisella FR, Lugkey KN, Baleno FX, Choi D, Kopach P, Lockatell V, Todd NW, Atamas SP. Cell Immunol. 2020 Nov;357:104203. doi: 10.1016/j.cellimm.2020.104203. Epub 2020 Sep 2.

      Read It

    (CDABS contribution: Bioinformatics, Visualization)

  • High-performance computing service for bioinformatics and data science. Courneya JP, Mayo A. J Med Libr Assoc. 2018 Oct;106(4):494-495. doi: 10.5195/jmla.2018.512. Epub 2018 Oct 1.

      Read It

    (CDABS primary work)

  • High-performance computing service in the Health Science and Human Services Library at University of Maryland Baltimore. Mayo A and Courneya JP. [version 1; not peer reviewed]. F1000Research 2018, 7(ISCB Comm J):1089 (https://doi.org/10.7490/f1000research.1115828.1)

      Read It

    (CDABS primary work)

  • PubRunner: A light-weight framework for updating text mining results. Anekalla KR, Courneya JP, Fiorini N, Lever J, Muchow M, Busby B. F1000Res. 2017 May 2;6:612. doi: 10.12688/f1000research.11389.2.

      Read It

    (CDABS contribution: Informatics, programming)