The HSHSL is a part of the University of Maryland, Baltimore | My UMB The Elm UM Shuttle Blackboard

601 West Lombard Street
Baltimore MD 21201-1512

Reference: 410-706-7996
Circulation: 410-706-7928


Center for Data and Bioinformation Services

Communications and Contributions

Follow our blog for CDABS updates, information about data and bioinformation related opportunities and events at UMB and beyond, and in-depth looks at useful tools and resources. Check out our research contributions (including supporting data and code) and presentations to see how we are enhancing the field of data and bioinformation.

Stay Informed

Enter your email address to have our CDABS content sent directly to your inbox!

You can also subscribe to our posts via   RSS!

Blog Posts

View all posts

Understanding Algorithmic Bias
Posted on Tuesday, November 8, 2022

DABS: DATA AND BIOINFORMATION STUFF

Understanding Algorithmic Bias

Algorithm. Machine Learning. Artificial Intelligence. We might feel like we are encountering these terms more and more every day, and yet they can remain mysterious and intimidating to those who do not work directly with them. What should we make of these tools? -- Flashy buzzwords, magic bullets for solving any problem, dystopian nightmare in the making? Perhaps a mix of all three, or somewhere in between.

First - let's unpack these related terms (definitions from NNLM Data Glossary)

  • Algorithm - "a set of instructions that is designed to accomplish a task. Algorithms usually take one or more inputs, run them systematically through a series of steps, and provide one or more outputs."
  • Artificial Intelligence - "actions that mimic human intelligence displayed by machines and to the field of study focused on this type of intelligence. AI consists of computer programs that are typically built to adaptively update and enhance their own performance over time."
  • Machine Learning - "a type of Artificial Intelligence. Machine Learning involves sophisticated algorithms which can be trained to sort information, identify patterns, and make predictions within large sets of data."

Algorithms are all around us - deciding things like which Facebook posts we see, what route our GPS takes, and which results come up first in a Google search. Algorithms are part of decision-making software in domains such as law enforcement, health care, finance, and human resources.

While algorithms and machine learning solutions can seem like magic, it is important to keep in mind that they are built by humans and based on existing and often flawed and incomplete data. What happens when the data used to build an algorithm is based on outdated racist, and/or sexist policies? What if the algorithm cannot be validated because the company that owns it either does not know how it works themselves, or does not want others to know? What if you are contributing data to an algorithm without knowing it?

Join us for a Coded Bias Virtual Discussion Event

If these issues interest you, join CDABS and the HSHSL Diversity and Inclusion Committee for a virtual discussion of the film Coded Bias. Coded Bias, directed by Shalini Kantayya, explores the fallout of MIT Media Lab researcher Joy Buolamwini´s startling discovery that facial recognition does not see dark-skinned faces and women accurately, and her journey to push for the first-ever legislation in the U.S. to govern against bias in the algorithms that impact us all.  

Register below for one of two facilitated 90-minute discussion sessions. We’ll discuss what it means to create artificial intelligence technologies and algorithms that do not encroach upon the civil liberties of people of color, and how this question ties into broader conversations around health equity and social justice. What has the field of Artificial Intelligence gotten right so far and in what direction(s) should it head in the future? Registered participants will receive a link to view the film between November 9th and 16th.

Space for this event is limited – sign up now!

Register for Discussion Session 1, Tues. Nov. 15 from 12:00-1:30 PM

Register for Discussion Session 2, Fri. Nov. 18 from 12:00-1:30 PM

Questions? Contact: Amy Yarnell, data services librarian, at data@hshsl.umaryland.edu.


The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication

Sign up to get DABS delivered to your email or RSS feed.


Open Access Week Challenge 2022
Posted on Thursday, October 20, 2022

the words "international open access week" next to a stylized image of a combination lock

The HSHSL is celebrating International Open Access (OA) Week with a 5-day challenge designed to improve discovery of your scholarly work! Each day we will email you with a brief but meaningful activity, including:

  • Creating an ORCiD and connecting it to your Scopus author profile
  • Locating OA journals in your research area
  • Learning about repositories and data sharing
  • Utilizing MyNCBI

If you complete any of the activities and fill out our evaluation survey at the end of the challenge, you will be entered to win some library swag!

Sign up to receive daily challenges from October 24 - 28, 2022


Learn R this Fall with CDABS
Posted on Tuesday, August 23, 2022

DABS: DATA AND BIOINFORMATION STUFF

Learn R this fall with CDABS

The Center for Data and Bioinformation Services (CDABS) will be holding an R workshop the last two Wednesday (21st & 28th) and Thursday (22nd & 29th) in September. All sessions run from 1:00pm to 4:00pm and meet online. Space is limited so register now!

R is an open-source programming language that is ideal for working with statistics and data. Here at CDABS, we love R for many reasons: It's free, flexible, and friendly! It's also a great tool for creating reproducible data analyses and visualizations.

In this series we'll start with the basics of R and the RStudio environment, move to more complex data wrangling and visualization tasks, and finally look at the extended R ecosystem and tools for sharing your work with interactive reports, notebooks, and applications. Sign up for the whole series, or just the sessions that interest you most, but be advised that the later sessions will require at least a little familiarity with R.

See full session descriptions below and register here:

September 21: Introduction to R and RStudio

This session will provide a solid foundation in working with R and RStudio and lay the groundwork to enable participants to explore more advanced topics in R programming. No experience with R or programming is required.

Topics covered will include:

  • Navigating the RStudio interface, installing packages, getting help
  • Naming and working with objects
  • Using functions
  • Identifying R data types and structures
  • Working with scripts

September 22: Data Wrangling with R -- Introduction to the Tidyverse

This session will introduce the concept of “tidy” data, and the versatile collection of packages known as the Tidyverse.  Participants will get hands-on experience wrangling real datasets.

Topics covered include:

  • Loading data from external files
  • Subsetting data
  • Transforming data from wide to long
  • Working with dates
  • Joining multiple datasets

Prerequisites: Session one in this series, or have previous experience with base R.

September 28: Data Visualization in R with ggplot2

Learn how to use the ggplot2, a robust Tidyverse package  used to create high quality graphics for exploring and communicating your data. We will go beyond basic graphs and learn how to customize and annotate our graphs for more effective storytelling. Participants will have the best experience if they attended session two in this series or have some previous experience with R and the Tidyverse.

Topics covered include:

  • Visualization best practices
  • Grammar of graphics - ggplot2 layers, aesthetics, and geoms
  • Choosing an effective graph type for your data
  • Customizing labels, axes, legends, and more
  • Choosing a color palette and themes

Prerequisites: Session two in this series, or have previous experience with R.

September 29: Introduction to Reproducible Research and Interactive Data Applications in R

This session will provide a high-level overview of the vast ecosystem in R for reproducible research and creating interactive data visualizations. Users will learn about version control, packages available in R for creating reports, online books, and even blogs. There will also be an introduction to creating data applications/ dynamic dashboards using the Shiny package in R. Participants will have the best experience if they have some familiarity with R syntax and the RStudio interface.

Topics covered will include:

  • Version control with Git
  • Integrating RStudio and GitHub for project data and code management and version control.
  • Reproducible research reports with code and prose with RMarkdown
  • Sharing your work on the web with Bookdown/Blogdown
  • Interact, analyze, and communicate your data with Shiny

Prerequisites: Session one in this series, or have previous experience with R.

Questions? Contact: Amy Yarnell, data services librarian and Jean-Paul Courneya, bioinformationist at data@hshsl.umaryland.edu.


The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication

Sign up to get DABS delivered to your email or RSS feed.

 


Learn R this summer with CDABS
Posted on Friday, June 3, 2022

DABS: DATA AND BIOINFORMATION STUFF

Learn R this summer with CDABS

The Center for Data and Bioinformation Services (CDABS) will be holding an R workshop every Thursday in July. All sessions run from 12:00pm to 3:00pm and meet online. Space is limited so register now!

R is an open-source programming language that is ideal for working with statistics and data. Here at CDABS, we love R for many reasons: It's free, flexible, and friendly! It's also a great tool for creating reproducible data analyses and visualizations.

In this series we'll start with the basics of R and the RStudio environment, move to more complex data wrangling and visualization tasks, and finally look at the extended R ecosystem and tools for sharing your work with interactive reports, notebooks, and applications. Sign up for the whole series, or just the sessions that interest you most, but be advised that the later sessions will require at least a little familiarity with R.

See full session descriptions below and register here:

July 7: Introduction to R and RStudio

This session will provide a solid foundation in working with R and RStudio and lay the groundwork to enable participants to explore more advanced topics in R programming. No experience with R or programming is required.

Topics covered will include:

  • Navigating the RStudio interface, installing packages, getting help
  • Naming and working with objects
  • Using functions
  • Identifying R data types and structures
  • Working with scripts

July 14: Data Wrangling with R -- Introduction to the Tidyverse

This session will introduce the concept of “tidy” data, and the versatile collection of packages known as the Tidyverse.  Participants will get hands-on experience wrangling real datasets.

Topics covered include:

  • Loading data from external files
  • Subsetting data
  • Transforming data from wide to long
  • Working with dates
  • Joining multiple datasets

Prerequisites: Session one in this series, or have previous experience with base R.

July 21: Data Visualization in R with ggplot2

Learn how to use the ggplot2, a robust Tidyverse package  used to create high quality graphics for exploring and communicating your data. We will go beyond basic graphs and learn how to customize and annotate our graphs for more effective storytelling. Participants will have the best experience if they attended session two in this series or have some previous experience with R and the Tidyverse.

Topics covered include:

  • Visualization best practices
  • Grammar of graphics - ggplot2 layers, aesthetics, and geoms
  • Choosing an effective graph type for your data
  • Customizing labels, axes, legends, and more
  • Choosing a color palette and themes

Prerequisites: Session two in this series, or have previous experience with R.

July 28: Introduction to Reproducible Research and Interactive Data Applications in R

This session will provide a high-level overview of the vast ecosystem in R for reproducible research and creating interactive data visualizations. Users will learn about version control, packages available in R for creating reports, online books, and even blogs. There will also be an introduction to creating data applications/ dynamic dashboards using the Shiny package in R. Participants will have the best experience if they have some familiarity with R syntax and the RStudio interface.

Topics covered will include:

  • Version control with Git
  • Integrating RStudio and GitHub for project data and code management and version control.
  • Reproducible research reports with code and prose with RMarkdown
  • Sharing your work on the web with Bookdown/Blogdown
  • Interact, analyze, and communicate your data with Shiny

Prerequisites: Session one in this series, or have previous experience with R.

Questions? Contact: Amy Yarnell, data services librarian and Jean-Paul Courneya, bioinformationist at data@hshsl.umaryland.edu.


The Center for Data and Bioinformation Services (CDABS) is the University of Maryland Health Sciences and Human Services Library hub for data and bioinformation learning, services, resources, and communication

Sign up to get DABS delivered to your email or RSS feed.


Citation List

  • A Model for Centralizing Data and Bioinformation Services at the Health Sciences and Human Services Library. Courneya JP, Yarnell A. Project Briefing presented at: Coalition for Networked Information (CNI) Fall Membership Meeting; 2020 Nov 10 - Dec 15; Virtual.

    (CDABS primary work)

  • Creating campuswide engagement opportunities with library professionals through promotion of Love Data Week 2020. Yarnell A, Courneya JP. Posted presented at: Medical Library Association, Mid-Atlantic Chapter (MAC/MLA) Annual Meeting; 2020 Oct 19-21; Virtual.

    (CDABS primary work)

  • Delayed microglial depletion after spinal cord injury reduces chronic inflammation and neurodegeneration in the brain and improves neurological recovery in male mice. Li Y, Ritzel RM, Khan N, Cao T, He J, Lei Z, Matyas JJ, Sabirzhanov B, Liu S, Li H, Stoica BA, Loane DJ, Faden AI, Wu J. Theranostics. 2020 Sep 14;10(25):11376-11403. doi: 10.7150/thno.49199.

      Read It

    (CDABS contribution: Functional gene enrichment analysis)

  • Full-length IL-33 regulates Smad3 phosphorylation and gene transcription in a distinctive AP2-dependent manner. Luzina IG, Fishelevich R, Hampton BS, Courneya JP, Parisella FR, Lugkey KN, Baleno FX, Choi D, Kopach P, Lockatell V, Todd NW, Atamas SP. Cell Immunol. 2020 Nov;357:104203. doi: 10.1016/j.cellimm.2020.104203. Epub 2020 Sep 2.

      Read It

    (CDABS contribution: Bioinformatics, Visualization)

  • High-performance computing service for bioinformatics and data science. Courneya JP, Mayo A. J Med Libr Assoc. 2018 Oct;106(4):494-495. doi: 10.5195/jmla.2018.512. Epub 2018 Oct 1.

      Read It

    (CDABS primary work)

  • High-performance computing service in the Health Science and Human Services Library at University of Maryland Baltimore. Mayo A and Courneya JP. [version 1; not peer reviewed]. F1000Research 2018, 7(ISCB Comm J):1089 (https://doi.org/10.7490/f1000research.1115828.1)

      Read It

    (CDABS primary work)

  • PubRunner: A light-weight framework for updating text mining results. Anekalla KR, Courneya JP, Fiorini N, Lever J, Muchow M, Busby B. F1000Res. 2017 May 2;6:612. doi: 10.12688/f1000research.11389.2.

      Read It

    (CDABS contribution: Informatics, programming)