The proliferation of data is changing the way research institutions work in myriad ways – making people and processes more efficient, creating new competitive advantages, and driving scientific advancements.
Research labs are faced with the challenge of analyzing increasingly large sets of data and learning how to process that information. Researchers struggle to recreate experiments and want to find ways to automate and scale their computational resources. These challenges are not unique to the researchers at Clemson University. Clemson’s team was also looking for a way to utilize different cloud providers, both public and private, while having visibility into a realistic budget for scaling up and down.
At Clemson University, the Computational Biology Lab, spearheaded by Professor Alex Feltus, integrated Cisco Hyperflex and Container Platform (CCP) to expand its data processing capabilities and accelerate research. CCP is helping Clemson to transform the way their researchers conduct experiments, share data, and analyze massive data sets.
“Approximately 20 percent of the time it takes to set up a new environment is spent teaching how to configure the environment, and that is removed by using containers,” Feltus says. By reducing the time needed to set up environments to perform experiments and run data sets, researchers have more time to focus on collecting and analyzing data.
The engineers in the lab adapted fairly quickly to using CCP as Cisco provided key training to roll out the platform. The lab has hosted additional training sessions and will continue to do so, to spread awareness of the benefits of using containers in a lab environment.
Since implementing CCP, the Computational Biology Lab has had the ability to focus on driving results and providing access to more data, versus trying to learn and manage multiple cloud GUI, leading to greater research successes.
For example, with CCP, the lab analyzed thousands of data sets focused on DNA sequencing to research treatment options for a specific and rare cancer. During a hackathon with other labs, the Computational Biology Lab compiled a specific DNA sequence of a kidney tumor and compared it to other data sets in order to better understand how the tumor came to be and what treatment options may combat it. CCP made this possible by allowing disparate labs and systems to work together and seamlessly share results and findings.
The common workflow and common platform for data, enabled by CCP, allows for research to scale rapidly, which is crucial as data is now being processed at the petascale. For example, data for one person or tumor might consume a few gigabytes of data. Analysis of that data – multiplied across the human population – has the potential to significantly impact scientific outcomes and cures. The scale needed for that type of data processing is exponential, which is why CCP is a tremendous tool and removed several key barriers to scale and collaboration.
Another benefit of using CCP is that data sets are available to anyone, not just elite consortiums. “You don’t need to be at an advanced institution to access data these days. CCP allows a smaller lab to access all NIH data – it’s basically a Netflix of research,” Feltus notes.
With so much data so readily available, Feltus sees the potential for implementing CCP at more educational labs, from large universities to community colleges. The key is getting data to more researchers for the best chance at yielding actionable results.
As a timely example, the genome of the new coronavirus (COVID-19) was processed in a matter of days, and clinical trials for vaccines are already underway. When the swine flu or SARS first made headlines, researchers didn’t have the technology to move as quickly with mapping those genomes.
Feltus believes the future of lab experimentation is containerization, “Everyone is moving to containers, or preparing to. Anyone doing high-performance computing should be using containers.”