Bioinformatics: Juggling Big Data, Complex File Types and Multiple Projects

Eva Chan, bioinformatician at Garvan Institute of Medical Research is an ace at making sense of the massive genomic data sets her research group works with.

Genomics data sets are highly sophisticated and multi-layered. Now, possibly more than ever before, we need those data sets to be publicly available, reproducible and transparent. It’s not an easy task but researchers around the globe are finding a way to make it happen…

Eva Chan, bioinformatician at Garvan Institute of Medical Research is an ace at making sense of the massive genomic data sets her research group works with.

An important part of her work, she says, is reproducibility. In order for her analysis to add value, every step must be carefully documented. “Choosing parameters and remembering them” is crucial in the world of bioinformatics and Eva manages to do this while handling big data, complex file types and multiple projects all at the same time. 

eva-and-vanessaEva Chan (left) and Vanessa Hayes (right) conduct genomic research at Garvan Institute of Medical Research in Sydney, Australia.

A lot of data

Before analysis even begins, genomics data is highly sophisticated and multi-layered. To make things even more complex, several different people handle the data that Eva works with and she works on more than one project at once. Intricate data and intricate workflows call for a high level of organization. Eva conducts her day to day work within LabArchives to keep track of it all.

Sometimes, Eva will spend an entire day analyzing trends and processing raw data. When she’s immersed in her work, she might not talk to anyone else for the entire day. When she does need to collaborate with team members, she can do it via LabArchives in a matter of seconds.

Eva shares links to her work with collaborators and can share big files with them via LabArchives when needed. The files she works with range from standard to highly specific and nearly all of them can be stored and shared with LabArchives.

Complex file types and DOIs

About a year ago, Eva needed to publish a data set that she’d generated with a piece of optimal mapping technology. This technology allows her to identify large-scale genomic rearrangements that are often difficult to “see” from genome sequencing data alone. Sound complicated? It was.

Because the data was referenced in a paper her group was trying to publish, Eva needed to make the data publicly available. Unfortunately, NCBI and other commonly used data repositories didn’t support the file type. LabArchives, however, did. 

Eva created a public DOI within LabArchives to share the data set which can now be accessed by any member of the public via the paper. It was a pretty simple solution for a pretty complicated data set. 

Reproducibility

Eva often has to repeat certain types of analysis many times over.  She creates templates in LabArchives to speed up the process. It’s data’s long term reproducibility, however, that matters most to her.

Eva uses LabArchives to jot down quick ideas and notes for herself so she can remember why she chose one type of analysis over another, for instance, down the track. She records her movements and thought processes alongside her analysis so that every bit of work she does has context. Later, when she needs to refer back to that context, she can find it quickly in LabArchives.

The search function allows Eva to quickly locate data months (or even years) after she’s done working with it. All of her work is documented and indexed with LabArchives revision history feature, so she never has to worry about losing data. If one of her group’s papers were to be reviewed in the future, all data tied to it is securely stored with attribution in LabArchives. Even better still? Many of the group’s papers link directly back to the cited work within LabArchives for complete transparency.

Managing complex data and workflows isn’t an easy task but Eva’s strategies make a reproducible, collaborative and transparent future for all research a very achievable goal.

Latest Blog Posts

The NIH 2023 DMP requirement encourages researchers to proactively plan for data sharing, with the expectation that data sharing will become an integral part of regular research conduct. In this article, we provide an overview on data repositories and how to identify compliant repositories that are most suitable for the data you are expected to share.
The NIH 2023 DMP requirement encourages researchers to proactively plan for data sharing, with the expectation that data sharing will become an integral part of regular research conduct. In this article, we summarize the types of the data you should expect to share and best practices that you should start familiarizing yourself with to be best prepared.
Adopting a modern electronic laboratory notebook (ELN) like LabArchives to support best practices in data management, lab connectivity, collaboration and research reproducibility can help position institutions for a successful REF 2029 review.
The purpose of this document is to outline the key components of the NIH 2023 Data Management and Sharing (DMS) policy

Get started with LabArchives today

Start for free and upgrade as your team grows