In the past decade, there has been an explosion of healthcare-related data. With the digitalization of medical records, increasing affordability of molecular testing, advent of medical informatics and widespread use of wearables, the sheer volume of data available for analysis is staggering.
"Big data" refers to large and complex datasets generated by a wide range of sources. These datasets are typically characterised by their large volume, high velocity, and extensive variety. They can be difficult to store, process, and analyse using traditional data management and analysis tools. Big data can be structured, unstructured, or semi-structured. Structured data refers to data that is organised in a specific format such as tables, spreadsheets, and databases. Examples of structured data include clinical and financial data. Unstructured data refers to data that does not have a specific format such as text, images, and videos. Semi-structured data is a form of data that has some kind of structure but it is not as rigid as structured data.
There is a discussion that “big” is no longer the correct parameter, but rather how “smart” the data are, focusing on the insights that the volume of data can reasonably provide. This aspect is fundamental in the health sector. The potential of big data in improving health is enormous. However, its potential value is unlocked only when leveraged to drive decision making and enable such evidence-based decision making, it is necessary to have efficient processes to analyse and turn high volumes of data into meaningful insights. Due to the complexity and diversity of the data, as well as, the computational power and storage required to handle it, the analysis of big data often requires specialised software, infrastructure, and expertise. How can we bridge the gap between the collected data, and our understanding and knowledge of human health? This is covered by “data science”.