People are saying, 'Big Data is the new oil.'

# Data Science

In recent years the falling cost of digital storage and the increasing move towards online information processing and other related technological developments has made it possible for organisations to collect massive amounts of data about their customers, user preferences and processes. As a result of this rapid growth in available information, data science has become a hugely important topic in recent years with a growing demand for practitioners in a variety of industries. With ever increasing growth in data generation and collection, the value of data to industries is highly dependent on appropriate understanding and analysis of the data. Consequently, data science and analytics has become a core component of both public and private sector companies wishing to maintain competitiveness.

Data science is a multidisciplinary field focused on finding actionable insights from large sets of raw and structured data while data analytics is the science of extracting actionable insight for large amounts of raw data in order to enable better decision making within an organisation. Data science uses many different techniques to obtain answers, incorporating statistics, computer science and predictive analytics to parse through massive data sets in an effort to establish solutions to problems that may not yet have arisen. On the other hand data analytics focuses on processing and performing statistical analysis on existing data sets, creating methods to capture, process, and organize data to uncover actionable insights for current problems.

#### The data science process

As mentioned previously, data science is not a single discipline but rather the intersection of many disciplines. In order to understand and appreciate how these seemingly disparate disciplines fit together it is often best to consider “The Data Science Process” - which has at its core six steps:

**Frame**the problem: who are you helping? what do they need?**Collect**raw data: what data is available? which parts are useful?**Process**the data: what do the variables actually mean? what cleaning is required?**Explore**the data: what patterns exist? are they significant?- Perform in-depth
**analysis**: how can the past inform the future? to what degree? **Discuss**the results: why do the numbers matter? what should be done differently?

Skills Required | ||
---|---|---|

1. Frame the problem | 2. Collect raw data | 3. Process the data |

Domain knowledge | Database management | Scripting language |

Business strategy | Querying (un)structured data | Data wrangling and cleaning |

Teamwork | Distributed storage | Distributed processing |

4. Explore the data | 5. Perform in-depth analysis | 6. Discuss the results |

Scientific computing | Advanced mathematics | Data visualisation |

Inferential statistics | Regression modelling | Data storytelling |

Feature extraction | Machine learning | Business acumen |

#### DATA SCIENCE AND ANALYTICS PROGRAMMES IN CIT

The Data Science and Analytics programmes in CIT have been designed and developed in collaboration with industry experts to reflect the true interdisciplinary nature of the field which draws from Statistics, Mathematics, Computer Science, Machine Learning and Business Intelligence. With this in mind, particular care has been taken to ensure the development of each strand – Statistics & Mathematics, Computer Science and Data Science – throughout the programmes. Furthermore, the programmes will provide learners with the opportunity to integrate and synthesise the learning acquired in each of these fields, and to apply these skills to real-life problems in the data analytics sphere.