Quick Guide
What is AI and data science all about? How do they differ? And how can you use these technologies to enhance your business or organisation?
Understanding and Utilizing Information
Data is produced, collected and stored by humans and machines through all kinds of channels. By data, we mean 'information' in the widest sense. Think for instance of questionnaire data, (social) media data, register data, sports statistics, financial data, sales figures, email correspondence, meeting minutes, websites, photos, video material, browser histories, sensor data, machine performance data, open data on weather, traffic, employers, road network, etc. There is a lot of 'proprietary' information, but especially in the context of government and science, a lot of 'open data' is offered free to use if the source attribution is mentioned.
The Power of Data Science
Data Science is an umbrella term for various subdisciplines that include machine learning and statistics, as well as certain aspects of computer science, including algorithms, data storage, and web application development. Data science solutions often involve at least a little bit of AI. This also distinguishes Data Science from traditional data analysis techniques: predictive value is extracted from the data and/or input for independently operating systems. See also Predictive/prescriptive. Data Science is also a practical discipline that requires an understanding of the field to which it is applied, such as business or science: its purpose (what its "added value" is), its underlying assumptions and its limitations. Data Science is the art of extracting from a huge amount of data (combined from different sources) but precisely the information that is relevant to business operations. Which can be used to make guiding predictions and decisions. This makes organisations and governments smarter and more efficient.
Artificial intelligence
Artificial Intelligence refers to "systems that show intelligent behaviour by analysing their environment and -with a degree of autonomy- taking action to achieve specific goals". To achieve this, Machine Learning is applied. Artificial Intelligence sets itself apart by two characteristics: adaptivity and autonomy. Adaptivity is the ability to improve performance by learning from experience. For this 'experience', the system is trained with data. Autonomy is about the ability to perform tasks in complex environments without constant guidance by a user. To apply AI, the Data Science life cycle must be completed. In reverse, AI is an important tool in the data scientist's toolbox.
Predictive / Prescriptive analyses
More available data and more advanced, quicker tooling make it possible to process and combine large amounts of data. As a result, many variables can be included in analyses to create new insights. We can therefore make predictive analyses. Which new product will be the most promising? What will the sales statistics be? The latest developments also give us the possibility of prescriptive analyses, in which algorithms & AI are used to automatically reach the best solution and operate independently.
Machine learning
Machine Learning is all about systems that get better at a task as they get more experience or data. UC Berkeley breaks down the learning system of a machine learning algorithm into three main parts.
A Decision Process: Machine learning algorithms are typically used to create a prediction or classification. Based on some input data, which can be labeled or unlabeled, your algorithm will produce an estimate about a pattern in the data.
An Error Function: An error function evaluates the model's prediction. If there are known examples of the model, an error function can make a comparison to determine the model's accuracy.
A Model Optimization Process: If the model can fit the data points in the training set better, variables are adjusted to narrow the gap between the known example and the model prediction. The algorithm will repeat this "evaluate and optimise" process, updating variables autonomously until the accuracy criteria has been met.
Robotics
Robotics is the science of creating and programming robots that can function in realistic environments. In a way, robotics presents the most complex challenge in the field of AI. This is because it brings together all sub-fields of AI. For example:
image and speech recognition for interpreting the environment;
natural language processing, information extraction, and reasoning under uncertainty for processing directions and predicting the consequences of possible actions;
development strategies and artificial emotional intelligence (systems that respond to human feeling expression or mimic feelings) for interacting and collaborating with humans.
The data science life cycle
The data science life cycle is a methodology for data science projects, developed by Microsoft's Data Science Process team. The model illustrates an overview of the processes of a data science project, covering the entire field. It can be used to organise data science projects for organisations and data scientists. The model's life cycle is divided into five stages:
Understanding the industry, domain, and problem definition. This involves translating the organisation’s goals or challenges into the goals of a data-related project or programme.
Data collection and understanding. in this phase datasets are made available, analysed for potential information value, cleaned if necessary, and prepared for workflow. They might include both internal and external datasets. Orientation models and visualisation tools are already used to evaluate the importance of the information.
Modelling. The actual analysis can begin once the essential data is fully available. To that purpose, models are created, trained, and improved. This is a continuous cycle.
Implementation. Working models are linked to current processes in this phase, such as dashboards, spreadsheets, or back-end apps.
Customer acceptance. The results are transferred to the customer at this phase, which involves validating whether the results meet the previous set goals.
The fact that the arrows between the stages point in opposite directions demonstrates how the life cycle is repetitive.
Programming languages
Programming languages are necessary to work with data. R and Python are commonly used programming languages. R comes from mathematics and is therefore especially well suited for the hard, statistical analysis. Python is a bit more general and is therefore used across the full range of Data Science.
Data Science & AI: Applications, Impacts, and Considerations
Applications
Data Science & AI are applied in many sectors and to a wide variety of issues. For example, data can contribute business efficiency in horticulture. And smart sensors can provide information on air quality or noise levels. Data Science & AI also play a role in crime prevention, in text analysis for finding the right information for a law office, customer profiling for marketing agencies, speech recognition for call centres, traffic jam prevention for road authorities, correct positioning of ambulances in the region for hospitals, preventive maintenance of expensive machinery in industry, more targeted treatment of illness for the medical sector, and so on.
Impacts
Through the use of methods, algorithms and technologies you can generate insights from data allowing you to discover important patterns and make predictions. This data and the insights gained from them are used to support decision-making, management, governance and policy in a chaotic world from a human perspective. With the help of Data Science you can lighten the workload of employees. You can perform maintenance more timely or develop autonomous robots for dangerous or dull jobs.
Considerations
One risk of deploying AI does deserve specific attention: the legal and ethical implications. What are companies allowed to do with data and what are the risks involved in using algorithms and datasets? In practice, things can go wrong: the cause often lies in the underlying dataset and human error. If the data is not sufficiently representative and objective, the system will develop a bias during 'learning'. And that can have serious consequences, especially when the algorithms and systems are so large and impenetrable that no one understands what is happening. For instance, we saw with the benefits affair what can go wrong when profiling certain groups of people. Human biases and errors of predictions led to serious abuses through the application of powerful technology.
Data Project Succes
A clear image of the project's aim and the availability of high-quality data are the primary success or failure factors in a data project. The availability of high-quality data in particular is frequently a stumbling barrier. So, before beginning a data science project as an organisation, it is critical to examine the data management:
1. Is the data available and complete?
2. Is there no contaminated or unnecessary data?
3. Is the data format readable?
Defining project goals and exceptations
If there is certainty in the points mentioned earlier, then a data project begins with setting the goal: what do you want to do better or differently?
It is also important to have some expectations: what can and cannot be done, and how much certainty can be offered in a prediction?
Moving forward, gather the relevant data: what data is required? Consider possibly useful data from external sources, such as StatLine.
You can begin working with a clearly defined goal and the right available data. The Hague University of Applied Sciences is happy to help, whether you are just getting started with AI or if you want to advance. Contact us or read further.
Advanced Tech Education
Of course, The Hague University of Applied Sciences offers programs in the fields of AI and Data Science, both full-time and part-time programs. We also hosts regular meetings and masterclasses; keep an eye on our news & events for more information.