Controlling weather with a hammer?
An acquaintance and long-time expert on analytics technologies once quipped that ‘in another decade, we will be able to control the weather with a hammer.’ Although slightly flippant, the remark summarises in a single sentence the challenges and opportunities for modern analytics to drive productivity and quality of life:
- Enabling an understanding and eventual predictive knowledge and control over extremely complex and ‘big’ data problems
- Connecting this understanding in some way to a very simple, purpose-driven user or automated interface
- Connecting that interface in real-time to drive effective action to improve processes and outcomes
- At StatSoft (now a part of Dell), we have observed and, we believe, often led the journey to effective analytics systems and solutions over the past two decades. Here is a list of the top changes and trends that we believe are driving the current and next waves of analytics technologies, enabling solutions that in many ways are more remarkable than ‘a hammer-that-controls-the-weather’ would be.
More data drives new analytic technologies
The prices for collecting and storing more data continue to fall. Today’s big data platforms such as Hadoop allow organisations and individuals to store ‘everything.’ The technology and solutions for analysing text (‘unstructured data’) is now quite mature. But the ability to collect data will always outstrip the ability to analyse it and to extract meaningful information from it. Analysing high-definition pictures and movies or streaming conversations in real time is the new and current challenge.
What’s next? While data grows exponentially and indefinitely, the information contained in data does not. We believe that statistics will be rediscovered – or, more specifically, new kinds of statistics will be developed that will combine the idea of exploratory mining with proven statistical traditions such as highly multivariate experimental designs. These methods could be repurposed to address the continuing challenge of insufficient computing resources to process data. In fact, one might argue that statistics was invented to allow for inference about data (populations) too large to process entirely; experimental design was invented to query data (real-world observations) to extract the maximum amount of information.
More data, more models, fewer analysts
The most effective predictive modelling as well as process monitoring (‘quality control’) algorithms are ensembles. Effectively, large numbers of prediction models such as neural nets, decision trees, etc., yield the most accurate predictions. In automated manufacturing and high-dimensional process monitoring, large numbers of rules-based, models-based, and uni/multivariate control charts are most effective when one needs to watch tens of thousands of parameters in real time.
What’s next? The modelling process itself will have to become automated and ‘intelligent.’ It is not realistic that a few experienced modellers will build and maintain hundreds or thousands of prediction models (e.g. for each patient in a hospital, to determine the most promising treatment regimen). Automated, dynamic modelling technologies for building and recalibrating models (as new data become available, sometimes recalibrating and learning in real time) are needed and are emerging.
Prescriptive analytics, decisions, actionable results
More data enable more models about more outcomes, based on more data. For example, the interactive workflow where engineers would drill down to understand a particular process problem will not scale when there are thousands of parameters to review and drill down on. Likewise, traditional BI charting becomes ineffective when hundreds or thousands of variables need to be considered, or when thousands of micro-segments of customers need to be reviewed. Interactive analyses and reviews at least must be guided and prioritised; however, actions based on predicted outcomes and how to optimise them should ideally be initiated instantly and automatically.
What’s next? The role of automated prescriptive analytics and decision support will become critical. For example, Internet of Things (IoT) technologies will close the feedback loop to individual stakeholders in predictive model prescriptions. For instance, a patient may receive an automated text message not only in the (relatively ‘obvious’) case when sensors and monitors detect undesirable changes in the amount of food or water intake, or if it is detected that critical medication is skipped. But automated prescriptive analytics will deliver accurate alerts when complex and highly multidimensional models of a patient’s condition predict impending serious health problems, and these models will prescribe an urgent medical intervention even when simple observation or traditional diagnostic methods do not indicate an emergency.
Regulatory oversight, governance
More data can mean more liability; more models increase the probability of making bad predictions. If flaws in data governance or modelling lead to undesirable outcomes, the public will demand oversight and accountability. There is an advanced analytics revolution in progress in healthcare, where much more individualised predictions and prescriptions can be generated based on more specific, electronically collected and stored patient data. But if the predictions are wrong, results can be dire, and it is critical that the modelling process and the resulting predictions can be justified.
In the pharmaceutical and medical-device industries, all analytics used in the manufacturing process must be documented and validated, and the integrity of the process is carefully scrutinised by independent regulatory bodies (e.g., the FDA in the USA). Similar regulatory oversight guides risk modelling in the financial industries. Going forward, regulatory oversight will become more prevalent as the public demands careful scrutiny of all models that affect our lives.
What’s next? Complete analytics and analytic modelling platforms will need to incorporate features that enable version control, audit logs, and traceability of all actions taken. From a data perspective, data integrity and validity will become critical, which also means that some data that cannot be validated may not be collected anymore, because doing so would pose a liability. If ‘everything’ is stored ‘indefinitely,’ it also becomes discoverable in case a conflict leads to a lawsuit claiming that the defendant ‘should have known…’
We believe that technology will continue to progress at an accelerated and perhaps exponential rate, contributing to growth of productivity. More data can be collected faster and in real time, stored cheaply, and analysed with large numbers of massively parallelised and distributed algorithms and methods. In short, Big Data enables more models and more predictions, which in turn allows for a more efficient use of resources, less waste and scrap, and safer and more environmentally sustainable businesses and industries.
We believe that the most useful and successful analytics platforms in the future will not only embrace these technologies but will enable a degree of automation and simplicity from a user perspective, so that the limited capabilities of few decision makers and process experts can be devoted to few critical areas, while a self-learning, automated, validated, and compliant automated modelling system ensures a continuously improving process.
Our influential technology will continue ‘making the world more productive’ even more so now that StatSoft has become part of Dell.