Pushing the envelope
Nvidia is partnering with Deutsche Post DHL to deploy a fleet of autonomous vehicles starting in 2018.
Gemma Church wonders if driverless cars and big data are quietly overhauling modelling and simulation.
High performance computing and big data analytics are natural bedfellows. It is important that both can move data, exchange messages and analyse computed results from thousands of parallel processes fast enough to keep those computing resources running at peak efficiency.
But for the true power of big data analytics to be harnessed inside a powerful HPC architecture, we need to integrate machine learning techniques into our approach. And this will open the floodgates to a range of new applications, as Fatma Kocer, vice president design exploration at Altair, explained: ‘We are now looking into using machine learning in applications where we do not have viable physics-based solutions. These applications could be ones that no physics-based solutions exist. They can also be ones that have a physics-based solution, however, they are resource intensive, especially compared to the data-based solutions.’
But big data is so different to traditional data sets that it forces us to change our simulation and modelling strategies. Kocer said: ‘Now, we have big data whereas, in the past, we had to work with barely enough data. Now, the data is coming from the field instead of coming from controlled design of experiments. As a result, there is noise that needs to be filtered; there are errors that need to be cleaned; there are attributes that needs to be ignored for successful applications of machine learning techniques.’
‘So, in short, the challenge is not working in the machine learning space as a simulation company; the challenge is to move from a controlled, small data set to big real-time field data,’ she added.
Working with big data
Until recently, the majority of big data applications have been based on conventional modelling and simulation techniques. However, we are just starting to see small breakthroughs in this area. For example, IBM Research recently overcame a key technical limitation where many deep learning frameworks do not run efficiently across multiple servers and can struggle to scale beyond a single node. Using a new distributed deep learning software, it achieved a record communication overhead and 95 per cent scaling efficiency.
Altair has also used machine learning techniques to create predictive models for design optimisation. ‘We have users that have no simulation models but they have data from testing. They use our products to create predictive models from these data sets to optimise their designs. We also have users that have resource intensive simulations such as CFD simulations that is prohibitive for many optimisation studies. They use our products to create training and testing data. They then continue creating predictive models using this data. Finally, they use these predictive models for design optimisation,’ Kocer explained.
‘These mathematical techniques can be successfully leveraged for design improvements without needing a statistician or an optimisation expert,’ Kocer added.
This is the tip of the iceberg. big data will drive the development of new forms of HPC configurations to unlock the insights held in these huge swathes of unstructured data. This convergence of big data analytics and HPC, also known as High-Performance Data Analytics (HPDA), will accelerate the development of many tantalising applications - including driverless cars.
We’re seeing great strides in the development of driverless cars through AI and machine learning tools. For example, NVIDIA is developing DRIVE PX Pegasus, which is the world’s first AI computer for fully autonomous robo-taxis. More than 25 companies are using Pegasus to develop level five, fully autonomous vehicles, according to an NVIDIA spokesman.
To create new machine learning algorithms for autonomous driving, you need large data sets. If we use real-world data, then it must go through a labour-intensive labelling process before the self-driving algorithm can ingest the data and learn from it. But simulated data is automatically labelled as it is created, which saves massive amounts of time.
A joint project between Siemens and the German Research Center for Artificial Intelligence recently demonstrated it was more effective to a combination of synthetic and real-world data when training deep learning driving algorithms, compared to using real-world data alone.
When we move away from the development of these algorithms and want to test such vehicles from on real road networks, then there are more challenges to address.
For example, it is estimated that autonomous vehicles will produce 4TB of data every day through the complex combination of scanners and sensors, cameras and GPS used to detect vehicles, pedestrians, traffic signals, road curbs and other obstacles.
So, while simulation allows us to supplement those real-world driving hours when we develop the necessary algorithms, we need to use human drivers on real roads to robustly evaluate the performance of new self-driving technologies.
In order to achieve this, we will require a new fleet of HPC configurations and simulation and modelling strategies to effectively label and analyse the reams of real-world data that will result from autonomous driving in the real world.
There are many clever techniques stepping up to the challenge, including an MIT spin-off called iSee that is integrating cognitive science into its AI algorithms to give autonomous vehicles a kind of common sense to quickly deal with new situations.
Another approach, taken by Drive.ai, is to use deep learning techniques to teach autonomous vehicles how to drive. This works under the premise that all data is not equal. So, instead of managing and analysing every piece of data available, the system collects high-quality data and then annotates it so it’s useful for deep learning algorithms.
Whatever approach is used though, artificial intelligence, machine learning and deep learning systems all thrive on data.
As such, the simulation and modelling techniques of today are not going to cut it when it comes to the development of driverless cars, unless we can effectively integrate real world and real-time big data into our algorithms using novel HPC configurations.