Skip to content

Basics: the anatomy of an autonomous driving software stack

While the implementation varies greatly, all autonomous driving software stacks have largely the same components. This is because there are simply a number of capabilities that an autonomous vehicle must have in order to navigate our human made and filled roads.

This post will explore the fundamental components of an autonomous driving software stack and show where sensor calibration fits in.

 

Building blocks

In autonomous vehicles, the human driver is replaced with algorithms. These algorithms should exhibit similar (though safer!) behavior as a human driver to allow the vehicle to navigate in traffic that is still largely operated by humans. As such, the software running on these vehicles mimics the cognitive functions that humans perform when driving. 

Screenshot 2024-08-30 at 22.56.46

These main functions can be divided into four main functional blocks that build on one another.

 

Perception and localisation

Perception is responsible for determining what is around the vehicle and where the vehicle is within the world. Starting with various sensors - such as cameras, Lidars, radars - that record the physical world, a digital environment model is calculated. The output of the perception block could look like so:

A list of other traffic participants, where they are relative to our vehicle and how fast they travel and in what direction. 
  • - Lane markings
  • - Traffic signs
  • - Traffic lights
  • - Obstacles on the road
  • - A map of drivable space around the vehicle

The calculation of these artifacts requires a semantic understanding of the sensor data and is usually performed by machine learning models. In fact, this is the part of the stack where machine learning is universally used as it is such a powerful and robust approach when extracting meaning from e.g. camera images.



An early visualization from Waymo. Although the image is from 2018, not much has changed in terms of the information provided by the perception component. Source: Engadget.

Localisation is responsible for placing the vehicle on a map of the real world. Think Google Maps but with a far more accurate map that includes details such as individual lanes and their geometries, closed lanes, traffic signs and lights. The accuracy of localisation is also far higher than would be possible by using GPS only. This is achieved by correlating features from the environment model with the high definition map.

Knowing the accurate range and azimuth to both static and dynamic objects, as well as the absolute and relative localisation of the vehicle, dramatically improves the safety and reliability of the system. This starts with calibration of the sensors providing that data.

 

Prediction

This software block predicts the behavior of objects from the environment model, such as cyclists, pedestrians and other cars. This understanding of intentions and trajectories is essential for the vehicle to safely move within human traffic. 

Is a person going to cross the street? Is a cyclist likely to ride from the sidewalk onto the lane? These are questions the prediction software block tries to answer.

 

Possible paths of people walking across a crosswalk. Source: Stanford

Accurately placing a pedestrian or vehicle in the environment model improves the prediction as the system can more accurately predict whether an interaction in paths would occur. Accurately placing that pedestrian or vehicle requires accurate perception, which starts with calibration.


Planning

Given its understanding of the environment, and trajectory predictions of other traffic participants, the stack now has to plan what it will do itself. The output here is usually a set of coordinates in the world to be reached over the next few seconds.

This can be thought of as semantic maneuvers such as 

  • - “slowly follow the cyclist while waiting for the on-coming traffic to pass”
  • - “take a right turn at the intersection”
  • - “accelerate until the speed limit has been reached”
  • - “brake for the trash can blocking the way”
  • - “perform an emergency brake because of the vehicle that blew past its stop sign”

Planning the ego vehicle motion is based on the prediction of others - improving the prediction of other behavior with accurate perception as a result of regular calibration, will necessarily improve the ego motion planning.

 

Actuation

Finally, the actuation is responsible for controlling steering, gas and braking in such a way that the vehicle follows the waypoints provided by the planning stage. 

 

Sensor Calibration: a value proposition

Given this overview of the architecture of an autonomous driving software stack, it becomes apparent why DeepCal places so much emphasis on sensor calibration. ADAS functionality is utterly reliant on an accurate perception stage.

Going through the stack, errors introduced by uncalibrated sensors lead to potentially disastrous outcomes in the subsequent stages - perception places a pedestrian 1m further from the road than she actually is, prediction mistakenly assumes that she will not cross the road, planning does not brake as its safety margin is not violated.

The correct functionality of the autonomous vehicle is predicated upon the accurate perception of its environment. Accurate perception, necessarily requires accurately calibrated sensors! 

Subscribe to our Newsletter                 Request Demo 

Add comment below