AI for perception

Perception in artificial intelligence refers to a machine's ability to integrate sensory data in addition to conventional computing methods. This junction enables AI models to react by taking into account the environment in which they evolve. By equipping it with perception processes, AI is able to process information from different sensory sources (sight, hearing, touch or even smell).

AI that 'sees' and 'hears'

Perception is one of the major challenges facing computer science, as it involves modelling sensory abilities such as sight, hearing, touch and smell.

Inspired by human cognition, AI research is attempting to reproduce these mechanisms to analyse images or understand speech.

The first approaches, known as Symbolic AI try to copy the brain using strict logical rules (e.g. "IF" ... "THEN").

Perceptron invented in 1957 by Frank Rosenblatt to mimic a biological neuron, is an important step in this research but remains limited.

The most recent breakthrough came with Convolutional Neural Networks (CNN). A CNN analyses an image using filters to detect simple shapes (lines, curves), then breaks them down into increasingly complex patterns (ears, muzzles). The AI model then gives the probability of belonging to a category.

To recognise cats or dogs, we don't give the machine any explicit rules. Instead, we provide it with thousands of examples of labelled images which are manually annotated "cat" or "dog".

Thanks to backpropagation of gradients, the model then gradually adjusts to isolate the characteristics specific to each category.

Before the arrival of modern CNNs in 2012, the best software was wrong 1 time out of 4. In just a few years, this rate has dropped to less than 3%.

Please note: AI doesn't "get it what it analyses: it is based solely on regularities.