Blank page

The Theory of Active Perception (TAPe)

Describes—with the Help of Group Theory—the Way the Human Brain Perceives Information, while Discovery of the Theory of Isomorphism between TAPe and Natural Human Language Suggests a New Information Processing Method

We have developed the Theory of Active Perception, TAPe, that describes the way the human brain perceives information. The Theory is based on a mathematical model that relies on group theory. We have also discovered the isomorphism between TAPe and natural human language. All of that suggests a new information processing method that could be applied in a wide variety of areas.

At this stage, it is computer technology we are referring to in the first place. New principles of building both neural network and computer processor architecture might be possible. As an additional example, technologies developed using TAPe can help make a leap forward in computer vision: the Theory can be used to create algorithms that will be able to recognize any image with the same apparent ease as the human brain does. While what is referred to as artificial intelligence uses large image databases to learn to recognize objects in a class (for example, faces, fingerprints, tiger skin patterns, and so on, with each separate class requiring a separate AI instance), our brain can accomplish all of those tasks at once. Using TAPe, we create a technology that is able to do about the same.

¹ Strictly speaking, it is not AI, but a neural network–based classifier. However, we will henceforth stick with a more colloquial term, which is more widespread and routinely understood by everyone.

Brief Description of the Core Principles in the Theory of Active Perception

We will refrain from giving a detailed mathematical description of the Theory here, as it constitutes our know-how. Nevertheless, we are going to disclose its basic principles.

We believe that the Theory of Active Perception mathematically describes what is referred to as the language of thought ² . It is important to understand that the brain, naturally, does not make use of the mathematics we are familiar with, and that it is not a computer dealing with 1's and 0's. But should we draw a parallel with computers, the brain rather deals with elements and symbols that constitute a system, a kind of “alphabet” that we decided to call languagemathics. We will take the liberty of using this newly coined term, as we are convinced that it allows describing the essence of brain processes used to perceive information in the most

² The first person to have put forward a language of thought (mentalese) hypothesis, as far back as in the 1970s, was Jerry Fodor, an American philosopher and psycholinguist. He also suggested that the internal mental language is a means of coding information, while the predicates of that language are innate. Fodor’s hypotheses chime in with generative linguistics and innate language structure theory developed by Noam Chomsky, an American linguist.

accurate manner. What can be conventionally referred to as language elements (“letters”) interact with one another according to the mathematical laws of group theory, thus generating new, more complex elements (“words” and “sentences”). And it is this process of the elements interacting with one another and generating new elements that the Theory of Active Perception describes.

Key Components of the Theory of Active Perception

The Theory of Active Perception uses a finite number of elements that, according to certain laws, are pooled in groups at three different levels. The first-level elements amount to a couple of dozens, they can merge with one another and generate second-level elements. The second-level elements are already estimated at a couple of hundreds, and they can also merge with one another and generate third-level elements. The third-level elements, in their turn, already stand at a few tens of thousands, and they represent more complex objects. It is with their help that information is recognized, for example an image. There is a minimally sufficient number of the first-level, second-level, and third-level elements, meaning they make up the exact amount necessary to perceive any piece of information.

Now, how does it relate to the human brain? Based on TAPe, we believe that the human brain uses certain filters to perceive information (visual information, for example). In TAPe, those filters are represented by the first-level elements. They constitute raw data—those exact features used in recognition technology. To recognize an image, the brain needs a minimum number of those feature filters. Apparently, when the human visual analyzer perceives (“sees”) certain information, the filter “assumes” a part of the information load, and this information is used in a neural network. We neither know how exactly that happens, nor is that important to us. What is important, though, is that any kind of visual information, according to TAPe, can be broken down into the first-level elements.

We believe that music is a good analogy for this type of element structure and interaction between them. Thus, the first-level elements in music are represented by notes.

The second-level elements are constituted, in the first place, by the laws according to which the first-level elements form certain connections in certain sequences between themselves, so that they get joined in groups of elements. Those laws, together with groups of elements resulting from connections between the first-level elements, make up the second-level elements. Were we to go further with the musical analogy, then the second-level elements would be the chords built with notes. The notes are combined according to a certain law, otherwise there would be no chords.

Finally, the third-level elements that describe any visual information to a T result from joining or combining the second-level elements. Sometimes they can be made up only of the first-level elements or of combinations of the first-level and second-level elements (while the second-level elements can only be made up of several first-level elements). Those variations amount to a few tens of thousands, which is not that many and is already enough to recognize any image at all, even under conditions of a priori uncertainty.

In our musical analogy, it is the third-level elements that would constitute the music itself. Music results from combinations of chords and/or notes. Sometimes music can be made up of repetitions of a single note, for example the C note.

Laws Governing TAPe Elements

Group theory serves as the basis for mathematical description of the Theory of Active Perception. Group elements are interconnected in such a way that one level of elements generates another level of elements. Relations between those elements are antitransitive .
Antitransitivity leads to a rigid hierarchy of elements: they follow a single possible pattern depending on the values they take. Knowing how the first-level elements have behaved, we can surely tell what will become of them further—what second-level and third-level groups will be activated. That is useful, among other things, for recognition speed: both for when the characteristic features (attributes, traits) are set and for when the actual recognition takes place.

Hierarchical Element Structure

So, if we know the first-group values, we can pin down what the second-level and third-level groups of elements will be.

Using the first-level, second-level, and—even more so—third-level elements, it is possible to recognize any image at all. It is very likely that the brain does not need to make calculations up to the third level every time: we do not scrutinize an object each and every time, slight recognition is often enough. Besides, the brain is able to build the image of an object that we have seen many times before without resorting to deep recognition. With all the seeming diversity of the existing images, their number is finite insofar as the number of words in a language is finite as well. The number of the third-level elements is sufficient for the brain to be able to recognize any images, even under conditions of a priori uncertainty. Modern computer vision technology, unlike the human brain, cannot recognize images under conditions of a priori uncertainty. On the contrary, it requires, if you will, “a priori certainty”, meaning the neural network “must know” what exactly and where it is trying to find. Again, the brain can very well do without it.

So, TAPe can help develop technologies to be used to build recognition algorithms for any image in any class without both prior learning and prior tasking. Learning will be happening while the recognition process is underway, as it happens to people who learn as they live and who, in the process of such natural learning, often “re-solve” the same recognition tasks over and over again.

Isomorphism between the Theory of Active Perception and the Language of Thought

While working on the Theory of Active Perception, we have noticed that its structure is similar to that of a natural language (that is, a language used by people for communication). This similarity got us interested, and we dug deeper into the theories on the origin of language: in particular, we studied the works by Noam Chomsky, Jerry Fodor, Svetlana Burlak, and researchers representing allied sciences, as well as by philosophers who addressed the issues of information perception. And it is thanks to our Theory of Active Perception that we discovered the isomorphism between the Theory of Active Perception and the natural language. The structures of those two systems are isomorphic—that is, they are similar to one another.

Why is this isomorphism so important? Because, firstly, it confirms the Theory of Active Perception and, secondly, studying or analyzing the structure of the natural language will help progress faster towards studying the possibilities of using the Theory of Active Perception in computer vision.

When we refer to isomorphism between TAPe and the natural language, we imply what follows:

● The elements in the natural language, similarly to those in the Theory of Active Perception, are grouped together according to certain laws at three different levels; those laws are the same for both systems.
● In the natural language, the first-group elements interact with one another according to certain laws and generate the second-group elements, which, in their turn, generate the third-group elements—exactly as it happens with the TAPe elements as well.
● Even the number of elements in the natural language and in TAPe is roughly the same, though it is the isomorphism between elements and connections that matters rather than their number being equal.

Why is it that any person is able to acquire any language from birth, how exactly does the human brain perceive a complex system such as the grammar of a language, what exact laws govern the way the word-like elements are grouped together in a language—those are the questions that Noam Chomsky (together with thousands of other researchers around the world) tried (and is still trying) to answer. But he did not go further than developing a set of rather general concepts in terms of why it is that different elements of the language of thought interact with one another in this exact way and generate new elements (meanings).

But his theories and concepts in what regards the origin and organization of language drew our attention to the isomorphism between the Theory of Active Perception and the language of thought. The similarity of structures in the Theory of Active Perception and the language is not surprising: people have an innate ability to perceive the language, from birth they are capable of discerning human speech from any other noises,

Laws Governing TAPe Elements

Group theory serves as the basis for mathematical description of the Theory of Active Perception. Group elements are interconnected in such a way that one level of elements generates another level of elements. Relations between those elements are antitransitive .
Antitransitivity leads to a rigid hierarchy of elements: they follow a single possible pattern depending on the values they take. Knowing how the first-level elements have behaved, we can surely tell what will become of them further—what second-level and third-level groups will be activated. That is useful, among other things, for recognition speed: both for when the characteristic features (attributes, traits) are set and for when the actual recognition takes place.

Hierarchical Element Structure

So, if we know the first-group values, we can pin down what the second-level and third-level groups of elements will be.

Using the first-level, second-level, and—even more so—third-level elements, it is possible to recognize any image at all. It is very likely that the brain does not need to make calculations up to the third level every time: we do not scrutinize an object each and every time, slight recognition is often enough. Besides, the brain is able to build the image of an object that we have seen many times before without resorting to deep recognition. With all the seeming diversity of the existing images, their number is finite insofar as the number of words in a language is finite as well. The number of the third-level elements is sufficient for the brain to be able to recognize any images, even under conditions of a priori uncertainty. Modern computer vision technology, unlike the human brain, cannot recognize images under conditions of a priori uncertainty. On the contrary, it requires, if you will, “a priori certainty”, meaning the neural network “must know” what exactly and where it is trying to find. Again, the brain can very well do without it.

So, TAPe can help develop technologies to be used to build recognition algorithms for any image in any class without both prior learning and prior tasking. Learning will be happening while the recognition process is underway, as it happens to people who learn as they live and who, in the process of such natural learning, often “re-solve” the same recognition tasks over and over again.