The Theory of Active Perception uses a finite number of elements that, according to certain laws, are pooled in groups at three different levels. The first-level elements amount to a couple of dozens, they can merge with one another and generate second-level elements. The second-level elements are already estimated at a couple of hundreds, and they can also merge with one another and generate third-level elements. The third-level elements, in their turn, already stand at a few tens of thousands, and they represent more complex objects. It is with their help that information is recognized, for example an image. There is a minimally sufficient number of the first-level, second-level, and third-level elements, meaning they make up the exact amount necessary to perceive any piece of information.
Now, how does it relate to the human brain? Based on TAPe, we believe that the human brain uses certain filters to perceive information (visual information, for example). In TAPe, those filters are represented by the first-level elements. They constitute raw data—those exact features used in recognition technology. To recognize an image, the brain needs a minimum number of those feature filters. Apparently, when the human visual analyzer perceives (“sees”) certain information, the filter “assumes” a part of the information load, and this information is used in a neural network. We neither know how exactly that happens, nor is that important to us. What is important, though, is that any kind of visual information, according to TAPe, can be broken down into the first-level elements.
We believe that music is a good analogy for this type of element structure and interaction between them. Thus, the first-level elements in music are represented by notes.
The second-level elements are constituted, in the first place, by the laws according to which the first-level elements form certain connections in certain sequences between themselves, so that they get joined in groups of elements. Those laws, together with groups of elements resulting from connections between the first-level elements, make up the second-level elements. Were we to go further with the musical analogy, then the second-level elements would be the chords built with notes. The notes are combined according to a certain law, otherwise there would be no chords.
Finally, the third-level elements that describe any visual information to a T result from joining or combining the second-level elements. Sometimes they can be made up only of the first-level elements or of combinations of the first-level and second-level elements (while the second-level elements can only be made up of several first-level elements). Those variations amount to a few tens of thousands, which is not that many and is already enough to recognize any image at all, even under conditions of a priori uncertainty.
In our musical analogy, it is the third-level elements that would constitute the music itself. Music results from combinations of chords and/or notes. Sometimes music can be made up of repetitions of a single note, for example the C note.