Theory of Active Perception (TAPe)

We have developed the Theory of Active Perception, TAPe, that describes the way the human brain perceives information. Researchers from various fields of science (neurobiologists, linguists, psychologists, etc.) often mention an innate mechanism of perception used by the brain to process data. In the mid-'70s, the hypothesis of Language of Thought as an inborn mechanism of perceiving information was formulated.

Theory of Active Perception describes some of the laws of Language of Thought. We have also discovered the isomorphism between TAPe and natural human language. This suggests a new information processing method several times faster and more efficient than currently known technologies.
At this point, it is important to clarify that we separate the concepts of "perception" and "information". The chain can be described roughly as follows: reality — perception — information — processing. To turn information into data, various technologies transform / transcode / encode / convert it in the required format, and only then this data can be used for solving tasks. At the stage of information conversion, meaningful connections present in the original reality are lost — conversion dramatically impoverishes the usefulness of the collected data.

Information can be obtained with much more meaningful connections, while spending much less resources — so much so, that instead of the binary system, which is fundamental for functioning of all devices today, a new system is needed — with other elements interacting based on other laws.
Brief Description of the Key Principles of TAPe
Brief Description of the Key Principles of TAPe
TAPe is based on group theory, Lie algebra, heterarchy of elements, antitransitivity, linguistic means of interrelations between elements, etc.
It is essential to understand that the brain, while performing calculations, does not make use of the traditional mathematics with its roots, integrals, functions — after all, it is not a computer dealing with 1's and 0's. But should we draw a parallel with computers, the brain rather deals with elements and symbols that constitute a system, a kind of "alphabet" that we decided to call languagemathics. We use this newly coined term as we are convinced that it allows describing the essence of brain processes used to perceive information in the most accurate manner. What can be conventionally referred to as language elements ("letters") interact with one another according to mathematical laws, thus generating new, more complex elements ("words" and "sentences"). In TAPe’s languagemathics, we denote elements using such categories as operators, filters and groups (depending on their hierarchy or, more precisely, heterarchy) — the elements themselves are called T-bits. And it is this process of the elements interacting with one another on different levels that the Theory of Active Perception describes.
For example, group elements are interconnected in such a way that one level of elements generates another level of elements. Relations between those elements are antitransitive.
Antitransitivity leads to a certain hierarchy of elements: they follow a single possible pattern depending on the mutual values they take. The number of elements in TAPe is minimally sufficient — that is, exactly as many as we need to perceive (and recognize) any visual information. We believe that the human brain perceives visual information as TAPe describes it. Apparently, when the human visual analyzer perceives ("sees") certain information, a hypothetical element (in TAPe — the filter) "assumes" a part of the information load, and this information is used in the brain’s neural network. Unlike recognition technologies, which require a map with numerous key image features, the brain needs a minimum number of such filters to recognize an image. In this case, it is likely that the brain does not need to perform "calculations" every time — there’s no need to continuously stare at an object that can be recognized superficially. Moreover, the brain is capable of completing the image of an object that we have seen repeatedly without deep recognition.
We are specifically referring to visual information, because the mathematical methods of TAPe have been only worked out for images at this point. However, we are sure that the Theory of Active Perception can be applied to any type of information in general — and its isomorphism with natural language only confirms it.
Isomorphism between the Theory of Active Perception and natural language
Isomorphism between the Theory of Active Perception and natural language
While working on the Theory of Active Perception, we have noticed that its structure is similar to that of a natural language (that is, a language used by people for communication), or even a particular group of languages. This is how we discovered the isomorphism between TAPe and natural language.
Isomorphism between TAPe and language:
Isomorphism between TAPe and language:
● Hierarchy / heterarchy: elements of language, just like elements of TAPe, are combined into groups at different levels. A heterarchical structure means that the elements of the system interdefine each other.
● Connections: elements of language, just like elements of TAPe, interact with each other according to certain laws. TAPe describes those laws, which are similar for both the Theory and natural languages.
● Number of elements: the number of elements at different levels in a language as well as in TAPe is roughly the same. There is no exact match, because any language is a free, rather than a strict system, unlike mathematical theory.
Innate mechanism of language perception
Innate mechanism of language perception
Innate language perception mechanism is discussed by Noah Chomsky, for example. Why is it that any person is able to acquire any language from birth, how exactly does the human brain perceive a complex system such as the grammar of a language, what exact laws govern the way the elements are grouped together in a language — those are the questions that Noam Chomsky together with hundreds of other researchers around the world are trying to answer. In particular, in the middle of the 20th century he put forward several hypotheses and theories that determined the development of linguistics for decades to come. However, Chomsky did not go beyond the general concepts of why the different elements of the language interact with one another and generate new elements (meanings) in a specific way.

In his works, Chomsky does not use the term "Language of Thought", but puts forward a hypothesis that language, as an innate system, started, at some point in history, to be used by people as a tool for thought in the first place and only later — as a means of communication. This hypothesis is contested: there’s a widespread view that language appeared as a means of communication first. However, we tend to agree with Chomsky — this hypothesis fits better with Fodor’s more general notion of the Language of Thought.

Language of Thought is a kind of a data perception mechanism innate in the human brain. And when Chomsky speaks of human innate ability to assimilate any natural language through a universal grammar that is somehow "built into" our brains since birth, it is obvious to us that a more general notion needs to be introduced. We propose to use Language of Thought as a general term. In fact, TAPe describes part of the principles of that Language — we call these principles languagemathics.
Isomorphism between TAPe and natural language allows us to argue that humans have a single innate mechanism of perception — but not only for languages (as Chomsky suggests), but for any data in principle.
TAPe in Computer Vision
TAPe in Computer Vision
Modern computer vision technologies are quite limited, yet require a lot of financial, labor, intellectual effort and time to solve tasks - the more complex the task, the more resources are required. Many solutions are heralded as real breakthroughs, while in fact remaining primitive relative to the recognition capabilities of the human brain. If the Terminator was running on modern CV technologies, his head would be the size of a house - in reality, processing that much information the way it is done in the movie would require an immense amount of resources today.

TAPe offers a major reduction in the amount of resources required to solve computer vision tasks of varying complexity. For example, TAPe enabled us to develop a reverse video search technology that can search and recognize thousands of video clips from thousands of TV channels, film libraries and video hosting sites in real time. All it takes is one server with no GPUs.
Recognition without convolution
Recognition without convolution
One of the reasons behind such efficiency is the absence of convolution in TAPe algorithms, which is the most resource-intensive operation in the field of computer vision. TAPe-based technology, similarly to the human brain, processes any image right away as a whole.
Simultaneous reading of key features
Simultaneous reading of key features
The second reason for the technology’s efficiency is that it can simultaneously get a map of any image’s key features at any level of detail. "Simultaneously" means that the features are read all together, and the number of those key features is minimally sufficient to solve any computer vision tasks.

By modeling the way the brain works, TAPe "reads" the features needed to recognize an image all at once — according to TAPe, this is exactly how the brain recognizes information. An image (in the broadest sense possible) read by the human visual analyzer is "automatically" broken down by the brain into those very features that are constant and do not change, irrespectively of the task. TAPe does not require breaking down the image into pixels — according to the Theory, any object (image) has a sufficient number of minimum features. It was TAPe that helped us develop an algorithm allowing the reading of those features, too.
Working under a priori uncertainty
Working under a priori uncertainty
Modern computer vision technologies, unlike the human brain, cannot recognize images under conditions of a priori uncertainty — on the contrary, they require "a priori certainty", meaning the neural network "must know" what exactly and where it is trying to find. That is why neural networks work with a human-prepared sample in one way or another. At the same time, TAPe-based technologies, just like the human brain, do not need such a sample: they know how to work in conditions of a priori uncertainty.
Conclusions
Conclusions
TAPe can help develop technologies to be used to build recognition algorithms for any image in any class without both prior learning and prior tasking. Learning will be happening while the recognition process is underway, as it happens to people who learn as they live, and who, in the process of such natural learning, often “re-solve” the same recognition tasks over and over again.

However, it’s more than just computer vision. We can now discuss new principles of architecture for both neural networks and computer processors, arithmetic-logic devices (ALUs), data centers with fundamentally new ways of information management, etc.
Made on
Tilda