« A common semantic code would make it possible to achieve a de-fragmentation of the global memory and an integration of symbolic and statistical AI »
Today, artificial intelligence is divided between two major trends: symbolic and statistical. The symbolic branch corresponds to what has been successively called in the last 70 years semantic networks, expert systems, semantic web and more recently, knowledge graphs. Symbolic AI codes human knowledge in the form of networks of relationships between concepts ruled by models and ontologies which give leverage to automatic reasoning. The statistical branch of AI trains algorithms to recognize visual, linguistic or other forms from large masses of data, relying on neural models roughly imitating the learning mode of the brain. Neuro-mimetic artificial intelligence has existed since the beginnings of computer science (see the work of McCulloch and von Foerster) but has only become useful because of the increase in computing power available since 2010. In the early 2020s, these two currents are merging according to a hybrid or neuro-symbolic model which seems very promising. Though many problems still remain, in terms of the consistency and interoperability of metadata.
Big tech companies and a growing number of scientific, economic and social sectors use knowledge graphs. Despite the availability of the WWW Consortium metadata standards for marking classifications and ontologies (RDF, OWL) the different sectors (see the slide below) do not communicate with each other and – even worse – divergent systems of categories and relationships are most often in use within the same domain. The interoperability of metadata standards – such as RDF – only addresses the compatibility of digital files. It should not be confused with true semantic interoperability, which addresses concept architectures and models. In reality, the problem of semantic interoperability has yet to be solved in 2021, and there are many causes for the opacity that plagues digital memory. Natural languages are multiple, informal, ambiguous and changing. Cultures and disciplines tend to divide reality in different ways. Finally, often inherited from the age of print, the numerous metadata systems in place to classify data are incompatible like thesauri, documentary languages, ontologies, taxonomies, folksonomies, sets of tags or hashtags, keywords, etc.
There is currently no way to code linguistic meaning in a uniform and computable way, the way we code images using pixels or vectors for instance. To represent meaning, we are still using natural languages which are notoriously multiple, changing and ambiguous. With the notable exception of number notation and mathematical codes, our writing systems are primarily designed to represent sounds. Their representation of categories or concepts is indirect (characters → sound → concepts) and difficult for computers to grasp. Computers can handle syntax (the regular arrangement of characters), but their handling of semantics remains imperfect and laborious. Despite the success of machine translation (Deep L, Google translate) and automatic text generation (GPT3), computers don’t really understand the meaning of the texts they read or write.
Now, how can we resolve the problem of semantic interoperability and progress towards a thorough automatic processing of meaning? Many advances in computer science come from the invention of a relevant coding system making the coded object (number, image, sound, etc.) easily computable. The goal of our company INTLEKT Metadata Inc. has been to make concepts, categories or linguistic meaning systematically computable. In order to solve this problem, we have designed the Information Economy MetaLanguage: IEML. This metalanguage has a compact dictionary of less than 5000 words. IEML words are organized by subject-oriented paradigms and visualized as keyboards. The grammar of this metalanguage is completely regular and embedded in the IEML editor. Thank to this grammar, complex concepts and relations can be recursively constructed by combining simpler ones. It is not a super-ontology (like Cyc) but a programmable language (akin to a computable Esperanto) able to translate any ontology and to connect any possible categories. By using such a semantic code, artificial intelligence could take a giant step forward feeding collective intelligence. Public health data from all countries would not only be able to communicate with each other, but could also harmonize with economic and social data. Occupational classifications and different international labour market statistics would automatically translate into each other. The AI of smart contracts, international e-commerce and the Internet of Things would exchange data and execute instructions based on automatic reasoning. Government statistics, national libraries, major museums and digital humanities research would feed into each other. On the machine learning side, we would reach a system of uniform and precise labels and annotations that would help AI to become more ethical, transparent, and efficient. A common semantic code would make it finally possible to achieve a de-fragmentation of the global memory and an integration of symbolic and statistical AI. The only price to pay for reaching neuro-symbolic collective intelligence would be a concerted effort for training specialists to translate metadata into IEML.