Knowledge Representation and Semantic Interoperability with IEML

Today, the whole world is rushing toward statistical AI, Neural Models and/or Generative AI. But we know that, though these models are useful, we still need symbolic models or, if you prefer, Knowledge Graphs, especially in knowledge management.

But why exactly do we still need symbolic models in addition to neural models? Because symbolic models represent knowledge in an explicit way, which has many benefits, like transparency and explainability.

In this talk, I am going to advocate for semantic (or conceptual) interoperability between knowledge graphs, and I will present IEML, a language that I have invented at the Canada Research Chair in Collective Intelligence with the help of my team of engineers.

Being familiar with the field of knowledge management you know there is a dialectic between implicit knowledge (in blue in Figure 1) and explicit knowledge (in red in Figure 1). But is there a dialectic between symbolic and neural models today? I don’t think so.

Figure 1

There are currently two prominent ways to process data (for knowledge management).

  • Via neural models, based mainly on statistics, for decision support, automatic understanding, and data generation
  • Via symbolic models, based on logic and semantics, for decision support and advanced search.

These two approaches, generally separate, correspond to two different engineering cultures. Because of their advantages and disadvantages, people are trying to combine them.

Now, let’s clarify the difference between « neural » and « symbolic » models and compare them to neural and symbolic cognition in human beings.

Neural Models. The big plus with neural models is their ability to automatically synthetize and mobilize a huge digital memory « just in time », or « on demand », which is impossible for a human brain to do. But their pattern recognition and generation process is statistical, meaning they can’t organize a world, they can’t conserve objects, they have no understanding of time and causality, or space and geometry. And they can’t always recognize image transformations of the same object the way living beings can.

By contrast, real living neurons can do things current formal neurons can’t do.  Animals, even without having symbolic models, just with their neurons, can model the world, use concepts, conserve objects despite their transformations, they grasp time, causality, space, etc. As for human brains they are able to run symbolic systems, like languages.

Symbolic Models. The positive aspect of AI symbolic models, or Knowledge Graphs, is that they are explicit models of the world (more precisely, a local practical world). They are in principle self-explanatory (if the model is not too complex), and they have strong reasoning abilities, so they are pretty reliable.

But there are two main weaknesses in current symbolic models.

  • Their design is time consuming (expensive in terms of specialized labor)
  • They have neither « concept conservation » nor « relation conservation » across ontologies/domains. In any given particular domain, every concept and relation have to be logically defined one by one.

While there is interoperability at the file formats level for semantic metadata (or classification systems) – like RDF or JSN LD – this interoperability does not exist at the semantic level of concepts, which compartmentalizes knowledge graphs, hindering collective intelligence.

By contrast, in real life, humans coming from different trades or knowledge domains understand each other by sharing the same natural language. In human cognition, a concept is determined by a network of relations inherent to natural languages.

But what do I mean by « the meaning of a concept is determined by a network of relations inherent to any natural language» ? What is this network of relations? And why am I pointing this out in this talk? Because current symbolic AI is missing the semantic aspect of human language. Let’s do a little bit of linguistics here so we can understand this deficiency better.

Any natural language weaves three kinds of semantic relations : interdefinition, composition and substitution.

Any word is defined by a sentence which involves other words, themselves defined the same way. For instance, a dictionary embraces a circular or tangled inter-definition of concepts.

Then, thanks to grammar rules, we can compose original sentences and understand new meanings.

Finally, not every word in a sentence can be replaced by any other; there are rules for possible substitutions that contribute to the meaning of words and sentences.

Figure 2: «I am painting the small room in blue»

You understand the sentence « I am painting the small room in blue » (See Figure 2) because you know the definitions of each word, you are aware of the grammatical rules giving each word their role in the sentence, and you know how to substitute a word by another. It is called inguistic semantics.

These relationships of inter-definition, composition and substitution between concepts don’t have to be defined one by one every time you speak about something. It’s all included in the language. Unfortunately, we don’t have any of these semantic functions when we build current knowledge graphs. And this is where IEML could improve symbolic AI and knowledge management.

To support my argumentation for a new method in building symbolic models, it is important to distinguish between linguistic semantics and referential semantics. Linguistic semantics are about the relations between concepts, as we have seen in the previous slide. Referential semantics are about the relations between propositions and states of things, or between proper nouns and individuals.

If linguistic semantics weave relations between concepts, why can’t we use natural languages in symbolic models? We all know the answer. Natural languages are ambiguous (grammatically and lexically) and machines can’t disambiguate meaning according to the context. In current symbolic AI, we cannot rely on natural language to organically generate semantic relations.

So, how do we build a symbolic model today?

  • In order to define concepts, we have to link them to URIs (Uniform Resource Identifiers) or web pages, according to referential semantics.
  • But because referential semantics are inadequate to describe a network of relations, instead of relying on linguistic semantics, we have to impose semantic relations on concepts one by one.

This is why the design of knowledge graphs is so time consuming and why there is no general semantic interoperability of knowledge graphs across ontologies or domains. Again, I am speaking here of interoperability at the semantic or conceptual level and not at the format level.

In order to alleviate the shortcomings of current symbolic models, I have constructed a metalanguage that has the same advantages as natural languages, namely an inherent mechanism for building semantic networks, but without their disadvantages, since IEML is unambiguous and calculable.

IEML (the Information Economy MetaLanguage), is a non-ambiguous and computable semantic metalanguage that includes a system of inter-definition, composition and substitution of concepts. IEML has the expressive power of a natural language with an algebraic structure, making it fully computable. IEML is not only computable in its syntactic dimension but also in its linguistic semantic dimension. Its semantic relations, (in particular, its composition and substitution relations), are computable functions of its syntactic relations.

IEML has a completely regular and recursive grammar with a 3,000 word dictionary organized in paradigms (systems of substitution), allowing the (recursive, grammatical) construction of any concept. Any concept can be created from a small number of lexical building blocks with simple universal composition rules.

With each concept automatically determined by composition and substitution relations with other concepts, and by using the grammar and dictionary’s words for definitions, IEML is its own metalanguage. The dictionary has been translated in French and English and could translate any natural language.

This invention will facilitates the design of knowledge graphs and ontologies, by ensuring semantic interoperability and by fostering collaborative design. Indeed, IEML is based on a vision of digital-based collective intelligence.

IEML allows an innovative and integrated architecture, overcoming the limitations and current divide of symbolic and neural models.

Figure 3

Figure 3 introduces a new semantic architecture for knowledge management (KM) made possible by IEML, an architecture that brings together neural and symbolic models.

The only thing that can generate all the concepts we need to express the complexity of knowledge domains, while maintaining mutual understanding, is a language. But natural languages are irregular and ambiguous, and their semantics cannot be computed. IEML is a univocal and formal algebraic language (unlike natural languages) that can express any possible concept (like in natural languages), with its semantic relations densely woven in a built-in mechanism. We can use IEML as a semantic metadata language to express any symbolic model *and* we can do it in an interoperable way. Again, I mean conceptually interoperable. With IEML, all symbolic models can exchange knowledge modules, and reasoning across ontologies becomes the norm.

Now, how can neural models be used in this new architecture? They could automatically translate natural language into IEML, with no extra work or learning for the layman. Neural models could even translate informal descriptions in natural language into formal models expressed in IEML.

Prompts expressed in IEML behind the scene would make data generation more controllable.

We could also use neural models to classify or label data automatically in IEML. Labels or tags expressed in IEML would support more efficient machine learning because the units or “tokens” taken into account would no longer be sound units—characters, syllables, words— in natural languages, but concepts generated by a semantic algebra.

What are the main advantages of the Integrated knowledge management architecture using IEML as a semantic coordinate system?

Symbolic and neural models would work together for the benefit of knowledge management.

A common semantic coordinate system would help the pooling of models and data. Symbolic models would be interoperable and easier to design and formalize. Their design would be collaborative across domains.

It would also improve intellectual productivity by allowing a partial automation of conceptualization.

Neural models would be based on data labeled in IEML and therefore be more transparent, explainable and reliable. This is important not only from a technical point of view but also from an ethical point of view.

Finally, this architecture would foster diversity and creative freedom, since the networks of concepts – or knowledge graphs – formulated in IEML can be differentiated and complexified at will.

REFERENCES FOR IEML

Scientific paper (English) in Collective Intelligence Journal,2023 
https://journals.sagepub.com/doi/full/10.1177/26339137231207634

Scientific paper (French) in Humanités numériques, 2023 
https://journals.openedition.org/revuehn/3836

Website: https://intlekt.io/

Book: The Semantic Sphere, Computation, Cognition and Information Economy. Wiley, 2011 

Publié par Pierre Lévy

Assiociate Researcher at the University of Montreal (Canada), Fellow of the Royal Society of Canada. Author of: Collective Intelligence (1994), Becoming Virtual (1995), Cyberculture (1997), The Semantic Sphere (2011) and several other books translated in numerous languages. CEO of INTLEKT Metadata Inc.

Laisser un commentaire