Semantic Computing with IEML

This blogpost reproduces my keynote speech at IJCLR’22 & NeSy’22.

The Video (45 mn) is here. A longer academic paper diving deep into linguistics, collective intelligence and artificial intelligence issues can be found here.

The topic of this communication is IEML, a language that I have invented, called the Information Economy Metalanguage. This language has the expressive power of a natural language and it is also a mathematical category, or an algebra.

Introduction

Like several philosophers before me (Leibniz, Peirce, Saussure, Wittgenstein, Chomsky…), I have been passionate since my early youth about the question of meaning : What is it? How does it work? And how far had we gone in the mathematization of language and in the calculation of concepts? When I say mathematization of language, I mean algebraic modelling, not statistical approximation.

In the early 1990’s, I understood that the internet would become the main communication medium. It became obvious for me that the growth of computing power would augment language and cognition, like writing, printing and electronic media’s revolutions did previously. Even before the advent of the web I knew that the future of text would be dynamic hypertext. And finally, I foresaw that a common digital memory would be able to support a new kind of large scale collective intelligence.

If all this was true, we needed to create a unifying semantic coordinate system, a digital native language that would lend itself to calculation and programming. What remained was to solve the problem of the mathematization of semantics. But what are semantics, exactly? Let’s distinguish between three aspects of semantics: first a pragmatic aspect, second a referential or logical aspect, and third a linguistic aspect. These three aspects are simultaneous and interdependant.

For the pragmatic semantics the meaning of a speech is its effect on a social situation. It depends on social games, it is about relevance, and it is a matter of sociology and game theory
For the referential semantics, the meaning is the reality or the reference that is designated by the speech. It is about truth, and it is a matter of logic.
For the linguistic semantics, the meaning of a speech emerges from the sense of its words and their grammatical organization: it is mainly about conceptualization, and it is a matter of linguistics.

Linguistic semantics are obviously the basis on which the other aspects are built, and that is why I decided to construct a language that would have a univocal, transparent and computable linguistic semantics.

First IEML has the same expressive power as a natural language. In particular, it allows the construction of complex recursive sentences that may be used for reasoning and complex causal modelling. It is able to explain itself (IEML is its own metalanguage), and it can translate any other natural and specialized language.

At the same time, its semantics are computable, which means that IEML sequences of phonemes (or chains of characters) self-decode into networks of concepts and that these concepts and their relations can be read in natural languages. It also supports algorithms for the creation and recognition of concepts networks (ontologies, knowledge graphs, domain specific languages, etc.). You don’t create the nodes and the links of semantic networks one by one but through the use of algebraic functions. We’ll see some examples later.

Currently, I am happy to say that, after more than twenty five years of research and development, the construction of the language is finished and that we have an editor. The langage includes a dictionary of three thousand words (chose « published projects ») and a completely functional and regular grammar. The editor is complete with a parser and all kinds of practical functions to create and explore conceptual networks.

_{Note that the explanation of IEML sentences and ontologies will come after the following section about the words of the dictionary.}

The Dictionary

The dictionary of this language contains three thousand words, representing elementary concepts. The number is small to facilitate the calculations on the machine side and the cognitive manageability on the human side. The words are organized into 120 paradigms.

Here are a few examples of the paradigms topics:

climates & landscapes;
continents & regions;
oceans & seas;
sky & meteorology;
animals & plants;
countries & states;
technical functions;
anthropological functions;
types of relations;
calendar time units;
life cycles;
generative mutations;
values & ideas;
signs & semiotic functions;
data curation & critical thinking;
complex feelings;
personality types;
body parts, etc.

The three thousand words can be used as a basis for defining (recursively) all possible and imaginable concepts by means of sentences.

The general problem in building the dictionary was: how to create the concepts that will be the most useful for the creation of new concepts? In a way it is a bootstrapping problem. You can see here the six basic symbols corresponding to the most elementary concepts.

It is important to note that these symbols correspond to three types of symmetry, unary for emptiness, binary for virtual/actual and ternary for sign/being/thing. The above figure does not represent a sentence but the symmetry structure of the primitives.

The Emptiness expresses the absence, the void, the zero, the silence, the noise (as the contrary of information)…

The virtual denotes the potential, the soul, the abstract, the immaterial or the transcendent dimension of human experience.
The actual represents the effectiveness, the body, what is concrete, tangible, material or any immanent aspect of reality.

This echoes all kinds of dualities : heaven and earth, yin and yang, abstract class and individual element, and so on

A sign is an entity or an event that means something for someone. By extension, the semantic primitive « sign » points to symbols, documents, languages, representations, concepts, and anything that is semantically close to code, message or knowledge.
A being is a subject or an interpreter. It can be a human, a group, an animal, a machine or whatever entity or process endowed with self-reference and interpretation. By extension, « being » refers to psychic interiority, the mind, the ability to conceive or interpret, intentions, emotions, people, their relationships, communities, societies and values.
A thing – when it is labelled by a sign, is often called an object or a referent. By extension, « thing » categorizes what we are talking about, objects (abstract or concrete), contextual elements. It also refers to bodies, tools, technology, material equipment, empowerment, power, and efficiency.

Sign/being/thing corresponds roughly to the semiotic triangle sign/interpreter/reference but also to all kinds of ternarities like syntax, semantics and pragmatics ; proposition, judgement and state of things ; or legislative, judicial, and executive.

In fact, these conceptual symmetries correspond to very old traditions, I did not invent them, I have only collected and compacted them. From these six symbols, I have created the 3000 words organized in 120 paradigmatic tables.

A paradigm, or a paradigmatic table, is akin to the map of a semantic field. Every IEML word belongs to one paradigm, and one paradigm only.

A paradigm of words is generated by a morphological function that combines multiplicative and additive operations. The multiplicative operation has 3 roles: substance, attribute, mode, and it is non-commutative and recursive. A two-dimensional table has two variable multiplicative roles.

In each cell of a table you have two expressions, one in IEML and one in natural language. The expression in natural language is the translation of the IEML word.

In IEML algebraic expressions, letters represent positions in symmetry systems, and punctuation marks represent recursive multiplication layers.

Let’s come back to the problem of words construction. Playing with the six primitive symbols as if they were building blocks, I created new concepts, and I continued recursively on this path. Again, the problem was to create the most general concepts, in order to allow the creation of new concepts in all possible directions of the semantic space.

How could I generate general concepts with my six first symbols, concepts able to cover all the directions of the semantic space? I began with the triad sign / being / thing.

On the left you can see the function that generates this paradigm. You can think of the substance as a figure and the attribute as a background.

_{Note that when some role is empty, we just put E in this role. In other examples, E would have been in role substance or attribute. We cannot remove E because every role must be filled with a primitive. IEML expressions must be « sequences of primitive symbols », with completely regular syntactic rules.} _{Nevertheless, there are rules allowing the elision of E, provided that the parser can read the syntactic structure thank to the punctuation marks.}

On the first row, we have sign in substance. When signs interact with signs, we get interpretation, reasonning, imagination, thoughts: reflection. When signs interact with beings, express beings, and help beings to communicate, it is language. And when the signs are engraved into things, when they are reified in a way or in another, they become memory.

On the second row, we have being in substance. When beings gather by the use of common symbols, (rituals, totems, flags, laws, institutions, music, contracts, languages…) they make society. When beings interact with themselves and with other beings, there is pleasure and pain, joy and suffering, and the whole range of emotions. When the objectivity of things is imbued with the symbolic organization, the values and the work of beings, it becomes a livable world.

On the third row, we have thing in substance with the usual variation sign /being/thing in attribute. When the objectivity of things is registered in a propositional sign, it is the truth. When the materiality of things comes to support the being, it is the life. Finally, the interaction of things, their respective positions, their envelopments, their connections form the space.

These nine very elementary concepts represent different points of departure, all equally valid, for the description of human experience. By using the same kind of reasoning, I created eighteen other lower-case letters.

Construction of the 25 lower case letters of IEML

As long sequences of the 6 Upper Case primitive symbols would have been difficult to read and write for humans, some often used sequences of three upper case letters are simplified into twenty-five lower case letters. There are ten vowels and fifteen consonnants to help human reading and understanding.

You can see, on the left of the slide, the function that generates the twenty-five lower case letters. The four colors on the table represent four symmetry systems inherent to the lower case letters. We have already studied the blue section.

I will just comment on the yellow section, with the letters y o e u a i. The first row displays three virtual actions: know (related to sign), want (related to being) and can (related to thing). The second row shows three actual actions : to say or communicate (related to signs), to commit (related to being) and to do (related to thing). For comments on the other sections and more details on the 25 lowercase letters, see https://intlekt.io/25-basic-categories/

Now let’s look at two more examples of paradigms, at higher layers. In the human development paradigm the words combine two letters. The generative function is: (s+b+t+k+m+nd+f+l) × (y+o+e+u+a+i) × (E)

The nine rows correspond to our nine concepts s b t k m n d f l and these concepts are declined (like in « declension ») according to the six types of action that we have seen before in the yellow section.

The six columns corresponds to types of knowledge, types of will or orientations, types of skills, types of signs, types of social roles, and types of tools or technologies. Now let’s have a look at the fifth column (social roles).

The reorganization of a column or row into a new table is automatic in the IEML editor. We find here our nine consonnants in position of substance and the letter a, corresponding to the notion of commitment, is in position of attribute. The generative function is (s+b+t+k+m+nd+f+l) × a × E. This gives us nine basic social roles. The interpreter corresponds to “reflection”, the storyteller to “language”, the scribe to “memory”, the chief to “society”, the parent to “emotion”, the judge to “world”, the researcher to “truth”, the healer to “life” and the guardian to “space”. Of course there are many more social roles in the dictionnary. Another paradigm multiplies these nine by themselves resulting in eighty-one other social roles.

A paradigm of scientific disciplines and sub-disciplines, contains also the objects of study and the name of the specialists. It amounts to three hundred and ninety two words. The table below is just a little part of this paradigm.

An excerpt of the paradigm of scientific disciplines

In this table, corresponding to the function:
(s.y.- + b.y.- + t.y.-) × (y.- + o.- + e.- + u.- + a.- + i.-) × ( s.y.-)
you can see that the rows correspond respectively to philosophy, communication and history. The columns correspond respectively to « science », « politics », « economy », « communication », « sociology » and « technology ». Of course, the common themes have parallels in functional invariants. For example, everything related to history begins by “t.”

In short, every word is part of a paradigm and every paradigm is organized along very regular semantic symmetries that are reflected into syntactic symmetries.

Sentences and Hypertexts

Now, we are going to see how to make sentences and semantic networks in IEML.

On the slide above you have the translation in IEML of the English sentence: » In a house, a mother tells a story from a book, to her child, with love, among joyful laughter, before she falls asleep. » The IEML words are represented by their counterpart in English and it is the structure of the IEML sentence that is underlined here.

Like words, sentences are generated in paradigms by generative functions. And we are going to see an example of a sentence paradigm later. As for the morphological function, the syntagmatic function combines multiplicative and additive operations. But it is not exactly the same operations as in the morphological function.

The additive operations correspond to junctions (like: and, or, but, because, and so on…)
The multiplicative operation has nine roles: the root corresponds to the verb (or to the main noun when it is a noun phrase), the initiator corresponds to the subject of traditional grammar, the interactant corresponds to the object of traditional grammar, there is also a recipient role and five complement roles : causality, time, place, intention and manner.

In an IEML sentence, the grammar distinguishes four kinds of parts:

The concepts are identified by a hash sign. They can be words or nested sentences.
The inflections, are identified by a tilde. They precise the meaning of concepts. For exemple, at the root role, the verb « tell » is precised by an indicative mood and a present tense and the nouns in other roles are precised by a gender, a number, or an article.
The prepositions identified by a star, determine the particular cases of the complements. For exemple, in this sentence, the time role is precised by the preposition « before » and the place role by the preposition « in ».
Finally, the junctions are identified by an ampersand. You can see an example of « and » here at the manner role, on the last line of the sentence.

To give you an idea of the fine nuance you can achieve in IEML, there are eighty inflections, one hundred and thirty one prepositions and twenty nine junctions.

IEML can explicitly handle proper nouns and references that are not general categories. Everything that is inside angled brackets is not a general category but a reference. This is the way to handle proper nouns, numbers and data in general. Of course, it is possible to have IEML expressions in reference and therefore the language is self-referential and can be its own metalanguage.

Link sentences are used to explain and connect words and sentences. As you can see on the slide above, semantic links are just sentences with arguments. You can have links with one, two, three, or four arguments that will connect respectively one, two, three or four conceptual nodes (words or sentences). The link represented on the slide says that «A is the contrary of B».

Of course a link like « A is the contrary of B » can be used in a wide variety of cases. The actualization of the link is performed by a function which determines the domain of the arguments and the syntactic conditions for the creation of the link. In this case, the function will create links like « to the right » is the contrary of « to the left ».

Two remarks here:

First, the semantic relations are created by a syntactic function.
Second, because of the regular and symmetric structure of the paradigms, a function actualizes several links. So you don’t have to create the links one by one.

Like words, sentences can be organized in paradigms. The example depicted on the slide below comes from an ontology of mental health in IEML.

Example of sentence generating a paradigm

You have one constant role « symptoms related to perception » and two variable roles corresponding to the perception problems and to the senses that are affected. As you can see, the variables are between braces. And below you see the resulting table:

Paradigmatic table of perception problems

Now let’s recap! IEML has the same power as a natural language. It can handle…

narration and causal modelling,
dialogue, reasoning and translation,
indexation, reference and self-explanation.

It has the same power as a natural language *and* at the same time, it is also a mathematical category. This means that is organizes a morphism or a systematic correspondance between an algebra and a graph structure

The algebra is about the linearity of texts: all IEML expressions are punctuated sequences of the six primitives. We have probably a non-commutative ring (the demonstration have been made for the layer of words – see here chapter 5 – it is still a conjecture for the layer of sentences).

The graph structure is about the concept network. At the grammatical level you have syntactic trees of nested sentences which cross paradigmatic matrixes. This makes a rather interesting graph, a kind of rhizome. And on top of that, link sentences combined with syntactic conditions produce more semantic relations between concepts.

Of course mathematization does not mean necessarily quantification. It can be a formalization of qualitative structures. In particular, abstract algebra can handle all kinds of symmetry systems and not only in the realm of numbers and geometrical figures.

The secret of the computability of IEML semantics lies in its coding principle. Semantic symmetries (the signified) are coded by syntactic symmetries (the signifier). And paradigmatic matrixes are created by functions with constants and variables.

If I were not held back by my modesty, I would say that Chomsky mathematized the syntagmatic trees and that – thanks to the coding system I have just explained – I have added the mathematization of paradigmatic matrices.

Conclusion

Which new perspectives would IEML bring if it was adopted as a Semantic Protocol?

First, general semantic interoperability. Semantic interoperability means that – coded in IEML – the meaning will be computable and easily sharable. Semantic interoperability is not about formats (like RDF, for example) but about architectures of concepts, ontologies and data models that would be connected across different domains, because nodes and links can be brought back automatically to the same dictionary according to the same grammar. Semantic interoperability means essentially an augmented collective intelligence.

For neuronal AI, if the tokens taken into account by the models were variables of a semantic algebra instead of phonetic chains of characters in natural languages, the machine learning would be more effective, and the results would be more transparent and explainable. My intention is to pursue the research direction of « semantic Machine learning ». Labelling / annotating data with good ontologies helps *generalization* in machine learning!

For symbolic IA, we would have concepts and their relations generated by semantic functions. Even more importanly, the mode of definition of concepts would change radically. Instead of having concepts that are defined separately from each other by means of unique identifiers (the URIs) on the model of referential semantics, we would have concepts defined by other concepts of the same language, like in a dictionary.

We know that there are problems of accumulation, sharing and recombination of knowledge between AI systems / models. A semantic protocol based on IEML will lead to logical de-compartmentalization, neuro-symbolic integration, accumulation and fluid recombination of knowledge.

The blockchain domain is important because it means the automation of value allocation. Today, smart contracts are written in many different programming languages bringing problems of interoperability between machines and readability for non-programming humans. With a semantic protocol based on IEML, smart contracts would be readable by humans and executable by machines.

The metaverse is about an immersive, interactive, social and playful user experience. Today, it includes mainly simulations, reproductions or augmentations of a physical 4D universe. With a semantic protocol based on IEML, the Metaverse could contain new sensory-motor simulations of the world of ideas, memory and knowledge.

A scientific revolution has already started with the digitization of archives, the abundance of data produced by human activities, the increased computer power availability, and data sharing within transdisciplinary teams. The name of this revolution is of course «digital humanities». But the field is still plagued by theoretical and disciplinary fragmentation, and weak mathematical modelling. With a semantic protocol based on IEML, the world of meaning and value would be unified and made computable. (Again, this does not mean reduction to quantity, or any kind of reductionism, for that matter). It would foster the emergence of an inexhaustible and complex semantic cosmos allowing for every point of view and every interpretation system to express itself. It would also lead to a better exploitation of a common memory, bringing a more reflexive knowledge to human communities.