The Linguistic Roots of IEML

IEML is based on the great achievements of linguistics from the last century. We will successively study the legacies of Chomsky; of Saussure and the structuralist school; of Tesnière and the actant model of the sentence; of Benveniste, Wittgenstein and Austin for their solutions to the thorny problems of enunciation and pragmatics. We will conclude by trying to defuse the main misunderstanding about IEML: it is not a « true » language (a language is neither true nor false but conventional), but a clear language.

[Pour une version française cliquez ici]

Chomsky’s Legacy and Regular Languages

Let us begin by evoking IEML’s debt to Noam Chomsky, one of the giants of 20th century linguistics and cognitive sciences. For the MIT professor, linguistic capacity is a genetically determined feature of the human species. Languages, despite their diversity and continuous evolution, all share the same « universal grammar » corresponding to this innate linguistic ability. This theory would explain why children learn to speak spontaneously and so quickly, without the need for grammar lessons. Chomsky presented a formal version of the universal grammar, which has been contested and revised several times. Chomsky’s most valuable scientific discovery is probably his theory of regular languages: he demonstrated that there was a correspondence between algebra and formal syntax. Language is therefore in principle a calculable object, at least on a syntactic level.

For a language to be easily manipulated by computers, i.e. calculable, it must be a regular language in the sense of Chomsky: a kind of mathematical code. However, natural languages are obviously not regular languages. The regular languages actually used today are programming languages. But the « semantics » of programming languages is none other than the execution of the operations they command. None of them comes close to the expressive capacity of a natural language, which enables to talk about anything and everything and to perform many other illocutionary acts besides giving instructions to a machine. It should be noted in passing that Louis Hjelmslev criticized the expression « natural language », because he preferred the expression « philological language » or « all-purpose language ». Indeed, one can say everything in Esperanto, for example, although it is a constructed language and not a natural one. Esperanto is therefore a philological language. Alas, the semantics of Esperanto are no more calculable than those of French or Arabic.

Because of their irregularity, computers today have access to philological languages only in statistical mode. This is why our digital age needs a philological language that is transparent to algorithms and therefore regular. IEML is the solution I found to the problem of constructing a philological language with computable semantics. The computability of its semantics is obviously relevant only if it is a philological language, allowing to « say everything ». And since the semantics of this language had to be calculable, its syntax had to be calculable as well. This is why IEML is a regular language in the sense of Chomsky. But if being a regular language was a necessary condition for the computability of its semantics, it was not a sufficient condition. Let us remember that the regular languages currently in use have restricted semantics: they are not philological languages. How can philological semantics be conferred to a regular language? To answer this question, I relied on the teachings of Saussure and his successors.

Saussure’s Legacy and Structuralism

According to Ferdinand de Saussure (1857-1913), one of the fathers of contemporary linguistics, linguistic symbols consist of two parts, the signifier (an acoustic or visual image) and the signified (an abstract concept or category). The relationship between the two parts of the symbol is conventional or arbitrary. Saussure also showed that the signifier plane, or the phonology of languages, was based on a system of differences between sounds, each language having its own list of phonemes and above all its own manner of arranging the thresholds between two phonemes in the sound continuum. In the same way, the signifieds are not self-sufficient atoms of meaning but correspond to positions in systems of differences: paradigms. Linguistic semantics is therefore not anchored in fixed and independent natural realities, but in a process of comparison, opposition, differentiation and cross-referencing between signifieds within a systemic grid that is closed on itself, just as the meaning of a word in the dictionary is defined by other words that are also defined by other words.

Saussure’s work was notably continued by Louis Hjemslev (1899-1965), who deepened the analysis of the linguistic sign and pleaded for a maximum of epistemological rigor in the study of language, to the point of a quasi-algebraic ideal. Hjemslev renamed the opposition between signifier and signified by describing two linguistic « planes »: that of expression (the signifier) and that of content (the signified). Each of the two planes is in turn analyzed in terms of matter and form. The matter of expression is in the range of sensible phenomena, for example visual images or sounds. In contrast, the forms of expression denote the abstract units that result from the distribution of signifiers in a given language. For example, the phoneme « a » represents a specific form which is opposed in a particular language to the phoneme « o ». In English, for example, this is what enables the distinction between «bat» and «bot». On the other hand, the form «a» can be filled by a large number of different sound materials depending on the voices, accents, etc., which differ from each other. The matter is about the concrete continuum while the form is about the abstract system of oppositions. It is the same for the content. Hjemslev assumed that there was a continuum of the signified, a kind of magma that virtually shelters all the possible categories: the matter of content. This matter is cut out and organized into paradigms in a different way for each language. In the end, any language organizes a particular correspondence between form of expression and form of content.

The structuralist current initiated by Saussure and continued by Hjemslev was continued by Julien Algirdas Greimas (1917-1992) and François Rastier (1945- ). While keeping alive the tradition that conceives the relatively autonomous existence of a world of the signified, these authors extended the structural analysis from the level of words and sentences to the level of the text, in particular thanks to the notion of isotopy.

Let us now return to our problem: how to construct a language that is simultaneously philological and regular? Not only are languages conventional, but they cannot fail to be so. The correspondence between signifier and signified, or expression and content, is arbitrary in nature. Since languages are necessarily conventional, nothing prevents the construction of one whose arrangement of signifiers is a « regular language ». We know that a regular language has a computable syntax. The syntax governs the signifier elements of the language, the phonemes and their sequences, at several nested levels of complexity. Since both signifiers and signifieds must be organized by a system of differences, there is nothing to prevent this regular language from being given – by convention – a system of differences for the signifieds (the form of content) that is a mathematical function of that of the signifiers (the form of expression). In accordance with the theories of Saussure and his successors, the units of the IEML language, words but also sentences, are organized into paradigms. These systems of variations against a background of constants – or groups of transformations – enable linguistic units to define and explain each other easily. In IEML the same paradigms are used to structure expression and content. So here is the principle for solving our problem: in a regular language whose system of signifier differences corresponds to that of signifieds, not only syntax but also semantics is computable. This is precisely the case of IEML, which is therefore a language with computable semantics!

Tesnière’s Legacy and Cognitive Linguistics

Among all the functions of language, one of the most important is to support the construction and simulation of mental models [I am inspired here in particular by Philip Johnson-Laird’s work, Mental Models, Harvard University Press, 1983]. The linguistic architecture of mental models is obviously not exclusive of sensory-motor modes of representation, especially visual ones, which can relate to fictional worlds as well as to lived reality. Linguists such as Ronald Langacker (1942- ) and George Lakoff (1941- ), who are among the main leaders of the cognitive linguistics movement, have particularly studied this mental modeling function. The ability to represent « scenes » – i.e., processes carried out by actors in certain circumstances – is a necessary condition for the modelling work carried out by language. It is the basis of the narrative faculty, since a narrative can be reduced to a hypertextual sequence of scenes, by means of anaphoric and isotopic relationships. I would add that by specifying the relationships between processes and/or between actors, linguistic scenography also founds the representation of causal relationships. Since one of IEML’s missions is to serve as a formal modeling tool, it must not only organize a morphism between its semantics and its syntax, but also systematize and facilitate as much as possible the representation of processes, actors, circumstances and their interactions. To do this, IEML has integrated, with a few adjustments, the actant model of the sentence that Tesnière, prefiguring cognitive linguistics, had proposed in the mid-twentieth century.

Figure 1: Examples of Tesnière’s dependency trees or « stemmas » CC BY-SA 3.0, Wikimedia Commons.

In addition to the structuralist school, IEML’s grammar was also greatly influenced by the major work of Lucien Tesnière (1893-1954). This French linguist was the first to present a universal grammar based on dependency trees, which highlights the intimate link between syntax and semantics (see Figure 1). Although the two systems were developed independently, Tesnière’s dependency trees are close to Chomsky’s syntactic trees. Tesnière also proposed a subtle theory of translation between the « parts of speech » that are verbs, nouns, adverbs and adjectives. Above all, he developed the actant model of the sentence upon which is based the syntagmatic function of IEML. The following quotation, taken from his posthumous work Elements of Structural Syntax, well explains the principle of the actant model : « The verbal node (…) expresses a whole little drama. Like a drama it involves (…) a process and, most often, actors and circumstances. The verb expresses the process. (…) Actors are beings or things (…) participating in the process. (…) Adjuncts express the circumstances like time, place, manner, etc. » [Lucien Tesnière, Eléments de syntaxe structurale, Klincksieck, Paris 1959: 102, Chap. 48]

Tesnière’s actant model was taken up and developed by two important contemporary linguists, Igor Melchuk (1932- ) and Charles Fillmore (1929-2014). Fillmore’s case grammar, published in 1968, was extended in the 1980s to a quasi-encyclopedic conception of linguistic semantics, implemented in the FrameNet project, which focuses on the English language and inspires several artificial intelligence programs. Frames describe the manner in which words fit together and mutually determine their meanings in a sentence. For example, when the verb « to attack » is used in the active voice, the grammatical subject is necessarily an attacker and the grammatical object a victim of the attack. IEML’s approach is consistent with Fillmore’s theories, with the cases corresponding to sentence roles and the equivalent of frames being sentence paradigms. As for Igor Melchuk, his most original contribution concerns morphology, i.e. the structure of words and their relationships. In particular, he described the lexical functions that regulate collocations – i.e. words that go or do not go together – and the semantic relations between the lexical units of a language. A simple example of a lexical function is « PLUS » as in : [PLUS (hill) = mountain] or [PLUS (stream) = river]. The lexical functions are used in particular to build explanatory and combinatorial (monolingual) dictionaries and, like Fillmore’s frames, they feed some natural language processing programs. IEML integrates the main lexical functions highlighted by Melchuk, making it easy to compose new words from dictionary elements and to formally explain the semantic relationships between lexical units. As for the collocations according to Melchuk, they are close to the Fillmore frames and are – like them – translated into IEML by sentence paradigms. In short, many linguists have stressed the importance of the modeling function of language. Following in their footsteps, IEML provides its speakers with the grammatical tools needed to describe scenes and tell stories. In addition, IEML enables the modeling of a specialized knowledge domain or a particular semantic field through the free elaboration of terminologies (radical paradigms) and frame sentences (sentence paradigms).

Austin, Wittgenstein and the Pragmatic Legacy

Language is an abstract structure that combines paradigms of words (indecomposable atoms of meaning) and rules for the composition of grammatical units (recursive sentences) from words. In contrast, speech – or text – is a particular sequence of morphemes that actualizes the language system in space and time. In this sense, IEML terminologies and framework sentences belong to an intermediate category between language and speech. They belong to « speech » insofar as they are freely created from the initial dictionary and phrase construction rules. But they still belong to the language since they are not strictly speaking enunciations in context. The pragmatic dimension of language – the speech acts – occur only at the level of enunciation. There is no point in choosing between the modeling and representative function of language, which has just been evoked in the previous section, and its practical function, which we will review in this one. On the contrary: representation and action are mutually supportive. Without a model of the world, action is meaningless and without immersion in some practical situation, representation loses all relevance.

Although we can trace the reflection on the practical power of language back to ancient rhetoric or to the earliest meditations of the Confucian school, I will limit myself to a few great authors: Emile Benveniste for the study of enunciation and the deictic function, Ludwig Wittgenstein for the question of reference and language games, John L. Austin for the very notion of linguistic pragmatics. Linguistic pragmatics refers to acts performed inside the language sphere but which have extra-linguistic consequences, such as baptizing, prohibiting, condemning, etc. Since they are performed in the language, these acts demonstrate a symbolic nature. They are governed by rules and carried out by « players » who assume certain roles. A multitude of « language games », to use Wittgenstein’s expression, animate the pragmatic dimension opened up by the enunciation. A language can itself be likened to a system of rules or a game. And if a language L is philological, it is capable of defining a multitude of restricted languages (l1, l2, l3…), rule systems or games, all of which are distinct ways of using language L in practice. Since IEML is a philological language, we will use it not only to model any semantic field, represent scenes and tell stories, but also to explain language games whose rules, roles and moves we will formalize through terminologies and sentences paradigms. When they recognize the speech acts performed by IEML speakers, algorithms will be able to automatically trigger their extra-linguistic consequences, and to compute the new states of the current « matches ». I will mention four main types of speech acts that are particularly relevant for IEML: reference, reasoning, social communication and instructions given to machines.

The first function of enunciation is to refer to non-linguistic objects. One of its most obvious forms is the distribution of interlocutory roles: the first, second or third person indicates who is speaking, to whom and about what. Possessives (related to the distribution of grammatical persons), demonstratives such as « this, here, there », adverbs such as « today », « tomorrow », etc., may also be mentioned. However, a simple text does not allow us to interpret deictics such as « I », « this » or « tomorrow ». Only the event of an utterance by someone, in a defined spatio-temporal context of interlocution, can give them content. [« I » means « the person who utters the present instance of the discourse containing « I »(Emile Benveniste)]. This referential function of language is particularly important for IEML, which is designed to categorize datasets and therefore to index (or label) them. Both the distribution of interlocutory roles and the categorization of data can conform to many distinct reference games. For example, in order to interpret a « we » we must know the system of distribution of grammatical persons to which it obeys: plural of majesty, researchers of the same discipline, members of a court, citizens of a nation at war…? On the other hand, the categorization of data in IEML takes on a different meaning depending on whether the indexation is done by an algorithm or by a human. In the case of automatic labeling, is it a statistical algorithm based on a manually indexed corpus? And in the latter case, indexed by whom, according to what criteria, etc.? In the same vein, it may be useful to know whether a text is cited (again a deictic gesture) as part of a reference corpus, as an authority to reinforce the credibility of the author’s ideas, to be criticized, or for some other reason. In short, the operation of reference is a speech act, this act is part of a multitude of possible games, and these games can be made explicit in IEML.

Reasoning is yet another type of language game that can be modeled in IEML. Let’s take a look at it, following Charles S. Peirce‘s typology: (1) the various kinds of deductive reasoning, (2) inductive reasoning – including statistical calculations – and (3) abductive reasoning, which builds causal models of a domain or process. It should be noted that, most of the time, reasoning presupposes reference and that the latter is often made to support the argumentation.

The language games that have been most studied by pragmatics scholars, starting with Austin and Searle, are the social communication games, which include, for example, assertions, questions, orders, promises, thanks, nominations, etc. But we can add to this type of games the transactions, contracts and everything that is related to legal arrangements and economic exchanges, which pass through electronic channels – increasingly blockchained – and which would benefit from being expressed in a transparent, univocal and calculable language such as IEML.

Finally, since we live in an increasingly robotized environment, the instructions given to machines, as well as the information – sometimes vital – that the machines send us, are obviously part of speech acts with important extra-linguistic consequences. Because computers can decode IEML and IEML is translated into natural languages, our metalanguage could become the software core of a ubiquitous and interoperable interface between humans and machines.

An Image of the World or an Image of Oneself?

In the Tractatus Logico Philosophicus, the early work that made him famous, Wittgenstein examines under what conditions logical propositions present a faithful image of reality. Since our Viennese philosopher conceived the world as « everything that happens », each fact or event should be represented by a proposition whose logic-grammatical structure reflects the internal structure of the fact. The idea of a perfect language or a transparent language is often associated with this ideal of isomorphism between linguistic expressions and the realities they describe or, in other words, between speech and its reference. Nothing is further from the IEML project. Rather than pursuing the vaguely totalitarian chimera of a language of truth (truth comes down to the correspondence between word and reality), I have pursued a less constraining and, above all, more attainable goal: that of a language of clarity, as unambiguous and translatable as possible. To the ideal of a logical language that would reflect states of things, I substituted that of a philological language whose algebraic expression would reflect the conceptual content: a language that would be an image of itself before being an image of the world. By definition, this internal correspondence is not true or false but a useful convention. As for IEML’s relationship with extralinguistic reality, it is the result of a multitude of language games (I follow here the mature Wittgenstein, as he expressed himself in the Philosophical Investigations), a multitude that encompasses the various ways of mapping out, recognizing and referring to relevant objects according to practical contexts. And thanks to the universal description power of all philological languages, we can model these multiple language games in IEML. This approach respects both the freedom and creativity of its speakers while helping them to coordinate with each other and with machines.

Let’s look again at the different levels of semantics – linguistic, referential and illocutionary. Our metalanguage clarifies the relations between signifieds and signifiers as well as the relations between signifieds to the point of being able to automate their processing. IEML’s main contribution is therefore at the level of linguistic semantics. As for reference semantics – pointing to extra-linguistic realities – it can become more precise insofar as the different reference modalities are specified in IEML. Finally, the illocutionary force of enunciations, i.e. the « moves » that are played in a multitude of communication games, can be recognized by algorithms and processed accordingly, provided that the games in question have been previously described in IEML. In short, the formalization of linguistic semantics offers us the key to the formalization of semantics in general.

Austin John L. How to Do Things with Words, Oxford University Press, Oxford, 1962
Benveniste Emile Problèmes de linguistique générale, Tomes 1 et 2, Gallimard, Paris, 1966-1974
Chomsky Noam New Horizons in the Study of Language and Mind, Cambridge University Press, Cambridge, 2000.
Chomsky Noam Syntaxic Structures, Mouton, La Hague and Paris, 1957.
Chomsky Noam ; Schützenberger, Marcel P. « The algebraic theory of context free languages », in Braffort, P. ; Hirschberg, D. : Computer Programming and Formal Languages, North Holland, Amsterdam, 118-161, 1963
Fillmore Charles « The Case for Case » (1968). In Bach and Harms (Ed.): Universals in Linguistic Theory. New York: Holt, Rinehart, and Winston, 1-88. (Tesnières is cited nine times).
Fillmore Charles « Frame semantics » (1982). In Linguistics in the Morning Calm. Seoul, Hanshin Publishing Co., 111-137.
Hejlmslev Louis, Prolégomènes à une théorie du langage – La Structure fondamentale du langage, Paris, Éditions de minuit, coll. « Arguments », 2000
Johnson-Laird Philip Mental Models, Harvard University Press, 1983
Lakoff George Women, Fire and Dangerous Things: What Categories Reveal About the Mind, University of Chicago Press, Chicago, USA, 1987.
Lakoff George, Johnson M., Metaphors We Live By, University of Chicago Press, Chicago, USA, 2003.
Langacker Ronald W., Foundations of Cognitive Grammar (2 volumes), Stanford University Press, Stanford, USA, 1987-1991.
Lévy Pierre The Semantic Sphere / La sphère sémantique, Hermès-Lavoisier, Paris-London, 2011
Melchuk, Igor, « Actants in Semantics and Syntax. I. Actants in Semantics », Linguistics, 42: 1, 2004, 1-66
Melchuk Igor Aspects of the Theory of Morphology. Berlin—New York: Mouton de Gruyter, 2006. 615 pp
Peirce, C. S., The Essential Peirce, Selected Philosophical Writings, Volume 1 (1867–1893) and 2 (1893-1913) Nathan Houser and Christian J. W. Kloesel, eds., Indiana University Press, Bloomington and Indianapolis, IN, 1992-1998.
Saussure Ferdinand Cours de Linguistique générale, Payot, Paris, 1916.
Searle John Speech Acts, Cambridge University Press, London, 1969.
Searle John Intentionality, Cambridge University Press, London, 1983.
Tesnière Lucien Eléments de Syntaxe structurale Klincksieck, Paris, 1959 (posthumous)
Wittgenstein Ludwig Tractatus Logico Philosophicus, Routledge and Kegan Paul Ltd, London, 1961.
Wittgenstein Ludwig Philosophical Investigations, Blackwell, Oxford, 1953.