Collective Intelligence in 1994
When I published « Collective Intelligence » in 1994, the WWW did not exist (you won’t find the word « web » in the book) and less than one percent of the world’s population was connected to the Internet. A fortiori, social media, blogs, Google and Wikipedia were still well hidden in the realm of possibilities and only a few visionaries had glimpsed their outlines through the mists of the future. Ancestors of social media, « virtual communities » only gathered tens of thousands of people on Earth and free software, already pushed by Richard Stallman in the early 1980s, would only really take off in the late 1990s. At that time, however, my diagnosis was already established: (1) the Internet was going to become the main infrastructure of human communication and (2) networked computers were going to increase our cognitive abilities, especially our memory. The advent of digital technology in the course of Human adventure is as important as the invention of writing or the printing press. I thought so then, and everything confirms it today. People often say to me, « you foresaw the advent of collective intelligence and well, look what happened! » No, I predicted – along with a few others – that humanity would enter into a symbiotic relationship with algorithms and data. Given this prediction, which we will acknowledge is coming true, I asked the following question: what civilization project should we embrace to best harness the algorithmic medium for the benefit of human development? And my answer was: the new medium enables us, if we decide to do so, to increase human collective intelligence… rather than continuing the heavy pattern of passivity in front of fascinating media already started with television and remaining obsessed by the pursuit of artificial intelligence.
A quarter of a century later
Having contextualized my 1994 « call for collective intelligence », I now turn to some observations on the developments of the last quarter century. In 2021, 65% of humanity is connected to the Internet and almost ninety percent in Europe, North America and most major cities. Lately, the pandemic has forced us to use the Internet massively to work, learn, buy, communicate, etc. Scholars around the world share their databases. We consult Wikipedia every day, which is a classic example of a philanthropic collective intelligence enterprise supported by digital technology. Programmers share their code on GitHub and help each other on Stack Overflow. Without always clearly realizing it, everyone becomes an author on his or her blog, a librarian when tagging, labelling or categorizing content, a curator when gathering resources, an influencer on social media and online shopping platforms, and even an unwilling artificial intelligence trainer as our slightest online actions are taken into account by learning machines. Distributed multi-player games, crowdsourcing, data journalism and citizen journalism have become part of everyday life.
Ethologists who study social animals define stigmergic communication as an indirect coordination between agents via a common environment. For example, ants communicate primarily by leaving pheromone trails on the ground, and this is how they signal each other about paths to food. The emergence of a global stigmergic communication through digital memory is probably the biggest social change of the last twenty-five years. We like, we post, we buy, we tag, we subscribe, we watch, we listen, we read and so on… With each of these acts we transform the content and the system of internal relations in the digital memory, we train algorithms, and we modify the data landscape in which other Internet users evolve. This new form of communication by distributed reading-writing in a collective digital memory represents an anthropological mutation of great magnitude that is generally little or poorly perceived. I will come back later on to this revolution by reflecting on how to use it as a support point to increase collective intelligence.
But first I need to talk about a second major transformation, linked to the first, a political mutation that I did not foresee in 1994: the emergence of the platform state. By this expression I do not mean the use of digital platforms by governments, but the emergence of a new form of political power, which is the successor of the nation-state without suppressing it. The new power is exercised by the owners of the main data centers, who in fact control the world’s memory. One will have recognized the famous Sino-American oligarchy of Google, Apple, Facebook, Amazon, Microsoft, Baidu, Alibaba, Tencent and others. Not only are these companies the richest in the world, having long since surpassed the old industrial flagships in market capitalization, but they also exercise classic regal powers: investment in crypto-currencies that escape central banks; control and surveillance of markets; authentication of personal identities; infiltration of education systems; mapping from street to satellite view; cadastral records; management of public health (wristbands or other wearable devices, recording of conversations at doctors’ offices, and epidemiological memory in the cloud), crisscrossing the skies with networks of satellites. But above all, the data overlords have taken control of public opinion and legitimate speech: influence, surveillance, censorship… Be careful what you say, because you risk being deplatformed! Finally, the new political apparatus is all the more powerful as it relies on psychological mechanisms close to addiction. The more users become addicted to the narcissistic pleasure or excitement provided by social media and other attention-grabbers, the more data they produce and the more they feed the wealth and power of the new oligarchy.
Confronted with these new forms of enslavement, is it possible to develop an emancipatory strategy adapted to the digital age? Yes, but without harboring the illusion that we can end the dark side once and for all through some radical transformation. As Albert Camus says at the end of his 1942 essay The Myth of Sisyphus, « The struggle itself towards the heights is enough to fill a man’s heart. One must imagine Sisyphus happy. » Increasing collective intelligence is a task always to be taken up and deepened. Maximizing creative freedom and collaborative efficiency simultaneously is a matter of performance in context and does not depend on a definitive technical or political solution. However, when the cultural and technical conditions I will now evoke are met, the task will be easier and efforts will be able to converge.
The dialectic of man and machine
Communication between humans is increasingly done through machines via a distributed read-write process in a common digital memory. Two poles interact here: machines and humans. Machines are obviously deterministic, whether this determinism is logical (ordinary algorithms, the rules of so-called symbolic AI) or statistical (machine learning, neural AI). Machine learning algorithms can evolve with the streams of data that feed them, but this does not make them escape determinism. As for humans, their behaviour is only partially determined and predictable. We are conscious social animals that play multiple games, that are permeated by all emotions, that show autonomy, imagination and creativity. The great human conversation expresses an infinite number of nuances in a fundamentally open-ended process of interpretation. Of course, it is humans who produce and use machines, therefore they fully belong to the world of culture. But it remains that – from an ethical, legal or existential point of view – humans are not deterministic logico-statistical machines. On the one hand the freedom of meaning, on the other the mechanical necessity. However, it is a fact today that humans keep memories, interpret and communicate through machines. Under these conditions, the human-machine interface is barely distinguishable from the human-human interface. This new situation provokes a host of problems of which out-of-context interpretations, nuance-free translations, crude classifications and communication difficulties are only the most visible symptoms, whereas the deep-rooted evil lies in a lack of autonomy, in the absence of control over the techno-cosmos on a personal or collective scale.
Let’s now think about our interface problem. The best medium of communication between humans remains language, with the panoply of symbolic systems that surround it, including music, body expression and images. It is therefore language, oral or written, that must play the main role in the human-machine interface. But not just any language: an idiom that is adequate to the multitude of social games, to the complexity of emotions, to the expressive nuance and interpretative openness of the human side. However, on the machine side, this language must be able to support logical rules, arithmetic calculations and statistical algorithms. This is why I have spent the last twenty years designing a human language that is also a computer language: IEML. The noolithic biface turns on one side to the generosity of meaning and on the other to mathematical rigor. Such a tool will give us a grip on our technical environment (programming and controlling machines) as easily as we communicate with our fellow men. In the opposite direction, it will synthesize the streams of data that concern us into explanatory diagrams, understandable paragraphs, and even empathetic multimedia messages. Moreover, this new techno-cognitive layer will allow us to go beyond the opaque stigmergic communication that we maintain through the digital memory to reach reflexive collective intelligence.
Towards a reflexive collective intelligence
Images, sounds, smells and places mark out human memory, as well as that of animals. But it is language that unifies, orders and reinterprets our symbolic memory at will. This is true not only on an individual scale, but also on a collective scale, through the transmission of stories and writing. Through language, humanity has gained access to reflexive intelligence. By analogy, I think that we will only reach reflexive collective intelligence by adopting a language adequate to the organization of the digital memory.
Let’s look at the conditions needed for this new form of large-scale critical thinking to take place. Since social cognition must be able to observe itself, we must model complex human systems and make these models easily navigable. Like dynamic human communities, these digital representations will be fed by heterogeneous data sources organized by disparate logics. On the other hand, we need to make conversational thought processes readable, where forms emerge, evolve and hybridize, as in the reality of our ecosystems of ideas. Moreover, since we want to optimize our decisions and coordinate our actions, we need to account for causal relationships, possibly circular and intertwined. Finally, our models must be comparable, interoperable and shareable, otherwise the images they send back to us would have no objectivity. We must therefore accomplish in the semantic dimension what has already been done for space, time and various units of measurement: establish a universal and regular coordinate system that promotes formal modeling. Only when these conditions are met will digital memory be able to serve as a mirror and multiplier for the collective human intelligence. The recording and computing power of gigantic data centers now makes this ideal attainable, and the semantic coordinate system (IEML) is already available.
In the maze of memory
Until the invention of movable type printing by Gutenberg, one of the most important parts of rhetoric was the art of memory. It was a mnemonic method known as « places and pictures ». The aspiring orator had to practice the mental representation of a large architectural space – real or imaginary – such as a palace or a temple, or even a city square, where several buildings would be arranged. The ideas to be memorized were to be represented by images placed in the places of the palatial architecture. Thus, the windows, niches, rooms and colonnades of the palace (the « places ») were populated with human figures bearing emotionally and visually striking characters, in order to be better remembered (the « images »). The semantic relations between ideas were better remembered and used if they were represented by local relations between images.
From the 16th century in the West, the fear of forgetting gave way to the anxiety of being drowned in the mass of printed information. The art of memory, adapted to the oral and manuscript eras, was followed by the art of organizing libraries. One discovers then that the conservation of information is not enough, it is necessary to classify it and to place it (both things go together before the digital era) so that one finds easily what one seeks. The plan of the imaginary palace is followed by the layout of the shelves of the library. The distinction between data (books, newspapers, maps, archives of all kinds) and metadata was established in the early 18th century. A library’s metadata system essentially consists of a catalog of stored documents and cardboard cards filed in drawers that give, for each item: its author, title, publisher, date of publication, subject, etc. Not to mention the index number that indicates the precise location of the document. This splitting of data and metadata is initially done on a library-by-library basis, each with its own organizational system and local vocabulary. A new level of abstraction and generalization was reached in the 19th century with the advent of classification systems with a universal vocation, which were applied to numerous libraries, of which Dewey’s « decimal » system is the best known example.
With the digital transformation that began at the end of the 20th century, the distinction between data and metadata has not disappeared, but it is no longer deployed in a physical space on a human scale.
In an ordinary database, each field corresponds to a general category (the name of the field is a kind of metadata, such as « address ») while the value of the field « 8, Little Bridge street » corresponds to a piece of data. The metadata system is none other than the conceptual scheme that governs the database structuring. (In Excel tables, the columns correspond to the metadata and the content of the cells to the data). Each database obviously has its own schema, adapted to the needs of its user. Traditional databases were designed before the Internet, when computers rarely communicated.
Everything changes with the massive adoption of the Web from the end of the 20th century. In a sense, the Web is a large distributed virtual database, each item of which has an address or an “index number »: the URL (Uniform Resource Locator), which begins with http:// (Hypertext Transfer Protocol). Here again, metadata are integrated into the data, for example in the form of tags. As the Web potentially makes all memories communicate, the disparity of local metadata systems or improvised folksonomies (such as hashtags used in social media) becomes particularly glaring.
But the abstraction and unbundling of metadata experienced by libraries in the 19th century is reinvented in the digital world. The same conceptual model can be used to structure different data while promoting communication between distinct repositories. Models such as schema.org, supported by Google, or CIDOC-CRM, developed by cultural heritage conservation institutions, are good examples. The notion of semantic metadata, developed in the symbolic artificial intelligence community of the 1970s, was popularized by the Semantic Web project launched by Tim Berners-Lee in the wake of the Web’s success. This is not the place to explain the relative failure of the latter project. Let’s just point out that the rigid constraints imposed by the standard formats of the World Wide Web Consortium have discouraged its potential users. The notion of ontology is now giving way to that of Knowledge Graph, in which digital resources are accessed by means of a data model and a controlled vocabulary. In this last stage of the evolution of memory, data is no longer contained in the fixed tabular schemas of relational databases, but in the new graph databases, which are more flexible, easier to evolve, better able to represent complex models and allowing several « views ». A knowledge graph lends itself to automatic reasoning (classical symbolic AI), but also to machine learning if it is well-designed and if the data is sufficiently large.
Today, a large part of the digital memory is still in relational databases, without clear distinction between data and metadata, organized according to mutually incompatible rigid schemas, poorly optimized for the needs of heir users: knowledge management, coordination or decision support. Gathered in common repositories (datalakes, datawarehouses), these datasets are sometimes catalogued according to disparate systems of categories, leading to incoherent representations or simulations. The situation in the still minority world of knowledge graphs is certainly brighter. But many problems remain: it is still very difficult to make different languages, business domains or disciplines communicate. While the visualization of space (projection on maps), time (timelines) and quantities has become commonplace, the visualization of complex qualitative structures (such as speech acts) remains a challenge, especially since these qualitative structures are essential for the causal understanding of complex human systems.
We need to make an extra effort to transform the digital memory into a support for reflexive collective intelligence. A collective intelligence allowing all points of view, but well coordinated. To do this, we need to reproduce at an additional height the gesture of abstraction that led to the birth of metadata at the end of the 18th century. So let us split metadata into (a) models and (b) metalanguage. Models can be as numerous and rich as one wants, they will communicate – with natural languages, humans and machines – through a common metalanguage. In the library of Babel, it is time to turn on the light.