Towards a Data-centric Organization

Conventional banks now offer their customers an application that enables them to carry out transactions on a smartphone. But the data from the smartphone is most often decoded and sent into the central system overnight, then processed and finally re-encoded the next day to be sent back to the smartphone application. Bottom line: it takes two days for your account to be updated after a transaction on your phone.

In contrast, with 21st century banking systems, all processing takes place in the same data center accessible via the Internet. In addition, the mobile and central applications communicate immediately because they use the same data format and categorization. As a result, accounts are updated instantly after a transaction on the smartphone. The information systems of the new banks are said to be data-centric from the start. By making the flow of information more fluid, the central challenge of the data-centric organization is to improve the customer experience or, to use another formulation, to create more value for the beneficiary of a service.

The value chain

The notion of value covers a wide semantic field. It can refer to ethical values, such as justice, courage, wisdom, or the harmony of human relations. Such values obviously have no monetary counterparts. As for the goods and services that are exchanged on the market, beyond a temporary and local adjustment of supply and demand, it is very difficult to assign an essence to their value. Economic value may correspond to a necessity (such as eating), to a desire for entertainment or beauty, to the acceleration of a boring job, to a prospect of making money (lottery or speculation instrument), to an improvement in the quality of life, to the acquisition of skills, to a broader understanding that will enable one to better decide, to a competitive advantage, to a more flattering image, etc. Value is therefore not a simple function of the work invested in the production of a good or service. It depends on the subjective appreciation and comparisons of those who benefit from it, all taking place in a changing economic and cultural context. Despite the evanescent nature of its essence, which is probably due to its relationship to desire, value is at the heart of economic theory and business practice. Every organization creates value for its customers (a private company), its public (a municipal service) or its patients (a hospital) and this creation is the main justification for its existence.

It is often useful to distinguish between two distinct people, the customer, who pays for the good or service, and the consumer, who uses it. For example, a company’s IT department (the customer) buys software, but it is the employees (the consumers) who use it. In the following analysis, I will focus on the relationship between the producers and consumers of value. Each collaborator creates value for the colleagues who come after him on the chain, the good execution of their work depending on his. The value chain does not necessarily stop at the borders of a single organization. It can connect networks of companies, which may themselves be located in several countries, with each type of company contributing to the design, production of parts, assembly, transportation and sale of the product. Supply chains, which have been talked about so much since the COVID-19 pandemic, are a case of a value chain that focuses on the material activities and transportation within a particular industry. The final consumer benefits from the value created at each stage of the production of the good or service.

Increasing the productivity of organizations and industries is the basis of economic prosperity. This increase comes from innovations that create more value at lower cost. But the overall performance of a company – or of a larger value chain – depends on the performance of each activity or trade, but also on the link between these activities. This is where we come back to the theme of the data-centric organization, because activities – and even more so the links between activities – require the reception, processing and exchange of information.

From application-centric computing to data-centric computing

In the second half of the 20th century, during the first wave of computerization, each « trade » of a company had developed applications to increase its performance: the computer-aided design system, the production robots, the inventory management, the employee payroll, the company accounting, the customer database, etc. Each particular application was designed according to the cultural norms and vocabulary of its environment. Input data was formatted specifically for the application that used it, while output data was formatted for the needs of its immediate users. The result was an application-centric « siloed » computerization, with each application controlling the structure of its input and output data.

The traditional bank at the beginning of this text is a good example of this 20th century computing, whose main defect is the difficulty of communication between applications. Indeed, the conceptual breakdown and formatting of the output data of one application do not necessarily correspond to those of the input data of another application. For example, if the inventory management software does not share its data with the customer database software, it is difficult to respond quickly to an incoming order. But since the beginning of the 21st century, Internet connections have become more and more commonplace. On the hardware side, information processing is increasingly taking place in the large data centers of Amazon or Microsoft, which rent memory to their customers as easily as parking lots and computing power on demand as if it were electricity. Memory and computing power are becoming commodities that you don’t have to produce yourself. This is called cloud computing. On the software side, APIs (application programming interfaces) are data encoding/decoding interfaces that allow applications to exchange information. As a result of the changes mentioned above, application-centric computing is becoming increasingly obsolete, although it is still the de facto situation in most organizations in 2021.

In contrast to the 20th century, 21st century computing is data-centric. We need to imagine a common warehouse where different applications come to get their input data and deposit their output data. Instead of specialized data being ordered around particular applications, multiple applications, some of which are ephemeral, are ordered around a common and relatively stable digital memory. We say then that applications become interoperable. The take-off of data-centric computing can be dated back to 2002, when Jeff Bezos, the head of Amazon, asked all his developers to make their data available through an API.

From an economic point of view, data-centric computing improves the productivity of organizations because it allows different activities to share their data and coordinate more easily: the value chain becomes more fluid. Contrary to those administrations displaying indecipherable forms in bureaucratic jargon and asking users ten times to give the same information in different versions because their applications don’t communicate, large cloud companies (like Big Techs and BATX) have accustomed clients to immediate reaction times and optimized interfaces. The richest companies in the world are data-centric. So are dynamic sectors of the economy, such as the video game industry or the online distribution of movies and series. Since the benefits of data-centric computing are so obvious, why isn’t it implemented everywhere? Because there can be no data-centric computing outside a data-centric organization, and the transition to this new type of organization requires a considerable epistemological and social change. The major cloud companies date from the 21st century or the very end of the 20th century. They were born in the digital paradigm, and it is they who invented the data-centric organization. Older industries, on the other hand, are struggling to keep up.

To any activity (production, sales, etc.) corresponds a practical culture, a certain way of cutting up objects, naming their relations and sequencing operations. The computerization of an activity implies not only the creation of an application but also of a metadata system, and both are conditioned by a dated and situated practical culture. Merging an organization’s data collections requires « reconciling » the different metadata systems and, once this is done, committing to maintaining and evolving the common metadata system to accompany the needs. All this requires many discussions with experts from different spheres of activity and harmonization meetings, where bargaining over concept definitions can be tough. The reconciliation of data models is no less complex than any intercultural negotiation weighed down by power issues. Indeed, for most of the actors involved, it is not only necessary to revise their cognitive habits and ways of doing things, but also to give up a part of their local sovereignty. It will no longer be possible to organize one’s practical memory without coordinating with the other activities in the value chain, both on a semantic and technical level. From now on, data governance, for which the main person in charge is the « Chief Information Officer » or « Chief Data Officer », becomes one of the main functions of the company.

Data governance

Data governance faces two intertwined problems: semantics and politics. On a political level, it should be noted that metadata systems – that is, the categories that organize data – are always linked to the social, cultural and practical characteristics of their users. For example, in a large telecommunication company, consumption data will be organized by « lines » and not by « customers ». A customer may have several lines and the same line may be used by several customers. It is clear that customer relations would be easier if the data were classified and analyzed according to the physical or legal persons who use the company’s services. But this is not the case because the telecom company is dominated by a culture of engineers for whom the « real » data are those of the lines. This hardware-based rather than human-based approach also makes pricing as « objective » as possible and removes it from negotiation. In short, the way an institution organizes its memory reflects and reifies its identity. To reorganize its memory is to change its identity. The parallelism between metadata and social contexts makes data governance a political issue.

As for the semantic issue, it no longer concerns the subjective side of identity – whether personal or collective – but its logical side. If we want applications to be interoperable from one end of the value chain to the other, objects, relationships and processes must be named in a unique way. The difficulty here comes from the multiplicity of businesses, each with its own jargon, and the plurality of languages, particularly in international companies or sectors. When it comes to coordinating activities, synonyms (different words for the same thing) and homonyms (one word meaning several different things) become obstacles to collaboration. Homonyms, in particular, can cause serious miscalculations. For example, it happened in an airline company that the word « Asia » covered different geographical areas depending on the branch and that this semantic inconsistency caused strategic decision errors. When all operations are automated and driven by data, an ambiguous term can give false indications to managers, or even disrupt a supply chain.

The « data dictionary » or catalog is the primary tool for data governance. It is where all the data types are listed and the unique way of categorizing them. If, as is often the case, the catalog has not been unified, then « alignment tables » must be used between systems. Beyond the problems of consistency, data governance must also deal with the quality of the data. For this purpose, a « data control catalog » is used, which lists the methods for testing the quality of the data according to its nature. For example, how do you detect errors in customer names when the company operates in seventy countries? There are countries where we do not split into first and last names, other countries where numbers are acceptable in a name (in Ukraine), others where a name can have four or five consonants in a row, etc.

The transition to a data-centric organization implies a change of culture and an evolution of management. All of a sudden, words and concepts become important, not only in communication and marketing, but also in production, which is no less digitized than the other functions of the company. In addition, the cultural change calls for more openness and communication between departments, branches, services and businesses. No good management without data management, and no data management without good metadata management. We thought that interest in semantics was reserved for cultural studies departments in American universities, but now it is a condition of business productivity!

Distinguish between words and concepts

Finally, I note that the most sophisticated metadata editing and management tools on the market (Pool Party, Ab Initio, Synaptica) have no way of clearly distinguishing between « words » or « terms » in a particular natural language and « concepts » or « categories », which are more abstract and cross-linguistic notions. The same concept can be expressed by different words in different languages and the same word can correspond to several concepts, even in the same language (is the « mole » an animal, a spot on the skin, an infiltrated spy, the Avogadro’s number…?). Words are ambiguous and multiple, but recognizable by humans. The underlying formal concepts are unique and should be interpretable by machines. By proposing a unique encoding system for concepts and their relations that is independent of natural languages, IEML allows words and concepts to be distinct and articulated. This new encoding system not only advances semantics, but also has an unsuspected power to make value chains more fluid and increase collective intelligence.

P.S. I would like to thank John Horodyski, Paul-Louis Moreau, Samuel Parfouru and Michel Volle for answering my questions, thus helping to inform this post. Errors, inaccuracies, and heterodox opinions should nevertheless be attributed only to the author, Pierre Lévy.

3 commentaires sur « Towards a Data-centric Organization »

URBAN-GALINDO dit :

3 juillet 2021 à 13 h 29 min

L’approche par les données n’est pas forcément très nouvelle : dans les années 70 la méthode MERISE était déjà fondée sur la structuration des données avec le modèle « entités-relations »
Ce qui est nouveau par rapport aux pratiques des développements des applications aux périmètres des grandes Directions/Fonctions de l’Entreprise : Marketing Ingéniering Fabrication Distribution Finance en silos c’est la volonté d’unifier les définition des données partagées sur l’ensemble du périmètre.
Ainsi chez PSA nous avons une définition commune de le « voiture » avec toutes ses options du marketing à l’aprés-vente. Ce n’est pas très commun.
Cela demande une approche d’urbanisation vigoureuse du Système d’Information.
Le problème se complique quand on « parle » entreprise étendue !

J’aimeAimé par 1 personne

Répondre
Ping: Vers l’organisation data-centrique | Pierre Levy's Blog
Ping: Semantic Computing with IEML

Towards a Data-centric Organization

Publié par Pierre Lévy

3 commentaires sur « Towards a Data-centric Organization »

Laisser un commentaire Annuler la réponse.

Partager :

Similaire

Publié par Pierre Lévy

3 commentaires sur « Towards a Data-centric Organization »

Laisser un commentaire Annuler la réponse.