20 years of progress and the failure of ISO/IEC 11179

Here is a more detailed look at Dan Gillman´s “20 years of progress” paper from the 2010 UNECE/METIS meeting.

1. “However, a desire to build much larger systems met with new challenges, and in 1999 the idea that building “cathedrals” should not be an initial goal – introduced by Jostein Ryssevik – was an important outcome. The notion of building a system up step by step is an old strategy, but it required repeating”

An important outcome? The projects that followed in the beginning of 2000 attempted exactly that. Metanet and COSMOS are two examples, with ambitions for common or reference models. IPIS is another. In fact, Dan Gillman´s CMR was one of the earliest attempts at building cathedrals and he has remained devoted to that approach all this time. In other words, this statement has nothing to do with historical truth. It is an example of rewriting history to fit the political needs of the present. It has been inserted to justify later failures and current confusions.

2. “Many offices also offer good metadata to go along with them. Now, the challenges are around services, software designed to work on the Web to find, manipulate, and integrate data from different sources. The change is toward more standardization and building automatic means for finding and manipulating data…”

This is glib, indeed. NSOs are far from offering good metadata. The current focus on exchange and standards has no connection to the existence of metadata in NSOs. To the contrary, it is the failure of several international and national metadata projects that have created the buzz surrounding projects such as SDMX. There is also no real standardisation going on. In fact, the opposite has happened. The focus on developing truly standard models for NSO systems has been lost. And the focus on “active” metadata is not a step in a general development, it is a result of the failed implementation of passive metadata.

3. “But, this requires significant upgrades to data and metadata systems, such as using standards for interoperability and designing “computable” terminology systems (ontologies)” [the sematic web]

Here is another example of self-promotion. Gillman attempts to further his own ideas from ISO/IEC 11179 by linking them to the requirements for the semantic web. His technique is to insert his own terms from 11179 (interoperability and terminology). I know very little about the semantic web. But the same thing was said about XML, and nothing has happened there yet. So, why should the semantic web be different? The truth of the matter is, again, that the core problems of metadata remain unsolved. NSOs are still crying out for truly standard models that are implemented successfully because they support NSO users in their daily work. Without a successful capture of metadata, there will not be any success, regardless of the latest technical fad.

4. “By the year 2000, new standards efforts such as the DDI started focusing on the survey life-cycle. In the last few years, Part C of the CMF, and the Generic Statistical Business Process Model in particular, is a much stronger focus for METIS and the Steering Group. Part A of the CMF also reflects the need for a strong internal focus.”

This is an example of what the Soviets called “the death of history”. DDI is named, because of its present interest, but other projects are striken from the historical record, such as SCBDOK, Metanet, IPIS, earlier UN standards, the Latvian system, etc.

5. Then we have the greatest self-promotion of all. “Terminology” is cast as a theme of its own. Why is not models a theme of its own? Instead, “formalism” is cast as a separate theme. This shows how Gillman is condescending in his view of real standard models, and how he views his own ideas about “terminology” as more important than real solutions to modelling problems. He is afraid of discussing real issues regarding real models.

“The importance of terminology was recognized early as one of the main outputs of METIS. Early work focused on the development of a terminology for metadata systems, edited by Dusan Prazenka then Daniel Gillman, (published by the UNECE secretariat in 1999) and models for handling classification systems (a form of a terminological system), which evolved into the Neuchâtel Classification model (published in 2000 and available in the CMF web site through http://www.unece.org/stats)”

In this quote we find all the cardinal sins at the same time. Gillman promotes his own early work, then attempts, with slippery language, to tie this work to the Neuchatel model for classifications. CLASET is not mentioned. Then he refers to classifications as a form of terminology, to make his own ideas about terminology seem more general and valid than they are, and, finally, he includes a link, in order to promote Neuchatel.

6. “In general, terminology management consists of defining and relating the concepts underlying terms that are used in an office and gathering together all the terms used for one concept. There are a wide variety of possible uses and applications of this technique, and statistical offices have long recognized the importance of this. Classifications, thesauri, ontologies, nomenclatures, glossaries, and dictionaries are all examples; and most if not all of these kinds of structures are in use in statistical offices”

Here is Gillman´s defintion of terminology work. Note that he again attempts to include classifications. He has to do this, in order to be able to claim that the importance of terminology work has been evident to NSOs and that such systems are widespread. But, classification databases have very little to do with ISO/IEC 11179 type models, and a central registry does in fact not need to collect all terms for a concepts. This is one of the mayor fallacies of 11179. If you have a central repository, why bother about local names?

To further these false claims about terminology in NSOs he includes a discussion of multi-lingual requirements. But this also has little to do with terminology management as handled by 11179. Of course, several offices are multi-lingual, but in terms of metadata systems, this is primarily a technical aspect of system design.

7. “In the 2000s, the same trend continued (MetaNet, Neuchâtel variables, ISO/IEC 11179, SDMX, DDI, and others). ISO/IEC 11179 and DDI were developed as standards, and SDMX became one. SDMX and DDI were both based on XML and complied with ISO/IEC 11179”

Here is just another example of the selective mentioning of standards. Why is not the Latvian system mentioned?

In this section, Gillman also turns to outright polemics. His claim is that there is something called “mathematical” formalism, and that this is no longer interesting, and not very practical. This is a false description of reality. Gillman´s purpose with this false description is to discredit the claim that theroetical insights are fundamental to the solution of ISOS modelling problems. In other words, it is a polemic directed towards the challenge from this blog and key insights.

Here is a quote:

“The CMF Part B (Concepts, Standards, and Models) and Part C (Statistical Life-Cycle and Generic Statistical Business Process Model) are further efforts to advance the work in this area. Again, the formality of this work is less than the mathematical precision required by some, but it has a practical value we will visit in the next section”

None of these “efforts” have advanced anything in this area.

8. “The real focus of METIS is to help statistical offices design, build, and use statistical metadata systems. Even though theory, formalism, concepts, standards, and models are part of the discussion, they are not the final product of the work”

Here is some honesty. But compare this with the choice of themes in the paper. Why is the paper not organised around systems and models as core themes? As opposed to “terminology” and “formalism”? As we have seen, this is because the two latter serve Gillman´s self-promotional activities and sly polemics. A focus on the two former would, on the other hand, reveal difficult truths.

Then the bare knuckles are shown. Gillman has nothing to say about neither impementation nor systems. This section, titled “implementation”, instead contains propaganda for the UNECE/METIS framework, especially part D and part A. But part A is not a METIS history of implementation and systems. And part D contains more general case studies, that are not very specific about NSO systems.

Why does Gillman have almost nothing to say about NSO systems, in this section?

“Over time, the complexity of the implementations in statistical offices has increased. Whereas, as we discussed in section IV-B, early implementations were more focused on data sets or single surveys, the corporate approach is much more common today. This migration over time is evidence that the lesson about not building cathedrals was learned. A slow approach is much more effective. In fact, it is evidence of the idea we are trying to convey in paragraph 43”

Once again he is arguing his own false point about developing cathedrals. What has happened is that NSOs generally have failed to develop the systems that they need, so they end up with very simple or overly complex systems, instead of real solutions. These failures are glossed over as a leason learned about strategy. Nowhere in this is the real lesson mentioned, the one about addressing the big picture and then developing truly standard core models. In fact, the term “core model” is not mentioned at all. Nore that the framework was supposed to be a description of system requirements.

9. Here is the grand finale:

“With the new effort to incorporate ideas from the Semantic Web, corporate implementations will get more complicated. Given the careful approach taken so far by statistical offices in developing their statistical metadata systems, even if these efforts don’t succeed, the overall systems will not fail. Seeing that this is the case, we conclude that METIS has been a great success so far. Let us ensure the next 20 years be as successful”

Ten dollars to anyone who understands what is being said here! Efforts may not succeed but systems will not fail? Which systems? What are their relation to this particular effort? Why should systems become more complex? Why is this supposed complexity not a problem, if the careful approach is used?

None of this is given a clear answer in Gillman´s paper.

So, what is this paper really all about?

My guess is that ISO/IEC 11179 has become a target of increasing criticisms due to its overly complex model. What Gillman is trying to do is argue (1) that this complexity will be needed in the future and (2) that it can be handled if metadata systems are developed step by step.

Maybe an important source of these criticisms has been my observation that Dan Gillman´s flagship “terminology” model ISO/IEC 11179 has been a resounding failure in the private sector? Maybe one or two NSOs are now also regretting the day they decided to implement 11179?

Advertisements

ISO/IEC 11179 and the metadata registry – revisited

Do you remember that ISO/IEC 11179 claims to be a standard for metadata registries? This characterisation is in itself vague, and can be taken to mean two things:

1. A standard that can be important to, and supported by several different metadata registries

2. A basic model for all metadata regstries

Now, here is the twist:

SDMX calls itself a standard for a registry for exchange of statistical data and metadata.

If so, then SDMX should either support ISO/IEC 11179 or be based on ISO/IEC 11179.

But, as it seems, it does neither.

How is it that a standard for a statistical data and metadata registry seems to be able to do without ISO/IEC 11179?

Could it be because ISO/IEC 11179 is a standard for a data element registry, and no one really needs data elements?

(SDMX does indeed claim to support 11179, but also this claim is vague. Support in what way? The proof is, as usual, in the pudding. Nowhere in the SDMX user guide is there a mention of data element concepts, data elements, etc. In other words, these constructs are of no importance to the SDMX model.)

ISO/IEC 11179 – a 2009 summary

One of the first concrete examples of the failures of the international metadata circuit discussed on this blog was ISO/IEC 11179.

11179 has had some influcence on NSO systems during the last five years, for example in Canada (Phase 3), Sweden (MetaPlus) and Croatia (system developed by statistics sweden). Its influence is also apparent in the Neuchatel for variables model.

I have listed four fallacies and one failure for this model. These fallacies and failure can all be derived from one single circumstance: the 11179 model has no connection with reality, it is a purely theoretical expression of semantic theory.

The four fallacies:

1. 1179 is touted as a standard for metadata registries, but there is no object in the model with the name “metadata”. Hence, it is not a model for a metadata registry (there can be no such thing) but instead a model for a “data element registry” – whatever that is.

2. Nobody needs a data element registry. What organisations need is a data registry. Industry standards for describing data already exist, but 11179 does not recognise these standards. It is therefore not a standards compliant model.

3. A result of the theoretical focus of 11179 is that it lacks functions that are essential to an NSO data registry. For example, a description of the physical storage of data and information that supports “statistical inter-operability”. It also requires a formalistic registration of objects that have little or no added value in a data registry (concept, data element concept).

4. The issues that 11179 supposedly has a solution to also turn out to be a fallacy. It lacks essential cataloguing features for a data registry, such as variable groups and data tables. Its “concept” that groups its “data element concepts” is also a fallacy, since a majority of data element concepts are unique, i.e. typically 80% have a 1:1 relationship to their “concepts” (even if this will improve over time, as new varieties of existing data element concepts are added). Synonyms are central to the 11179 model, but in reality they are secondary in a central registry, since standard terminology can be implemented on registration (and 11179 will in any case require such standardisation of the names of its “data element concepts” in order to function according to its intentions).

We have now arrived at the failure of 11179. It is catastrophic, and should serve as a memento to all general directors who are asked to sign up on 11179 as a corporate standard:

The private industry has rejected 11179. Even the single large manufacturer of data warehouse applications that did support it has withdrawn its support

It should be a priority task for NSO:s to contact the private industry to find out why they do not support 11179, in spite of the increased importance of data integration.

Given these fallacies and failure, the role that 11179 has played in the international metadata circuit reveals two things about this circuit:

1. The UNECE/METIS is incompetent and corrupt, because they have for more than a year been advertising 11179 as a relevant standard for official statistics, on their home page.

2. The one-eyed are leading the blind. In this case, the one-eyed is Dan Gillman (one of the two authors of 11179) at the U.S. bureau of labor statistics, and the blind can be found in statistics canada, statistics sweden, statistics norway and statistics Holland (the three latter are members of the Neuchatel group for variables).

The ISO/IEC 11179 fallacy no. 4

In our series about ISO/IEC 11179, we have arrived at fallacy no. 4.

11179 is supposedly a standard for a registry, and what do we expect from a registry?

One of the most basic requirements is that it should be easy to search. That is – sort of – the whole point with a registry.

It is easy to see that the two most fundamental solutions to this are tables (as found and used in production) and variable groups.

If nowhere else, this can be gathered from DDI, which not only has the table as its basic object, it has also developed an advanced taxonomy for variable groups.

Does 11179 provide these two basic search constructs? No!

Then, what does it provide?

The answer is that 11179 could not care less, since it is a way of expressing semantic theory in a model, and has nothing to do with requirements for real world registries.

The most important 11179 search construct is probably the concept. A concept can group several, more narrow, data element concepts, i e “income” will group “net income of person”, “gross income of enterprise”, etc.

Experience shows that the relation between a concept and a data element concept more often than not is 1:1. Altogether, some 70-80% have a 1:1 relationship.

A result of this fallacy is that the concept is more or less useless as a device for grouping data element concepts.

Another result is that recent, 11179 inspired, metadata systems lack variable groups, for example the MetaPlus system developed by statistics sweden.

A story worth telling

EUROSTAT has a questionnaire, that it sends to NSO:s, to learn more about their metadata strategy, systems and projects. One question is about the use of standards, and it lists ISO/IEC 11179.

In the Wikipedia article for ISO/IEC 11179, we learn that no large vendor of commercial software supports ISO/IEC 11179. Oracle, the only one that did, seems to have withdrawn its support.

This is strange, since data integration and metadata have become more important to the IT-community, not less.

– Why is it that a model that claims to be the only “mature” standard for exchange of data in a heterogenuous environment, lacks support from the industry?

– And why is it that the same model, during the same period, seems to have become more important in the international metadata circuit?

If I were a general director, asked by my line managers and experts to sign on ISO/IEC 11179 as a corporate standard, I would first give the large software vendors a call, and ask them why they do not support it. Especially, I would want to know why Oracle withdrew its support.

Now, that seems to be a statistical metadata story worth telling…

The ISO/IEC 11179 fallacy no. 3

What  happens if you base a model on a pet theory, instead of system requirements? Let us take a closer look at ISO/IEC 11179!

We already know that 11179 is not a standard for “metadata registries”. A model that has an entity called “metadata”, is a model for a “metadata registry”, and there can be no such a thing. 11179 is a model with a core entity called “data element”, hence it is a model for a “data element registry”.

But who has asked for a “data element registry”? What people want is a corporate data registry, that shows which data the organisation has, including technical and descriptive information that makes it possible to find and use the data. Does 11179 address these issues?

– it contains no solution to the core issue of physical storage and retrieval (only “semantics”)

– it contains no solution to the core issue of statistical inter-operability (only “semantic inter-operability”, whatever that is?)

So, what does it contain a solution for? Since 11179 presents a solution without first presenting the issues, this is not entirely clear. However, the issues seem to be:

– It can be difficult to find all relevant data, because similar data may have different local names (for example “gender” instead of “sex”)

– It should be easy to find data based on conceptual relations in terms of wide and narrow (e.g. “income” is a wider concept than “net income”)

It what way are these priority issues in a corporate data registry?

– Since the data registry is corporate, we can have guide-lines that standardise terminology, so that “gender” is used instead of “sex”. In other words, local names is a non-issue in a corporate data registry. And if you necessarily need to register a local, or other, name, then you can add attributes for alternative names.

–  When we want to register “net income” , why would we want to also register an “income” concept? If you really need a general definition for “income” you might as well use Wikipedia. What we want is the definition of net income in a specific dataset

– Why is not a simple wild card search a good way of finding all related data? Search the registry for “*income*” and you should be able to find both “net income”, “gross income”, “yearly income”, etc. Why do the users have to register a “christmas tree” of contrived entities such as data element, data element concept, etc?

What these examples illustrate is that the 11179 model, and its artefacts, are a result of a dogmatic adherence to semantic theory, and have little or nothing to do with real requirements for real corporate data registries.

A lot of hard working, honest people in statistical offices have to pay a prize for this, when they are required to document their data according to models based on 11179. Here is lesson learned at statistics sweden:

Make some sort of prototype at an early stage, it is very difficult and abstract for users to describe use cases for a system with this complexity”

In fact, modelling and documenting data in a corporate data registry is the easiest thing in the world, if you know what you are doing.

The pet-theory approach

Key insight no. 1 suggests that part of the metadata fallacy is due to a model-based approach, as opposed to a system-based approach.

This is not entirely true, there is something even worse: the pet-theory approach. This is the approach used to develop ISO/IEC 11179.

The story of 11179 seems to have begun with semantic theory, and the model is primarily a way of expressing that theory. Considerations regarding systems is then something that has been pasted on afterwords, as a post hoc rationale for the model.