Moving on!

Are you and your agency tired of the international statistical metadata circuit?

Do you want real results?

Then we have a suggestion!

One full week of consultancy at your office. You will learn everything you need to solve your metadata modelling issues.

– Strategy

– Existing standards that can be modified and used

– Key insights

– Modelling

The model that will be discussed and explained is the Abelin model for classification management. This model illustrates how concrete modelling problems can be solved.

We will also use key insights to analyse your existings systems and existing international proposals (DDI, SDMX, etc).

The total prize for this is 10.000 EUR, all included. One of our associated consultants will be at your office for a full working week.

Please note, that this offer is only valid if and when a sufficient number of NSOs have placed an order. At least three NSOs within in each mayor geographical area has to place an order (Europe, America, Asia, etc).

You can contact us via e-mail: isos.meta[@]gmail.com.

Bo Sundgren – An intellectual biography – No. 8 The last decade

In the early 1970s, Bo wanted his own general theory of data bases, and dissmissed the relational model as too limited. History was not kind to this.

Then there was a mayor ARKSY project at Statistics Sweden. At the end of the 1970s there was even a first mayor UN session about statistical systems.

What became of that we do not know, except that they must have failed. But, we do know that most of the ideas and concepts were in circulation already then.

During the 1980s, everything becomes very quite, at least judging from the publications on Bo’s own web site.

In the early 1990s the old issues from the 1970s reappeared. Bo then started to rehash his old material and publish it in R & D Reports from Statistics Sweden.

In all this, we do not find a single article in a leading academic journal about informatics. Why not?

In any case, we have now reached the final decade. The 2000s. The beginning of this decade saw a concerted effort from the EU to finance “statistical metadata” projects. This is when we find projects such as AMRADS, Metanet and COSMOS. What did Bo contribute via these projects, if anything?

Not much, it seems. In fact, these projects were over all a great failure. And, here is the most important point! These failures were totally unnecessary. If Bo had been honest about learning from this earlier failures they may never have been repeated in the 2000s. Instead, he did the very opposite and sold his old horse to the new international metadata circuit.

Then, suddenly there was SDMX and DDI and the METIS framework. The SDMX consultant did not even know about Sundgren’s earlier models, so they arranged a special meeting between Bo and the consultant. Very little came out of that, except the new term DSD instead of key family (and Bo did not come up with that term himself!).

This has not stopped the international statistical metadata circuit from honoring Sundgren. To the contrary, this was exactly what they had been looking for. A mentor that could teach them how to do nothing slowly, forever.

For example, since 2001 The International Marketing and Statistical Output Database Conference (IMAODBC) awards a yearly “Bo Sundgren Award”, and the prize winners are of course only from statistical agencies. There is no self-denial here.

On Sundgren’s home page we find a large number of publications from the 2000s. What can we say about them?

Of course, he has continued to rehash old material. He has also begun to float more and more on top of things. He has left the more concrete issues and discusses statistical systems and public information systems in an increasingly lofty and abstract manner.

What else is there to do, when you have not been able to solve – or even properly formulate – the fundamental issues to be solved?

This has not stopped Bo from parading as knowledgable about modelling. For example, from 2005, we find a publication with the title “Modelling statistical systems”. Go, figure!

Another typical publication in this genre, is one with the title “Classification of statistical metadata”. Who needs that?

Where are the solutions to real-life modelling problems, stupid!

I think you get the general drift of this. You can always visit the “Publications” page on his home page and see the list of publications from the 2000s yourself!

I will end this intellectual biography with a small contest. If you can find a publication by Bo that has contributed to the solving of a core metadata modelling issue, feel free to contact me!

Bo Sundgren – An intellectual biography – No. 7 Another missing link!

My Swedish source continues to turn up new publications by Professor Sundgren.

I would seem that his bibliography as published on his home page is incomplete.

Here is what he has found:

“RAM – A Framework for a Statistical Production System”, published by Statistics Sweden in 1979.

Another missing link?

Now, here is something interesting. Guess where it was first published!

By the Conference of European Statisticians and their seminar about “Integrated Statistical Information Systems”.

Wow!

However, a quick look at this report suggests that Bo was busy trying to solve data base management issues with his own infological model, issues that were was later solved by commercial relational databases. Maybe that is why this ended up on the historical scrapheap?

On the other hand, the idea of general production modules, for aggregation, etc, can also be found here.

If anything, this just strengthens my critique. If all visions, goals and ideas, even concepts, were on the table already thirty years ago, why has this still not been solved?

The further back in time we can follow this, the more incriminating it gets. In this case, we have revealed an early involvement from the UN.

Bo Sundgren – An intellectual biography – No. 6 The missing link?

My posts about professor Sundgren have been relatively long. This one will be short.

I have asked what happened with Bo’s general theory of databases after this 1974 doctoral thesis.

I think I have found part of the answer! Check out the list of references in the following paper:

Sundgren, B. (2001a). The AlfaBetaGammaTau-model: A theory of multidimensional structures of statistics. MetaNet, Voorburg, the Netherlands.

In 2000, Microsoft had already committed itself in earnest to analytical software. Yet, the only references in this paper are to Sundgren himself. As if no one else in the whole world had written anything worth mentioning about dimensional modelling – except Sundgren himself.

However, the main point with the list of references in this paper is something else. Check out the second publication on the list!

Sundgren, Bo (1975): “Theory of Data Bases”, Mason/Charter Publishers, New York 1975, ISBN 0-88405-307-5.

Wow, his proposed theory of data bases apparently became a book, the year after his doctoral thesis. And it was published in the U.S.

I wonder how that book was recieved? And if it contained any references (to others)? And if  parts of the contents were published in real academic journals?

If we keep digging we can find more interesting tit bits. My Swedish source has just found another. In a report from Statistics Sweden, from 1992, Bo does use references and bitches about himself being early!

“The Entity-Relationship mode is often ascribed to Chen (1976). However, it is a fact that similar conceptual models, originated by several European authors, had been presented and used for at least a decade before the appearance of Chen’s paper; for example; see Langefors (1966), Sundgren (1973, 1974), Durchholz and Richter (1974), Lindgreen (1974).” (see p. 4, “Some properties of statistical information”, 1992:16).

So, Chen published a paper? And Bo, he published – yes, what? A book?

(What we do find at the end of this report is a reference to a contribution by Sundgren to an edited volume from 1974, about data bases. The topic in Sundgren’s contribution was the “Conceptual Foundation of the Infological Approach”).

Bo Sundgren – An intellectual biography – No. 5 A lost decade 1991-1999

So, all the cards were on the table already in 1974. Both ideas about standardizing the production of statistics and the realisation that models, standards, systems, etc, would be required. In the ARKSY project “reuse” was the key concept.

Then what happened?

In 1991 Bo Sundgren and Bengt Rosén published a model for documentation for reuse of micro data (SCBDOK). Go, figure!

Should not that have been solved a long time ago, as a natural part of the ARKSY project? Sixteen years later there is suddenly something in writing about how to document micro data for reuse! What happened during all those years?

Well, let bygones be bygones. Better late than never. Right? Not really. Did they solve the problem? No, they did not. SCBDOK is a model in the abstract that describes what a survey is. It avoids all the tough issues of exactly what, how and how much information is needed to really reuse data. In fact, the concept of reuse is not even explored on a practical level. It is therefore fair to claim that SCBDOK is not really about documentation for reuse. It is much more like the GSBPM than an analysis of what reuse is and what documentation it requires in practice.

What we see is an example of a structural flaw in the way that Bo approaches a problem. He abstracts and theorizes but does not care about implementation and realities on the ground. Much like his OPR(t) model. In 1974 he believed that a general theory of databases was more interesting than Codd’s relational model; a model that he described as limited in its practical use.

We also have to repeat our question. Why did Rosén and Sundgren not publish their SCBDOK model in an academic journal?

Well, in the decade to come Bo would produce three more classics:

Sundgren, B. (1993). Guidelines on the Design and Implementation of Statistical Metainformation Systems. Report for the UN/ECE METIS Group, established within the programme of work of the Conference of European Statisticians.

Sundgren, B. (1995). Guidelines for the Modelling of Statistical Data and Metadata. Published as Guidelines from the United Nations Statistical Division, New York.

This seems to say and solve it all. Architecture! Systems! Data and metadata! Modelling! Right?

Not really. Look at the titles! “Guidelines”. Hm, “guidelines”? Who needs guidelines

Where are the systems, stupid!

Where are the models, stupid!

And why was not anything of this published in an academic journal?

There may be a partial answer to the last question. Remember the old adage, that besides Ovid he only quoted himself? I can recommend that you click on the links and then look through these documents for references.

Who do you think appears in the bulk of those references? Some 10 out of 16 are Sundgren himself. And the others? Well, they are also from the statistical metadata circuit. It even seems that several are from the same project or conference.

In these lists we also find a partial answer to what happened between 1974 and 1991. There are a few publications by Sundgren and Malmborg and Sundgren. 1980, 1984 and 1989. Also, we find an article published in an academic journal from 1985! However, not by Sundgren and in Statistics Sweden’s own journal (JOS).

Where are the articles in real academic journals, stupid!

We also find that the OPR(t) and alpha-beta-tau models have resurfaced. Why do these models appear already in 1974 and then remain in oblivion until the mid 1990s, when they are suddenly published by the UN? Without references? Surely, by then there was plenty of new research and theory about, for example, dimensional modelling that could be referenced?

Who has heard of a professor that publishes his results only via government institutions and conferences?  What kind of a professor is that? What has he achieved? Why is he afraid of real peer reviewing? Why has he not published anything in academic journals? (There it is, again, that obnoxious question).

There is a bigger picture here. As before, we can give Bo the benefit of the doubt. Maybe statistical computing is very different in the NSO/ISO sector, and he has truly been a pioneer in this area. Maybe that is why he does not have any real articles or references?

However, this does not fly. Already in the 1974 ARKSY report he claimed that the project was of interest outside of the NSO community. It is also clear that he has been intellectualy lazy in referencing others and that he has avoided the necessary peer reviews of his OPR(t), etc, model.

The very unfortunate result of this is that it has become customary within the statistical metadata circuit to not build bridges and validate their work in the real world. You quote Sundgren and Sundgren quotes himself.

However, worse than that is what this has done to problem solving. If NSO/ISO requirements are very different, than exploring and understanding those differences should be at the heart of all statistical metadata concerns. Is it? No, they seem happy with their Neuchatel, and their DDI, and their SDMX, and their GSBPM without referencing, comparing or exploring what has been done in the private sector.

Bo Sundgren’s intellectual biography tells us something about how statistical metadata research developed into a closed shop.

It gets worse. There is in fact reason to suspect that Bo some time between 1974 and 1991 realized that either the goals were not possible to meet or he was not capable of finding a solution. So, he salvaged what he could, and created a closed shop for himself where he could take on the role of pioneer and guru, and no one would hold him accountable.

And it worked.

Today, the same thing is being done, but now on a level and a scale that no one could dream about in the 1990s. That includes the level and scale of charlatanism.

Footnote: The SCBDOK model as described in 1991 can be compared with the current GSBPM and especially the GSIM. It is still better than the latter. In the realm of the blind, the one eyed is king!

SAB, DDI and SDMX

In the latest SAB newsletter (October 2011) Arofan Gregory has an item about the “business case” for the use of both DDI and SDMX.

Do you think that his text contains a business case?

Do you think that it contains a solution to the DDI/SDMX integration issue?

Of course not. That would be doing serious work. Attempts in that direction would also reveal that Mr Gregory does not have a clue about what he is doing.

Instead, the only thing we learn from the article is that NSOs may need both DDI and SDMX, because they have been developed for different and only partly overlapping purposes. To cap we are informed that there is such a thing as the DDI/SDMX dialogue (but not whay they have (not) achieved).

This is the wagging-the-dog syndromed all over again. There is DDI, which grew slowly and obliquely into the NSO metadata sphere. Then there is SDMX, which was a one off consultant-cowboy initiative that nobody really understood.

So, you take parts from two different bodies and then try to put them together in a new body. Good luck with that! In fact, to continue the metaphor, they do not even know what a real body looks like nor what parts of such a body that are needed and what parts DDI and SDMX cover.

Here is a thought! What if the solution really is one model, which neither is like DDI nor SDMX. Then what? (I know this to be the case, because I know what the solution is).

Then there are organisational issues, discussed by Martin Vocsan in the editorial.

Sure, the organizational structures need focusing. There is currently more organizational bodies “working” with these issues than there are high level objects in the GSIM model. (Here is the SAB page that shows an overview of the current organizationl structure). However, such a focus is hardly possible before there is a focus for content issues. What exactly are all these groups trying to do, and in what order?

The simple truth is that they just do not know.

Bo Sundgren – An intellectual biography – No. 4 ARKSY

So, Bo finished his doctorial thesis in 1973. In it he predicted a unified theory of databases and that the relational model would not be able to solve the bulk of database needs. Then what?

On his blog we find a number of internal reports from Statistics Sweden, from 1974. That is all until 1991!

(I rely on my Swedish informant, so there may be one or two errors of translation. If so, let me know and I can correct them!)

The key acronym for these activities in 1974 seems to have been “ARKSY”. What on earth was that? And why was that the only thing Bo produced for eighteen years?

(At least produced, that was R & D, and that is now possible to publish on his blog. If you know of any other R & D publication by Bo between 1975 and 1990, please send me an e-mail!)

So, ARKSY was an acronym for “archive statistical system”. The purpose of this project was to increase reuse and recombination of statistical data within Statistics Sweden. In its graphical form, the vision was the classical attempt to transcend the stove-pipe organisation. An “input data warehouse”, if you will.

The background for ARKSY was described as changed external circumstances and it was envisioned that ARKSY would have a general influence on the way that statistics was produced. Have we heard this before? Or, better, how many times have we heard the same thing since then?

It gets better! One of the first steps was to propose a “frame” as a basis for a more strict theory of the production of statistics. Wow! Does that not sound a bit like the GSBPM?

In the description of chapter seven in the report there was further mention of “conceptual systems”, “frame models” and “standards” that were of importance to the realisation of the ARKSY. Does this sound familiar?

As concrete systems the vision was that the project would result in a variable and file catalogue, a database management system, and a record (or file) linkage system. Go, figure!

But, what about the surveys and standardization? The hard core content basis of standardisation!

Do you remember my posts about statistics as art or science? I mentioned that the case for standardization must be made by being specific about individual surveys. It must also be specific regarding different aspects of a survey. I specifically mentioned the statistical unit.

In 1974, the ARKSY project provided a full analytical model for this! Wow! There is very little new under the sun.

This analytical model included: (1) statistical units, (2) populations, (3) samples, (4) variables (including classifications), (5) time and (6) (physical?) access to data!

I wonder when this comprehensive analytical scheme was lost to the international metadata circuit? The exact same rethoric about standardization can now be found in international fora, via HLG-BAS. Yet, one searches in vain for an analytical model of this sort.

On the other hand, if the Swedes have been thinking about and trying to do this since 1974, what are the chances that the international community will be successful in the 2010s?

Clearly, something is wrong here. I think that it is easy to see that with as many as six parameters, the probability that each survey will be unique enough to make reuse difficult is very high. If it is not the definition of the object type, it is the sample, or the time, etc.

Then the report goes slipshod. It claims that there is some reuse and recombination at Statistics Sweden, but that the potential is much higher than the current practice. Really!? How did they know that for a fact?

Maybe they did not, but they later found out they were wrong? The report actually suggested that a next step was for the subject matter departments to suggest “new archival statistical products”. What happened with that?

What are the conclusions of this?

– That the same abstract visions have been around for over three decades, without achieving their goals.

– Initial claims are made without being been substantiated by concrete examples.

– If it is true that there was an attempt to find concrete examples at Statistics Sweden, already in the 1970s, then we would like to know why there never was an ARKSY system.

Of course, what this suggests is that there was no real empirical basis for these visions. Worse than that, it also suggests that Bo realised this already thirty years ago.

My suggestion is that everyone who has contact with Bo should ask him what happened with the ARKSY project and the attempts to standardize and define “new archive statistical products” (and a frame, and standards, and theory, and a variable and file catalogue, etc).

If he cannot give a straight answer, we must conclude that he is a fraud.