macroscope

( はてなダイアリーから移動しました)

Scientists should make their programs and data open -- what does it mean?

This is an English version of my post in Japanese here on 2010-02-06.

There have long been such opinions that scientists who work with public funding should make their computer programs and data open to public (or to be audited) to make the reproducibility of their studies sure. Such opinions are heard more often since the incident which some call "Climategate" (my post in Japanese here on 2010-01-29) happened.

Among those opinions which sound superficially similar, I find that some say things quite different from others, and my feeling is that some are reasonable and some are unreasonable.

Here I would like to raise a few logical points to think about the structure of the problem, without going to details of actual facts. Though I know that the status of computer programs and data as intellectual properties differ, I think that we can discuss them together in this context.

There are several pairs of notions that are different but I am afraid often confused.

The first is the difference between "encouraging" (or discouraging) and "requiring" (or prohibiting).

The second issue is the difference between "programs and data that are produced in a study" and "programs and data that are used in a study".

The third issue is the difference between "obligations agreed at the start of a study" and "obligations charged after it has started (and perhaps ended)".

The fourth is the difference between "making programs and data available to general public" and "making programs and data available to assigned auditors".

Now I am going to continue discussion combining these concepts somewhat freely.

I think that it is desirable to encourage the scientists to make the programs and data produced during their studies available open, especially when the studies are funded publicly and when they have public relevance.

And, as far as the work is under contract, the funding agency may require that the programs and data produced in the study should be available open. (But the rule will not work if it is applied too strictly. We must take into account that meaningful programs and data sets are usually combinations of parts produced with different funding.)

The programs and data used in a study have many sources. Some are public domain, and some have clear open-source licenses. But some are intellectual properties that should be protected. Some national governments consider large parts of climate data as national properties rather than public goods. Saying that scientists must make them open is absurd.

To make the programs and data available to official auditors is a matter different from making them open to public. If fraud is suspected, inspection is needed. Even without suspect, to audit research projects not only about handling of money but also about conduct of science may be a good idea (though it is not quite simple to decide who are eligible as auditors). Research contracts or employment contracts may include obligations to be audited. There are ethical bindings of the auditors: They must not leak legitimately closed information. They should not use the occasion for their own interests.

Also note: even if the institution of audit is legally backed, it is just backed by the authority of one sovereign country. If the closed data are properties of another nation, making it forcefully available to the auditors may be an infringement of another nation's right. It may be a diplomatic issue.

A related issue is how traceable a scientist's work should be. On one hand, it is a good thing to have clear codes and well-documented databases. On the other hand, researchers need trial-and-error. Sometimes we make it too quickly to be well documented. Probably we should re-do the trial with the "trace" switch on before we publish.