|High Energy Physics Libraries Webzine|
Issue 3 / March 2001
Let's face it: astronomy is largely a virtual science.
With the exception of experiments carried out in situ by solar-system spacecraft, our knowledge of the universe is totally derived from photons reaching us from outer space. And because of the finite speed of light, we do not observe the objects the way they are, but the way they were when the photons we are collecting actually left them.
What astronomers have thus in their data files is nothing other than a huge and complex virtuality of prior stages, differentiated as a function of the distance in space and time of the various sources. Thus the job of astronomers is to work on that space-time mosaicked virtual universe in order to figure out what is exactly the real universe and to understand the place and rôle of man in it.
As a result of the huge amount of data accumulated, but also by necessity for their extensive international collaborations, astronomers have been pioneering -- often in parallel with the high-energy physicists -- the development of distributed resources, electronic communications and networks coupled to advanced methodologies and technologies -- frequently much before they become of common world-wide usage.
But exactly because of this kind of paradox (immense philosophical impact versus inability to interact with the observed subjects), astronomers have to be specially careful not only of the way data are handled and archived for their own professional usage and for the scientific memory of the future, but also of the way astronomy information in the broad sense is conveyed to society at large.
At a time when taxpayers (who are nowadays in practice the exclusive
source of support for astronomy research) seem to have other priorities
(such as environment, health, security, unemployment) than space investigations
and cosmological perceptions [see Note 1], it is certainly appropriate
to keep in mind that a good image (which implies efficient and consistent
public relations policies) and top-quality education through all media
possible (including 'simple' but valuable web sites) remain the best tools
to ensure continuing public support of a science which, in return, will
be able to pursue at best its noble mission of building the cosmic context
of mankind's evolution.
The astronomy information flow
The information flow in astronomy is much more complex than what most books and courses on astronomical data are describing -- at least if one wishes to encompass the whole elaborate process from sources (cosmic objects) to end (mankind's knowledge).
That information flow diagram could have been made much more complicated to read, at the limit of legibility, by additional boxes and links, especially by shifting slightly the perspective, such as usage instead of flow. For instance, libraries could be linked to databases, to referees, to expert committees, to amateur astronomers, to the public at large, and so on.
Data centres have not been assigned a box per se in that scheme as they can be spread over several existing ones. Even if not fully realized yet, the key rôle of data centres has been recentred over the past decade. While they are now flourishing as distributed resources thanks to the networking of the planet, the advent of these electronic networks might have signalled for ever their destiny as information transit posts (hubs) or, as some say a bit disrespectfully, astrogroceries. We shall come back to this later on.
It is outside the scope of this paper to offer a whole treatise on astronomical information. Here are however for the newcomer to the astro-data world a few pointers, again mainly oriented towards data collection, processing, archiving and distributing: Boroson et al. (1996) on modern observing modes; Egret & Albrecht (1995) on a wide selection of databases, archives, data centres and information systems; Kidger et al. (1999) on Internet resources; Murtagh & Heck (1987) on multivariate data analysis; Heck & Murtagh (1989) on knowledge-based systems; Heck & Murtagh (1993) on information retrieval; Starck et al. (1998) on image processing; Heck (1997a) on electronic publishing; and finally Jaschek (1989) on astronomical data in general, still largely valid.
Series of specialized conferences provide also sources for advanced material: the Astronomical Data Analysis Software and Systems (ADASS) conferences (see e.g. Mehringer et al. 1999); the Astronomy from Large Databases (ALD) conferences (see e.g. Heck & Murtagh 1992); the Data Analysis in Astronomy (`Erice') workshops (see e.g. Di Gesù et al. 1997); the Library and Information Services in Astronomy (LISA) conferences (see e.g. Grothkopf et al. 1998); the Converging Computing Methodologies in Astronomy (CCMA) workshops (see e.g. Heck & Murtagh 1996 and Molina et al. 1997). Refer also to the numerous references quoted in those works as well as at least two historical colloquiums (e.g. Jaschek & Wilkins 1977 and Jaschek & Heintz 1982). Schools have also been occasionnally organized nationally and internationally (among these, see e.g. Hauck & Sedmak 1985).
The list above is definitely not exhaustive. Because of the rapid evolution of information technologies nowadays, and because of the impact they have on the dynamics of the astronomy community, some aspects of the most recent compilations and reviews could become rather quickly outdated. The basic principles will largely remain unchanged though.
The present paper is mainly offering a few thoughts derived from experience
in all these fields. In no way is there any intention to give lessons to
anyone: just some modest experience-sharing exercise from a largely independent
observer of, and sometimes actor/user in, the data information world in
astronomy during the last three decades -- and with some practice of society
at large with respect to both the impact of astronomy and communications
techniques in the broad sense.
Concepts and buzzwords
Concepts are sometimes poorly defined in data-related books and occasionally lead to misunderstandings. Without entering so-called `Jesuitical' discussions on some terms (such as data, information, knowledge, ...), it might be a good idea for scientists to spend some lines to introduce as precisely and as concisely as possible what is their understanding of the main terms they use, and this especially in new fields where concept forming is at risk of being a bit anarchical at the beginning.
To complicate the picture, next to scientific concepts, we are also living in a world of buzzwords. These are useful when well introduced and justified. They summarize ideas and projects in an imaginative way and can be excellent vectors to `sell' them to decision makers and takers, to the community, and to the society at large. Some of them might even make it into history.
Their semantic substance must however be representative of what they are labelling and not to be sources of confusion nor, in the worst cases, of deceit. Abuses of language should be avoided and concepts should be appropriately used. The sensationalism introduced by most communications media nowadays should not be followed in science, essentially in terms of excesses flourishing now in our everyday language.
Some big projects announced on web pages are only made of a few links
put together in a few minutes, projects for which at least in one case
the designer got commended and distinguished. The reader will surely understand
if no names are given here and if we rather go to the heart of the message:
(and the work behind it) should always have precedence over the attractiveness
of labels and the hype around them.
Speaking of buzzwords, a new one appeared recently in the literature: virtual observatory. While highly desirable and commendable, the proposed structures will be quite far from the classical acception of an astronomical observatory devoted to the collection of new data. The label could thus be seriously misleading since additionally a fundamental feature of the actual universe will be disregarded: its omnipresent variability (and the more we observe it, the more variable it becomes).
For instance, the project known in the US as the National Virtual Observatory (NVO) [See e.g. http://astro.caltech.edu/nvoconf and Cheung et al. (1999).] is basically the aggregation of complementary multiwavelength surveys. Other projects currently in the air are putting more emphasis on the methodological ways of tackling the existing -- and largely dormant -- amount of data, not only in astronomy, but also in Earth and environmental sciences [See for instance http://newb6.u-strasbg.fr/~ccma/vo for a European project].
A related project with a less questionable label (only the `instrument'
here is virtual) has been launched recently: AstroVirtel [See http://www.stecf.org/astrovirtel/]
aiming at making accessible the ESO/ST-ECF archive that currently contains
more than 7.0 terabytes of scientific data obtained with the ESA/NASA
Space Telescope (HST) and with several ESO large ground telescopes.
Among the many success stories of data centres, an exemplary one is certainly that of the Strasbourg astronomical Data Centre (CDS). Over almost three decades now, it moved from a file holder to an impressive information hub. It is recognized world-wide for the excellence of its work and its products -- the kernel of which is the huge database SIMBAD of catalogue synonyms giving access to all individual data from the integrated catalogues, all of this being completed by an object-oriented bibliography linked to the database and by several complementary services.
The above figure shows the various elements constituting today CDS's hub. It is out of the scope of this paper to review in detail the history of CDS and its component parts, but interested readers can find more by referring to the CDS web site [http://cdsweb.u-strasbg.fr/], to the CDS Information Bulletin series (now discontinued), as well as to the papers published as a special issue of Astronomy & Astrophysics Supplement Series (Vol. 143, 1, April 2000).
What we wish to stress more particularly here is that CDS's success is certainly due to the clearsightedness of its founders and to the extensive network of collaborations set up ab initio, but also to the right decisions taken at the right times by its successive managers, to the consistency of the policies followed and, last but not least, to its small but dedicated staff.
However, if most of the goals set by the founders have been reached -- and sometimes largely facilitated by not-so-quickly-expected technologies (such as the electronic networking of the planet), at least one of them did not really come through: astrophysical research directly geared to the data centre. This is worth stressing since such an ambition was explicitly written in the statutes.
Because of the current efficient connectivity, any astronomer able to hook onto the Internet is nowadays getting immediate access to all distributed astronomy resources from his/her office (or his/her home). There is absolutely no difference today whether the users are carrying out their research geographically close or far from the data repositories. It should however be kept in mind that we are not (yet) all equal round the world in terms of Internet access: large gaps still exist as shown by the following illustrations reproduced from Heck (2000).
Figure 2 shows the world geographical distribution of astronomy-related organizations (all categories) with an Internet presence (e-mail and/or web pages), while Fig. 3 and Fig. 4 display respectively the West European and North American subsets.
Methodological lessons learned
The world works in small circles and how many times can we observe that each circle tends to become its own finite universe tending to ignore the overall purpose of the whole machinery. Even within a scientifically narrow field such as astronomy (compare to e.g. physics or chemistry), it is often difficult for individuals to get enough distance to put their own work in the appropriate perspective and not to lose track that the ultimate aim, conditioning all steps of the information flow, is science and progress of knowledge.
How many times did we have to remind software people that their packages should be designed as tools to be used by `plain' scientists and not as challenges for computer freaks? Database people tend sometimes to gather data together for the sake of it, with only secondary interest for the science that can be extracted from those compilations. Examples could be multiplied.
The success of the database-related (e.g. ALD) and software-related (e.g. ADASS) conferences is the best indication that these fields are lively, but, as astronomers, we would certainly like to see more pointers towards significant scientific advancements at such gatherings, even if they are not their main purposes. Along the comments made by Benvenuti (1988) at the ALD-I conference, it is appropriate to recognize that one sees too many technicalities at such events and not enough astrophysics: too many data collections per se, too many prototypes, too many demonstrations of feasibility without new results, too many school examples and illustrations, in other words too many openings of doors already open.
Take for instance the Kohonen self-organizing maps with which there is a kind of fashion these days. One of them appears on a CDS web page [ http://vizier.u-strasbg.fr/cgi-bin/VizieR]. Personally I love to see such illustrations, but should we not ask also ourselves whether they are bringing anything to the progress of astrophysics?
Those beautiful graphical summaries are excellent to demonstrate to students or to beginners the proximities of fields or research works in multidimensional spaces, but should we not put to shame a scientist who would ignore who else is working in the same area or what are the astrophysical relationships of the objects he/she is studying -- i.e. his/her working context? In other words, that person would not learn anything from such illustrations.
Perhaps, after all, such methodologies find their real usefulness in other disciplines where the underlying physics is less developed than in our field. And there are instances too where they can lead to weird conclusions.
The message of this is that each development of new methodologies, software
packages, impressive web sites, and so on, should be pursued not as an
per se, but according to what they really and efficiently bring
to the progress of astrophysics and knowledge.
The real slot of electronic publishing
Do not be mistaken: no astronomy journal is practicing electronic publishing in the full sense of the medium. What they do is to put on line digitized files still basically along the lines of linear structure of a document on paper, since the first thing most users of such `electronic' journals want is ... to print the `papers'! And what we still have largely is a system equivalent to TV bulletin news zooming on newspapers (on paper) or showing people reading magazines.
A fully electronic resource would make use of all the degrees of freedom of the medium, such as the hypertextual structure, the colours, the sound and motion, the applets and whatever might come next. Certainly some e-peculiarities have been introduced in electronizing the journals, such as advanced integration in databases and forward referencing, not to forget the possibility of shipping the papers back and forth quickly between authors, editors and referees, as well as quick download from the web sites.
The real difficulty in scientific electronic publishing is not to make papers available electronically with whatever e-gimmicks and degrees of freedom. The true points to be solved are the following ones:
Scientific electronic publishing can probably expect natural progress from (at least) two areas: on one hand, cellular telecommunications including advanced voice recognition and all-web capabilities (this is solved in principle already) and, on the other hand, neurobiology with direct plugging of our own neural network into the technological communications loop (perhaps not so far away). This would induce dramatic changes in the rôle of publishing that goes already much beyond the mere transfer of knowledge in the dynamics of our communities (by being a key factor in recognition and evaluation procedures).
Grey literature has never really been identified as an important issue in astronomy, perhaps because of the small size of the community and the rather fast publishing procedures (compared to other disciplines).
However, and with all the consideration and appreciation due to preprint servers such as the LALN one [http://xxx.lanl.gov/ e.g. Ginsparg 1996], one must recognize that the system is somehow heavy and not very time-efficient (files need often to be compressed, possibly to be uuencoded, plus ftp-ed and/or e-mailed before the paper is up and available).
Again for all its value at the time it was set up, such a system could certainly be simplified nowadays by taking advantage of the web structure and by pointing to papers residing at the authors' sites. The maintenance would be lighter (especially in case of paper upgrade) and the validation procedures could remain very similar to the current ones. An abstract and a bibliographical reference could always be included in the main database together with the paper's URL.
As detailed in Grothkopf (2000), librarians are more and more involved
in the whole evolution of the astronomy-related information environment.
This increasing participation should be encouraged with adequate financial
and material means, not only to libraries, but also for librarians themselves
in order that they become optimally trained and organized.
Quality and automation
To the best of our experience, those two fellows are not quite good friends yet and this is said at the risk of being hated by colleagues developing automatic procedures for information extraction.
To give a couple of examples (no names!), there have been attempts of automatic spectral classification and other bibliographical studies ruined because the initial samples were polluted by that unavoidable fringe of data that still cannot be sorted out except by the human brain even today or because of some carelessness in the way the data were labelled in the original logs. We should also remember that astronomy does not quite work statistically (for instance, a star is expected to have a specific spectral type, not a chance percentage of being of such or such class).
Our own experiences with artificial intelligence and knowledge-based systems showed that, with currently available technology and problematics, it would take much too much time to capture the corresponding human expertise and to put it into a computer system. All the discourses heard on interviewing people before they retire or they die are very attractive theoretically. In practice, if that expertise is complex, it might take another life to record it. This does not mean we should not try to develop better ways to capture people's expertise, but we might have to wait for significant advances in neurobiology again to progress decisively in that direction.
As a general rule, the quality of results can at best reflect the quality of the input material when dealing with data-related methodologies (in other words, garbage in can only lead to garbage out). With the tools we have today, increase of quality is still the result of time and human sweat.
Beware also of what is sometimes said on automated methodologies. Spontaneous generation of knowledge does not exist as no methodology will ever reveal knowledge that is not already somehow in the data.
With the massive amount of information available nowadays thanks to the networking of the planet and in particular via the web, there are endless discussions on the virtues, properties and techniques of data mining. The problem today is not so much accessing information, but to identify good-quality information, then information relevant to the matter under consideration, and finally information exactly on target.
Additionally, as Gell-Mann (1997) expressed it, with the digital age producing an ``immense sea of data that threatens to drown humanity", people need to adapt how they think so that true knowledge can be distilled from the deluge. ``We hear, in this dawn of the so-called information age, a great deal of talk about the explosion of information and new methods for its dissemination. It is important to realize, however, that most of what is disseminated is misinformation, badly organized information or irrelevant information. How can we establish a reward system such that many competing but skillful processors of information, acting as intermediaries, will arise to interpret for us this mass of unorganized, partially false material?" What is true for the web is also valid much more generally.
According to our experience, as of today, validated information (be it refereed papers, homogenized and/or critically analyzed elements of a database, or others) cannot be secured automatically. Take for instance those two yellow-page services that we contributed to set up: on the one hand the StarPages [http://vizier.u-strasbg.fr/starpages.html -- Heck 1997b and the references therein] and on the other hand AstroWeb [http://cdsweb.u-strasbg.fr/astroweb.html -- Jackson et al. 1994]. The resources are factually compared at http://vizier.u-strasbg.fr/~heck/awsp.htm.
While AstroWeb is almost exclusively based on automated procedures registering URLs together with brief descriptions and verifying the corresponding links do not die out, the StarPages rely on a painstaking daily manual maintenance and various postal procedures for updating, authenticating and validating the information published. AstroWeb's quality of entries is heterogeneous, sometimes questionable and definitely non-exhaustive. Little if anything is checked beyond the aliveness of the links [see Note 2]. Its mirror sites are sometimes down or plagued by updating problems that may remain unnoticed for quite some time.
Even on the level of the quantity, automation does not perform better
since the StarPages include currently (amongst much more comprehensive
information) more than 11,000 URLs while the URL-only-oriented AstroWeb
does not even reach 3,000 URLs.
Need to look to the future
Believe it or not, but not so long ago, the space agencies had no post-mission plans for what to do with the data collected by their spacecraft. Their commitment was to select the best ones, to supervise their manufacturing, to launch them and, in the best of the cases, to operate them.
The scientific community or rather the principal investigators (PIs) were largely responsible for what was happening to the data gathered by the various experiments. In that respect, the policy was essentially a transposition of what was happening at ground observatories: once the observing runs were completed, the astronomers were going home with `their' data. But how many plates ended in drawers and closets, and were never looked at? How many tapes never saw a drive again?
Fortunately things changed with the cost of the missions, with the observing pressure on instruments (percentage of requested over available time) and with the general pressure from the community in parallel with the digitization of most data that became easy to duplicate.
Now no space experiment worth its name is launched without ad hoc provisions for archiving and disseminating its data to the community at large after some possible proprietary periods in favor of PIs. One can appreciate how complex such tasks have become by referring for instance to Hanisch (2000) and to Cheung & Leisawitz (2000).
On a more general level, Thomas J. Watson declared when opening a new IBM laboratory in 1932: ``There is no business in the world that can hope to move forward if it does not keep abreast of the time, look into the future and study the probable demand of the future.'' (Berghel 1999).
Predicting is definitely a difficult game, but few would disagree with the need for sound and accurate technology forecasting in any organization that seeks to remain competitive. There is thus a need for anticipating appropriate information policies in projects.
As we have seen earlier, summarizing information, i.e. reducing
it to its most essential points, is a real issue nowadays (see e.g.
Endres-Niggemeyer 1998). As pointed out by Albrecht (2000), there is a
definite need to innovate in terms of knowledge processing and transfer.
Refer also in this respect to Böcker (1998) and to various special
issues of the Communications of the ACM.
Education and communication
No astronomer will question the need to increase astronomy teaching at all levels. The field has been well covered by Percy and by Norton et al. (2000). The difficulties encountered in some countries with lobbies such as the creationism-related ones should not be underestimated, but they should also be carefully treated to avoid that actions taken backfire through some propaganda in favour of such groups.
In the same vein, astronomers would be well inspired to become more involved in organizations fighting pseudo-sciences such as the Committee for the Scientific Investigations of Claims of the Paranormal (CSICOP) [http://www.csicop.org/] and its world-wide affiliates. Amateur astronomers can also be usefully involved in such actions.
Amateur astronomers are generally classified in two categories: the active and the armchair amateur astronomers. While the latter ones have generally a passive interest in astronomy (reading magazines, attending lectures, and so on), the former ones carry out some observing, often with their own instruments, and such activities can be useful to professional astronomy. Mattei & Waagen (2000) beautifully exemplify how a well-organized and hard-working organization can efficiently contribute to the gathering of data and thus to the expansion of cosmic knowledge.
Inversely the sharing of knowledge with more passive amateur astronomers and with the society at large has many facets. Professional astronomers should also be encouraged to share their work more often with the open world. Beyond the world-wide audience of a journal such as Sky & Telescope [http://www.skypub.com/], there are many national journals which deserve more attention from our community.
Public observatories and planetariums (on the latter, see Petersen & Petersen 2000) are other outlets where professional astronomers could be seen more often.
As already pointed out earlier when speaking of electronic astronomy, there are still large portions of some of our continents where astronomy is almost inexistent (Heck 2000) and where help is dramatically needed. Refer for instance to Andersen (2000) for the various actions undertaken by the International Astronomical Union (IAU) [http://www.iau.org/] in this respect.
As amply illustrated by Andersen (2000), by Maran et al. (2000), and by Madsen & West (2000), astronomers need to learn how to communicate properly and it is true that, in general, this is not part of their education. And narrow-minded, often personally-motivated, initiatives have sometimes ended in disastrous results for the community because outsiders generally believe that astronomers are all talking with the same voice.
The American Astronomical Society (AAS) [http://www.aas.org/] has understood this very well, not only by its dedicated news media service (refer to Maran et al. 2000), but also by opening years ago a bureau in Washington, DC. It is lobbying directly the US Congress while instructing also adequately its membership via its newsletter and its electronic announcements for concerted actions at appropriate times with adequate arguments.
Each decade in the US too, an Astronomy and Astrophysics Survey Committee (AASC) [ http://www.nas.edu/bpa/projects/astrosurvey/] surveys the field of space- and ground-based astronomy and astrophysics, recommending priorities for the most important new initiatives. The publication of the next report (addressing the decade 2000-2010) is now expected.
We are still waiting for such undertakings in Europe, even at the national levels as political lobbying is largely left to individual initiatives and to short-sighted personal promotions and political connections.
However fascinating it can be, the communication process needs to be carefully planned: the formulation of a message (i.e. an information set), its conveyance, and its reception by targets who will each perceive it differently. In a scientific context, the matter is not only to deal with `true' information (i.e. authenticated, verified and validated), but also for each scientist to get the recognition he/she deserves among his/her peers, as well as for a scientific community to position itself adequately compared to other disciplines and to society at large. And in astronomy, as already mentioned, we are not only `selling' products (our research results) or ourselves, but also the fundamental understanding of mankind's position in the universe.
Innovations and assertive attitudes, in other words creativity, towards
society at large should probably be put more often into practice, for instance
when it comes to countering the problematic practice of selling stars by
offering instead cosmic objects for adoption while educating people adequately
(see e.g. Heck 1997c).
Conclusions and final comments
First of all, never forget: it all comes down to a few photons reaching us from outer space. Astronomy has thus to rely on the ingenuity of instrumentalists to conceive and design a whole range of observing tools exploiting optimally the latest technologies and the most sensitive detectors to obtain the most relevant and most varied information allowing progress of astronomical knowledge.
It is a truism to say that the last decades have seen dramatic changes in the way information in general is handled. Some of us surely remember how (not so long ago) we were still using logarithmic tables, mechanical typewriters, speaking to colleagues over noisy phone lines (sometimes hard to get and not uncommonly breaking down) and how we were dependent, to work and publish, on what we call nowadays `snail mail'. Astronomy has been no stranger to that evolution in parallel with instrumentation developments and panchromatic integration of wavelength ranges. Radio-astronomers and space scientists are no longer separate branches of the family.
It is also true that, over the last three decades, we went from the (almost) individualistic way of dealing with astronomical data to the current institutionalized handling of astronomical information with teams becoming larger and larger and involving specialists (astronomers or others) in instrumental technology, telecommunications, computing, image processing, knowledge extraction, electronic publishing, and so on.
Now we can also (in principle) cross-check the latest p/reprints on the astro-ph server or check out the ADS abstracts and papers [ADS = NASA Astrophysics Data System at http://adsabs.harvard.edu/] from the middle of nowhere through a PDA phone facility.
But have we gone through a revolution as some claimed maybe a bit too hastily or merely through natural evolution of technologies and methodologies? The few comments in this paper have pointed out a number of serious shortcomings and language abuses compared to the actual situation. The real revolution might still be to come, with the direct connection of our brains to a huge knowledge base, somehow foreseen in Gibson's (1986) Neuromancer.
What is sure, however, as technology leaders agreed at the ACM97 50th
anniversary conference in San Jose, CA, change is the only sure thing about
computing and communication technologies in the next decade. As always
in the past, astronomers will certainly be among the first eager users
of the latest technologies whatever they will be -- and we shall let future
generations answer the question above as they will have the advantage to
look at us with some historical perspective.
It is a real pleasure to acknowledge here the impact of numerous readings
and enlightening conversations with many colleagues involved in astrophysics,
communications, computing, information science, library management, communications
media, publishing, sociology of science, and so on.
Tel: (+33)(0)390 242 420
|André HECK has a 30 year-long international career in astronomy and space sciences with interdisciplinary collaborations involving, beyond instrumental technologies and information sciences, psychology, biology, medicine and sociology. He has also degrees in communication techniques and mass studies. A dozen years ago, he initiated and coordinated the reflexion of the astronomy community on electronic publishing.|