BIFOLD: In Search of Europe’s Scientific Identity
Modern algorithms open up new possibilities for historians
In the past, scholars used to pore over dusty tomes. Today Dr. Matteo Valleriani, group leader at the Max Planck Institute for the History of Science as well as honorary professor at TU Berlin and fellow at the Berlin Institute for the Foundations of Learning and Data (BIFOLD), uses algorithms to group and analyze digitized data from historical works. The term used to describe this process is computational history. One of the goals of Valleriani’s research is to unlock the mechanisms involved in the homogenization of cosmological knowledge in the context of studies in the history of science.
The project is co-financed by BIFOLD and researches the evolutionary path of the European scientific system as well as the establishment of a common scientific identity in Europe between the 13th and 17th centuries. Dr. Valleriani is working with fellow researchers from the Max Planck Institute for the Physics of Complex Systems to develop and implement empirical, multilayer networks to enable the analysis of huge quantities of data.
In Paris in the first half of the 13th century, Johannes de Sacrobosco compiled an elementary text on geocentric cosmology entitled Tractatus de sphaera. This manuscript is a simple, late medieval description of the geocentric cosmos based on a synthesis of Aristotelian and Ptolemaic worldviews.
“This compilation of the knowledge of its time is the result of an emerging intellectual interest in Europe. In the 13th century, a need arose for a knowledge of astronomy and cosmology on a qualitative and descriptive basis – parallel to and driven by the emergence of a network of new universities,” explains Valleriani. Over the following decades, the Tractatus de sphaera was commented on, extended, and revised many times, but continued to be a mandatory text at all European universities until the 17th century.
Digitized copies of 359 printed textbooks featuring modified forms of the Tractatus de sphaera from the period 1472 until 1650 are now available to researchers. During this period of about 180 years, some 30 new universities were founded in Europe. The universal language of scholars at that time was Latin, which contributed significantly to the high mobility of knowledge even in this period. “An introductory course in astronomy was mandatory for students in Europe at that time,” explains Valleriani. “As a committed European, I am mainly interested in how this led to the emergence of a shared scientific knowledge in Europe.”
Taken together, these 359 books contain some 74,000 pages – a quantity of text and images that it is not possible for any individual person to examine and analyze. Working with machine learning experts from BIFOLD, the research team first had to clean, sort, and standardize this colossal data corpus drawn from a wide range of digital sources to make it accessible for algorithms. The first step was to sort the data into texts, images, and tables. The texts were then broken down into recurring textual parts and organized according to a specific semantic taxonomy reflecting early modern modes of production of scientific knowledge. Each of the more than 20,000 scientific illustrations had to be linked to the extensive metadata of the editions and their textual parts. In addition, more than 11,000 tables were identified in the Sphaera corpus. “To analyze the tables, we developed an algorithm to divide them into several groups with similar characteristics. This allows us to now use further analyses to compare these groups with each other,” explains Valleriani. This process may sound simple, but in fact involves countless technical difficulties: “Developing suitable algorithms is made more difficult by four error sources. The books from this period contain many printer errors. This and the fact that the conditions of the books vary greatly makes them at times hard to digitize. Then there is the problem of the differing quality of the electronic copies. We also have to remember that at that time every printer used their own typeface, meaning that our algorithms have to be effectively trained for each printer to be able to even recognize the data.”
In order to track the transformation process of the original text in the 359 books dating from this 180-year period and formalize this process of knowledge, the researchers need to understand precisely how knowledge changed, ultimately becoming more and more homogenous. “To achieve an understanding based upon data requires an intelligent synthesis of machine learning and the working practices of historians. The algorithms which we will now publish are the first capable of analyzing such data. We are also looking forward to develop further algorithms as part of our continuing cooperation with BIFOLD,” Valleriani explains.
Building and Interpreting Deep Similarity Models. IEEE Transactions on Pattern Analysis and Machine Intelligence. https://doi.org/10.1109/TPAMI.2020.3020738
Evolution and Transformation of Early Modern Cosmological Knowledge: A Network Study. Scientific Reports – Nature. https://doi.org/10.1038/s41598-020-76916-3
Further information is available from:
Dr. Matteo Valleriani
Honorary professor at the Institute of History and Philosophy of Science, Technology, and Literature
Tel.: 0049 (0)30 22 667 128