BIBLindex
  • Home
  • Presentation (current)
    Overview of the project Story of the project Future prospects (current) Project team Publications Credits
  • Biblical tools
    Biblical texts Biblical correspondences Text credits
  • Patristic tools
    Search
    Searchable corpus Patristic authors Patristic works Groups of patristic authors
    Temporary issues Statistics
  • Help
    Guidelines Tutorials
  • Contact
  • Jerome's Hebrew Names
    The Jerihna project A digital edition A research seminar Jerihna and BiblIndex Bibliography
  • EN
    FR
  • Future prospects

  • The main lines of a work in progress

    The main challenge that BiblIndex faces is the following: to find a balance between quantitative and qualitative approaches. On the one hand, an expanded corpus must be made accessible within a reasonable period of time: the greater the number of citations listed, the wider the research fields covered will be, and the more the index will enable relevant and innovative statistical analyses. On the other hand, we also have to guarantee the quality and homogeneity of the analyses by developing very fine and consistent tracking techniques. The first requirement is the large-scale integration of scriptural indexes of new works; the second is the fine and time-consuming analysis of biblical intertextuality through patient readings of the patristic texts. Data models and interfaces must remain compatible with these two methods. The dichotomy between quantitative and qualitative is therefore at the heart of each of our work streams, the development of which is done iteratively, in a continual back-and-forth between computer scientists and patristic researchers:

    • the enlargement of the corpus, by checking already found references, and as soon as possible acquiring new data from specialists in the texts concerned;

    • the creation of a large community by developing a collaborative work site;

    • the automated identification of biblical intertextuality in texts;

    • the methodological reflection on data visualization;

    • the work directly carried out on the biblical and patristic texts, and their relationship with the existing numerical references.

  • A massive enlargement of the corpus

    There is no need to repeat what Steven Harmon has already written about the chronological and geographical incompleteness of Biblia Patristica 1. To improve this aspect as quickly as possible, we decided, rather than systematically checking archival data not published by the CADP as a prerequisite for their integration, to enter the data as it was, and warn the internet user that this is provisional data by displaying it in red. At the same time, of course, the meticulous verification goes on2: the references are checked one by one, and appear progressively in black on the website. So far, all recovered data has been prepared according to uniform guidelines3: verified or not, consistency is ensured.

    In addition to the records already available online, around 600,000 references, contained in the 15 linear metres of CADP handwritten archives for some 3,000 works, mainly written in Greek, but also in Latin and Syriac, between the 4th and 14th centuries, were digitalized between 2011 and 2017. Here is an initial glimpse of the corpus made up of the million pieces of data collected: the first three centuries have been fully processed; most of the 4th century is covered. For the 5th century, the exegetical works of Jerome, which were still missing, were integrated, and Cyril of Alexandria and Theodoret of Cyr were fully treated, along with later authors such as Procopius of Gaza, Gregory the Great (6th century) and Maximus the Confessor (7th century). To these major authors are added numerous exegetical catenae, works by Pseudo-Chrysostom, liturgical texts, especially Byzantine texts, etc.4.

  • Now that all of these CADP archives have been integrated – even if, as been said, much verification work still needs to be done – BiblIndex is building a systematic program for processing the missing works, adopting Biblia Patristica ’s goal of exhaustiveness¸ starting with the 4th and 5th centuries. The first project, which started in August 2017, concerns the works of Augustine, in great demand by internet users. BiblIndex specific guidelines have been drafted5, largely compatible with those of CADP.

    A special effort is planned to integrate the huge field of Eastern Christianity, and first of all Syriac, hitherto unexplored for biblical quotations. Syriac texts’ relationship with the Bible is indeed very interesting because of the linguistic proximity between Hebrew and Syriac. Listing their quotations will enable significant progress to identify or reconstitute the origin of the versions: regarding the Old Testament, the Jewish targums and the New Testament, on the one hand, the Diatessaron, and on the other hand, the Old Syriac – Curetonian or Sinaitic – or finally the Peshitta of the 4th century, which can be compared to the Latin Vulgate. The ten volumes of translation from Syriac in the Sources Chrétiennes series, with their biblical indexes, constitute a starting point, to be broadened, especially with Ephrem’s work published in the Corpus Scriptorum Christianorum Orientalium. The Syriac sources after the 5th century also play a very important role as evidence of Greek works, sometimes lost in their original language (e.g. Severus of Antioch and Theodore of Mopsuestia) 6.

    External resources will be added in the process of being analysed, e.g. the quotations found in Bernard of Clairvaux’ works (around 35,000 references); a partnership is being considered with the Faculty of Theology of the Aristotle University of Thessaloniki7: it has developed, thanks to a large team having worked for more than 30 years under the direction of professors S. Sakkos and P. Koutlemanis, a scriptural index, now available in digital form, of approximately 350,000 references, covering the whole of Migne's Greek Patrology; its current manager, Pr. Athanasios Paparnakis, has agreed to make it accessible via BiblIndex8. Another partnership is being intended with the PAVONe project (Platform of the Arabic Versions of the New Testament) of the University of Balamand (Lebanon), which lists not only all the Arabic manuscripts of the New Testament, but also the citations of the New Testament found in lectionaries and other Christian and Muslim literature of the first millennium. In addition, the scriptural indexes of the Sources Chrétiennes series not yet taken into account by the CADP will be added, initially the recent volumes whose indexes will only require a technical revision. In the longer term, links to other databases will be considered, opening up to other cultural and religious areas: Judaism9, Samaritan texts, and Islam.

    The architecture is fully modelled, and much of the data is already ready for import. Making them available online as quickly as possible and then being able to easily add new data are priority objectives. Unfortunately, for years, the weakness of the technical means made available to the project, due to insufficient funding, has slowed down the IT development. Various searches for external funding are in progress.

  • The creation of a collaborative website and the automated retrieval of intertextuality

    Given the scope of the work to be done to achieve completeness and data homogeneity, obviously the ten or so members of the Sources Chrétiennes team are insufficient. Two directions are being explored to speed up the process.

    Firstly, the establishment of a collaborative work platform, where each BiblIndex user and specialist in a text or a field could, over the course of his/her requests, participate in improving the data, by suggesting corrections via a controlled validation system. Any researcher preparing a text edition could also, in the course of his/her work, help improve existing records or provide new data. The models of this platform are ready and awaiting funding to realize.

    Secondly, we aim to apply to patristic works techniques for the semi-automatic detection of intertextuality. These quotation search assistance tools are intended to intervene upstream of the work of patristic and biblical scholars, to provide the latter with a pre-marked version of the text to be analysed.

    In 2013, a postdoctorate was carried out at the LIRIS lab by Samuel Gesche10. A precise and up-to-date survey of the field of lemmatization in ancient languages was first established: a great deal of work is indeed underway at various institutes. BiblIndex can rely on morphologically lemmatized versions of Greek and Latin biblical texts; the work on Syriac texts could be done in collaboration with the EEP project Talstra Center for Bible and Computer (VU Amsterdam). Insofar as a detailed knowledge of the linguistic system of each ancient language is required to prepare the lemmatization tools11, the work was carried out in close collaboration between the computer scientists and the patristic researchers. A lemmatizer specific to patristic ancient Greek, still to be perfected, has been prepared. Moreover, based on a sample lemmatized corpus (Clement of Alexandria, Quis dives salvetur and the complete works of Philo of Alexandria on the patristic side; the lemmatized text of the Septuagint and Greek New Testament on the biblical side), a configurable quotation detection tool has been developed. It is currently only effective when the compared texts have at least one common lemma. Maria Moritz (Institute of Computer Science, University of Göttingen) resumed this work in 2016 on the same test corpus, augmented by samples of texts by Bernard de Clairvaux, which resulted in a publication in 201612. The use of the TRACER software, developed by Marco Büchler in the e-TRAP (electronic Text Reuse Acquisition Project) research group in Göttingen, also makes it possible to identify a number of paraphrastic allusions. It will still be necessary to provide much training data so that the performance of the software on the patristic texts is sufficient. We also plan to work on semantic fields, by creating vast and multilingual dictionaries of synonyms. The International Workshop on Computer Aided Processing of Intertextuality in Ancient Languages, organized in Lyon as a concluding conference of the ANR Biblindex in 2014, has clearly shown how promising this research field is: it brought together representatives of a large number of European projects working on intertextuality in ancient literature. The proceedings of this meeting were broadened, to give birth to a special issue of the Journal on Data Mining and Digital Humanities (JDMDH), published in 2017.

  • The results visualization

    Another project concerns the visualization of the quotations identified, both upstream in the interfaces used by analysts – a great deal of work has already been carried out by LIRIS for the parallel scrolling visualization of biblical and patristic texts – and downstream in those consulted by internet users – basic query forms, results forms that must allow multiple sorting, etc13. Very precise mock-ups have been produced for all of these interfaces, which are also waiting for funding to be implemented.

    More specifically, the Laboratoire d'Informatique de Grenoble (LIG) prepared models and a prototype of multidimensional interfaces14 to allow visualizations of chronological-geographical requests. To account for the characteristics of the information in terms of quality (uncertainty, incompleteness or imprecision) and density, by means of an adapted graphic and cartographic semiology, while taking into account the diversity of profiles of the end users, was very challenging. Ultimately, it will be possible to select sets of scriptural quotations that form thematic constellations, by geographical area and for a given period. The selection of zones or periods will be made by means of visual queries (clicking on a map or on a timeline using a cursor). These functionalities will bring out processes of distribution or dissemination of biblical texts that are difficult to perceive through a simple textual interface: which canon, which text, was received, at a certain time, in Antioch, in North Africa, etc.?

    Interdisciplinarity is therefore at the heart of the BiblIndex project and requires a large number of specialists from various fields. In our HiSoMA laboratory, the focus is on the preparation of biblical and patristic texts for the subsequent phases of their visualizations.

  • 1.↑ S. R. Harmon, "A Note on the Critical Use of Instrumenta for the Retrieval of Patristic Biblical Exegesis", Journal of Early Christian Studies 11:1 (2003), 95-107, sp. 100 and 106.

    2.↑ To take just one example, there are 43,701 references found in John Chrysostom’s works in BiblIndex.This data was mainly recorded based on Migne’s Patrologia text. Undertaking rigorous checking of this data, referring to more modern critical texts when available, would require approximately 30,000 hours of work.

    3.↑ J. Allenbach (dir.), Étapes, moyens et méthode d’analyse pour la constitution du Fichier microphotographique des citations de l’Écriture chez les Pères, CADP, Strasbourg 1967.

    4.↑ An exhaustive list is available on the "Searchable corpus" page.

    5.↑ See L. Mellerin, "Methodological Issues in BiblIndex, An Online Index of Biblical Quotations in Early Christian Literature", in M. Vinzent, L. Mellerin, H.A.G. Houghton (eds), Biblical Quotations in Patristic Texts, Studia Patristica LIV, vol. 2 (Papers presented at the Sixteenth International Conference on Patristic Studies held in Oxford 2011), Leuven-Paris-Walpole, MA 2013, p. 11-32.

    6.↑ The article of L. Van Rompay, "Between the School and the Monk’s Cell : The Syriac Old Testament Commentary Tradition", in B. T. H. Romeny (ed.), The Peshitta: Its Use in Literature and Liturgy. Papers Read at the Third Peshitta Symposium, Brill, Leyde 2006, p. 27-51, gives a list of works for which processing is planned.

    7.↑ See A. Paparnakis, C. Domouchtsis, "Digital Greek Patristic Catena (DGPC). A brief Presentation", in M. Büchler, L. Mellerin (ed.), Computer-aided Processing of Intertextuality in Ancient Languages, Journal on Data Mining and Digital Humanities, special issue, 2017.

    8.↑ About 200,000 references do not duplicate the content of BiblIndex.

    9.↑ We have heard of a number of similar initiatives for the Jewish texts of the Second Temple period and the rabbinical period: At Brown University, Michael L. Satlow has dealt with the Babylonian Talmud. In Vienna, a project was carried out on the Second Temple, but the page that hosted it seems no longer to exist. At Saint-Louis University, Matthew Thiessen started a Biblia iudaica project in 2013, after collecting around 12,000 quotes from the book of Genesis in texts from the Second Temple and from the rabbinical era. In addition, the database of rabbinical texts of the University of Bar-Ilan (Responsa Project) spotted and lemmatized a large part of the biblical quotations. The project "The Greek Bible in Byzantine Judaism" at King's College in London aims to collect textual traces of the use of Greek biblical translations by Jews in the Byzantine era and to provide an online corpus.

    10.↑ See S. Gesche, S., E. Egyed-Zsigmond, S. Calabretto, "References and Citations in Ancient Greek Documents", 2017.

    11.↑ In particular, the connection of the forms or lemmas listed with the places and contexts of their occurrences; the removal of ambiguities during lemmatization.

    12.↑ M. Moritz, A. Wiederhold, B. Pavlek, Y. Bizzoni, M. Büchler, "Non-Literal Text Reuse in Historical Texts: An Approach to Identify Reuse Transformations and its Application to Bible Reuse", in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, Texas, November 1-5, 2016, p.1849–1859.

    13.↑ The site's search form has been completely redesigned: the corpus may be entered through a chosen Bible and verse numbering; the result form allows to see the biblical text corresponding to the request, and perform various operations on the data obtained, which the current form unfortunately does not allow.

    14.↑ The design and production of this interface are based on the GenGHIS environment, generator of spatio-temporal data visualization applications, initially developed to report on historical information dedicated to natural hazards. You can formulate visual requests and consult the results through several windows, all interconnected and synchronized: a cartographic window is dedicated to the spatial part, a timeline represents the time dimension, and an information (or attribute) window displays details of each entity contained in the information system.

v2.0-beta Project led by Sources Chrétiennes - Copyright © 2005-2023 BIBLINDEX - CNRS