Unofficial Bookmarks for STRATI 2026 Program v0.1.7
S14 July 2 · 12:00–12:15 · Room 776 (7F)

On-Line Open-Access Database of Stratigraphic Data and LLM-Enhanced Visualization System

S14 (title TBD) 📅 Add to Calendar

Jeremy R. Young,, Ellis R. Selznick, Neel Patel, Aaron Ault, James G. Ogg, Jiepeng Ye, Xiang Zongyuan, Juye We

Unraveling Earth history requires the assembly of estimated correlations among all types of stratigraphic data and of the global lithostratigraphic record. The legacy projects coordinated by the Geologic TimeScale Foundation had amassed a very large collection of time-calibrated stratigraphic data of many fields and regions. These had been contributed by dozens of experts during their publication of the Geologic Time Scale series of syntheses and the associated visualization enabled by the TimeScale Creator program (TSC). The TSC program, which is freely downloadable and is now online (https://tsconline.timescalecreator.org/), allows a user to draw custom stratigraphic tables from this array of multi-disciplinary data. This is a very widely-used resource, but the data underlying it has been less accessible and dependent on the curatorial efforts of a small group, albeit assisted by a very wide circle of collaborators. Crucially the prime data was progressively assembled and interlinked as a set of very large Excel workbooks, one per geologic period. That spreadsheet structure had been selected because it was very flexible and allowed complex relationships between datasets to be maintained, so that all the ages could be recalibrated as the underlying timescale age-model evolved. The scale of the accumulated data has now outgrown what can be conveniently maintained this way, and it difficult to share this type of data structure. In addition, the visualization for the data had existed as a stand-alone system that could not be interlinked with other external websites. Our projects were to migrate all of the various databases into a cloud-based FAIR system and to enable user interaction with the visualizations via GeoGPT or other AI-assistants. The first project, funded by the Deep-Time Digital Earth project, was to transfer the spreadsheet-based data into an online open access mySQL database. To do this, while maintaining the data interconnections and the ability to edit and update the data arrays, required development of a new online application (arkL). The data is primarily stored in two large tables of events and of intervals. The ages of intervals (stages, biozones, chrons, etc.) are defined by their bounding events, whilst the ages of most events (FAD/LADs, geochemical values, sea-levels, etc.) are defined by their position within an interval, or relative to another event. A small subset of events have externally defined ages, such as the astronomically-scaled Cenozoic geomagnetic polarity timescale, CONOP-derived biozone scale for Ordovician-Silurian, or radiometric dates, and all other ages are ultimately defined relative to these constraints. Changing an externally defined age automatically results in recalculation of all dependent ages. Translating the Excel interlinked data into the database and checking the data integrity has now been largely completed for the main datasets. The database currently holds data on 13,000 events and 10,000 intervals from 135 datasets covering chronostratigraphy, magnetostratigraphy, biostratigraphy, and sequence stratigraphy as well as time-series of geochemical data. In total this allows plotting of some 400 different columns on TimeScaleCreator. There is also a bibliography of 5000 entries. We are now beginning to migrate the suites of regional and specialized thematic sets (e.g., Australian lithostratigraphy, Cenozoic evolution-chart database for planktonic foraminifera, etc.). The current version of the migrated database is provisionally accessible at www.nannodata.org and data can be plotted and downloaded. The next phase of the project will include development of APIs to allow querying by other systems and to meet the requirements of FAIR. The second project in collaboration with GeoGPT is modifying the online visualization system and its databases to be an AI-agent that can directly utilized by other applications and to allow LLM-type user-interfaces to generate and modify the graphic output. One goal is to enable GeoGPT to extract information from published stratigraphy-type figures and to relay and plot it against other chronostratigraphic and regional datasets that are stored within the TSC systems. As of mid-April, 2026, a prototype is functional, and we hope to have a version for release on the GeoGPT website tool-page for the Strati-2026 conference. With the transition to an open-access system for the interlinked datasets and the ability to use it as an AI-agent, we hope both that a wider range of stratigraphers will be able to benefit from the system and that further updates and development can be shared among a wider range of collaborators.

PhanerozoicchronostratigraphydatabaseAIFAIR
Affiliations
  1. Dept of Earth Sciences, University College London, London, UK,
  2. School of Electrical & Computer Engineering, Purdue University, Indiana, USA
  3. Key Lab of Deep-time Geography & Environment Reconstruction, Chengdu Univ. Tech., China
  4. Geologic TimeScale Foundation, Indiana, USA
  5. GeoGPT, Zhejiang Lab, Hangzhou, China