GeoCroissant Recipes

Authors

Croissant & GeoCroissant Working Group

croissant@mlcommons.org

croissant-geo@mlcommons.org

Published

January 20, 2026

Authors

Rajat Shinde : NASA Office of Data Science and Informatics, University of Alabama in Huntsville, US

Manil Maskey : NASA, US

Ag Stephens : STFC Centre for Environmental Data Analysis, UK

Harsh Shinde : Individual Researcher, India

Joseph Edgerton : University of Virginia, US

Tejasri N : IIT Hyderabad, India

Douglas Fils : San Diego Supercomputing Center, US

Edenna Chen : Massachusetts Institute of Technology, US

Claus Weiland : Senckenberg - Leibniz Institution for Biodiversity and Earth System Research, Germany

Pedram Ghamisi : Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Germany; Lancaster University, UK

Gerald Fenoy : GeoLabs, France

Yuhan Douglas Rao : National Oceanic and Atmospheric Administration, US

Omar Benjelloun : Google, US

Elena Simperl : King’s College London & Open Data Institute, UK

Overview

GeoCroissant extends the MLCommons Croissant metadata standard with geospatial concepts for GeoAI and Earth observation workflows. These notebooks demonstrate practical implementation of GeoCroissant metadata creation, validation, and conversion across geospatial data formats and catalog standards.

What is GeoCroissant?

GeoCroissant (http://mlcommons.org/croissant/geo/1.0) is a metadata extension that adds geospatial capabilities to the Croissant standard:

  • Spatial & Temporal Coverage: Geographic extent and time ranges
  • Coordinate Reference Systems: CRS definitions and transformations
  • Spatial Resolution: Ground sampling distance and pixel spacing
  • Band Configuration: Spectral band organization and metadata
  • Time-Series Support: Temporal cadence and ordering
  • Responsible GeoAI: Spatial bias and sampling strategy documentation

Notebook Categories

Introduction

Core concepts and fundamental implementation patterns for GeoCroissant metadata.

Catalog to GeoCroissant

Convert geospatial catalog metadata to GeoCroissant format from STAC, NASA UMM-G, CEDA CMIP6, and Google Earth Engine.

Data Format to GeoCroissant

Generate GeoCroissant metadata from cloud-optimized geospatial data formats including GeoParquet, HDF5, NetCDF, and Zarr.

GeoCroissant Extensions

Advanced features for time-series Earth observation data and OGC Training Data Markup Language integration.

GeoCroissant to Standards

Export GeoCroissant metadata to STAC and GeoDCAT formats for interoperability.

ML Framework Integration

GeoCroissant datasets integrate with PyTorch, TensorFlow, and JAX using the mlcroissant Python library:

import mlcroissant as mlc
dataset = mlc.Dataset("geocroissant.json")

FAIR Principles

GeoCroissant supports machine-actionable FAIR data:

  • Findable: Versioned URIs and schema.org vocabulary
  • Accessible: Standardized API endpoints and persistent URLs
  • Interoperable: Stable vocabulary terms and format conversions
  • Reusable: Explicit conformance and responsible AI metadata

Conformance

GeoCroissant datasets declare conformance at the dataset level:

"dct:conformsTo": [
  "http://mlcommons.org/croissant/1.1",
  "http://mlcommons.org/croissant/geo/1.0"
]

Acknowledgements

  • MLCommons GeoCroissant Working Group
  • MLCommons Croissant Working Group
  • Open Geospatial Consortium (OGC) GeoAI Domain Working Group