The dataset is curated from research-grade observations provided by the Global Biodiversity Information Facility (GBIF), including validated data from iNaturalist and Observation.org. Images in CrypticBio are annotated with rich metadata including detailed taxonomic descriptions and observation context, enabling extensive filtering and analysis. Cryptic species groups for each species are organically derived from iNaturalist's record of historical misidentifications. We include species scientific names with multicultural and multilingual species vernacular naming practices from the iNaturalist Taxonomy Archive, to preserve ecological knowledge and increase cultural reach.

Spatiotemporal context is included as an additional modality which can then eventually be aligned with species image-text embedding as shown in TaxaBind.


Cryptic species have historically emerged as a consequence of biogeographic isolation (natural barriers, such as rivers, mountains, or deserts; deforestation; agricultural expansion; or man-made structures) which disrupted gene flow between populations and ultimately promoted allopatric divergence over evolutionary timescales, as shown below.

We hypothesize that the integration of spatiotemporal context will provide complementary cues beyond visual appearance alone and ultimately enhance the identification accuracy of cryptic species.