# BioKEA — long-form summary for LLM agents > BioKEA is an AI company with a wet-lab moat. A 5,000+ sq ft Berkeley lab plus an AI pipeline from field sample to verifiable scientific claim, built for the commons. This is the long-form variant of [llms.txt](https://biokea.ai/llms.txt). It is generated at build time from the same typed data modules that drive the rest of the site, so its contents are always in sync with what's rendered to humans. Canonical site: https://biokea.ai Short summary: https://biokea.ai/llms.txt Machine-readable JSON: https://biokea.ai/api/team.json, https://biokea.ai/api/projects.json, https://biokea.ai/api/capabilities.json Sitemap: https://biokea.ai/sitemap-index.xml ## Operations - **The Large Data Collider (LDC)** — combined wet-lab + compute infrastructure in Berkeley (5,000+ sq ft, operational March 2026). Houses an Oxford Nanopore Promethion 2 and ~80 instruments across extraction, prep, quantification, and sequencing stages. - **BioinfoOS** — the in-house software layer running on the LDC: AI-assisted extraction QC, taxonomy reconciliation, FAIR-package validation, draft narrative generation. - **Agentis** — a forthcoming AI-first open-access scientific journal on the AT Protocol. Launching at agentis.science. Publication is structured as interactive StoryMaps tied to FAIR data packages, not dead PDFs. - **Droplet** — an aquatic eDNA and metabarcoding specialist service line. ## Services (molecular sequencing as a service) BioKEA offers molecular sequencing as a service out of the Berkeley LDC, targeted at environmental DNA (eDNA) and related customers — conservation nonprofits, state/federal agencies, academic labs, and environmental consultancies. - **eDNA & metabarcoding** (primary target) — water, soil, and sediment samples through extraction → amplification → quantification → long-read or amplicon sequencing → taxonomic assignment, delivered as a FAIR-compliant data package. - **DNA barcoding** — Sanger-replacement barcode surveys via amplicon sequencing on the ONT Promethion 2 (COI, 16S, 18S, ITS, rbcL, matK, and custom primers). - **Long-read microbial genomics** — whole-genome, metagenomic, and hybrid assemblies on the ONT Promethion 2. - **Specimen screening (arriving early summer 2026)** — high-throughput morphological imaging on DiversityScanner, pipelined into the LDC molecular workflow. - **Pipeline integration** — bespoke sample-to-claim workflows for partner organizations that need more than a raw FASTQ drop. ### Service offerings (project-rate engagements; see /services to request a quote) - **Study design** Best-practices consultation on sampling, replication, controls, and analysis. - **Bioinformatic data analysis and interpretation** Read processing, taxonomy reconciliation, downstream statistics, written report. - **DNA-based identification of organisms (barcoding)** Single-specimen barcode sequencing across COI, 16S, 18S, ITS, rbcL, matK, or custom primers. - **qPCR / eDNA assay (single + multi-species)** Quantitative PCR for environmental DNA detection of one or more target species in water, soil, or sediment samples. - **Custom eDNA / qPCR assay design + validation** New-species or cross-species assay design including primer/probe optimization and specificity validation. - **Metabarcoding, metagenomics, and other custom analyses** Community-level amplicon surveys, shotgun metagenomics, hybrid assemblies, and bespoke pipeline integration. - **Field collection assistance** Sampling protocol development and field collection support — Bay Area projects in person; remote consult elsewhere. Full catalog, FAQ, and quote intake: https://biokea.ai/services ## Team ### Sean Jungbluth, PhD — CEO / CTO, Founder Microbial genomicist building computational and AI tooling for environmental biology. Lectures sometimes at Stanford on microbial genomics; previously studied deep-sea and subsurface microbial diversity across three submersible expeditions to ~2,650 m. Author of open-source pipelines and a contributor to FAIR data standards (MIxS, MIEM). Credentials: Anthropic Claude Community Ambassador; Built with Claude Sonnet 4.5 Challenge — Winner (https://x.com/alexalbert__/status/1978220407716245581) Knows about: Environmental DNA, Metabarcoding, Biodiversity informatics, Long-read sequencing, FAIR data, AT Protocol Profiles: https://seanjungbluth.me/ ### Michelle Jungbluth, PhD — CSO, Co-Founder Marine and estuarine ecologist focused on zooplankton communities and food-web dynamics. Combines field sampling with DNA barcoding, eDNA, qPCR, and metabarcoding to track threatened estuarine fishes — including longfin smelt — and identify indicator species in human-impacted wetlands. Lead investigator on BioKEA's San Francisco Bay metabarcoding baseline. Knows about: Marine biology, Estuarine ecology, Zooplankton, Metabarcoding Profiles: https://jungbluthlab.org/ ### Austin Baker, PhD — Founding Research Scientist Entomologist and biodiversity scientist managing the California Insect Barcoding Initiative — over 1 million specimens barcoded, with recent work estimating that at least one third of the state's insect biodiversity remains undiscovered. Previously a postdoctoral scholar at the Natural History Museum of Los Angeles County. PhD on parasitoid-wasp systematics. Knows about: DNA barcoding, Entomology, California insect biodiversity, Conservation biology ### Sunit Jain, MS — Advisor Bioinformatics scientist with 13+ years building agentic, multi-agent systems for microbial-community analysis. Author of Colloquip. Knows about: Multi-agent AI, Scientific deliberation, Bioinformatics Profiles: https://github.com/sunitj ### Greg Fedewa, PhD — Advisor Bioinformatics scientist (Caltech, Centre for Pathogen Evolution) developing computational methods for immunological and antigenic data analysis. Knows about: Bioinformatics, Immunology, Pathogen evolution ## Programs & support BioKEA participates in major cloud and AI infrastructure programs that supply compute, credits, and engineering support behind the LDC and BioinfoOS: - **AWS for Startups** — https://aws.amazon.com/startups/ - **Google Cloud for Startups** — https://cloud.google.com/startup - **NVIDIA Inception** — https://www.nvidia.com/en-us/startups/ ## Partners - **California Institute for Biodiversity** (major partner) — Bay Area nonprofit coordinating biodiversity research, specimen curation, and field inventories across California. https://www.calalive.org/ - **San Francisco Estuary Institute** (collaboration) — Independent research institute focused on the ecology, water quality, and habitat of the San Francisco Bay-Delta estuary. https://www.sfei.org/ - **Coastal Quest** (collaboration) — California-based nonprofit supporting coastal stewardship, conservation, and habitat restoration. https://www.coastalquest.org/ ## Six-stage pipeline (soil | water | specimen → verifiable claim) ### 01. Ingest — Universal Envelope Every input — raw FASTA, DwC-A archive, drafted manuscript — becomes a cryptographically trackable object. Automatic file-type detection and metadata extraction. ### 02. Analyze — Large Data Collider The LDC runs image QC, taxonomy reconciliation, and FAIR validation over millions of reads in minutes. Outputs operational taxonomic units and candidate novel lineages. ### 03. Draft — AI-assisted narrative The scientist directs; the AI drafts structure and links LDC data directly into the text. Cross-references with external hypotheses in real time. ### 04. Review — Multi-agent panel AI pre-screens manuscript structure and methodology in hours. Verified human experts evaluate contextual scientific nuance. Weighted, transparent scoring. ### 05. Broadcast — Interactive StoryMap The end product is not a dead PDF. It is an explorable digital artifact permanently tethered to its underlying FAIR data package (GBIF, NCBI SRA, Zenodo). ### 06. Amplify — ATProto / Bluesky Publishing is the starting line. Seamless AT Protocol integration pushes verifiable scientific artifacts into decentralized social graphs. ## Equipment (LDC · Berkeley) ### Extraction **Hero:** Thermo KingFisher Flex — 3-unit fleet — Magnetic-bead purification at 96-well scale, running in parallel. Paired with an Eppendorf epMotion 5075 liquid handler for end-to-end sample prep. **Also on this stage:** - Eppendorf epMotion 5075 - Qiagen QIAcube Connect - Thermo KingFisher Presto (×2) ### Prep & Amplification **Hero:** 2× Qiagen QIAgility — Automated PCR prep with reproducible pipetting across 96- and 384-well plates. Backed by a stable of thermal cyclers for parallel amplification. **Also on this stage:** - Bio-Rad S1000 Thermal Cycler (×2) - MJ Research PTC-200 and PTC-225 Tetrad - Eppendorf Mastercycler Satellite X50i - CyBio CyBi-SELMA 96 semi-automated pipettor ### Quantification **Hero:** 2× Roche LightCycler 480 II — Real-time PCR on a 384-well block, doubled up for high-throughput absolute quantification. Joined by an Applied Biosystems StepOne, capillary electrophoresis, and fluorometry. **Also on this stage:** - Applied Biosystems StepOne Real-Time PCR - Qiagen Qiaxcel Advanced Capillary Electrophoresis - Caliper Life Sciences LabChip GXII - Qubit 4 fluorometer - Molecular Devices FilterMax F3 microplate reader ### Sequencing **Hero:** Oxford Nanopore Promethion 2 — Two flow cells of long-read nanopore sequencing on demand. Live on site since November 2025 — the heart of the LDC. **Also on this stage:** - On-site library QC (Qubit 4, LabChip GXII, Qiaxcel) feeding directly in. Full inventory (~80 items) available on request. Hardware was sourced across 2025–2026 through Bay Area biotech auctions in San Jose, San Francisco, and Berkeley — near-new units acquired at roughly one-tenth retail cost. ## Projects ### Intertidal Biodiversity DNA Barcode Library (LIVE) A reference barcode taxonomic coverage gap analysis tool — 4,384 intertidal species along the California coast, cross-referenced against BOLD, NCBI GenBank, NCBI SRA, and GBIF to prioritize which species to sample next. - **Type:** Interactive Shiny app - **Year:** 2026 - **Partner:** Coastal Quest - **Tags:** eDNA, marine, gap analysis, DNA barcoding, California - **Team:** Sean (lead) - **Link:** https://biokea.shinyapps.io/california_intertidal_gap_analysis/ ### California Insect Barcoding Initiative (REVEALING-SOON) The first large-scale DNA-barcode survey of California insects — over 1 million specimens barcoded, estimating a conservative minimum of ~61,000 species statewide with roughly one third still undiscovered. Generates spatial richness interpolations constrained by ecoregion and vegetation type to guide targeted inventory and conservation. - **Type:** Research paper + dataset - **Year:** 2026 - **Tags:** DNA barcoding, insects, biodiversity, California, conservation - **Team:** Austin (lead) - **Reveal target:** Pending Ecography publication - **Origin:** Originated independently of BioKEA as Austin's prior research program; now continued under the BioKEA umbrella. ### DaKineDiving — real-time dive intelligence for O'ahu (LIVE) A real-time dive intelligence platform for O'ahu, Hawai'i. Combines NOAA tide data, PacIOOS wave buoys, and GBIF biodiversity records to surface conditions, encounter probabilities for 100+ marine species, and Marine Life Conservation District boundaries on an interactive map. Built with Claude Sonnet 4.5. - **Type:** Web application - **Year:** 2025 - **Tags:** marine, biodiversity, GBIF, eDNA-adjacent, Hawaii, AI-assisted build - **Team:** Sean (lead) - **Award:** Built with Claude Sonnet 4.5 Challenge — Winner (https://x.com/alexalbert__/status/1978220407716245581) - **Videos:** Walkthrough: https://drive.google.com/file/d/1eYVxautzXZERbk1Oez_VfE5xeEnx85dR/view?usp=drive_link; Walkthrough · additional biology features: https://drive.google.com/file/d/1artFfslcNR90__Jx9xeEAYPUDBjUkeAL/view?usp=sharing - **Origin:** Built by Sean as a solo entry to Anthropic's Built with Claude Sonnet 4.5 Challenge (October 2025); winner of the contest. Surfaced under BioKEA because of the GBIF biodiversity layer; not part of the BioKEA wet-lab pipeline. ### Bay estuary metabarcoding baseline (REVEALING-SOON) A longitudinal metabarcoding baseline for the San Francisco Bay estuary, in partnership with the San Francisco Estuary Institute. - **Type:** Dataset + paper - **Year:** 2026 - **Partner:** San Francisco Estuary Institute - **Tags:** metabarcoding, marine, estuary, Bay Area - **Team:** Michelle (lead), Sean - **Reveal target:** Q4 2026 ### Long-read microbial genome resource (COMING-SOON) A growing library of high-quality long-read microbial assemblies produced on the ONT Promethion 2 and published as a public resource. - **Type:** Dataset - **Year:** 2026–2027 - **Tags:** sequencing, microbial, long-read - **Team:** Sean (lead) - **Reveal target:** Q4 2026 ### Colloquip — multi-agent scientific deliberation (LIVE) An open-source multi-agent AI deliberation platform. Specialized scientific personas — Biology, Chemistry, ADMET, Clinical, Regulatory, Red Team — self-organize to debate hypotheses, with emergent discussion phases and energy-based conclusion instead of fixed turn orders. - **Type:** Open-source platform - **Year:** 2026 - **Tags:** AI, multi-agent, scientific reasoning, open source, deliberation - **Team:** Sunit (lead) - **Link:** https://github.com/sunitj/Colloquip - **Origin:** Originated and maintained independently by Sunit Jain on GitHub; surfaced here through Sunit's advisor role, not authored by BioKEA. ### Sequoia™ — a foundation model for global biodiversity (COMING-SOON) BioKEA's multimodal foundation model: it learns biology from DNA, images, and the spatial environment all at once — what an organism is, what it looks like, and where it lives. Reads partial DNA, fragmentary photographs, and habitat maps as one signal. Two tiers: the Seed (a small, single-file model that runs on a laptop, for education and open collaboration) and the Forest (the production engine that processes hundreds of millions of biodiversity datapoints). - **Type:** AI foundation model - **Year:** 2026–2027 - **Tags:** AI, foundation model, multimodal, biodiversity, computer vision, DNA - **Team:** Sean (lead) - **Reveal target:** 2027 ## Golden Sample Hunt — Code with Claude · 2026 A 30-day public scavenger hunt. The six BioKEA projects above double as "games": one Golden Sample Card is hidden inside each. Players collect six clue fragments, assemble them into a final answer, and submit through a Google Form. The first ten correct submissions win: - Real molecular sequencing of soil from the player's own backyard - A full report (PDF) - The raw sequencing data (FASTQ) - A Claude-powered explorer for the data Hunt window: May 7, 2026 → June 5, 2026 (deadline 11:59 PM PT). US residents only, 18+, one submission per email. Page: https://biokea.ai/golden-sample-26 ## Milestones - **2025-03** — BioKEA founded Biology Knowledge Exploration Assistant — spin-out from biodiversity and environmental omics research, accelerated by revolutionary AI tooling. - **2025-04** — Agentis started AI-reviewed science journal conceived. - **2025-09** — Berkeley lab planning begins Start planning the 5,000+ sq ft Berkeley lab space. - **2025-10** — Built with Claude Sonnet 4.5 Challenge — winner Sean wins Anthropic's Built with Claude Sonnet 4.5 Challenge with DaKineDiving, a real-time dive intelligence platform for O'ahu. - **2025-11** — Contracts begin; ONT Promethion 2 arrives Major new contracts begin and the Oxford Nanopore Promethion 2 sequencer lands on site. - **2026-02** — Sean becomes Anthropic Claude Community Ambassador Sean joins the Claude Community Ambassador program, deepening BioKEA's ties to the Anthropic developer community. - **2026-03** — Move into Berkeley space Team takes possession of the 5,000+ sq ft Berkeley lab. - **2026-04** — First employee hired Founding team expands, the first salaried hire joins the team. ## Vocabulary - **eDNA** = Environmental DNA - **LDC** = Large Data Collider - **CIB** = California Institute for Biodiversity - **SFEI** = San Francisco Estuary Institute - **FAIR** = Findable, Accessible, Interoperable, Reusable - **DwC-A** = Darwin Core Archive - **ONT** = Oxford Nanopore Technologies - **GBIF** = Global Biodiversity Information Facility - **NCBI SRA** = NCBI Sequence Read Archive - **MIxS** = Minimum Information about any (x) Sequence - **MIEM** = Minimum Information about an Environmental Microbiome - **AT Protocol** = decentralized social-web protocol underpinning Bluesky - **MLCD** = Marine Life Conservation District (Hawaii state designation) ## Contact biokea.ai/contact — partnership / capabilities / funding / Agentis early access.