OSF Integration Spec¶
Status: draft (2026-04-16)
Overview¶
The Open Science Framework (OSF, Center for Open Science) is a cloud-hosted project container offering free DataCite DOIs, immutable registrations, preprint hosting, ORCID-anchored contributors, and a link-only storage add-on model over GitHub, S3, Dataverse, Figshare, Zotero, Mendeley, and others. It has shallow metadata, imposes no project structure, and offers no code/pipeline/bibliography intelligence — but guarantees 50-year preservation and is the de-facto platform for preregistration in many fields.
Projio and OSF are complementary, not competing:
- Projio is the working environment — local-first, opinionated structure, deep per-subsystem metadata, agent-native via MCP.
- OSF is a publication / registration / archival façade — shallow metadata, cloud-hosted, DOI-minting, discoverable.
This spec defines how a projio repo becomes OSF-compatible without compromising its local-first, repo-as-knowledge design. The guiding principle mirrors OSF's own add-on philosophy: link, don't copy. The repo stays the source of truth; OSF becomes the discoverability / DOI / archive layer.
Goals¶
- One-command publication of a projio repo as an OSF project with a DOI.
- Single metadata manifest that also feeds
CITATION.cff,codemeta.json, anddatacite.yaml— no duplication. - Map projio subsystems (manuscripts, pipelines, bibliography, code, data) to OSF components so OSF visitors see a navigable structure rather than a file dump.
- First-class preregistration living in the repo (YAML answers → OSF registration schema).
- Manuscript → preprint flow via notio + OSF Preprints.
- Agent-facing
osf_*MCP tools with preview-first semantics (matching thedataladmodule).
Non-goals¶
- Re-implementing OSF storage locally.
- Making OSF the source of truth for code, data, or bibliography.
- Pushing every file in the repo. Uploads are explicit and curated.
- Running a custom OSF instance or upstreaming a projio add-on into OSF core.
Architecture¶
┌──────────────────────────────────────────────────────────────────┐
│ projio repository │
│ .projio/osf.yml ← single source of truth metadata │
│ .projio/osf-registration.yml │
│ .projio/osf/state.json ← mirror of remote state (generated) │
│ │
│ bib/ code/ manuscripts/ pipelines/ figs/ │
└────────────┬──────────────────────────────────────┬────────────┘
│ │
│ projio sync │ projio osf push
▼ ▼
generated citation artifacts ┌──────────────────────┐
CITATION.cff │ OSF │
codemeta.json │ ┌────────────────┐ │
datacite.yaml │ │ root project │ │
│ ├────────────────┤ │
│ │ component: │ │
│ │ manuscript │ │
│ ├────────────────┤ │
│ │ component: │ │
│ │ pipeline … │ │
│ ├────────────────┤ │
│ │ component: │ │
│ │ code (link) │ │
│ ├────────────────┤ │
│ │ component: │ │
│ │ data (link) │ │
│ └────────────────┘ │
└──────────────────────┘
The .projio/osf.yml manifest¶
Single human-edited file. Acts as the source of truth for all citation-related metadata in the repo. projio sync derives CITATION.cff, codemeta.json, and datacite.yaml from it.
# .projio/osf.yml
title: "Example Study on Cortical Dynamics"
description: |
Short abstract. Can be multi-line. Rendered into the OSF project
description and DataCite metadata.
category: project # one of OSF's ~15 categories
license: CC-BY-4.0 # SPDX ID
tags: [neuroscience, fmri, preregistration]
contributors:
- name: "Arash Shahidi"
orcid: 0000-0000-0000-0000
role: [conceptualization, software, writing-original-draft] # CRediT
bibliographic: true
- name: "Jane Co-Author"
orcid: 0000-0000-0000-0001
role: [investigation, writing-review-editing]
funders:
- name: "Example Funding Agency"
award_number: "EFA-1234"
award_uri: "https://example.org/awards/EFA-1234"
related_identifiers:
- identifier: "https://github.com/arash/example-study"
type: url
relation: IsSupplementTo
osf:
project_id: null # filled in on first push
public: false
include:
- manuscripts/*/build/*.pdf
- figs/**/*.svg
- figs/**/*.pdf
- .projio/render/compiled.bib
- docs/site/
exclude:
- "**/*.tmp"
- "**/__pycache__/**"
components:
- kind: manuscript
source: manuscripts/main
title: "Manuscript: Example Study"
- kind: pipeline
source: pipelines/preprocessing
title: "Pipeline: preprocessing"
- kind: bibliography
title: "Bibliography"
- kind: code
addon: github
repo: arash/example-study
title: "Code"
- kind: data
addon: s3
bucket: example-study-data
title: "Data"
The components: block drives the concept → OSF component mapping on push.
Commands¶
All commands are preview-first: default dry-run prints the planned API calls and file list; --yes executes.
projio osf push¶
- Validate
.projio/osf.yml. - Resolve the target project (create if
osf.project_idis null). - Sync metadata (title, description, license, category, tags, contributors, funders, related identifiers) via v2 REST API.
- For each entry in
components:, create or update the matching component and apply its own metadata subset. - Walk
osf.include/osf.excludeand upload changed files viaosfclientto OSF Storage on the appropriate component. - For
addon: github/addon: s3components, configure the OSF add-on to link the upstream service instead of uploading bytes. - Write the returned
project_idback intoosf.ymlon first push. - Refresh
.projio/osf/state.json.
projio osf pull¶
Mirror remote OSF project state into .projio/osf/state.json: DOI, public/private flag, contributors, last-modified, registration list, preprint list. Enables agent queries like "is this project published, what's its DOI, who are the contributors" without hitting the API every time.
projio osf register¶
- Read
.projio/osf-registration.yml(schema choice + answers). - Create a git tag
osf/reg/<timestamp>. - Build a frozen file-hash manifest of the current tree.
- Submit a registration draft to OSF's v2 API against the chosen schema (default:
OSF-Standard Pre-Data Collection; supported:AsPredicted,PreregChallenge,Secondary Data). - Optionally apply an embargo (up to 4 years).
- Return the registration DOI and store it under
related_identifiers.
projio osf preprint <manuscript>¶
- Build the PDF via
manuscript_build(notio + pandoc). - Upload to a chosen preprint server (PsyArXiv, SocArXiv, MetaArXiv, EarthArXiv, …; configurable per-manuscript).
- Fill preprint metadata from notio frontmatter +
osf.yml. - Return the preprint DOI and store it in the manuscript's frontmatter.
projio osf doi¶
Quick accessor that prints the project DOI (and registration / preprint DOIs if present) from .projio/osf/state.json.
Concept → OSF component mapping¶
| projio concept | OSF target | Mechanism |
|---|---|---|
| Repo root + README | Root project | Metadata from osf.yml; README uploaded |
manuscripts/<name> |
Component "Manuscript: …" | Built PDF + supplementary figures uploaded |
pipelines/<flow> |
Component "Pipeline: …" | dag.svg + pipeline-docs.md + rule list uploaded |
bib/ |
Component "Bibliography" | compiled.bib + biblio summary uploaded |
code/ |
Component "Code" | GitHub add-on link (no upload) |
data/ (DataLad) |
Component "Data" | S3 add-on link to DataLad RIA/S3 sibling |
figs/ |
Attached to the relevant manuscript component | SVG/PDF uploaded |
| Git tag + registration answers | Registration | projio osf register |
| Manuscript PDF | Preprint | projio osf preprint |
MCP tools¶
Expose under src/projio/mcp/osf.py, registered in src/projio/mcp/server.py:
| Tool | Purpose |
|---|---|
osf_status |
Return .projio/osf/state.json contents; refresh if stale |
osf_push |
Preview or execute publication (respects --yes) |
osf_pull |
Refresh local mirror of remote state |
osf_register |
Preview or execute preregistration |
osf_preprint |
Preview or execute preprint upload |
osf_doi_get |
Return the project / registration / preprint DOIs |
osf_contributors_sync |
Two-way sync of contributor list with OSF |
osf_manifest_validate |
Schema-check .projio/osf.yml |
All mutating tools follow the preview-first pattern used by the datalad MCP module.
Configuration¶
User config (~/.config/projio/config.yml):
osf:
token_env: OSF_TOKEN # personal access token env var
default_preprint_server: psyarxiv
default_registration_schema: "OSF-Standard Pre-Data Collection"
Project config (.projio/config.yml) may override any of these.
Derived artifacts (written by projio sync)¶
CITATION.cff— GitHub citation widget + Zenodo ingestion.codemeta.json— CodeMeta 2.0 for code discovery.datacite.yaml— DataCite 4 metadata, used when minting DOIs outside OSF (e.g. Zenodo).
All three are regenerated from .projio/osf.yml and should not be edited by hand. Each carries a header comment: # generated from .projio/osf.yml — do not edit.
Implementation phases¶
Phase 1 — Manifest + citation artifacts (low effort, high value)
- [ ]
.projio/osf.ymlschema + loader (src/projio/osf/manifest.py) - [ ]
projio syncgeneratesCITATION.cff,codemeta.json,datacite.yaml - [ ]
projio osf validate(CLI) +osf_manifest_validateMCP tool - [ ] Docs under
docs/how-to/osf-publish.md
Phase 2 — Push / pull
- [ ]
projio osf pushusingosfclient+ direct v2 API calls for metadata - [ ]
projio osf pull→.projio/osf/state.json - [ ]
osf_push,osf_pull,osf_status,osf_doi_getMCP tools - [ ] Preview-first semantics mirroring
dataladmodule
Phase 3 — Components mapping
- [ ]
components:block processing on push - [ ] GitHub / S3 add-on linking instead of uploads for
code/data - [ ] Per-component metadata propagation
Phase 4 — Registration + preprint
- [ ]
.projio/osf-registration.ymlschema - [ ]
projio osf registerwith schema selection + embargo - [ ]
projio osf preprint <manuscript>via notio build - [ ]
osf_register,osf_preprintMCP tools
Phase 5 — Exploratory (only if an institutional user asks)
- [ ] OSF Storage as a DataLad sibling via WebDAV
- [ ] Custom OSF add-on surfacing a projio repo's
.projio/index
Open questions¶
- osfclient vs. direct v2 calls.
osfclientis mature for file ops but does not cover registrations, wikis, or metadata. Push = osfclient for uploads + directrequestsfor metadata/components/registrations. - Preprint server defaults. Should
default_preprint_serverbe user-global or per-manuscript? Leaning per-manuscript in frontmatter with user-global fallback. - Registration schema coverage. Start with
OSF-Standard Pre-Data Collection; add others on demand. - DOI ownership. OSF mints DataCite DOIs free; Zenodo does too. Should
projio osfand a futureprojio zenodoshare the same manifest? Yes —osf.ymlis already shaped like a DataCite record for this reason. Consider renaming to.projio/citation.ymlonce Zenodo lands. - Contributor sync direction. One-way (projio → OSF) is safe; two-way risks overwriting OSF-side edits. Default one-way with explicit
osf_contributors_sync --pull.
Related specs¶
- bib-architecture.md — bibliography structure that feeds the
bibliographycomponent. - deliverables.md — publishable outputs that become OSF components and preprints.