Platform · Multimodal AI

VLM × VQA × VLA — three legs on one passport.

Vision-Language-Models, Visual Question Answering, and Vision-Language-Action sit on top of the same CoatingPassport. VLM produces it. VQA queries it. VLA acts on it. The data contract underneath stays stable as model generations land and as autonomous robotics matures.

MCP tool list →Agentic story →← Platform overview

VLM Vision-Language-Model

Live

Reads ROV / drone / handheld / fixed-camera / satellite / crawler / AUV footage and emits structured findings — category, severity, confidence, n_frames, image quality, source frame URIs, AI model version. The default backend is Claude; the VLMProvider interface lets us swap without refactoring the analysis pipeline.

Who uses it. Every CoatingPassport in production is produced by a VLM call. Operators don't call this directly — they consume the passport.

Endpoint

Internal — surfaces via POST /api/analyze and /api/export/pdf

VQA Visual Question Answering

Live

Takes an existing CoatingPassport and answers natural-language questions about it. Returns a structured answer with explicit confidence + citations to specific finding_ids and source frames. Refuses to invent data the passport doesn't carry — grounded_in_passport flag is enforced.

Who uses it. Agentic stacks (Cognite Atlas AI, Equinor Echo, custom Claude / GPT agents) that need an interpretable answer that cites passport lineage instead of hallucinating.

Endpoint

POST /api/passports/{id}/qa body: { "question": "..." }

Example

Ask: "Which findings drive the most ETS exposure on this hull?" → grounded answer citing finding_ids + source frames + confidence.

VLA Vision-Language-Action

Co-developed

Takes a finding and orchestrates the next physical action — request a closeup capture, reposition the camera, trigger a Spot / ANYmal navigation to the location, request human review. The VLAProvider interface is in src/lib/vlm/provider.ts; implementations are co-developed per operator at platform tier.

Who uses it. Energy majors running autonomous inspection robotics (Aker BP's Eureka, Equinor subsea autonomy, ConocoPhillips Ekofisk autonomy). Co-developed with the operator's engineering team.

Endpoint

Co-developed; not exposed on the public surface.

Example

On a severe-graded finding, VLA can trigger Spot to navigate to the location, capture a high-frame-rate closeup, and emit an updated passport revision — all in one loop, no human in the middle.

Try VQA live

Pick any demo passport and ask a question with curl:

curl -X POST https://hullproof.com/api/passports/demo-offshore-jacket-001/qa \
  -H "Content-Type: application/json" \
  -d '{"question": "Which findings are most severe and which standards do they cite?"}'

Returns a structured response with answer, confidence, citations to finding_ids + source_frames, ai_model_version, and a grounded_in_passport flag.

Passport diff — drift between inspections

Beyond VQA: the diff endpoint compares two passport versions of the same asset and surfaces escalated / new / resolved findings. The structured answer to “what changed since the last inspection?” — for an integrity-management workflow or an agent.

See drift demo →Raw diff JSON →

For operators co-developing VLA

Aker BP's Eureka, Equinor subsea autonomy, ConocoPhillips Ekofisk autonomy programs — the VLA layer is where Hullproof findings drive Spot / ANYmal / AUV action loops. Platform-tier scope, co-developed with the operator's robotics + engineering team.