Labs  ·  Active build  ·  Updated weekly

Labs

Experiments, prototypes, and technical build logs across AI systems, agents, software trust, research tools, and applied machine learning.

A living workspace for things I am testing, building, breaking, and improving. Some experiments become products, some become essays, and some become lessons.

Featured experiments

Active labs

Six workstreams I am running right now. Each lab is a focused investigation with its own success metric, failure mode, and lesson log.

Active Agents

Agent Workflow Lab

Experiments with LLM agents, tool use, memory, planning, and evaluation loops. Measuring what makes an agent reliably finish a task versus impressively start one.

tool use memory eval loops
Research Trust & verification

Software Trust Lab

Exploring evidence layers, audit trails, signed build records, and verifiable software systems. The thesis: trust is becoming a first-class layer of the AI stack.

audit trail provenance verifiable build
Shipped Computer vision

Retinal AI Lab

Foundation model experiments for retinal disease classification: zero-shot inference, linear probing, and low-data evaluation. 0.92 AUROC with 5 to 20 percent of labels.

FLAIR 0.92 AUROC low-data
View lab
Prototype Retrieval

RAG Evaluation Lab

Testing retrieval quality, hallucination control, chunking strategies, and answer evaluation methods. Building a small bench you can actually trust.

chunking hallucination answer eval
Active AI coding

Claude Code Build Lab

Experiments using AI coding agents to build, refactor, debug, and ship production interfaces. This very portfolio is one of the artefacts. Log of what works, what does not.

agent loops refactor shipping
Idea Founder tools

AI Business Tools Lab

Small tools for evaluating startup ideas, AI use cases, cold email offers, and workflow automation opportunities. A founder utility belt, not a startup.

idea triage offer testing workflow audit
Coming soon
The method

How I run a lab

Every lab moves through the same six steps. The point is not to be heroic. The point is to learn something specific and write it down.

Observe problem

Find a sharp question worth answering with code.

Design system

Sketch the smallest system that could test it.

Build prototype

Make the thing real enough to expose flaws.

Test failure modes

Push it until it breaks. Note what broke first.

Document lessons

Write the result so it survives the week.

Ship or archive

Promote it to product, essay, or close the file.

Where the labs converge

Current focus

Three problems I keep returning to. Most active labs are slices of these.

01 · Agents

AI agents that can use tools reliably

Reliability beats raw capability. Most demos work once. The interesting work is making them finish a task ten times in a row.

02 · Trust

Evidence and trust infrastructure for software

Software is increasingly generated, copied, and signed. Trust may become the next layer between code and the people who use it.

03 · Knowledge

Knowledge systems that help people understand AI faster

Atlases, field manuals, structured Q&A, learning paths. Tools that compress the time from confused to capable.

Want to collaborate on a serious AI build?

If any of this overlaps with what you are working on, I am open to focused collaboration and contract work.

Contact me