DocFlare: Building an Edge-Native Document Q&A App on Cloudflare

I live in Germany, and I am still learning German. If you not know, Germany is famous for handling all processes via letters. Not electronic mails, but physical letters. Not knowing the language well enough has always been a challenge, especially when dealing with letters.

I can use translation services, or AI apps to understand what these letters are about. However, I have always been uneasy about uploading sensitive documents to third-party services. Contracts, invoices, tax forms — these contain information that I would rather not send to someone else’s servers. But whenever I needed to extract knowledge from a stack of PDFs, the options were limited: read them manually (kein Deutsch), ship them to an external API for processing (hello, data privacy concerns), or build a custom RAG pipeline with a dozen moving parts.

I wanted to see if I could build a complete document Q&A system where the data never leaves Cloudflare’s network. No third-party LLM APIs. No external vector databases. No data flying off to services I don’t control. The result is DocFlare — a chat-based app where you upload PDFs, ask questions in natural language, and get answers grounded in the documents. Everything runs on Cloudflare’s edge infrastructure: Workers, Durable Objects, R2, AI Search, Sandbox containers, and Workers AI.

In this article, I’ll walk you through how DocFlare works, the architectural decisions I made, and the problems I ran into along the way.

But before that, here’s a quick demo

The Problem with PDFs

PDFs are the cockroaches of the digital world — they’re everywhere, they survive everything, and they’re nearly impossible to work with programmatically. When I started building DocFlare, the PDF extraction piece was the challenge I was most worried about. More on that later.

Architecture at a Glance

Before diving into the details, here’s how the system fits together:

The key thing I want to highlight: every box in that diagram is a Cloudflare product. R2 for storage. AI Search for the RAG pipeline. Workers AI for generation. Sandbox containers for OCR. Durable Objects for stateful chat sessions. There’s no external dependency in the critical path.

Two-Strategy PDF Extraction

This was the hardest problem to solve well, and the part of DocFlare I’m most proud of. While it is not perfect, in my testing it reliably extracts meaningful text from a wide variety of PDFs — including scanned documents, handwritten notes, and image-heavy files — without hallucinating content.

Strategy 1: env.AI.toMarkdown()

Cloudflare’s Workers AI binding includes a toMarkdown() method that extracts text from PDFs and converts it to structured markdown. It’s fast, it’s included in Workers AI at no extra cost, and it works beautifully for text-layer PDFs — the kind generated by Word, LaTeX, or any modern document tool.

const results = await ai.toMarkdown([
  {
    name: fileName,
    blob: new Blob([pdfBytes], { type: "application/pdf" }),
  },
]);

const result = results[0];
if (!result || result.format === "error") {
  return null;
}

// Strip metadata headers toMarkdown always includes, then check
// that there's at least 50 characters of actual content
const contentsMatch = result.data.match(/## Contents\s*\n([\s\S]*)/);
const contentsSection = contentsMatch?.[1] ?? "";
const stripped = contentsSection.replace(/###\s+Page\s+\d+/g, "").trim();

if (stripped.length >= 50) {
  return {
    fileName,
    markdown: result.data,
    hasContent: true,
    method: "toMarkdown",
  };
}

The critical detail here: I strip out the metadata section that toMarkdown() always includes (page headers, etc.) and check that the remaining content is at least 50 characters. If it isn’t, we’re probably looking at a scanned document where toMarkdown() found little or no text layer — and we need to fall back.

Why Not Use a Vision LLM for OCR?

This was a tempting shortcut. Modern vision LLMs can “read” images, right? But there’s a fundamental problem: vision LLMs hallucinate when used as OCR. They’ll confidently “read” text that isn’t there, rearrange numbers in tables, and invent content. For a document Q&A system where accuracy is the entire point, this was a non-starter for me.

Strategy 2: RapidOCR in a Sandbox Container

For scanned PDFs, DocFlare falls back to classical OCR — specifically, RapidOCR running inside a Cloudflare Sandbox container.

RapidOCR uses the same PaddleOCR models (text detection, direction classification, text recognition) but runs them through ONNX Runtime instead of PaddlePaddle. This drops the runtime overhead from ~500 MiB to ~80 MiB — a big deal when you’re running inside a container with constrained resources.

The OCR container processes PDFs page by page to keep memory usage at ~25 MiB per page:

# Get page count first, then convert one page at a time to keep peak
# memory low (~25 MiB per page instead of all pages in memory at once).
info = pdfinfo_from_path(str(path))
num_pages = info["Pages"]

pages = []
for i in range(1, num_pages + 1):
    images = convert_from_path(str(path), dpi=300, first_page=i, last_page=i)
    img_array = np.array(images[0])
    result = engine(img_array)
    if result and result.txts:
        pages.append({"page": i, "text": "\n".join(result.txts)})

On the Worker side, the Sandbox container is invoked through Cloudflare’s @cloudflare/sandbox package. The PDF is written to the sandbox filesystem, then the Python script is executed directly:

const sandbox = getSandbox(sandboxNs, "ocr");

// Write the PDF to the sandbox filesystem
const base64 = Buffer.from(pdfBytes).toString("base64");
await sandbox.writeFile("/workspace/input.pdf", base64, { encoding: "base64" });

// Run RapidOCR and parse JSON from stdout
const result = await sandbox.exec("python3 /app/ocr.py /workspace/input.pdf");
const ocrResult = JSON.parse(result.stdout);

The result is a clean, structured markdown extraction that works reliably on scanned documents, handwritten-ish text, and image-heavy PDFs — with zero hallucination risk. I was genuinely impressed with how well this worked.

AI Search: RAG Without the Plumbing

If you’ve built a RAG system before, you know the pain: chunk your documents (but what chunk size? overlap?), generate embeddings (which model? dimensions?), store them in a vector database (which one? how do you index?), retrieve with similarity search (cosine? dot product?), maybe rerank, then generate.

Cloudflare AI Search handles all of it as a managed service. You point it at an R2 bucket, it indexes the contents, and you get a search API. That’s it.

Here’s the part that made me smile: my original plan included a full custom pipeline — bge-m3 embeddings, Durable Object SQLite storage, JavaScript cosine similarity. I scrapped all of that in favor of a single AI Search call:

const searchResponse = await this.env.AI.autorag("docsflare-search").search({
  query,
  rewrite_query: true,
  max_num_results: 8,
  ranking_options: {
    score_threshold: 0.15,
  },
});

One call. That replaces chunking, embedding, vector storage, retrieval, and reranking. I love when things get simpler.

Why search() Instead of aiSearch()?

AI Search offers two APIs:

aiSearch() — retrieval + generation in one call. Convenient, but you lose control.
search() — retrieval only. You handle generation yourself.

I deliberately use search() because I needed control over:

The system prompt — DocFlare identifies itself as a retrieval assistant with specific behavioral instructions: ground answers in retrieved context, acknowledge when context is insufficient, and include source filenames in responses.
Conversation history — Multi-turn chat requires injecting prior messages into the LLM context. aiSearch() doesn’t support this.
Streaming — Responses stream back over WebSocket in real-time. I needed direct access to the streamText() call.
Model selection — I use @cf/nvidia/nemotron-3-120b-a12b specifically.

The ChatAgent builds context from search results and passes it to Workers AI with the full conversation history:

// Build a context string from retrieved chunks, labelled by source filename
const contextText = chunks
  .map((chunk, index) => {
    const source = chunk.filename ?? `Document ${index + 1}`;
    const confidence = chunk.score ? ` (score ${chunk.score.toFixed(2)})` : "";
    const text = chunk.content
      .filter((entry) => entry.type === "text")
      .map((entry) => entry.text?.trim())
      .join("\n");
    return `[${source}${confidence}]\n${text}`;
  })
  .join("\n\n");

const workersAI = createWorkersAI({ binding: this.env.AI });

const result = streamText({
  model: workersAI("@cf/nvidia/nemotron-3-120b-a12b"),
  system: [
    "You are Docflare, a retrieval assistant for indexed PDF documents.",
    "Answer only with information grounded in the retrieved context.",
    "If context is insufficient, say so directly.",
    "Include the source file names in your answer when possible.",
    "",
    "Retrieved context:",
    contextText,
  ].join("\n"),
  messages: modelMessages,
});

Privacy by Architecture

This is the part I care about the most, and it’s not a feature bolted on after the fact — it’s a consequence of how the system is built.

Step	Where It Happens	Data Leaves Cloudflare?
PDF upload & storage	R2	No
Text extraction (Strategy 1)	Workers AI (`toMarkdown()`)	No
OCR extraction (Strategy 2)	Sandbox container	No
Chunking & indexing	AI Search	No
Retrieval	AI Search	No
LLM generation	Workers AI (Nemotron 3 120B)	No
Chat state	Durable Objects	No
WebSocket transport	Workers	No

Every single step runs on Cloudflare infrastructure. The original PDFs sit in R2. The extracted text sits in R2. The embeddings and index live in AI Search. The LLM runs on Workers AI. The chat sessions live in Durable Objects.

If you’re working with sensitive documents — legal contracts, financial records, medical information — this matters. You’re not shipping your data to OpenAI, Anthropic, or any other third party. The documents stay in your Cloudflare account. Privacy is a structural guarantee, not a policy promise.

The Tech Stack

Layer	Technology
Frontend	React 19, TanStack Start (SSR), TanStack Router
Runtime	Cloudflare Workers
Chat Agent	AIChatAgent (Cloudflare Durable Object)
Real-time	WebSocket via useAgent + useAgentChat hooks
LLM	@cf/nvidia/nemotron-3-120b-a12b via Workers AI
PDF Extraction	env.AI.toMarkdown() + RapidOCR (ONNX Runtime)
Object Storage	Cloudflare R2
RAG Pipeline	Cloudflare AI Search
OCR Container	Cloudflare Sandbox (Python 3.11 + poppler + PaddleOCR ONNX)
UI Components	@cloudflare/kumo + Tailwind CSS v4

A Note on the UI

I wanted DocFlare’s interface to feel different from the typical “AI chat” look. The design draws from archival documents and dossiers — parchment-colored backgrounds (#F4F1EA), vermillion red accents (#E3342F), zero border radius everywhere, monospace system labels like [AWAITING_COMMAND] and [GENERATING_RESPONSE], and a subtle noise texture overlay.

It’s a small detail, but it reinforces what the tool is: a system for interrogating documents. Not another chatbot with rounded corners and a gradient.

What’s Next?

DocFlare is currently single-tenant — one user, one document collection. Here are some things I want to build next:

Multi-tenancy — per-user document namespaces and chat histories
Document management — delete, re-index, and organize uploaded documents
Richer citations — link directly to source pages within PDFs
More file formats — extend beyond PDF to DOCX, plain text, and HTML

Wrapping Up

Building DocFlare was a fun exercise in seeing how far Cloudflare’s edge platform can go. The key pieces that came together:

Two-strategy extraction solves the “PDFs are hard” problem reliably — toMarkdown() for text-layer PDFs, RapidOCR in Sandbox containers for scanned documents.
AI Search eliminates the entire custom RAG pipeline — no chunking code, no embedding generation, no vector database to manage.
Edge-native architecture means documents never leave Cloudflare’s network — privacy is a structural guarantee, not a policy promise.

The entire project is open source. If you’re building on Cloudflare and working with documents, take a look.

If you have questions or want to share how you’re building with these tools, feel free to reach out on LinkedIn or X (Twitter). I’d love to hear about your use case.

DocFlare: Building an Edge-Native Document Q&A App on Cloudflare 📖