Struktur

Struktur is a TypeScript extraction engine that turns pre-parsed artifacts into validated JSON using the Vercel AI SDK. It chunks content by token budgets, runs strategy-driven workflows, validates with Ajv, and merges or dedupes when needed.

Highlights

Strategy-driven extraction: simple, parallel, sequential, auto-merge, or double-pass.
Token-aware batching with optional image limits.
Schema-first validation with Ajv retries on failure.
Merge and dedupe with schema-aware rules and CRC32 hashing.
Typed results with Ajv JSONSchemaType<T>.

Installation

bun install

Quick start

import { extract, simple } from "@mateffy/struktur";
import type { JSONSchemaType } from "ajv";
import { google } from "@ai-sdk/google";

type Output = { title: string };

const schema: JSONSchemaType<Output> = {
  type: "object",
  properties: { title: { type: "string" } },
  required: ["title"],
  additionalProperties: false,
};

const result = await extract({
  artifacts: [/* Artifact[] */],
  schema,
  strategy: simple({ model: google("claude-haiku-4-5") }),
});

console.log(result.data.title);

How it works

extract()
  -> strategy.run()
     -> batchArtifacts() / splitArtifact()
        -> prompt builder(s)
           -> runWithRetries()
              -> Ajv validation / retry
              -> merge / dedupe (strategy-specific)

Strategies

simple: single-shot extraction. Best for small inputs.
parallel: concurrent batches, then LLM merge.
sequential: batches processed in order with context carryover.
parallelAutoMerge: schema-aware merge + hash dedupe + LLM dedupe.
sequentialAutoMerge: sequential merge with dedupe pass.
doublePass: parallel merge, then sequential refinement.
doublePassAutoMerge: auto-merge first, then sequential refinement.

Common options (varies by strategy):

model: base extraction model.
mergeModel: optional merge model (parallel/double pass).
dedupeModel: optional dedupe model (auto-merge strategies).
chunkSize: token budget per batch.
maxImages: optional image limit per batch.
concurrency: maximum parallel batches.
outputInstructions: extra system output instructions.

import { extract, parallel } from "@mateffy/struktur";
import { google } from "@ai-sdk/google";

const result = await extract({
  artifacts,
  schema,
  strategy: parallel({
    model: google("claude-haiku-4-5"),
    mergeModel: google("claude-haiku-4-5"),
    chunkSize: 10_000,
    concurrency: 4,
  }),
});

Artifacts

Artifacts are JSON DTOs with text and media slices. Struktur does not parse PDFs or HTML; it expects normalized inputs.

import { urlToArtifact, fileToArtifact, registerArtifactProvider } from "@mateffy/struktur";

const artifact = await urlToArtifact("https://example.com/artifact.json");

registerArtifactProvider("application/pdf", async (buffer) => ({
  id: "pdf-1",
  type: "pdf",
  raw: async () => buffer,
  contents: [{ page: 1, text: "..." }],
}));

const fromFile = await fileToArtifact(buffer, { mimeType: "application/pdf" });

Events

Use events for progress and validation retries.

const result = await extract({
  artifacts,
  schema,
  strategy,
  events: {
    onStep: ({ step, total, label }) => {
      console.log("step", step, total, label);
    },
    onMessage: ({ role, content }) => {
      console.log(role, content);
    },
  },
});

Testing

bun test

Documentation

Landing page: docs/index.html
Full guide: docs/guide.html

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
docs		docs
examples		examples
original_php_source		original_php_source
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ANALYSIS.md		ANALYSIS.md
README.md		README.md
article.md		article.md
bun.lock		bun.lock
deploy.txt		deploy.txt
package.json		package.json
spec.markdown		spec.markdown
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Struktur

Highlights

Installation

Quick start

How it works

Strategies

Artifacts

Events

Testing

Documentation

About

Uh oh!

Releases

Packages

Languages

mateffy/struktur

Folders and files

Latest commit

History

Repository files navigation

Struktur

Highlights

Installation

Quick start

How it works

Strategies

Artifacts

Events

Testing

Documentation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages