Skip to content

A powerful database schema extraction and formatting library

License

Notifications You must be signed in to change notification settings

ob-labs/MSchema-JS

Repository files navigation

MSchemaJS

A powerful database schema extraction and formatting library

License TypeScript Node

📖 Background

MSchemaJS is the JavaScript/TypeScript implementation of the M-Schema project by XGenerationLab.

M-Schema is a semi-structured database schema representation format that transforms complex database information into a concise, LLM-friendly format, significantly enhancing SQL generation accuracy in Text-to-SQL applications.

M-Schema Format Example

[DB_ID] my_database
[Schema]
# Table: users 
 [(id: INT, Primary Key),
 (name: VARCHAR(100), Examples: [John Doe, Jane Smith, Bob Wilson]),
 (email: VARCHAR(255), Examples: [john@example.com, jane@example.com, bob@example.com]),
 (created_at: DATETIME, Examples: [2024-01-15 10:30:00, 2024-01-16 14:20:00])]
# Table: orders 
 [(order_id: INT, Primary Key),
 (user_id: INT),
 (amount: DECIMAL(10,2), Examples: [99.99, 149.50, 299.00]),
 (status: VARCHAR(50), Examples: [pending, completed, cancelled])]

✨ Key Features

  • 🔌 Multi-Database Support: Works with MySQL, PostgreSQL, SQLite, and OceanBase Oracle mode
  • 📊 Complete Schema Information: Automatically extracts table structures, column types, primary keys, comments, and other metadata
  • 💡 Intelligent Sample Data: Automatically retrieves sample values for columns to help LLMs better understand data content

📦 Installation

npm install mschemajs
# or
pnpm install mschemajs
# or
yarn add mschemajs

Module Support

This package supports both ESM (ECMAScript Modules) and CommonJS:

// ESM
import { MSchema } from 'mschemajs';

// CommonJS
const { MSchema } = require('mschemajs');

🚀 Quick Start

MySQL Example

import { MSchema } from 'mschemajs';

const mschema = new MSchema({
  host: 'localhost',
  port: 3306,
  type: 'mysql',
  user: 'root',
  password: 'your_password',
});

// Connect to database
await mschema.connect();

// Get all databases
const databases = await mschema.getDatabaseNames();
console.log('Available databases:', databases);

// Get complete information for a specific database
const db = await mschema.getDatabase('my_database');
console.log(db.toString()); // Output formatted M-Schema

// Get information for a specific table
const table = db.getTable('users');
console.log(table.toString());

// Disconnect
await mschema.disconnect();

PostgreSQL Example

const mschema = new MSchema({
  host: 'localhost',
  port: 5432,
  type: 'postgresql',
  user: 'postgres',
  password: 'your_password',
  database: 'postgres',
});

await mschema.connect();
const db = await mschema.getDatabase('my_database');
console.log(db.toString());
await mschema.disconnect();

SQLite Example

const mschema = new MSchema({
  host: '',
  port: 0,
  type: 'sqlite',
  database: './example.db', // SQLite file path
});

await mschema.connect();
const databases = await mschema.getDatabaseNames();
const db = await mschema.getDatabase(databases[0]);
console.log(db.toString());
await mschema.disconnect();

OceanBase Oracle Mode Example

const mschema = new MSchema({
  host: 'localhost',
  port: 2883,
  type: 'oceanbase-oracle',
  user: 'your_username',
  password: 'your_password',
  sampleData: {
    limit: 3,
    excludeBinaryData: true,
  },
});

await mschema.connect();
const schemas = await mschema.getDatabaseNames();
const db = await mschema.getDatabase('YOUR_SCHEMA');
console.log(db.toString());
await mschema.disconnect();

📚 Usage

Basic Configuration

interface DatabaseConfig {
  host: string;           // Database host address
  port: number;           // Database port
  type: DatabaseType;     // Database type
  user?: string;          // Username
  password?: string;      // Password
  database?: string;      // Database name (optional)
  sampleData?: {
    limit?: number;             // Number of sample data rows, default 3
    excludeBinaryData?: boolean; // Exclude binary data, default true
    excludeTextData?: boolean;   // Exclude text data, default false
  };
}

type DatabaseType = 'mysql' | 'postgresql' | 'sqlite' | 'oceanbase-oracle';

Core API

MSchema Class

class MSchema {
  constructor(config: DatabaseConfig);
  
  // Connect to database
  async connect(): Promise<void>;
  
  // Disconnect from database
  async disconnect(): Promise<void>;
  
  // Get all database names
  async getDatabaseNames(): Promise<string[]>;
  
  // Get complete information for a specific database
  async getDatabase(databaseName: string): Promise<Database>;
  
  // Get information for a specific table
  async getTable(databaseName: string, tableName: string): Promise<Table>;
  
  // Clear cache
  clearCache(): void;
  
  // Get configuration
  getConfig(): DatabaseConfig;
}

Database Class

class Database {
  // Get database ID
  getId(): string;
  
  // Get database name
  getName(): string;
  
  // Get schema (OceanBase Oracle mode)
  getSchema(): string | undefined;
  
  // Get all table names
  getTableNames(): string[];
  
  // Get specific table
  getTable(tableName: string): Table | undefined;
  
  // Get all tables
  getAllTables(): Table[];
  
  // Format as M-Schema string
  toString(): string;
  
  // Convert to JSON
  toJSON(): DatabaseInfo;
}

Table Class

class Table {
  // Get table name
  getName(): string;
  
  // Get all columns
  getColumns(): ColumnInfo[];
  
  // Get specific column
  getColumn(columnName: string): ColumnInfo | undefined;
  
  // Format as string
  toString(): string;
  
  // Convert to JSON
  toJSON(): TableInfo;
}

Using with Text-to-SQL

import { MSchema } from 'mschemajs';

async function textToSQL(question: string, evidence: string) {
  const mschema = new MSchema({
    host: 'localhost',
    port: 3306,
    type: 'mysql',
    user: 'root',
    password: 'password',
  });

  await mschema.connect();
  const db = await mschema.getDatabase('my_database');
  const schemaStr = db.toString();

  // Build LLM Prompt
  const prompt = `You are a MySQL data analyst. The database schema is as follows:

【Schema】
${schemaStr}

【Question】
${question}

【Evidence】
${evidence}

Based on the user's question and evidence, generate an executable SQL statement.`;

  // Call LLM API
  const response = await callLLM(prompt);
  
  await mschema.disconnect();
  return response;
}

// Example usage
const sql = await textToSQL(
  "How many users registered in January 2024?",
  "User information is stored in the users table, with the registration date field being created_at"
);

Advanced Usage: Custom Connectors

import { BaseConnector, ConnectorFactory } from 'mschemajs';

// If you need a custom connector
class MyCustomConnector extends BaseConnector {
  // Implement custom logic
}

// Register custom connector
// Then use it in configuration

🔧 Development

Building the Project

# Install dependencies
pnpm install

# Compile TypeScript
pnpm build

# Watch mode
pnpm watch

# Clean build artifacts
pnpm clean

Running Examples

# Set environment variables
export MYSQL_HOST=localhost
export MYSQL_PORT=3306
export MYSQL_USER=root
export MYSQL_PASSWORD=your_password

# Run MySQL example
npx ts-node examples/mysql-example.ts

# Run PostgreSQL example
npx ts-node examples/postgresql-example.ts

# Run SQLite example
npx ts-node examples/sqlite-example.ts

# Run OceanBase Oracle example
export OCEANBASE_HOST=localhost
export OCEANBASE_PORT=2883
export OCEANBASE_USER=your_username
export OCEANBASE_PASSWORD=your_password
export OCEANBASE_SCHEMA=YOUR_SCHEMA
npx ts-node examples/oceanbase-oracle-example.ts

🤝 Contributing

Issues and Pull Requests are welcome!

If you're interested in the project or have any questions, feel free to contact us.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

🔗 Related Links

About

A powerful database schema extraction and formatting library

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published