CSV Data Processor – Group 5

Atlantic Technological University – Software Development Project 2025

Course: SWDE_IT803 - Software Development (2025/26)
Instructor: Lusungu Mwasina
Degree: Computing in Contemporary Software Development - Bachelor of Science (Honours)

A modular, production-grade Java library for reading, parsing, validating, transforming, and writing CSV files with clean object-oriented design and comprehensive test coverage.

Quick Start

Clone and Build

git clone https://github.com/MichaelMcKibbin/ATU-SoftDev-Grp5Project.git
cd ATU-SoftDev-Grp5Project
mvn clean install

Expected output: BUILD SUCCESS

Run Tests

mvn test

Expected output: All tests pass with BUILD SUCCESS

Run Demo Application

mvn -q exec:java -Dexec.mainClass="com.group5.csv.demo.Main"

Expected output: Interactive CLI menu for CSV operations

Project Structure

ATU-SoftDev-Grp5Project/
├── src/
│   ├── main/java/com/group5/csv/
│   │   ├── core/              # Row, Headers, FieldType
│   │   ├── io/                # CsvParser, CsvReader, CsvWriter
│   │   ├── schema/            # Schema, FieldSpec, Validators
│   │   ├── validation/        # Error handling
│   │   ├── exceptions/        # Custom exceptions
│   │   ├── ops/               # Filters, Transforms, Joins, Aggregations
│   │   └── demo/              # Demo CLI application
│   ├── main/resources/
│   │   ├── demo/              # Sample CSV files for demo
│   │   └── sample-csvs/       # Test data
│   └── test/java/com/group5/csv/
│       └── [Corresponding test packages]
├── docs/
│   ├── diagrams/              # PlantUML diagrams
│   ├── classes/               # Class documentation
│   ├── getting-started.md     # Setup guide
│   ├── pull-request-workflow.md
│   ├── JaCoCo-and-JUnit-setup-readme.md
│   └── how-csv-functions.md
├── pom.xml                    # Maven configuration
├── README.md                  # This file
└── LICENSE                    # Academic license

Features

CSV Reading

Handles quotes, escapes, multiline fields, BOM, blank lines
Configurable via CsvConfig and CsvFormat
Support for multiple CSV dialects (RFC 4180, Excel, TSV, custom)

CSV Writing

Auto-quotes when required
Quote-doubling and escaping
Guarantees round-trip fidelity (Reader → Writer → Reader)

Schema Validation

Field types: String, Int, Decimal, Boolean, Date, DateTime, Time
Custom validators: min/max, regex, required, etc.
Structured error reporting with row/column/value context

Streaming

Memory-safe row streaming with Spliterator<Row>
Ideal for multi-million-row files
Lazy evaluation for performance

Error Handling

Rich exceptions with row, column, raw value, and message
Fail-fast or accumulation modes
Detailed validation error reporting

Prerequisites

Java: Java 21 (LTS version - required for this project)
Build Tool: Maven 3.8.0 or higher
OS: Linux, macOS, or Windows (WSL2 recommended for Windows)
RAM: Minimum 2GB for builds, 4GB+ for large CSV processing

Verify Prerequisites

# Verify Java version (should show "21.x.x")
java -version

# Verify Maven (should show "3.8.0" or higher)
mvn --version

Installation

Clone and Build

# Clone the repository
git clone https://github.com/MichaelMcKibbin/ATU-SoftDev-Grp5Project.git
cd ATU-SoftDev-Grp5Project

# Build the project
mvn clean install

Usage

As a Library

Add to your pom.xml:

<dependency>
    <groupId>com.group5</groupId>
    <artifactId>csv-processor</artifactId>
    <version>1.0.0</version>
</dependency>

Reading CSV Files

import com.group5.csv.io.CsvReader;
import com.group5.csv.core.Row;

try (CsvReader reader = CsvReader.fromPath(Paths.get("data.csv"))) {
    for (Row row : reader) {
        String name = row.get("name");
        int age = row.getInt("age");
        System.out.println(name + ": " + age);
    }
}

Writing CSV Files

import com.group5.csv.io.CsvWriter;
import com.group5.csv.core.Row;

try (CsvWriter writer = CsvWriter.toPath(Paths.get("output.csv"))) {
    writer.writeHeaders("name", "age");
    writer.writeRow("Alice", 30);
    writer.writeRow("Bob", 25);
}

Schema Validation

import com.group5.csv.schema.Schema;
import com.group5.csv.schema.FieldSpec;

Schema schema = Schema.builder()
    .field("name", FieldSpec.STRING)
    .field("age", FieldSpec.INT)
    .field("salary", FieldSpec.DECIMAL)
    .build();

try (CsvReader reader = CsvReader.fromPath(Paths.get("data.csv"), schema)) {
    for (Row row : reader) {
        // All fields are validated and typed
        System.out.println(row.get("name") + ": $" + row.getDecimal("salary"));
    }
}

Architecture

The system follows a layered design:

┌─────────────────────────┐
│   CsvConfig / CsvFormat │  ← Dialect + Policy
└───────────┬─────────────┘
            │
┌───────────▼─────────────┐
│       CsvParser (FSM)    │  ← Tokenizes CSV input
└───────────┬─────────────┘
            │
┌───────────▼─────────────┐
│        CsvReader         │  ← Builds Row objects
└───────────┬─────────────┘
            │
┌───────────▼─────────────┐
│ Row, Headers, FieldType │  ← Data model & typed access
└───────────┬─────────────┘
            │
┌───────────▼─────────────┐
│      Schema/Validation   │  ← Optional validation layer
└───────────┬─────────────┘
            │
┌───────────▼─────────────┐
│      CsvWriter/Printer   │  ← Escape/quoting + serializing
└─────────────────────────┘

Core Components

CsvParser: Finite State Machine (FSM) that tokenizes CSV input character-by-character
CsvReader: Builds Row objects from parsed tokens
CsvWriter/CsvPrinter: Handles escaping, quoting, and serialization
Schema: Defines field types and validation rules
Row/Headers: Data model for accessing typed fields

How It Works

Reading Pipeline

Input Detection: Detects BOM and charset automatically
FSM Parsing: Character-by-character parsing using Finite State Machine
- Handles quoted fields, escaped quotes, embedded delimiters
- Correctly processes multiline fields
Row Construction: Parsed cells are assembled into Row objects with Headers
Optional Validation: Schema validates and converts field types
Streaming: Rows consumed lazily via Iterator or Stream

Writing Pipeline

Row Input: Rows passed to CsvWriter
Formatting: CsvPrinter applies quoting/escaping rules
Serialization: Consistent delimiter and quote rules applied
Output: Valid CSV written to file or stream

Why Finite State Machine?

CSV parsing is stateful because of quoting rules:

A comma inside quotes is NOT a field separator
Escaped quotes ("") must be treated as a single character
Newlines inside quotes must NOT end the record

The FSM tracks whether the parser is inside or outside a quoted field, ensuring correct parsing of complex CSV data.

Design Principles

Immutability & Thread Safety

Row objects are immutable once created, making them safe to share across threads
RowBuilder provides controlled construction while keeping Row simple and robust

Separation of Concerns

I/O Layer (csv.io): Handles streams, encoding, CSV dialects
Core Layer (csv.core): Models rows, fields, headers, types
Schema Layer (csv.schema): Encapsulates validation rules and constraints

Type Safety

FieldType enum: Centralizes parsing/formatting logic for each type
Field model: Rich cell representation with validation state and error reporting
Specs (DecimalSpec, DateSpec, etc.): Configure type-specific behavior

This design enables:

Easy testing of isolated components
Incremental feature development
Clear error reporting with row/column/value context

Development Approach

Test-Driven Development (TDD)

The project was developed using Test-Driven Development methodology:

Red-Green-Refactor Cycle
- Write failing test that defines desired functionality
- Write minimal code to make test pass
- Refactor while keeping tests passing
Test-First Methodology
- Tests written before or alongside implementation
- Forces clear thinking about interfaces and edge cases
- Provides immediate feedback on design decisions
Coverage Enforcement Strategy
- Gradual threshold increases: 30% → 40% → 50% → 60% → 70% → 80%
- Prevents last-minute testing rushes
- Maintains consistent testing discipline throughout development

Team Collaboration

Weekly Synchronous Meetings (Tuesdays 18:30 via Microsoft Teams)

Progress review and design clarification
Problem-solving and blocker resolution
All meetings recorded for asynchronous access

Daily Asynchronous Communication (WhatsApp)

Quick questions and updates
Team decision-making via polls
Coordination on interdependent tasks

GitHub-Based Workflow

Feature branches with defined naming conventions
Pull requests reviewed by peers before merge
Commit messages reference GitHub Issues for traceability
Mandatory code review ensures quality and knowledge sharing

Testing Challenges Solved

Challenge 1: Finite State Machine (FSM) Parser

Solution: RFC 4180 compliance testing with comprehensive edge cases
Result: 98% instruction coverage, 91% branch coverage

Challenge 2: Large File Streaming

Solution: Iterator and streaming API tests for memory efficiency
Result: Efficient processing without memory overflow

Challenge 3: Round-Trip Consistency

Solution: Integration tests comparing original and output files
Result: Data integrity preserved across read-write cycles

Quality Metrics

471 test methods across 25 test files
92% code coverage (exceeded 80% requirement)
100% test success rate (452 tests passed, 0 failures)
5.4 second test execution (rapid feedback cycle)
47 merged pull requests with peer review
100% team meeting attendance (5 meetings, all recorded)

Building and Testing

Build the Project

mvn clean install

Run Unit Tests

mvn test

Run All Tests (including integration tests)

mvn verify

Generate Code Coverage Report

mvn clean test jacoco:report

View the report at: target/site/jacoco/index.html

Test Statistics

471 test methods across 25 test files
92% code coverage enforced via JaCoCo
Comprehensive coverage of all core packages:
- com.group5.csv.core (97% coverage)
- com.group5.csv.io (98% coverage)
- com.group5.csv.schema (79% coverage)
- com.group5.csv.exceptions (93% coverage)
- com.group5.csv.demo (100% coverage)

Demo Application

The project includes an interactive CLI demo that showcases library features.

Run the Demo

mvn -q exec:java -Dexec.mainClass="com.group5.csv.demo.Main"

Demo Features

Load a CSV file
Display rows in formatted table
Validate using schema
Perform round-trip (read → write → read)
Test different CSV dialects (comma, semicolon, tab)

Sample Data

Demo CSV files are located in src/main/resources/demo/:

demo_input.csv (comma-delimited)
demo_input_semicolon.csv (semicolon-delimited)
demo_input_tab.tsv (tab-delimited)

Development Workflow

Creating a Feature Branch

git checkout -b feature/your-feature-name

Committing Changes

git add .
git commit -m "feat: add your feature description"

Running Tests Before Commit

mvn verify

Pushing and Creating a Pull Request

git push origin feature/your-feature-name

Then open a Pull Request on GitHub. See Pull Request Workflow for details.

Code Quality Standards

All tests must pass: mvn verify
Code coverage must remain ≥80%
Follow Java naming conventions
Write meaningful commit messages

Documentation

Document	Description
Getting Started Guide	Step-by-step setup instructions
Pull Request Workflow	How to open and merge PRs
JaCoCo & JUnit Setup	Testing framework configuration
How CSV Functions	Detailed CSV processing pipeline
Design Rationale	Architecture and design decisions
Class Documentation	Individual class documentation
Diagrams	PlantUML architecture diagrams

Troubleshooting

Issue: Java version mismatch

Error: Unsupported class version 65.0 or compilation errors

Solution: Ensure you have Java 21 installed.

java -version

Expected output: java version "21.x.x"

Issue: Maven build fails with dependency error

Error: Could not find artifact...

Solution: Update your local repository and clear Maven cache.

mvn clean dependency:resolve

Issue: Tests fail with "Cannot find test resources"

Error: FileNotFoundException: demo_input.csv

Solution: Ensure you're running tests from the project root directory.

cd ATU-SoftDev-Grp5Project
mvn test

Issue: JaCoCo report not generated

Solution: Run the full verify lifecycle:

mvn clean verify

Report will be at: target/site/jacoco/index.html

Contributing

We welcome contributions! Please see Pull Request Workflow for detailed guidelines.

Quick summary:

Create a feature branch: git checkout -b feature/YourFeature
Make your changes and write tests
Run mvn verify to ensure all tests pass
Commit with meaningful messages
Push and open a Pull Request
Address review feedback
Merge using "Squash and Merge"

Team Members

Michael McKibbin
Bogdan Bondarenko
Vivien White
Edson Soares Ferreira
Arron Hoare
Abudulqurdiri Adelakun

License

This project is for academic assessment as part of:

Course: SWDE_IT803 - Software Development (2025/26)
Institution: Atlantic Technological University (ATU)
Degree: Computing in Contemporary Software Development - Bachelor of Science (Honours)
Instructor: Lusungu Mwasina

You may reuse parts of it as reference material in future coursework or portfolios, provided you maintain attribution to the original authors and Group 5.

Last Updated: December 3, 2025
Project Status: Active Development
Maintained By: Group 5 - ATU Software Development

Name		Name	Last commit message	Last commit date
Latest commit History 242 Commits
.github		.github
docs		docs
src		src
.gitignore		.gitignore
Import_Sample.csv		Import_Sample.csv
README.md		README.md
Steps to complete product.md		Steps to complete product.md
demo_output.csv		demo_output.csv
output.csv		output.csv
pom.xml		pom.xml

MichaelMcKibbin/ATU-SoftDev-Grp5Project

Folders and files

Latest commit

History

Repository files navigation

CSV Data Processor – Group 5

Table of Contents

Quick Start

Clone and Build

Run Tests

Run Demo Application

Project Structure

Features

CSV Reading

CSV Writing

Schema Validation

Streaming

Error Handling

Prerequisites

Verify Prerequisites

Installation

Clone and Build

Usage

As a Library

Reading CSV Files

Writing CSV Files

Schema Validation

Architecture

Core Components

How It Works

Reading Pipeline

Writing Pipeline

Why Finite State Machine?

Design Principles

Immutability & Thread Safety

Separation of Concerns

Type Safety

Development Approach

Test-Driven Development (TDD)

Team Collaboration

Testing Challenges Solved

Quality Metrics

Building and Testing

Build the Project

Run Unit Tests

Run All Tests (including integration tests)

Generate Code Coverage Report

Test Statistics

Demo Application

Run the Demo

Demo Features

Sample Data

Development Workflow

Creating a Feature Branch

Committing Changes

Running Tests Before Commit

Pushing and Creating a Pull Request

Code Quality Standards

Documentation

Troubleshooting

Issue: Java version mismatch

Issue: Maven build fails with dependency error

Issue: Tests fail with "Cannot find test resources"

Issue: JaCoCo report not generated

Contributing

Team Members

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages