Skip to content

HandleyLab/handleylab-dir-structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸ§ͺ Lab Directory Structure

  • Lab_Root/
    • Projects/ πŸ“Š
      • Project_Name_1/
        • data/
          • raw/
          • processed/
          • metadata/
        • code/
          • scripts/
          • notebooks/
          • pipelines/
        • results/
          • figures/
          • tables/
          • models/
        • manuscript/
        • README.md
      • Project_Name_2/
        • ...
    • People/ πŸ‘₯
      • LastName_FirstName/
      • Smith_Jane/
      • Smith_John/
      • ...
    • Resources/
      • references/
      • databases/
      • standard_datasets/
    • Admin/ πŸ“‹
      • grants/
      • protocols/
      • meetings/
    • Archive/ πŸ—ƒοΈ
      • Archived_Projects/
        • Project_Name_1_YYYY/
        • Project_Name_2_YYYY/
      • Alumni/
        • Wang_Dave_YYYY-YYYY/
        • Virgin_Skip_YYYY-YYYY/
        • ...

Directory Explanations

Projects πŸ”¬

Contains all active research projects, each with standardized subdirectories for data, code, results, and documentation.

People πŸ‘₯

Individual directories for all current lab members.

Resources

Shared resources used across multiple projects including references, databases, and standard datasets.

Admin πŸ“‹

Administrative materials including grants, protocols, and meeting notes.

Archive πŸ—ƒοΈ

  • Archived_Projects: Completed projects with year of completion
  • Alumni: Past lab members with their tenure dates

File Management Best Practices πŸ“‹

Naming Conventions

  • Be consistent: Use snake_case or kebab-case consistently. CamelCase should be avoided
  • Avoid spaces: Replace spaces with underscores or hyphens
  • Include dates: Use ISO format (YYYY-MM-DD) when including dates
  • Version numbers: Include version numbers (v1, v2.3) when applicable
  • Descriptive names: Files should be self-descriptive without being excessively long

Data Management πŸ’Ύ

  • Raw data is sacred: Never modify raw data; always create processed copies
  • Data provenance: Document the source of all datasets and preprocessing steps, preferably using an RMarkdown or Quarto document
  • Large files: Use Git LFS (Large File Storage) for files >100MB. HTCF LFS is for large local data, and Harvard Dataverse is for external hosting.
  • Data validation: Implement checksums to verify data integrity

GitHub Integration πŸ™

  • Repository per project: Create a separate repository for each major project
  • Branching strategy:
    • main - stable, working code
    • develop - integration branch
    • Feature branches for new analyses
  • README files: Include setup instructions, dependencies, and basic usage
  • GitHub Actions: Set up automated testing and verification workflows
  • Releases: Tag significant versions with semantic versioning (v1.0.0)
  • Issues: Use for tracking tasks, bugs, and future work

Advanced GitHub Features

  • GitHub Pages: For project documentation and results sharing
  • Project boards: For task management and project coordination
  • GitHub Packages: For storing and sharing lab-developed packages
  • Continuous Integration: Automatically test code on push

Code Organization πŸ’»

  • Modular code: Break code into logical, reusable components
  • Script headers: Include purpose, author, date, and usage examples in comments
  • Requirements: Include environment specifications (requirements.txt, environment.yml)
  • Documentation: Document functions, parameters, and expected outputs

Reproducibility ♻️

  • Environment management: Use conda or Docker to ensure reproducibility when applicable
  • Seed values: Set and document random seeds
  • Dependency management: Specify exact versions in requirements
  • Notebooks: RMarkdwon/Quarto or Jupyter

Lab-Specific Considerations (To Do)

  • Onboarding documents: Create guides for new lab members
  • Compute resources: Document procedures for utilizing cluster/cloud resources
  • Collaboration guidelines: Establish protocols for code sharing and authorship

About

Directory structure and best practices for lab project files

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published