Skip to content

dylanwalker/dfpeek

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dfpeek

A fast command-line tool for peeking at tabular data files (CSV, TSV, Parquet, Feather, Excel) with concise, chainable options inspired by Unix tools like ls.

Installation

Install from PyPI (recommended):

pip install dfpeek

Or, for Excel/Parquet support:

pip install dfpeek[excel,parquet]

Usage

Run from the command line:

dfpeek <datafile> [options]

Options

Option Description
-f FORMAT Force file format (csv, tsv, excel, parquet, feather)
-d DELIM Set delimiter for CSV/TSV files (e.g., , or \t)
-xs N Select Excel sheet N (1-based indexing)
-xr N Skip first N rows in Excel files
-H N Show first N rows
-T N Show last N rows
-R START END Show rows in range START to END (zero-based, END excl.)
-L EXPR Perform df.loc[expression] for flexible row/column selection
-I EXPR Perform df.iloc[expression] for position-based selection
-u COL Show unique values for column COL
-c COL Show info about column COL (type, nulls, etc.)
-v COL Show value counts for column COL
-s COL Show stats for numerical column COL
-l List column names
-i Show file info (rows, columns, memory usage)

All options can be chained in any order.

Examples

Basic Operations

Show first 10 rows:

dfpeek data.feather -H 10

Show last 5 rows:

dfpeek data.feather -T 5

Show rows 20 to 30:

dfpeek data.feather -R 20 30

Column Analysis

Show unique values for column city:

dfpeek data.feather -u city

Show info about column city:

dfpeek data.feather -c city

Show value counts for column status:

dfpeek data.feather -v status

Show stats for column age:

dfpeek data.feather -s age

File Information

List columns:

dfpeek data.feather -l

Show file info:

dfpeek data.feather -i

Advanced Indexing

Use loc for label-based selection:

# Rows only
dfpeek data.feather -L "0:5"                        # First 5 rows
dfpeek data.feather -L "df.age > 30"                # Rows where age > 30

# Columns only  
dfpeek data.feather -L ":, 'name'"                  # All rows, name column
dfpeek data.feather -L ":, ['name', 'age']"         # All rows, name and age columns

# Both rows and columns
dfpeek data.feather -L "0:5, 'name':'city'"         # First 5 rows, name to city columns
dfpeek data.feather -L "df.age > 25, ['name', 'status']"  # Age > 25, name and status columns

Use iloc for position-based selection:

# Rows only
dfpeek data.feather -I "0:5"                        # First 5 rows
dfpeek data.feather -I "[0,2,4]"                    # Rows at positions 0, 2, 4

# Columns only
dfpeek data.feather -I ":, 0"                       # All rows, first column
dfpeek data.feather -I ":, [0,2]"                   # All rows, columns 0 and 2

# Both rows and columns  
dfpeek data.feather -I "0:5, 0:3"                   # First 5 rows, first 3 columns
dfpeek data.feather -I "[0,2,4], [1,3]"            # Specific rows and columns

Format and Delimiter Options

Force CSV format for files without .csv extension:

dfpeek mydata.txt -f csv

Use custom delimiter:

dfpeek data.tsv -d "\t" -H 5

Default Behavior

Show info and first 5 rows (default if no options):

dfpeek data.feather

Use a specific Excel sheet (e.g., the 3rd sheet):

dfpeek data.xlsx -xs 3 -H 10

Skip the first 2 rows of an excel file:

dfpeek data.xlsx -xr 2 

Supported Formats

  • CSV (.csv)
  • TSV (.tsv)
  • Parquet (.parquet)
  • Feather (.feather)
  • Excel (.xlsx)

Notes

  • For very large files, output may be slow if printing many rows.
  • All rows/columns are shown in full (no abbreviation).
  • Requires Python 3.7+

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages