SpaceWalk - Backend

This CLI can be used to preprocess the data and get the proper data and format for the backend. This process downloads the data, preprocesses the data, saves the needed particle properties and generates the corresponding octree.

Usage

Installation and Setup

Download this repo Python >= 3.10

make venv
source venv/bin/activate # to activate the virtual environment

CLI Usage Download and Preprocess

export TNG_TOKEN="..."

tng-sv-cli web download --simulation-name TNG50-4 --snapshot-idx NR
tng-sv-cli web preprocess --simulation-name TNG50-4 --snapshot-idx NR

tng-sv-cli web batch-download --simulation-name TNG50-4 --snapshot-idx NR
tng-sv-cli web batch-preprocess --simulation-name TNG50-4 --snapshot-idx NR

Dev Python Backend

PYTHONPATH=. fastapi dev webScripts/api/backend.py --host 0.0.0.0 --port 9999

Frontend

Go into frontend repository. Maybe adjust src/index.ts:139 const url to backend ip and port

npm install
npm run start

How to prepare the necessary data

In order to use the application the backend needs access to preprocessed data. This data is generated by the preprocessing pipeline, which can be started via the tng-sv-cli.

As it relates to the web application we can find this preprocessing command under the web subcommand. Additionally, one has to set the API token, required to download the data. Given the data is already present, one can also set the --data-path flag.

To see which flags exists use:

$ tng-sv-cli web preprocess --help  # For preprocessing one snapshot
$ tng-sv-cli web batch-preprocess --help  # For preprocessing multiple snapshots

Not so straight forward parameters are:

--filter-out-percentage, which allows to only pre-process a certain percentage of the data, sorted by max value
--data-path, which allows to use already present data on the machine, if e.g. the CLI is executed on a host from the IllustrisTNG project

The pipeline

Download snapshot n and n+1: To pre-process a snapshot, meaning an instant in time, the tool downloads the current snapshot and the next one. As we interpolate the trajectory from one particle at two time frames we require the positions at time n and n+1.
Filter out a specific percentage: Allows to filter out a certain percentage of data points based on a certain property ordered by size, e.g. Density
Check which particles are contained in both snapshots
Generate an Octree based on the particles which we can interpolate: The octree contains the offsets to the splines, which are stored in an extra array. This is done for pragmatic performance reasons. The octree serializes to json, while saving the extra array can be done using numpy arrays. Which makes reading and accessing faster compared to dumping everything as a json file.
- We use the Python API of the C++ Open3D implementation of an Octree
For the spline calculation we refactored the CubicHermiteSpline of SciPy, s.t. we can use numba's JIT feature in order to speed up the pipeline.
Calculate further properties:
- Attribute Quantiles: allows a better user experience when filtering out certain quantiles of an attribute in the
- Voronoi diameter: Is an approximation of the size of certain voronoi cells using the known density and mass of a cell
Save data to the disk

API Architecture

Important structures

CameraInformation:

contains coordinates and size of the client's camera

ClientState:

saves the state for already loaded node indices, level of detail, batch size and percentage of data per leaf node in order to load data dynamically batch-wise

DataCache:

class responsible for loading data efficiently:
- checks if a snap is already loaded, if not it fetches it from the server's filesystem
- keeps loaded data on server cache

`GET /v1/get/init/{simulation}/{snap_id}`

This endpoint initializes the visualization by providing metadata (number of quantiles and their data, all available snapshots and BoxSize) and fetching the initial simulation data.

How does it work?

scans directories to identify available snapshots (all_possible_snaps) by matching folder names with the pattern snapdir_
extracts box size (BoxSize) metadata from files matching the pattern groups_ using the illustris library
uses the DataCache class to check if the requested simulation and snapshot data (simulation, snap_id) is already cached
if cached, it retrieves the data directly. Otherwise, it:
- loads several data files (splines, velocities, densities, etc) and structures from the filesystem based on the simulation and snapshot
- prepares a ListOfLeafs object from leafs and leafs_scan arrays
- calculates density quantiles using the densities data
- caches the loaded data

What is the response? The response is a JSON that includes:

density_quantiles: A list of quantile values derived from the density data.
n_quantiles: The number of quantiles available.
available_snaps: A list of all possible snapshot numbers for the simulation.
BoxSize: The size of the simulation box.

`POST /v1/get/splines/{simulation}/{snap_id}`

This endpoint processes and retrieves spline data along with related information for a specific simulation and snapshot, filtered based on the client's camera view.

How does it work?

retrieves cached data for the specified simulation and snapshot (simulation, snap_id) using the DataCache class. This data includes:
- octree: For spatial hierarchy and node traversal
- splines: Cubic spline parameters
- velocities, densities, coordinates, voronoi_diameter_extended
- particle_list_of_leafs: Data structure that maps particles to octree leaf nodes
traverses octree:
- uses the client's camera position (from CameraInformation) to create a ViewBox, representing the region of interest in 3D space
- traverses the octree to find intersecting nodes containing relevant particles (node_indices).
filters and loads particles for each intersecting node:
- retrieves particle IDs from particle_list_of_leafs based on the percentage of data (client_state.percentage) and level of detail (LOD)
- adjusts the range of particles per node based on batch_size_lod
increases level of detail:
- updates the LOD for each node in the client state, ensuring the detail increases with each call
extract relevant data:
- Splines: Extracts spline parameters (splines_a, splines_b, splines_c, splines_d)
- Physical properties: Coordinates, velocities, densities, Voronoi diameters
- Calculates the minimum and maximum densities for the selected particles

What is the response? The response is a JSON that includes:

Data:
- Relevant particle IDs (relevant_ids).
- Coordinates, velocities, densities, splines, and Voronoi diameters for the selected particles
Metadata:
- Updated level_of_detail for the nodes
- Density range (min_density, max_density)
- Total number of particles (nParticles)
- Density quantiles, snapshot ID (snapnum)

Octree

Structure:

the octree starts with a root node that represents the entire bounding box (the space of interest).
each node is recursively subdivided into eight smaller cubical regions (children), dividing the space into octants.
subdivision continues up to a maximum depth or until each node contains fewer than a specified number of points (or other criteria are met).

Storage of Data:

particles are stored in the leaf nodes. If a node contains more particles than the allowed threshold (size_per_leaf), it is further subdivided.
leaf node stores data like particle indices and values of relevant fields

Traversal:

queries or operations (e.g. finding neighbors or retrieving data) involve traversing the octree from the root, descending into relevant nodes based on the spatial location of interest.

Dataflow

Trigger download from server to client

Download is triggered as soon as one of the following premises is met:

download of the current ViewBox is finished for current percentage and LOD
a leaf inside the ViewBox has a higher LOD than the current still with particles in it

Fetch data from server to client

the particles of the current LOD from every leaf are downloaded and then the LOD is increased
the latest LOD which was loaded is saved so that changing the ViewBox will continue downloading at the last LOD which was not loaded yet per leaf

Credits

Originally writted by Nicolas Bender, Marc Burg, and Jonannes Maul as part of a research project at Heidelberg University.

Supervised by Dylan Nelson and Filip Sadlo.

The write-up of the project is available as a PDF in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 144 Commits
scripts		scripts
tng_sv		tng_sv
webScripts/api		webScripts/api
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
download_commands.txt		download_commands.txt
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
report_final.pdf		report_final.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SpaceWalk - Backend

Usage

Installation and Setup

CLI Usage Download and Preprocess

Dev Python Backend

Frontend

How to prepare the necessary data

The pipeline

API Architecture

Important structures

`GET /v1/get/init/{simulation}/{snap_id}`

`POST /v1/get/splines/{simulation}/{snap_id}`

Octree

Dataflow

Trigger download from server to client

Fetch data from server to client

Credits

About

Uh oh!

Contributors 4

Uh oh!

Languages

nelson-group/SpaceWalk-backend

Folders and files

Latest commit

History

Repository files navigation

SpaceWalk - Backend

Usage

Installation and Setup

CLI Usage Download and Preprocess

Dev Python Backend

Frontend

How to prepare the necessary data

The pipeline

API Architecture

Important structures

GET /v1/get/init/{simulation}/{snap_id}

POST /v1/get/splines/{simulation}/{snap_id}

Octree

Dataflow

Trigger download from server to client

Fetch data from server to client

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages

`GET /v1/get/init/{simulation}/{snap_id}`

`POST /v1/get/splines/{simulation}/{snap_id}`