Skip to content

MicheleDelliVeneri/ALMASim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,322 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ALMASim

PyPI version Python 3.12 Documentation CI codecov License: GPL v3

ALMASim is a library-first Python environment for simulating ALMA observations, exploring ALMA metadata, downloading science products, and building ML-ready radio/mm-wave datasets.

It provides reusable services in src/almasim that can be driven by CLI scripts, Jupyter notebooks, a FastAPI backend, or direct Python code — all through the same staged API.


Table of Contents


Key Capabilities

Simulation

  • Build clean sky cubes from point, Gaussian, extended, molecular-cloud, diffuse, Galaxy Zoo, and Hubble-100 source models
  • Simulate single-pointing ALMA interferometric observations with multi-configuration support (12m, 7m, TP)
  • PWV-aware per-channel noise model
  • Additive astrophysical background sky — faint dusty galaxies, diffuse emission, or combined
  • Optional serendipitous source injection
  • Iterative CLEAN-style deconvolution with resumable state
  • TP+INT feather-style image combination

Data Products

  • Dirty cube, dirty visibilities, beam cube, UV mask cube, U/V coordinate cubes
  • Interferometric, total-power, and combined TP+INT image cubes
  • ML-ready HDF5 shards (clean cube + dirty cube + dirty visibilities + UV mask + metadata)
  • Native MeasurementSet (.ms) export via CASA tools or python-casacore

Metadata and Archive

  • Query ALMA observations via TAP with rich inclusion/exclusion filters
  • Normalise TAP columns into stable application fields
  • Resolve DataLink products, download ALMA data products with parallel support
  • Unpack raw ASDMs into MeasurementSets
  • Apply delivered calibration to produce calibrated science MSs

Compute

  • Synchronous, local multiprocess, Dask, Slurm, and Kubernetes backends
  • Backend-agnostic simulation service layer

Architecture

src/almasim/          ← installable library  (pip install almasim)
  services/
    simulation.py     ← staged pipeline entry points
    interferometry/   ← UV sampling, baselines, noise, TP
    imaging/          ← deconvolution, TP+INT combination
    metadata/         ← TAP queries, normalisation
    products/         ← MS export, HDF5 shards, cube export
    compute/          ← backend abstraction
    archive/          ← ASDM unpack, calibration apply
    astro/            ← spectral lines, redshift, parameters
  skymodels/          ← source model implementations

backend/              ← FastAPI service  (Docker: ghcr.io/…/almasim-backend)
frontend/             ← Svelte UI  (requires Docker Compose)
examples/             ← CLI scripts and Jupyter notebooks

The library layer owns all domain logic. The backend is a thin adapter over library services. CLI scripts and notebooks call the same staged services directly.


Installation

Library only (cross-platform)

pip install almasim

CLI installation

The package installs the almasim command via the Python entry point:

pip install almasim
almasim --help

If you are working from a local clone, install the project into a virtual environment and use the CLI directly:

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group dev
uv run almasim --help

For an editable local install that exposes almasim on your shell path:

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install -e .
almasim --help

Once installed, the CLI exposes the main workflows:

almasim metadata --help
almasim products --help
almasim simulation --help
almasim clean --help

With CASA tools (Linux x86-64 only)

casatools and casatasks wheels are Linux-only. Install the optional [casa] extra on a supported Linux system:

pip install "almasim[casa]"

The [casa] extra enables:

  • Native MeasurementSet export via casatools
  • ASDM-to-MS conversion via casatasks.importasdm
  • Calibration application via casatasks.applycal

Without [casa], all simulation, imaging, metadata, and download features still work. The MS export path falls back to python-casacore if available:

pip install "almasim[ms-casacore]"

From source (development)

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group dev

Backend service (Docker Compose)

The FastAPI backend and Svelte frontend require Docker Compose:

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
docker compose up

The backend image is available pre-built from GHCR:

docker pull ghcr.io/michelledelliveneri/almasim-backend:latest

Quick Start

Query ALMA metadata

from almasim.services.metadata.tap.service import query_by_science_type, InclusionFilters

df = query_by_science_type(
    include=InclusionFilters(science_keyword=["Galaxies"], band=[6])
)
print(df[["ALMA_source_name", "Band", "spatial_resolution"]].head())

Run a simulation from a metadata row

from almasim import SimulationParams, run_simulation
from pathlib import Path

params = SimulationParams.from_metadata_row(
    row,                          # pandas Series from a metadata query
    idx=0,
    main_dir=Path("src/almasim"),
    output_dir=Path("output"),
    project_name="my_project",
)

result = run_simulation(params)

Use the staged API

from almasim import (
    SimulationParams,
    generate_clean_cube,
    simulate_observation,
    image_products,
    export_results,
)

params = SimulationParams.from_metadata_row(row, idx=0, ...)

cube_result  = generate_clean_cube(params)
obs_result   = simulate_observation(params, cube_result)
img_result   = image_products(params, obs_result)
export_results(params, cube_result, obs_result, img_result)

Staged Simulation API

The pipeline is split into four composable stages:

Stage Function What it does
1 generate_clean_cube() Build sky cube from skymodel, apply background
2 simulate_observation() Run interferometric + TP simulation, return dirty products
3 image_products() Deconvolve, combine INT+TP, build image cubes
4 export_results() Write cubes, ML shards, parameter summaries to disk

run_simulation() orchestrates all four in sequence.

write_ml_dataset_shard() exports an HDF5 shard (clean cube + dirty cube + dirty visibilities + UV mask + metadata) independently of the main export path.

estimate_simulation_footprint() returns resolved pixel count, channel count, cell size, beam size, and raw output size in GiB — useful for pre-run capacity checks.

Full reference: Simulation docs


Skymodels

Source type Description
point Point source — PSF and CLEAN validation
gaussian 2-D Gaussian — compact extended source
extended TNG-backed realistic extended emission
galaxy-zoo Galaxy Zoo image morphology prior
hubble-100 Hubble Top-100 image morphology prior
molecular Molecular cloud structured emission
diffuse Correlated diffuse emission field

All skymodels accept explicit source_offset_x_arcsec / source_offset_y_arcsec to shift the science target from phase center.

Additive background sky (independent of the main source):

Mode Effect
blank_field_dsfg Faint dusty star-forming galaxies
dusty_diffuse Correlated low-spatial-frequency dusty background
combined Both of the above

Full reference: Skymodels docs


Compute Backends

Select via SimulationParams.compute_backend:

Backend Use case
sync Notebooks, examples, debugging
local Local CPU parallelism
dask Distributed execution, cluster scheduling
slurm HPC job submission
kubernetes Cluster-native environments

Full reference: Compute docs


Metadata and Downloads

Query metadata via TAP

from almasim.services.metadata.tap.service import (
    query_by_science_type,
    InclusionFilters,
    ExclusionFilters,
)

df = query_by_science_type(
    include=InclusionFilters(
        science_keyword=["Galaxies"],
        band=[6, 7],
        public_only=True,
        science_only=True,
    ),
    exclude=ExclusionFilters(solar=True),
)

Download products

from almasim.services.download import resolve_products, run_download_job

products = resolve_products(df["member_ous_uid"].tolist())
run_download_job(products, destination=Path("downloads"), extract_tar=True)

Full reference: Metadata docs · Downloads docs


Backend Service

The FastAPI backend exposes library services over HTTP and drives the Svelte frontend.

Endpoint group Purpose
/api/v1/metadata TAP queries and metadata management
/api/v1/simulation Simulation job submission and status
/api/v1/download Product resolution and download jobs
/api/v1/imaging Deconvolution and combination products
/api/v1/visualizer Output browsing and product inspection
/health Health check
/docs Interactive OpenAPI docs (Swagger UI)

Start locally for development:

cd backend
uv run uvicorn app.main:app --reload --port 8000

Full reference: Frontend docs


Examples

All examples use the sync compute backend and require no running scheduler.

Script Description
examples/query_metadata_cli.py Query TAP, export metadata and product CSVs
examples/download_products_cli.py Resolve and download ALMA products
examples/archive_ms_cli.py Unpack ASDMs and apply calibration
examples/staged_pipeline_cli.py Full pipeline: query → simulate → ML shard
examples/imaging_cli.py Synthetic imaging + iterative deconvolution
# Installable CLI (Typer)
almasim metadata query \
  --science-keyword Galaxies --band 6 \
  --save-csv examples/output/metadata.csv

# Extract member_ous_uid values and resolve DataLink products
almasim products resolve \
  --metadata-csv examples/output/metadata.csv \
  --save-member-ous-uid-list examples/output/member_ous_uids.txt \
  --save-products-csv examples/output/resolved_products.csv

# Download selected product types and extract archives
almasim products download \
  --products-csv examples/output/resolved_products.csv \
  --product-filter all \
  --destination examples/output/downloads \
  --extract-tar

# Split archive stages so download/extract/unpack/calibrate can run independently
almasim products download \
  --products-csv examples/output/resolved_products.csv \
  --destination examples/output/downloads

almasim products extract \
  --source-root examples/output/downloads

almasim products unpack \
  --input-root examples/output/downloads \
  --output-root examples/output/archive_ms/raw_ms \
  --postprocess-backend slurm \
  --slurm-workers 8

almasim products calibrate \
  --input-root examples/output/downloads \
  --raw-ms-root examples/output/archive_ms/raw_ms \
  --output-root examples/output/archive_ms/calibrated_ms \
  --postprocess-backend slurm \
  --slurm-workers 8

# Download + unpack ASDM + calibrate using Slurm-backed parallel post-processing
almasim products download \
  --products-csv examples/output/resolved_products.csv \
  --destination examples/output/downloads \
  --extract-tar \
  --unpack-ms \
  --generate-calibrated-visibilities \
  --postprocess-backend slurm \
  --slurm-queue normal \
  --slurm-workers 8

# Run WSClean through ALMASim (all WSClean flags are forwarded unchanged)
almasim clean -- \
  -name examples/output/imaging/demo \
  -size 1024 1024 \
  -scale 0.1asec \
  -niter 20000 \
  examples/output/archive_ms/calibrated_ms/uid___A001_X*.ms

# Run staged simulation from metadata query
almasim simulation run \
  --science-keyword Galaxies \
  --band 6 \
  --row-idx 0 \
  --project-name demo \
  --ml-shard-path examples/output/demo.h5

# Query metadata for Band 6 galaxy observations
python examples/query_metadata_cli.py \
  --science-keyword Galaxies --band 6 \
  --save-csv examples/output/metadata.csv

# Run a staged simulation from the first metadata row
python examples/staged_pipeline_cli.py \
  --metadata-csv examples/output/metadata.csv \
  --row-idx 0 --project-name demo \
  --ml-shard-path examples/output/demo.h5

# Iterative deconvolution demo
python examples/imaging_cli.py \
  --output-dir examples/output/imaging --cycles 180 --gain 0.12

Notebook equivalents: staged_pipeline_notebook.ipynb · query_metadata_notebook.ipynb · download_products_notebook.ipynb

End-to-end archive pipeline (Marimo)

examples/e2e_archive_pipeline.py is a reactive Marimo notebook that covers the full archive workflow interactively: query ALMA metadata → resolve DataLink products → download → unpack ASDMs → apply calibration.

# Install dev dependencies (includes marimo)
uv sync --group dev

# Interactive editing mode — cells re-run automatically as you edit
marimo edit examples/e2e_archive_pipeline.py

# Read-only app mode — run the pipeline step-by-step via the UI
marimo run examples/e2e_archive_pipeline.py

Steps 4 (unpack) and 5 (calibrate) require CASA tools (Linux x86-64 only):

pip install "almasim[casa]"

The notebook saves query filter presets as .query.json files so they can be reloaded across sessions.


Documentation

Full documentation: micheledelliveneri.github.io/ALMASim

Section Topics
Quick Start Installation, first simulation
Simulation Staged API, SimulationParams, outputs
Interferometry UV sampling, baselines, multi-config
Noise PWV-aware noise model
Background Sky Additive astrophysical background
Skymodels Source models reference
Imaging Deconvolution, TP+INT combination
Metadata TAP queries, filters
Downloads Product download workflow
Compute Backends Sync, Dask, Slurm, Kubernetes
Frontend Svelte UI workflows

Build docs locally:

uv sync --group dev
uv run sphinx-build -b html docs/source docs/build/html

Contributing

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
uv sync --group dev
uv run pytest --ignore=illustris_python
uv run ruff check .
uv run ruff format .

A release is published automatically when a version tag is pushed:

# 1. Bump version in pyproject.toml and src/almasim/__version__.py
# 2. Commit and tag
git tag v2.1.11
git push origin v2.1.11

The release pipeline then:

  1. Validates that the tag matches pyproject.toml
  2. Runs the full lint + test suite
  3. Publishes wheel and sdist to PyPI via OIDC trusted publisher
  4. Creates a GitHub Release with auto-generated changelog and attached artifacts
  5. Builds and pushes the backend Docker image to GHCR

One-time PyPI setup: register a trusted publisher on PyPI with owner MicheleDelliVeneri, repo ALMASim, workflow release.yml, environment pypi.


License

ALMASim is released under the GNU General Public License v3.

About

A python package to make realistic simulations of ALMA observations of galaxies and point sources.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors