ALMASim is a library-first Python environment for simulating ALMA observations, exploring ALMA metadata, downloading science products, and building ML-ready radio/mm-wave datasets.
It provides reusable services in src/almasim that can be driven by CLI scripts, Jupyter notebooks, a FastAPI backend, or direct Python code — all through the same staged API.
- Key Capabilities
- Architecture
- Installation
- Quick Start
- Staged Simulation API
- Skymodels
- Compute Backends
- Metadata and Downloads
- Backend Service
- Examples
- Documentation
- Contributing
- License
Simulation
- Build clean sky cubes from point, Gaussian, extended, molecular-cloud, diffuse, Galaxy Zoo, and Hubble-100 source models
- Simulate single-pointing ALMA interferometric observations with multi-configuration support (12m, 7m, TP)
- PWV-aware per-channel noise model
- Additive astrophysical background sky — faint dusty galaxies, diffuse emission, or combined
- Optional serendipitous source injection
- Iterative CLEAN-style deconvolution with resumable state
- TP+INT feather-style image combination
Data Products
- Dirty cube, dirty visibilities, beam cube, UV mask cube, U/V coordinate cubes
- Interferometric, total-power, and combined TP+INT image cubes
- ML-ready HDF5 shards (clean cube + dirty cube + dirty visibilities + UV mask + metadata)
- Native MeasurementSet (
.ms) export via CASA tools or python-casacore
Metadata and Archive
- Query ALMA observations via TAP with rich inclusion/exclusion filters
- Normalise TAP columns into stable application fields
- Resolve DataLink products, download ALMA data products with parallel support
- Unpack raw ASDMs into MeasurementSets
- Apply delivered calibration to produce calibrated science MSs
Compute
- Synchronous, local multiprocess, Dask, Slurm, and Kubernetes backends
- Backend-agnostic simulation service layer
src/almasim/ ← installable library (pip install almasim)
services/
simulation.py ← staged pipeline entry points
interferometry/ ← UV sampling, baselines, noise, TP
imaging/ ← deconvolution, TP+INT combination
metadata/ ← TAP queries, normalisation
products/ ← MS export, HDF5 shards, cube export
compute/ ← backend abstraction
archive/ ← ASDM unpack, calibration apply
astro/ ← spectral lines, redshift, parameters
skymodels/ ← source model implementations
backend/ ← FastAPI service (Docker: ghcr.io/…/almasim-backend)
frontend/ ← Svelte UI (requires Docker Compose)
examples/ ← CLI scripts and Jupyter notebooks
The library layer owns all domain logic. The backend is a thin adapter over library services. CLI scripts and notebooks call the same staged services directly.
pip install almasimThe package installs the almasim command via the Python entry point:
pip install almasim
almasim --helpIf you are working from a local clone, install the project into a virtual environment and use the CLI directly:
git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group dev
uv run almasim --helpFor an editable local install that exposes almasim on your shell path:
git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install -e .
almasim --helpOnce installed, the CLI exposes the main workflows:
almasim metadata --help
almasim products --help
almasim simulation --help
almasim clean --helpcasatools and casatasks wheels are Linux-only. Install the optional [casa] extra on a supported Linux system:
pip install "almasim[casa]"The [casa] extra enables:
- Native MeasurementSet export via
casatools - ASDM-to-MS conversion via
casatasks.importasdm - Calibration application via
casatasks.applycal
Without [casa], all simulation, imaging, metadata, and download features still work. The MS export path falls back to python-casacore if available:
pip install "almasim[ms-casacore]"git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group devThe FastAPI backend and Svelte frontend require Docker Compose:
git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
docker compose upThe backend image is available pre-built from GHCR:
docker pull ghcr.io/michelledelliveneri/almasim-backend:latestfrom almasim.services.metadata.tap.service import query_by_science_type, InclusionFilters
df = query_by_science_type(
include=InclusionFilters(science_keyword=["Galaxies"], band=[6])
)
print(df[["ALMA_source_name", "Band", "spatial_resolution"]].head())from almasim import SimulationParams, run_simulation
from pathlib import Path
params = SimulationParams.from_metadata_row(
row, # pandas Series from a metadata query
idx=0,
main_dir=Path("src/almasim"),
output_dir=Path("output"),
project_name="my_project",
)
result = run_simulation(params)from almasim import (
SimulationParams,
generate_clean_cube,
simulate_observation,
image_products,
export_results,
)
params = SimulationParams.from_metadata_row(row, idx=0, ...)
cube_result = generate_clean_cube(params)
obs_result = simulate_observation(params, cube_result)
img_result = image_products(params, obs_result)
export_results(params, cube_result, obs_result, img_result)The pipeline is split into four composable stages:
| Stage | Function | What it does |
|---|---|---|
| 1 | generate_clean_cube() |
Build sky cube from skymodel, apply background |
| 2 | simulate_observation() |
Run interferometric + TP simulation, return dirty products |
| 3 | image_products() |
Deconvolve, combine INT+TP, build image cubes |
| 4 | export_results() |
Write cubes, ML shards, parameter summaries to disk |
run_simulation() orchestrates all four in sequence.
write_ml_dataset_shard() exports an HDF5 shard (clean cube + dirty cube + dirty visibilities + UV mask + metadata) independently of the main export path.
estimate_simulation_footprint() returns resolved pixel count, channel count, cell size, beam size, and raw output size in GiB — useful for pre-run capacity checks.
Full reference: Simulation docs
| Source type | Description |
|---|---|
point |
Point source — PSF and CLEAN validation |
gaussian |
2-D Gaussian — compact extended source |
extended |
TNG-backed realistic extended emission |
galaxy-zoo |
Galaxy Zoo image morphology prior |
hubble-100 |
Hubble Top-100 image morphology prior |
molecular |
Molecular cloud structured emission |
diffuse |
Correlated diffuse emission field |
All skymodels accept explicit source_offset_x_arcsec / source_offset_y_arcsec to shift the science target from phase center.
Additive background sky (independent of the main source):
| Mode | Effect |
|---|---|
blank_field_dsfg |
Faint dusty star-forming galaxies |
dusty_diffuse |
Correlated low-spatial-frequency dusty background |
combined |
Both of the above |
Full reference: Skymodels docs
Select via SimulationParams.compute_backend:
| Backend | Use case |
|---|---|
sync |
Notebooks, examples, debugging |
local |
Local CPU parallelism |
dask |
Distributed execution, cluster scheduling |
slurm |
HPC job submission |
kubernetes |
Cluster-native environments |
Full reference: Compute docs
from almasim.services.metadata.tap.service import (
query_by_science_type,
InclusionFilters,
ExclusionFilters,
)
df = query_by_science_type(
include=InclusionFilters(
science_keyword=["Galaxies"],
band=[6, 7],
public_only=True,
science_only=True,
),
exclude=ExclusionFilters(solar=True),
)from almasim.services.download import resolve_products, run_download_job
products = resolve_products(df["member_ous_uid"].tolist())
run_download_job(products, destination=Path("downloads"), extract_tar=True)Full reference: Metadata docs · Downloads docs
The FastAPI backend exposes library services over HTTP and drives the Svelte frontend.
| Endpoint group | Purpose |
|---|---|
/api/v1/metadata |
TAP queries and metadata management |
/api/v1/simulation |
Simulation job submission and status |
/api/v1/download |
Product resolution and download jobs |
/api/v1/imaging |
Deconvolution and combination products |
/api/v1/visualizer |
Output browsing and product inspection |
/health |
Health check |
/docs |
Interactive OpenAPI docs (Swagger UI) |
Start locally for development:
cd backend
uv run uvicorn app.main:app --reload --port 8000Full reference: Frontend docs
All examples use the sync compute backend and require no running scheduler.
| Script | Description |
|---|---|
examples/query_metadata_cli.py |
Query TAP, export metadata and product CSVs |
examples/download_products_cli.py |
Resolve and download ALMA products |
examples/archive_ms_cli.py |
Unpack ASDMs and apply calibration |
examples/staged_pipeline_cli.py |
Full pipeline: query → simulate → ML shard |
examples/imaging_cli.py |
Synthetic imaging + iterative deconvolution |
# Installable CLI (Typer)
almasim metadata query \
--science-keyword Galaxies --band 6 \
--save-csv examples/output/metadata.csv
# Extract member_ous_uid values and resolve DataLink products
almasim products resolve \
--metadata-csv examples/output/metadata.csv \
--save-member-ous-uid-list examples/output/member_ous_uids.txt \
--save-products-csv examples/output/resolved_products.csv
# Download selected product types and extract archives
almasim products download \
--products-csv examples/output/resolved_products.csv \
--product-filter all \
--destination examples/output/downloads \
--extract-tar
# Split archive stages so download/extract/unpack/calibrate can run independently
almasim products download \
--products-csv examples/output/resolved_products.csv \
--destination examples/output/downloads
almasim products extract \
--source-root examples/output/downloads
almasim products unpack \
--input-root examples/output/downloads \
--output-root examples/output/archive_ms/raw_ms \
--postprocess-backend slurm \
--slurm-workers 8
almasim products calibrate \
--input-root examples/output/downloads \
--raw-ms-root examples/output/archive_ms/raw_ms \
--output-root examples/output/archive_ms/calibrated_ms \
--postprocess-backend slurm \
--slurm-workers 8
# Download + unpack ASDM + calibrate using Slurm-backed parallel post-processing
almasim products download \
--products-csv examples/output/resolved_products.csv \
--destination examples/output/downloads \
--extract-tar \
--unpack-ms \
--generate-calibrated-visibilities \
--postprocess-backend slurm \
--slurm-queue normal \
--slurm-workers 8
# Run WSClean through ALMASim (all WSClean flags are forwarded unchanged)
almasim clean -- \
-name examples/output/imaging/demo \
-size 1024 1024 \
-scale 0.1asec \
-niter 20000 \
examples/output/archive_ms/calibrated_ms/uid___A001_X*.ms
# Run staged simulation from metadata query
almasim simulation run \
--science-keyword Galaxies \
--band 6 \
--row-idx 0 \
--project-name demo \
--ml-shard-path examples/output/demo.h5
# Query metadata for Band 6 galaxy observations
python examples/query_metadata_cli.py \
--science-keyword Galaxies --band 6 \
--save-csv examples/output/metadata.csv
# Run a staged simulation from the first metadata row
python examples/staged_pipeline_cli.py \
--metadata-csv examples/output/metadata.csv \
--row-idx 0 --project-name demo \
--ml-shard-path examples/output/demo.h5
# Iterative deconvolution demo
python examples/imaging_cli.py \
--output-dir examples/output/imaging --cycles 180 --gain 0.12Notebook equivalents: staged_pipeline_notebook.ipynb · query_metadata_notebook.ipynb · download_products_notebook.ipynb
examples/e2e_archive_pipeline.py is a reactive Marimo notebook that covers the full archive workflow interactively: query ALMA metadata → resolve DataLink products → download → unpack ASDMs → apply calibration.
# Install dev dependencies (includes marimo)
uv sync --group dev
# Interactive editing mode — cells re-run automatically as you edit
marimo edit examples/e2e_archive_pipeline.py
# Read-only app mode — run the pipeline step-by-step via the UI
marimo run examples/e2e_archive_pipeline.pySteps 4 (unpack) and 5 (calibrate) require CASA tools (Linux x86-64 only):
pip install "almasim[casa]"The notebook saves query filter presets as .query.json files so they can be reloaded across sessions.
Full documentation: micheledelliveneri.github.io/ALMASim
| Section | Topics |
|---|---|
| Quick Start | Installation, first simulation |
| Simulation | Staged API, SimulationParams, outputs |
| Interferometry | UV sampling, baselines, multi-config |
| Noise | PWV-aware noise model |
| Background Sky | Additive astrophysical background |
| Skymodels | Source models reference |
| Imaging | Deconvolution, TP+INT combination |
| Metadata | TAP queries, filters |
| Downloads | Product download workflow |
| Compute Backends | Sync, Dask, Slurm, Kubernetes |
| Frontend | Svelte UI workflows |
Build docs locally:
uv sync --group dev
uv run sphinx-build -b html docs/source docs/build/htmlgit clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
uv sync --group dev
uv run pytest --ignore=illustris_python
uv run ruff check .
uv run ruff format .A release is published automatically when a version tag is pushed:
# 1. Bump version in pyproject.toml and src/almasim/__version__.py
# 2. Commit and tag
git tag v2.1.11
git push origin v2.1.11The release pipeline then:
- Validates that the tag matches
pyproject.toml - Runs the full lint + test suite
- Publishes wheel and sdist to PyPI via OIDC trusted publisher
- Creates a GitHub Release with auto-generated changelog and attached artifacts
- Builds and pushes the backend Docker image to GHCR
One-time PyPI setup: register a trusted publisher on PyPI with owner
MicheleDelliVeneri, repoALMASim, workflowrelease.yml, environmentpypi.
ALMASim is released under the GNU General Public License v3.