MsRaster

MsRaster is an application for 2-dimensional visualization and flagging of visibility and spectrum data.

Infrastructure

Like Interactive Clean, MsRaster utilizes the Bokeh backend for plotting. Bokeh has built-in plot tools allowing the user to zoom, pan, select regions, inspect data values, and save the plot. Additional libraries are used in MsRaster for data I/O, logging, plotting, and interactive dashboards:

../_images/bokeh_logo.svg ../_images/xradio_logo.webp ../_images/graphviper_logo.jpeg ../_images/hvplot.png ../_images/holoviews.png ../_images/panel.png
  • XRADIO (Xarray Radio Astronomy Data I/O) implements the MeasurementSet v4.0.0 schema using Xarray to provide an interface for radio astronomy data

  • toolviper is used for creating the Dask.distributed client and for logging

  • graphviper is used for Dask-based MapReduce to calculate statistics

  • Holoviz library hvPlot allows easy visualization of Xarray data objects as interactive Bokeh plots

  • Holoviz library Holoviews allows the ability to easily layout and overlay plots

  • Holoviz library Panel streamlines the development of apps and dashboards for the raster plots

Implementation

MsRaster creates raster plots for the user to view interactively or save to disk. The app can be used in three ways from Python:

  • create raster plots exported to .png files

  • create interactive Bokeh raster plots shown in a browser window

  • show a GUI dashboard in a browser window for selecting plot parameters to create interactive Bokeh raster plots

Data Exploration

XRADIO allows the user to explore MeasurementSet data with a summary of its metadata, and to make plots of antenna positions and phase center locations of all fields. These features can be accessed in MsRaster, as well as the ability to list MSv4 data groups and antenna names to aid in selection.

Raster Plots

MsRaster gives the user flexibility to set plot axes and the complex component, select data, aggregate along one or more data dimensions, iterate along a data dimension, style the plot, and layout multiple plots in a grid. All parameters available from the MsRaster function calls are also available in the interactive GUI.

Installation

Requirements

To save plots, additional packages are required:

  • Selenium

  • geckodriver for firefox, or chromedriver for chromium

Install

You may want to use the conda environment manager from miniforge to create a clean, self-contained runtime where casagui and the MsRaster dependencies can be installed:

conda create --name casagui python=3.12 --no-default-packages
conda activate casagui

Install required packages:

pip install casagui xradio graphviper hvplot

To install xradio with python-casacore for MSv2 conversion:

pip install "xradio[python-casacore]"

Note

On macOS it is required to pre-install python-casacore using conda install -c conda-forge python-casacore.

It is also possible to use casatools as the backend for reading the MSv2. See the XRADIO casatools setup guide for more information.

Install packages to save plots to file (choose one):

conda install -c conda-forge selenium firefox geckodriver
conda install -c conda-forge selenium python-chromedriver-binary
pip install selenium chromedriver-binary

Dask.distributed Scheduler

For parallel processing workflows, you can set up a local Dask cluster using ToolVIPER. Dask.distributed is a centrally managed, distributed, dynamic task scheduler.

Prior to using MsRaster, you may elect to start a Dask Client (local machine) or a Dask LocalCluster (cluster). For a local client, set the number of cores and memory limit per core to use. The logging level for the main thread and the worker threads may also be set (default “INFO”). When plotting small datasets, however, this adds overhead which may make plotting slower than without the client.

ToolVIPER has an interface to create and access a local or distributed client. See the client tutorial and help(local_client) or help(distributed_client) for more information.

Warning

If Python scripts are used to make plots, the client should not be created in the main thread. For more details, see Standalone Python scripts in the Dask Scheduling documentation.

Using MsRaster to Create Plots

In this simple example with no GUI, we import MsRaster, construct an MsRaster object, create a raster plot with default parameters, and show the interactive Bokeh plot in a browser tab:

from casagui.apps import MsRaster
msr = MsRaster(ms=myms)
msr.plot()
msr.show()

Construct MsRaster Object

>>> msr = MsRaster(ms=None, log_level='info', show_gui=False)
  • ms (str): path to MSv2 (usually .ms extension) or MSv4 (usually .zarr extension) file. Required when show_gui=False.

  • log_level (str): logging threshold. Options include ‘debug’, ‘info’, ‘warning’, ‘error’, ‘critical’.

  • show_gui (bool): whether to launch the interactive GUI in a browser tab.

MsRaster can be constructed with the ms path to a MSv2 or MSv4 file. If a MSv2 path is supplied, it will automatically be converted to the MSv4 zarr format in the same directory as the MSv2 with the extension .ps.zarr. For more information on the MSv4 data format, see the XRADIO Measurement Set Tutorial.

Warning

MSv2 files will be converted to zarr using the xradio default partitioning: data_description (spectral window and polarization setup), observation mode, and field. If the MSv2 to be converted has numerous fields, such as a mosaic, it is best to convert the MSv2 to zarr without field partitioning prior to using MsRaster.

The log_level can be set to the desired level, with log messages output to the Python console. There is currently no log file option.

At construction, we decide whether to create the plots using MsRaster functions (showgui=False) or using the interactive GUI (showgui=True). When the GUI is shown, using MsRaster functions does not update the GUI widget selections or the plot. The ms parameter is required when showgui=False, but optional when this path can be set in the GUI.

Explore MS Data

Several MsRaster functions are available for exploring the MeasurementSet. The .zarr file is opened as an XRADIO ProcessingSet, which is an Xarray DataTree composed of MSv4s (see Measurement Set Tutorial for more information). XRADIO has added custom functions for data exploration in its ProcessingSetXdt class, which are available in MsRaster.

Some functions require that the MSv4 data groups name is specified. Data groups are a set of related visibility/spectrum data, uvw, flags, and weights. Use data_groups() to get a list of the data groups available in the MS:

>>> msr.data_groups()
{'base', 'model', 'corrected'}

Use summary() to view ProcessingSetXdt metadata for a specified MSv4 data groups name, displayed in a tabular format. These column names and values can be used to do selection for a plot:

>>> msr.summary(data_group='base', columns=None)
                    name                                                                            intents              shape polarization                        scan_name                         spw_name     field_name   source_name line_name                       field_coords  start_frequency  end_frequency
0  sis14_twhya_selfcal_0 [CALIBRATE_BANDPASS#ON_SOURCE, CALIBRATE_PHASE#ON_SOURCE, CALIBRATE_WVR#ON_SOURCE]  (20, 210, 384, 2)     [XX, YY]                              [4] ALMA_RB_07#BB_2#SW-01#FULL_RES_0  [J0522-364_0] [J0522-364_0]        []  [fk5, 5h22m57.98s, -36d27m30.85s]     3.725331e+11   3.727669e+11
1  sis14_twhya_selfcal_1 [CALIBRATE_BANDPASS#ON_SOURCE, CALIBRATE_PHASE#ON_SOURCE, CALIBRATE_WVR#ON_SOURCE]  (20, 171, 384, 2)     [XX, YY]                             [33] ALMA_RB_07#BB_2#SW-01#FULL_RES_0      [3c279_6]     [Unknown]        []  [fk5, 12h56m11.17s, -5d47m21.52s]     3.725331e+11   3.727669e+11
2  sis14_twhya_selfcal_2    [CALIBRATE_AMPLI#ON_SOURCE, CALIBRATE_PHASE#ON_SOURCE, CALIBRATE_WVR#ON_SOURCE]  (20, 190, 384, 2)     [XX, YY]                              [7] ALMA_RB_07#BB_2#SW-01#FULL_RES_0      [Ceres_2] [J1037-295_2]        []   [fk5, 6h10m15.95s, 23d22m06.91s]     3.725331e+11   3.727669e+11
3  sis14_twhya_selfcal_3                               [CALIBRATE_PHASE#ON_SOURCE, CALIBRATE_WVR#ON_SOURCE]  (80, 210, 384, 2)     [XX, YY] [10, 14, 18, 22, 26, 30, 34, 38] ALMA_RB_07#BB_2#SW-01#FULL_RES_0  [J1037-295_3]    [TW Hya_3]        [] [fk5, 10h37m16.08s, -29d34m02.81s]     3.725331e+11   3.727669e+11
4  sis14_twhya_selfcal_4                                                         [OBSERVE_TARGET#ON_SOURCE] (270, 210, 384, 2)     [XX, YY]         [12, 16, 20, 24, 28, 36] ALMA_RB_07#BB_2#SW-01#FULL_RES_0     [TW Hya_5]     [3c279_4]        [] [fk5, 11h01m51.80s, -34d42m17.37s]     3.725331e+11   3.727669e+11

If a subset of columns is of interest, set the columns parameter to one or more column names:

>>> msr.summary(data_group='base', columns='intents')
>>> msr.summary(data_group='base', columns=['spw_name', 'shape'])
>>> msr.summary(data_group='base', columns=['spw_name', 'start_frequency', 'end_frequency'])

To view the summary information grouped by MSv4, set columns='by_ms' (only the first MS is shown here):

>>> msr.summary(data_group='base', columns='by_ms')
MSv4 name: sis14_twhya_selfcal_0
intent: ['CALIBRATE_BANDPASS#ON_SOURCE', 'CALIBRATE_PHASE#ON_SOURCE', 'CALIBRATE_WVR#ON_SOURCE']
shape: 20 times, 210 baselines, 384 channels, 2 polarizations
polarization: ['XX' 'YY']
scan_name: ['4']
spw_name: ALMA_RB_07#BB_2#SW-01#FULL_RES_0
field_name: ['J0522-364_0']
source_name: ['J0522-364_0']
line_name: []
field_coords: (fk5) 5h22m57.98s -36d27m30.85s
frequency range: 3.725331e+11 - 3.727669e+11

The antennas() function returns a list of antenna names, which can be used to select baseline, antenna1, or antenna2. This function has been combined with the XRADIO ProcessingSetXdt function to plot the antenna positions, optionally labeled by name:

>>> antenna_names = msr.antennas(plot_positions=False, label_antennas=False)
>>> print(antenna_names)
['DA42_A050', 'DA44_A068', 'DA45_A070', 'DA46_A067', 'DA48_A046', 'DA49_A029', 'DA50_A045',
'DV02_A077', 'DV05_A082', 'DV06_A037', 'DV08_A021', 'DV10_A071', 'DV13_A072', 'DV15_A074',
'DV16_A069', 'DV17_A138', 'DV18_A053', 'DV19_A008', 'DV20_A020', 'DV22_A011', 'DV23_A007']

When plotting antenna positions with label_antennas=False, the antenna names can be shown by hovering over the antenna position. Sample antenna plot with label_antennas=True:

../_images/antenna_positions.png

MsRaster includes the XRADIO ProcessingSetXdt function plot_phase_centers() to plot the phase center locations of all fields in a data group. You may optionally label the fields by name:

>>> msr.plot_phase_centers(data_group='base', label_fields=False)

When label_fields=False, the central field is highlighted in red based on the closest phase center calculation, and the legend shows the central field name. Otherwise, all fields are named and the central field is red. Sample plot with label_fields=True:

../_images/phase_centers.png

Style Raster Plots

By default, MsRaster uses the Viridis colormap for unflagged data and the Reds colormap for flagged data, and shows colorbars for both types of data. Use the set_style_params() function to change these settings. These style settings will then be used for all plots for the MsRaster object:

>>> msr.set_style_params(unflagged_cmap='Viridis', flagged_cmap='Reds', show_colorbar=True,
show_flagged_colorbar=True)

Use colormaps() to get the list of available colormaps, which are the same ones used in Interactive Clean:

>>> msr.colormaps()
['Blues', 'Cividis', 'Greens', 'Greys', 'Inferno', 'Magma', 'Oranges', 'Plasma', 'Purples',
'Reds', 'Turbo', 'Viridis']

It is expected that in the future, additional style settings will be available, such as fonts for plot labels. It is also possible that in the future these settings will be able to be saved in a config.py file.

Create Raster Plot

Use plot() to create each raster plot, which can then be shown (see Show Raster Plot) or saved (see Save Raster Plot):

>>> msr.plot(x_axis='baseline', y_axis='time', vis_axis='amp', selection=None,
aggregator=None, agg_axis=None, iter_axis=None, iter_range=None, subplots=None,
color_mode=None, color_range=None, title=None, clear_plots=True)
  • x_axis and y_axis (str): select the axes to plot from the data dimensions ‘time’, ‘baseline’ (‘antenna_name’ for spectrum data), ‘frequency’, and ‘polarization’. For spectrum data, ‘baseline’ is equivalent to ‘antenna_name’ since it is the default x_axis.

    • Default x_axis: baseline

    • Default y_axis: time

  • vis_axis (str): The complex component of the visibility data has options ‘amp’, ‘phase’, ‘real’, and ‘imag’. For spectrum data, only ‘amp’ and ‘real’ are valid, with ‘amp’ equivalent to ‘real’ since it is the default vis_axis.

    • Default vis_axis: amp

  • selection (None, dict): select data using a dictionary of keyword/value pairs. Selections may be made from the XRADIO ProcessingSetXdt or MSv4 by value:

    • ProcessingSetXdt: use the summary column names as keys and the column value(s) as valid options (see summary()). Since the ProcessingSetXdt summary is a Pandas DataFrame, selection can also be done as a Pandas query with keyword 'query'. The MSv4 data groups name for correlated data may also be selected (see data_groups()). Some selections will be made automatically if not user-selected (see note below). Example selections:

      • selection = {‘data_group’: ‘corrected’, ‘intents’: ‘OBSERVE_TARGET#ON_SOURCE’, ‘field_name’: ‘TW Hya_5’}

      • selection = {‘query’: ‘field_name=[“3c279_6”, “TW Hya_5”], scan_name=[“33”, “36”]’}

      • selection = {‘query’: ‘start_frequency > 100e9 AND end_frequency < 200e9’}

    • MSv4: select the data dimensions. Some selections will be made automatically if not user-selected (see note below):

      • 'time': ???

      • 'frequency': ???

      • 'baseline', 'antenna1' and antenna2': select antenna names or baselines as ‘ant1_name & ant2_name’ (see antennas()).

      • 'polarization': use summary(columns=’polarization’) for valid options (see summary()).

    • Default selection: None (use automatic selection).

Note

  • Automatic selection: Since MsRaster creates a 2D plot from 4D data, some selections must be done automatically if not selected by the user. These selections include:

    • spw_name: for consistent data shapes, the first spectral window (by spectral_window_id) is selected.

    • data_group: default ‘base’.

    • data dimensions: the dimensions not chosen for x_axis, y_axis, aggregation agg_axis or iteration iter_axis are automatically selected. After user selections are applied, the set of dimension values are sorted, and the first is used. Time and frequency are numeric values. For baseline, names are formed as ‘ant1_name & ant2_name’ strings and sorted, then the first pair in the list is selected. For polarization, the names are converted to casacore Stokes enum values, sorted, and the first in the list is selected and converted back to a name for value selection.

  • aggregator (None, str): the reduction function to apply along the agg_axis dimension(s) of all correlated data in the data group (VISIBILITY/SPECTRUM, WEIGHT, and FLAG). Aggregator options include ‘max’, ‘mean’, ‘min’, ‘std’, ‘sum’, and ‘var’. The reduction is applied after the vis_axis component is applied. Flags and weights are not used in the aggregation but are aggregated.

    • Default aggregator: None

  • agg_axis (None, str, list): which dimension(s) to apply the aggregator across. Cannot include x_axis or y_axis. Ignored if aggregator=None. The entire dimension(s) are aggregated. If agg_axis is a single dimension, the remaining dimension is selected automatically as described above. If agg_axis=None, all remaining dimensions are aggregated.

    • Default agg_axis: None

  • iter_axis (None, str): dimension along which to iterate to create multiple plots. Cannot be x_axis or y_axis. Select the iteration index range with iter_range.

    • Default iter_axis: None

  • iter_range (None, tuple): (start, end) inclusive index values for creating iteration plots. Use (0, -1) for all iterations. Warning: you may trigger a MemoryError with many iterations. Iterated plots can be shown or saved in a grid (see subplots) or saved individually (see Save Raster Plot).

    • Default iter_range: None (equivalent to (0, 0), first iteration only)

  • subplots (None, tuple): set a plot layout of (rows, columns). Use with iter_axis and iter_range, or clear_plots=False. If the layout size is less than the number of plots, subplots will limit the plots in the layout. Similarly, if the number of plots is less than the subplots size, the plots will appear in the layout according to the number of columns but fewer rows. The last subplots setting will be used in Show Raster Plot and Save Raster Plot.

    • Default subplots: None (single plot, equivalent to (1, 1))

  • color_mode (None, str): Whether to limit the colorbar range. Options include None (use data limits), ‘auto’ (calculate limits for amplitude), and ‘manual’ (use color_range). ‘auto’ is equivalent to None if vis_axis is not ‘amp’. ‘manual’ is equivalent to None if color_range=None. Automatic limits for amplitudes are calculated from statistics for the unflagged data in the spectral window, which use GraphVIPER MapReduce for fast computation. The range is clipped to 3-sigma limits to brighten weaker data values.

    • Default color_mode: None (use data limits)

  • color_range (None, tuple): (min, max) of colorbar to use if color_mode='manual', else ignored.

    • Default color_range: None (use data limits)

  • title (None, str): Plot title. Options include None, ‘ms’ to generate a title from the ms name appended with iter_axis value (if any), or a custom title string.

    • Default title: None (no plot title).

  • clear_plots (bool): whether to clear the list of previous plot(s). Set to False to create multiple plots and show them in a layout with subplots. This option is not currently available in the interactive GUI.

    • Default clear_plots: True (remove previous plots)

Examples:

  • Aggregation: time vs. baseline averaged over frequency, with the first polarization selected automatically:

    >>> msr.plot(x_axis='baseline', y_axis='time', aggregator='mean', agg_axis='frequency')
    
  • Aggregation: time vs. baseline averaged over frequency, with polarization selection:

    >>> msr.plot(x_axis='baseline', y_axis='time', selection={'polarization': 'YY'},
    aggregator='mean', agg_axis='frequency')
    
  • Aggregation: time vs. baseline averaged over frequency and polarization (equivalent to agg_axis=['frequency', 'polarization']):

    >>> msr.plot(x_axis='baseline', y_axis='time', aggregator='mean')
    
  • Iteration layout: time vs. baseline for the first six frequencies in 2 rows, with the first polarization selected automatically:

    >>> msr.plot(iter_axis='frequency', iter_range=(0, 5), subplots=(2, 3))
    
  • Multiple plots layout: time vs. baseline for ‘amp’ and ‘phase’ in one row, with the first frequency and polarization selected automatically:

    >>> msr.plot(vis_axis='amp')
    >>> msr.plot(vis_axis='phase', subplots=(1, 2), clear_plots=False)
    
  • Aggregation and iteration layout: time vs. frequency for the max across baselines, iterated over all polarizations, in a 2x2 layout with a clipped colorbar range:

    >>> msr.plot(x_axis='frequency', y_axis='time', vis_axis='amp', aggregator='max',
    agg_axis='baseline', iter_axis='polarization, iter_range=(0, -1), subplots=(2, 2),
    color_mode='auto')
    

Show Raster Plot

After a plot is created (see Create Raster Plot), it may be shown in a browser tab with show(), which has no parameters:

>>> msr.show()

If multiple plots were created (with iter_axis or clear_plots=False) in a subplots layout, the plots are connected so that the Bokeh plot tools such as pan and zoom act on all plots in the layout.

To show the plot, Holoviews renders the plot to a Bokeh figure, then Bokeh show() saves the plot to an HTML file in a temporary directory and automatically opens the file in the browser.

Save Raster Plot

After a plot is created (see Create Raster Plot), it may be saved to a file with save().

>>> msr.save(filename='', fmt='auto', width=900, height=600)
  • filename (str): Name of file to save. Default ‘’: the plot will be saved as {ms}_raster.{ext}. If fmt is not set, plot will be saved as .png.

  • fmt (str): Format of file to save (‘png’, ‘svg’, ‘html’, or ‘gif’). Default ‘auto’: inferred from filename extension.

  • width (int): width of exported plot.

  • height (int): height of exported plot.

As in Show Raster Plot, multiple plots can be saved in a layout using the subplots parameter in the Create Raster Plot plot() function. This layout plot will be saved to a single file with filename.

However, if iteration plots were created and subplots is a single plot (None or (1, 1)), the iteration plots will be saved individually with a plot index appended to the filename according to the iter_range index range: {filename}_{index}.{ext}.

When save() is called, as with show() the plot is exported to an HTML file. If the file format is .png, the exported plot will be rendered as a PNG, which currently requires Selenium and either firefox and geckodriver or chromium browser and chromedriver (see Install).

Warning

If you have rendered the plot with show(), it is much faster to save the plot with the Bokeh save tool than with save().

Iteration save() examples:

  • Create plots for the first four frequencies, and save them in a 2x2 layout:

    >>> msr.plot(iter_axis='frequency', iter_range=(0, 3), subplots=(2, 2))
    >>> msr.save(filename='iter_freq_layout.png')
    
  • Create plots for a range of frequencies, and save them individually with an index (in this example: iter_freq_10.png, iter_freq_11.png, iter_freq_12.png, iter_freq_13.png):

    >>> msr.plot(iter_axis='frequency', iter_range=(10, 13))
    >>> msr.save(filename='iter_freq.png')
    
  • Create plots for all polarizations, and save them individually with an index (in this case, iter_pol_0.png, iter_pol_1.png, etc. depending on the number of polarizations):

    >>> msr.plot(iter_axis='polarization', iter_range=(0, -1))
    >>> msr.save(filename='iter_pol.png')
    

Use Interactive GUI

As mentioned in Construct MsRaster Object, use showgui=True to launch the interactive GUI. In this case, an ms path can be supplied but is not required.

>>> msr = MsRaster(ms=None, log_level='info', show_gui=True)

The GUI will immediately launch in a browser tab. If ms is set, a plot with default parameters is created and shown in the GUI.

../_images/msraster_gui.png

Each section of the plot inputs on the right side of the GUI corresponds to the MsRaster plot parameters:

  • Construct MsRaster Object parameters:

    • Select file: ms

  • Style Raster Plots parameters:

    • Plot style: unflagged_cmap, flagged_cmap, show_colorbar, show_flagged_colorbar

  • Create Raster Plot parameters:

    • Plot axes: x_axis, y_axis, vis_axis

    • Selection: selection

    • Aggregation: aggregator, agg_axis

    • Iteration: iter_axis, iter_range, subplots

    • Plot title: title

Plot: the Plot button will change from outline to solid when the plot inputs change. Click Plot to render the plot.

Note

Iterated plots in a layout will be shown in a Bokeh plot in a new browser tab. There is currently no way to create multiple plots in the GUI except by iteration.