Working with large data volumes

Authors: Ashley Smith

Abstract: Some strategies for requesting and handling larger data volumes

Note that the code could take a long time to run, so it is better to adjust it for smaller jobs if you are just testing it out.

%load_ext watermark
%watermark -i -v -p viresclient,pandas,xarray,matplotlib
2021-01-24T15:37:48+00:00

CPython 3.7.6
IPython 7.11.1

viresclient 0.7.1
pandas 0.25.3
xarray 0.15.0
matplotlib 3.1.2
from viresclient import SwarmRequest
import datetime as dt
import xarray as xr
import glob

Set up the request parameters - magnetic data and model evaluations

We fetch the measurements (F, B_NEC), and model values (named as F_CHAOS, B_NEC_CHAOS) - here we name the custom model as “CHAOS” but you can call it anything.

It is also possible to fetch data from all satellites at once with request.set_collection("SW_OPER_MAGA_LR_1B", "SW_OPER_MAGB_LR_1B", ""SW_OPER_MAGC_LR_1B"), which will be identified in the returned data by the Spacecraft column.

request = SwarmRequest()
request.set_collection("SW_OPER_MAGA_LR_1B")  # Swarm Alpha
request.set_products(
    measurements=["F",],
    # Choose between the full CHAOS model (will be a lot slower - the MMA part could use some optimisation(?))
#     models=["CHAOS = 'CHAOS-6-Core' + 'CHAOS-6-Static' + 'CHAOS-6-MMA-Primary' + 'CHAOS-6-MMA-Secondary'"],
    # ...or just the core part:
  #  models=["CHAOS = 'CHAOS-Core'"],
    sampling_step="PT60S"
)
# Quality Flags
# https://earth.esa.int/web/guest/missions/esa-eo-missions/swarm/data-handbook/level-1b-product-definitions#label-Flags_F-and-Flags_B-Values-of-MDR_MAG_LR
# NB: will need to do something different for Charlie because the ASM broke so Flags_F are bad
request.set_range_filter("Flags_F", 0, 1)
request.set_range_filter("Flags_B", 0, 1)
<viresclient._client_swarm.SwarmRequest at 0x7f3e400d2810>

Look at one day to see what the output data will look like

data = request.get_between(
    start_time=dt.datetime(2014,1,1),
    end_time=dt.datetime(2020,1,2)
)
[1/1] Processing:  100%|██████████|  [ Elapsed: 02:05, Remaining: 00:00 ]
      Downloading: 100%|██████████|  [ Elapsed: 00:01, Remaining: 00:00 ] (126.087MB)
data.as_dataframe(expand=True).head()
Radius Spacecraft Longitude F Latitude
Timestamp
2014-01-01 00:00:00 6878309.22 A -14.116674 22867.5503 -1.228938
2014-01-01 00:01:00 6878724.84 A -14.204757 22570.1752 -5.030220
2014-01-01 00:02:00 6879101.12 A -14.290944 22294.3361 -8.830961
2014-01-01 00:03:00 6879437.14 A -14.373717 22010.3060 -12.631126
2014-01-01 00:04:00 6879732.52 A -14.451429 21709.9609 -16.430683
data.as_xarray()
<xarray.Dataset>
Dimensions:     (Timestamp: 3061714)
Coordinates:
  * Timestamp   (Timestamp) datetime64[ns] 2014-01-01 ... 2020-01-01T23:59:00
Data variables:
    Spacecraft  (Timestamp) object 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
    Radius      (Timestamp) float64 6.878e+06 6.879e+06 ... 6.82e+06 6.82e+06
    Longitude   (Timestamp) float64 -14.12 -14.2 -14.29 ... -86.52 -86.44 -86.31
    F           (Timestamp) float64 2.287e+04 2.257e+04 ... 2.605e+04 2.744e+04
    Latitude    (Timestamp) float64 -1.229 -5.03 -8.831 ... -40.84 -44.68 -48.52
Attributes:
    Sources:         ['SW_OPER_MAGA_LR_1B_20140101T000000_20140101T235959_050...
    MagneticModels:  []
    RangeFilters:    ['Flags_B:0,1', 'Flags_F:0,1']