SDF.data_model: Access, Generate and Modify SDF Objects

The Python representation of SDF objects can be found in the SDF.data_model submodule and aims to mirror the behavior of built-in data structures like dict, set and list as much as possible.

XMLWritable

The abstract class XMLWritable is the interface for all classes in SDF.data_model which directly represent an XML element. It has the two abstract methods to_xml_element (element) and from_xml_element ().

Name, Date, Owner, Comment

These classes represent atomic values: Name, Owner and Comment are just wrappers around strings (str), a Date wraps a timestamp (datetime.datetime). Users don’t have to interact with these classes directly, as each class containing a Name, Date, Owner or Comment has a property with that name (in lowercase) that returns the appropriate value.

Inheritance diagram of SDF.data_model.Name, SDF.data_model.Owner, SDF.data_model.Comment, SDF.data_model.Date, SDF.data_model.NameElement

Fig. 10 Inheritance Diagram

>>> from SDF.data_model import Workspace
>>> from datetime import datetime

>>> ws = Workspace(name="My Workspace", owner="Me")
>>> ws.name
'My Workspace'
>>> ws.owner
'Me'
>>> ws.owner = "You"
>>> ws.owner
'You'
>>> ws.date = datetime.now()  # works
>>> ws.comment = "My comment"  # works

Details:

  • Name s are immutable, because they are often used as keys in dict-like data structures

  • Name s cannot be empty and cannot contain multiple lines

  • Name elements (<name>...</name>) are implemented in the class NameElement, which subclasses Name and also XMLWritable

  • Owner strings will be normalized, such that multiple whitespace characters will be collapsed into one space

  • Multiline Comment s will be de-indented, such that relative indentation is preserved and the least indented line will become not indented

Samples

Each Sample has a name and a comment. Workspace s and Dataset s can have multiple Sample s. So Workspace s and Dataset s have a samples property, which behaves like a dict. Thus, users will most likely never directly interact with Sample objects.

>>> from SDF.data_model import Workspace

>>> ws = Workspace("My Workspace")
>>> ws.samples["sample 1"] = "Comment 1"
>>> ws.samples["sample 2"] = "Comment 2"
>>> ws.samples["sample 1"]
'Comment 1'
Inheritance diagram of SDF.data_model.Sample, SDF.data_model.SampleSet

Fig. 11 Inheritance Diagram

Parameters

Parameters can either be single Parameter s or ParameterSet s. Both are subclasses of ParameterType.

Single Parameters

Single parameters (<par name=... value=... [unit=...] />) are represented by the Parameter class. Its attributes name, value and unit are strings (unit can be None). The constructor can take any value type, but it will internally be converted to a str. The Parameter class has the special property parsed_value, which tries to parse the value string as Python literal.

Users will directly interact with Parameter objects, but won’t have to generate them manually, as will be seen in the section about ParameterSet s.

>>> import numpy as np
>>> from SDF.data_model import Parameter

>>> p1 = Parameter("par1", 1)
>>> p1.name
'par1'
>>> p1.value
'1'
>>> p1.parsed_value
1
>>> p1.unit is None
True

>>> p2 = Parameter("par2", np.arange(4), "N/m")
>>> p2.value
'[0, 1, 2, 3]'
>>> p2.parsed_value
[0, 1, 2, 3]
>>> p2.unit
'N/m'
Inheritance diagram of SDF.data_model.Parameter

Fig. 12 Inheritance Diagram

Details

  • There is no guarantee that parsed_value will return the originally passed value, as there are endless possible cases

  • numpy.ndarray will be represented as (possibly nested) list s

  • bytes will often be represented as ascii str s

  • list s cannot be parameter values, since that would make parsing much more complicated. Use tuple or numpy.ndarray instead.

Parameter Sets

Sets of parameters (<par name=...>...</par>) are represented by the ParameterSet class. It mostly resembles dict with str keys (the names) and Parameter or ParameterSet values.

Workspace and Dataset objects have a parameters attribute.

Inheritance diagram of SDF.data_model.ParameterSet

Fig. 13 Inheritance Diagram

To add parameters to a ParameterSet, there is a type-safe and a more user-friendly dict -like way:

Dict-like

ParameterSets can be handled similarly to the built-in dict class.

>>> from SDF.data_model import Workspace
>>> ws = Workspace("My Workspace")

>>> # single parameters
>>> ws.parameters["par1"] = 1.8
>>> ws.parameters["par1"].parsed_value
1.8
>>> ws.parameters["par2"] = 3.1, "um"
>>> ws.parameters["par2"].parsed_value
3.1
>>> ws.parameters["par2"].unit
'um'

>>> # parameter sets
>>> ws.parameters["parset1"] = [("name1", "value1"), ("name2", "value2")]  # tuples are single parameters
>>> ws.parameters["parset1"]["name2"].value
'value2'
>>> ws.parameters["parset2"] = {"name3": (1, 2, 3), "name4": (3.14, "mm")}
>>> ws.parameters["parset2"]["name3"].parsed_value
(1, 2, 3)
>>> ws.parameters["parset2"]["name4"].unit
'mm'

Type-safe

Instead of relying on our parsing mechanism, users can explicitly create Parameter and ParameterSet objects and add them to ParameterSet objects by using its inherited add method.

>>> from SDF.data_model import Workspace, Parameter, ParameterSet
>>> ws = Workspace("My Workspace")

>>> # single parameters
>>> par1 = Parameter("par1", "value1")
>>> ws.parameters.add(par1)
>>> ws.parameters["par1"].value
'value1'

>>> # parameter sets
>>> parset1 = ParameterSet("parset1")
>>> parset1.add(Parameter("name2", 123, "V/m"))
>>> ws.parameters.add(parset1)
>>> ws.parameters["parset1"]["name2"].unit
'V/m'

Instruments

Instrument s are implemented like ParameterSet s (both share thesame base class). They cannot be added to other Instrument s or ParameterSet s, but users can add Parameter s and ParameterSet s to them in the same way as with ParameterSet s.

Workspace s and Dataset s have an instruments property, which behaves like a set of Instrument s or a ParameterSet s with Instrument instances as first-level children.

>>> from SDF.data_model import Workspace, Instrument
>>> ws = Workspace("My Workspace")

>>> # dict-like
>>> ws.instruments["inst1"] = {"par1": "val1", "par2": ("val2", "unit2")}
>>> ws.instruments["inst1"]["par1"].value
'val1'

>>> # type-safe
>>> inst2 = Instrument("inst2")
>>> inst2.add(Parameter("par3", "val3"))
>>> ws.instruments.add(inst2)
>>> ws.instruments["inst2"]["par3"].value
'val3'
Inheritance diagram of SDF.data_model.Instrument, SDF.data_model.InstrumentSet

Fig. 14 Inheritance Diagram

Data Blocks

Data blocks (<data ...>...</data>) are represented by the abstract class Data. Currently there are three implementations:

Users usually don’t have direct contact to these wrappers, since the Dataset classes abstract them and provide direct access to the wrapped Data objects.

Inheritance diagram of SDF.data_model.ArrayData1D, SDF.data_model.ArrayData2D, SDF.data_model.ImageData

Fig. 15 Inheritance Diagram

SDF Objects

The abstract class SDFObject implements the behavior and properties shared by Dataset and Workspace:

Datasets

Besides the properties inherited from SDFObject, datasets contain a single Data object, and optional metadata specific to the respective Data type (type_for_xml).

These Dataset implementations provide access to the object wrapped by Data via their data property. This, however, is just a common attribute of the current Dataset implementations and not a requirement for future implementations.

>>> import numpy as np
>>> from SDF.data_model import ArrayDataset1D
>>> ds_1d = ArrayDataset1D("Dataset 1", np.arange(10), unit="s", comment="Comment 1")
>>> ds_1d.data
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> ds_1d.comment
'Comment 1'

>>> from SDF.data_model import ArrayDataset2D
>>> ds_2d = ArrayDataset2D("Dataset 2", np.random.random((2, 30)),
...     samples=dict(cell1="wild type", cell2="mutant"))
>>> ds_2d.data.shape
(2, 30)
>>> ds_2d.samples["cell1"]
'wild type'

>>> from PIL import Image
>>> from SDF.data_model import ImageDataset
>>> img = Image.fromarray(np.random.randint(0, 256, (20, 20)), mode="L")  # random grayscale image, 20x20
>>> ds_img = ImageDataset("Dataset 3", img, owner="Santa")
>>> ds_img.data.size
(20, 20)
>>> ds_img.owner
'Santa'
Inheritance diagram of SDF.data_model.ArrayDataset1D, SDF.data_model.ArrayDataset2D, SDF.data_model.ImageDataset

Fig. 16 Inheritance Diagram

Details

Since there are many optional properties, the constructor only accepts the name and data object as positional arguments. Other arguments (owner, parameters, …) must be passed by keyword.

Workspaces

Besides the properties inherited from SDFObject, Workspace s can wrap multiple child datasets and workspaces, which enables SDF files to be hierarchical.

>>> import numpy as np
>>> from SDF.data_model import Workspace, ArrayDataset1D

>>> ws2 = Workspace("Child workspace")
>>> ds1 = ArrayDataset1D("First dataset", np.array([1, 2, 3]))
>>> ds2 = ArrayDataset1D("Second dataset", np.array([4, 5, 6]))

>>> # ws1 is initialized with ds1 and ws2 as children
>>> ws1 = Workspace("Parent workspace", datasets=[ds1], workspaces=[ws2])
>>> ws2 in ws1.workspaces
True
>>> ws1.workspaces["Child workspace"] is ws2  # access by name
True
>>> ds1 in ws1.datasets
True
>>> ds2 in ws1.datasets
False

>>> # items can be added or removed later
>>> ws1.datasets.add(ds2)
>>> ws1.datasets.remove(ds1)
Inheritance diagram of SDF.data_model.Workspace

Fig. 17 Inheritance Diagram

Full Inheritance Diagram

Inheritance diagram of SDF.data_model.ArrayData1D, SDF.data_model.ArrayData2D, SDF.data_model.ArrayDataset1D, SDF.data_model.ArrayDataset2D, SDF.data_model.Comment, SDF.data_model.Data, SDF.data_model.Dataset, SDF.data_model.Date, SDF.data_model.ElementSet, SDF.data_model.ImageData, SDF.data_model.ImageDataset, SDF.data_model.Instrument, SDF.data_model.InstrumentSet, SDF.data_model.Name, SDF.data_model.Owner, SDF.data_model.AnonymousParameterSet, SDF.data_model.Parameter, SDF.data_model.ParameterSet, SDF.data_model.Sample, SDF.data_model.SampleSet, SDF.data_model.SDFObject, SDF.data_model.SourceParameters, SDF.data_model.Workspace, SDF.data_model.XMLWritable, SDF.data_model.NameElement

Fig. 18 Inheritance Diagram