The expected format of the components of data product in the pipeline is described in this file on SCRC Teams.
We suggest that namespaces for Data Products should only contain ASCII letters, ASCII digits, underscores, and dashes (A-Za-z0-9_-), names in the namespace can also include forward slashes (/) to denote structure. Component names in TOML should use the same characters allowed in namespaces (A-Za-z0-9_-) – these correspond to the characters allowed in TOML's bare keys. Component names in hdf5 files, like Data Product names, can also include forward slashes (/) to denote sub-components. At the moment none of these conventions are enforced, but we suggest that everyone maintains them until we find a reason to change them.
Data Products stored in this repository should be stored in folders according to their namespace, data product name and version number. So the
human/infection/SARS-CoV-2/latent-period Data Product version
v0.0.1 in the
SCRC namespace should be found in
SCRC/human/infection/SARS-CoV-2/latent-period and called
v0.0.1.toml. Following this convention will make it easy to browse the repository.
For TOML files, there are currently three types of information that can be stored in one:
[latent-period] type = "point-estimate" value = 123.12
[latent-period] type = "distribution" distribution = "gamma" shape = 1 scale = 2
[latent-period] type = "samples" samples = [1.0, 2.0, 3.0, 4.0, 5.0]
In the examples above, each file had a single component called
latent-period in the data product. If there's only one component in a data product, then we suggest giving it the same name as the last part of the data product's name in the namespace, so for
human/infection/SARS-CoV-2/latent-period, this would be
latent-period. This will be the default if no component name is given in a funtion call.
You can have multiple components of any kind in a single data product. For example:
[latent-period] type = "point-estimate" value = 123.12 [asymptomatic-period] type = "point-estimate" value = 200.1
The only further constraint is that all of the component names (here
asymptomatic-period) are different.