Functions you may use in your components:

List, load and save datasets

platiagro.list_datasets() List[str]

Lists dataset names from object storage.

Returns

A list of all datasets names.

Return type

list

from platiagro import list_datasets

list_datasets()
['iris', 'boston', 'imdb']
platiagro.get_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None) Union[pandas.core.frame.DataFrame, BinaryIO]

Retrieves dataset response object from minio.

If run_id exists, then gets the dataset from the specified run. If the dataset does not exist for given run_id/operator_id return the ‘original’ dataset

Parameters
  • name (str) – the dataset name.

  • run_id (str, optional) – the run id of training pipeline. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

urllib3.response.HTTPResponse object

Raises

FileNotFoundError – If dataset does not exist in the object storage.

from platiagro import get_dataset

dataset = "iris"

get_dataset(dataset)
<urllib3.response.HTTPResponse object at 0x7f9c4711f2e0>
platiagro.load_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None, page: Optional[int] = None, page_size: Optional[int] = None) Union[pandas.core.frame.DataFrame, BinaryIO]

Retrieves the contents of a dataset.

If run_id exists, then loads the dataset from the specified run. If the dataset does not exist for given run_id/operator_id return the ‘original’ dataset

Parameters
  • name (str) – the dataset name.

  • run_id (str, optional) – the run id of training pipeline. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

The contents of a dataset. Either a pandas.DataFrame or an BinaryIO buffer.

Raises

FileNotFoundError – If dataset does not exist in the object storage.

from platiagro import load_dataset

dataset = "iris"

load_dataset(dataset)
        col0  col1  col2  col3  col4         col5
0  01/01/2000   5.1   3.5   1.4   0.2  Iris-setosa
1  01/01/2001   4.9   3.0   1.4   0.2  Iris-setosa
2  01/01/2002   4.7   3.2   1.3   0.2  Iris-setosa
3  01/01/2003   4.6   3.1   1.5   0.2  Iris-setosa
platiagro.save_dataset(name: str, data: Optional[Union[pandas.core.frame.DataFrame, BinaryIO]] = None, df: Optional[pandas.core.frame.DataFrame] = None, metadata: Optional[Dict[str, str]] = None, run_id: Optional[str] = None, operator_id: Optional[str] = None)

Saves a dataset and its metadata to the object storage.

Parameters
  • name (str) – the dataset name.

  • data (pandas.DataFrame, BinaryIO, optional) – the dataset contents as a pandas.DataFrame or an BinaryIO buffer. Defaults to None.

  • df (pandas.DataFrame, optional) – the dataset contents as an pandas.DataFrame. df exists only for compatibility with existing components. Use “data” for all types of datasets. Defaults to None.

  • metadata (dict, optional) – metadata about the dataset. Defaults to None..

  • run_id (str, optional) – the run id. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

Raises

PermissionError – If dataset was read only.

import pandas as pd
from platiagro import save_dataset
from platiagro.featuretypes import DATETIME, NUMERICAL, CATEGORICAL

dataset = "test"

df = pd.DataFrame({"col0": ["01/01/2000", "01/01/2001"], "col1": [1.0, -1.0], "col2": [1, 0]})
save_dataset(dataset, df, metadata={"featuretypes": [DATETIME, NUMERICAL, CATEGORICAL]})
platiagro.stat_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None) Dict[str, str]

Retrieves the metadata of a dataset.

Parameters
  • name (str) – the dataset name.

  • run_id (str, optional) – the run id of trainning pipeline. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

The metadata.

Return type

dict

Raises

FileNotFoundError – If dataset does not exist in the object storage.

from platiagro import stat_dataset

dataset = "test"

stat_dataset(dataset)
{'columns': ['col0', 'col1', 'col2'], 'featuretypes': ['DateTime', 'Numerical', 'Categorical']}
platiagro.download_dataset(name: str, path: str)

Downloads the given dataset to the path.

Parameters
  • name (str) – the dataset name.

  • path (str) – destination path.

from platiagro import download_dataset

dataset = "test"
path = "./test"

download_dataset(dataset, path)

Load and save models

platiagro.load_model(experiment_id: Optional[str] = None, operator_id: Optional[str] = None) Dict[str, object]

Retrieves a model from object storage.

Parameters
  • experiment_id (str, optional) – the experiment uuid. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

A dictionary of models.

Return type

dict

Raises
  • TypeError – when experiment_id is undefined in args and env.

  • TypeError – when operator_id is undefined in args and env.

from platiagro import load_model

class Predictor(object):
    def __init__(self):
        self.model = load_model()

    def predict(self, X)
        return self.model.predict(X)
platiagro.save_model(**kwargs)

Serializes and saves models.

Parameters

**kwargs – the models as keyword arguments.

Raises
  • TypeError – when a figure is not a matplotlib figure.

  • TypeError – when experiment_id is undefined in args and env.

  • TypeError – when operator_id is undefined in args and env.

from platiagro import save_model

model = MockModel()
save_model(model=model)

Save metrics

platiagro.list_metrics(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None) List[Dict[str, object]]

Lists metrics from object storage. :param experiment_id: the experiment uuid. Defaults to None. :type experiment_id: str, optional :param operator_id: the operator uuid. Defaults to None. :type operator_id: str, optional :param run_id: the run id. Defaults to None. :type run_id: str, optional

Returns

A list of metrics.

Return type

list

Raises
  • TypeError – when experiment_id is undefined in args and env.

  • TypeError – when operator_id is undefined in args and env.

from platiagro import list_metrics

list_metrics()
[{'accuracy': 0.7}]
platiagro.save_metrics(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, **kwargs)

Saves metrics of an experiment to the object storage. :param experiment_id: the experiment uuid. Defaults to None :type experiment_id: str, optional :param operator_id: the operator uuid. Defaults to None :type operator_id: str, optional :param run_id: the run id. Defaults to None. :type run_id: str, optional :param **kwargs: the metrics dict.

Raises
  • TypeError – when experiment_id is undefined in args and env.

  • TypeError – when operator_id is undefined in args and env.

import numpy as np
import pandas as pd
from platiagro import save_metrics
from sklearn.metrics import confusion_matrix

data = confusion_matrix(y_test, y_pred, labels=labels)
confusion_matrix = pd.DataFrame(data, columns=labels, index=labels)
save_metrics(confusion_matrix=confusion_matrix)
save_metrics(accuracy=0.7)
save_metrics(reset=True, r2_score=-3.0)

List, save and delete figures

platiagro.list_figures(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None) List[str]

Lists all figures from object storage as data URI scheme.

Parameters
  • experiment_id (str, optional) – the experiment uuid. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

  • run_id (str, optional) – the run id. Defaults to None.

Returns

A list of data URIs.

Return type

list

from platiagro import list_figures

list_figures()
['data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMAAAAC6CAIAAAB3B9X3AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAG6SURBVHhe7dIxAQAADMOg+TfdicgLGrhBIBCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhDB9ho69eEGiUHfAAAAAElFTkSuQmCC']
platiagro.save_figure(figure: Union[bytes, str], extension: Optional[str] = None, experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None)

Saves a figure to the object storage.

Parameters
  • figure (bytes, str) – a base64 bytes or a base64 string.

  • extension (str, optional) – the file extension when base64 is send. Defaults to None.

  • experiment_id (str, optional) – the experiment uuid. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

  • run_id (str, optional) – the run id. Defaults to None.

  • deployment_id (str, optional) – the deployment id. Defaults to None.

  • monitoring_id (str, optional) – the monitoring id. Defaults to None.

import numpy as np
import seaborn as sns
from platiagro import save_figure

data = np.random.rand(10, 12)
plot = sns.heatmap(data)
save_figure(figure=plot.figure)
platiagro.delete_figures(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None) List[str]

Delete a figure to the object storage.

Parameters
  • experiment_id (str, optional) – the experiment uuid. Defaults to None.

  • operator_id (str, optional) – the operator uuid. Defaults to None.

  • run_id (str, optional) – the run id. Defaults to None.

  • deployment_id (str, optional) – the deployment id. Defaults to None.

  • monitoring_id (str, optional) – the monitoring id. Defaults to None.

from platiagro import delete_figures

delete_figures()

Get feature types

platiagro.infer_featuretypes(df: pandas.core.frame.DataFrame, nrows: int = 100)

Infer featuretypes from DataFrame columns.

Parameters
  • df (pandas.DataFrame) – the dataset.

  • nrows (int) – the number of rows to inspect.

Returns

A list of feature types.

Return type

list

import pandas as pd
from platiagro import infer_featuretypes

df = pd.DataFrame({"col0": ["01/01/2000", "01/01/2001"], "col1": [1.0, -1.0], "col2": [1, 0]})
result = infer_featuretypes(df)
platiagro.validate_featuretypes(featuretypes: List[str])

Verifies whether all feature types are valid.

Parameters

featuretypes (list) – the list of feature types.

Raises

ValueError – when an invalid feature type is found.

from platiagro import validate_featuretypes
from platiagro.featuretypes import DATETIME, NUMERICAL, CATEGORICAL

featuretypes = [DATETIME, NUMERICAL, CATEGORICAL]
validate_featuretypes(featuretypes)

featuretypes = ["float", "int", "str"]
validate_featuretypes(featuretypes)
ValueError: featuretype must be one of DateTime, Numerical, Categorical

Download artifact

platiagro.download_artifact(name: str, path: str)

Downloads the given artifact to the path.

Parameters
  • name (str) – the dataset name.

  • path (str) – destination path.

Raises

FileNotFoundError

from platiagro import download_artifact

download_artifact(name="glove_s100.zip", path="/tmp/glove_s100.zip")