Functions you may use in your components:¶

List, load and save datasets¶

platiagro.list_datasets() → List[str]¶

Lists dataset names from object storage.

Returns: A list of all datasets names.
Return type: list

from platiagro import list_datasets

list_datasets()
['iris', 'boston', 'imdb']

platiagro.get_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None) → Union[pandas.core.frame.DataFrame, BinaryIO]¶

Retrieves dataset response object from minio.

If run_id exists, then gets the dataset from the specified run. If the dataset does not exist for given run_id/operator_id return the ‘original’ dataset

Parameters

name (str) – the dataset name.
run_id (str, optional) – the run id of training pipeline. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

urllib3.response.HTTPResponse object

Raises

FileNotFoundError – If dataset does not exist in the object storage.

from platiagro import get_dataset

dataset = "iris"

get_dataset(dataset)
<urllib3.response.HTTPResponse object at 0x7f9c4711f2e0>

platiagro.load_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None, page: Optional[int] = None, page_size: Optional[int] = None) → Union[pandas.core.frame.DataFrame, BinaryIO]¶

Retrieves the contents of a dataset.

If run_id exists, then loads the dataset from the specified run. If the dataset does not exist for given run_id/operator_id return the ‘original’ dataset

Parameters

name (str) – the dataset name.
run_id (str, optional) – the run id of training pipeline. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

The contents of a dataset. Either a pandas.DataFrame or an BinaryIO buffer.

Raises

FileNotFoundError – If dataset does not exist in the object storage.

from platiagro import load_dataset

dataset = "iris"

load_dataset(dataset)
        col0  col1  col2  col3  col4         col5
0  01/01/2000   5.1   3.5   1.4   0.2  Iris-setosa
1  01/01/2001   4.9   3.0   1.4   0.2  Iris-setosa
2  01/01/2002   4.7   3.2   1.3   0.2  Iris-setosa
3  01/01/2003   4.6   3.1   1.5   0.2  Iris-setosa

platiagro.save_dataset(name: str, data: Optional[Union[pandas.core.frame.DataFrame, BinaryIO]] = None, df: Optional[pandas.core.frame.DataFrame] = None, metadata: Optional[Dict[str, str]] = None, run_id: Optional[str] = None, operator_id: Optional[str] = None)¶

Saves a dataset and its metadata to the object storage.

Parameters

name (str) – the dataset name.
data (pandas.DataFrame, BinaryIO, optional) – the dataset contents as a pandas.DataFrame or an BinaryIO buffer. Defaults to None.
df (pandas.DataFrame, optional) – the dataset contents as an pandas.DataFrame. df exists only for compatibility with existing components. Use “data” for all types of datasets. Defaults to None.
metadata (dict, optional) – metadata about the dataset. Defaults to None..
run_id (str, optional) – the run id. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.

Raises

PermissionError – If dataset was read only.

import pandas as pd
from platiagro import save_dataset
from platiagro.featuretypes import DATETIME, NUMERICAL, CATEGORICAL

dataset = "test"

df = pd.DataFrame({"col0": ["01/01/2000", "01/01/2001"], "col1": [1.0, -1.0], "col2": [1, 0]})
save_dataset(dataset, df, metadata={"featuretypes": [DATETIME, NUMERICAL, CATEGORICAL]})

platiagro.stat_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None) → Dict[str, str]¶

Retrieves the metadata of a dataset.

Parameters

name (str) – the dataset name.
run_id (str, optional) – the run id of trainning pipeline. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

The metadata.

Return type

dict

Raises

FileNotFoundError – If dataset does not exist in the object storage.

from platiagro import stat_dataset

dataset = "test"

stat_dataset(dataset)
{'columns': ['col0', 'col1', 'col2'], 'featuretypes': ['DateTime', 'Numerical', 'Categorical']}

platiagro.download_dataset(name: str, path: str)¶

Downloads the given dataset to the path.

Parameters

name (str) – the dataset name.
path (str) – destination path.

from platiagro import download_dataset

dataset = "test"
path = "./test"

download_dataset(dataset, path)

Load and save models¶

platiagro.load_model(experiment_id: Optional[str] = None, operator_id: Optional[str] = None) → Dict[str, object]¶

Retrieves a model from object storage.

Parameters

experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.

Returns

A dictionary of models.

Return type

dict

Raises

TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.

from platiagro import load_model

class Predictor(object):
    def __init__(self):
        self.model = load_model()

    def predict(self, X)
        return self.model.predict(X)

platiagro.save_model(**kwargs)¶

Serializes and saves models.

Parameters

**kwargs – the models as keyword arguments.

Raises

TypeError – when a figure is not a matplotlib figure.
TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.

from platiagro import save_model

model = MockModel()
save_model(model=model)

Save metrics¶

platiagro.list_metrics(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None) → List[Dict[str, object]]¶

Lists metrics from object storage. :param experiment_id: the experiment uuid. Defaults to None. :type experiment_id: str, optional :param operator_id: the operator uuid. Defaults to None. :type operator_id: str, optional :param run_id: the run id. Defaults to None. :type run_id: str, optional

Returns

A list of metrics.

Return type

list

Raises

TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.

from platiagro import list_metrics

list_metrics()
[{'accuracy': 0.7}]

platiagro.save_metrics(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, **kwargs)¶

Saves metrics of an experiment to the object storage. :param experiment_id: the experiment uuid. Defaults to None :type experiment_id: str, optional :param operator_id: the operator uuid. Defaults to None :type operator_id: str, optional :param run_id: the run id. Defaults to None. :type run_id: str, optional :param **kwargs: the metrics dict.

Raises

TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.

import numpy as np
import pandas as pd
from platiagro import save_metrics
from sklearn.metrics import confusion_matrix

data = confusion_matrix(y_test, y_pred, labels=labels)
confusion_matrix = pd.DataFrame(data, columns=labels, index=labels)
save_metrics(confusion_matrix=confusion_matrix)
save_metrics(accuracy=0.7)
save_metrics(reset=True, r2_score=-3.0)

List, save and delete figures¶

platiagro.list_figures(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None) → List[str]¶

Lists all figures from object storage as data URI scheme.

Parameters

experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
run_id (str, optional) – the run id. Defaults to None.

Returns

A list of data URIs.

Return type

list

from platiagro import list_figures

list_figures()
['data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMAAAAC6CAIAAAB3B9X3AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAG6SURBVHhe7dIxAQAADMOg+TfdicgLGrhBIBCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhDB9ho69eEGiUHfAAAAAElFTkSuQmCC']

platiagro.save_figure(figure: Union[bytes, str], extension: Optional[str] = None, experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None)¶

Saves a figure to the object storage.

Parameters

figure (bytes, str) – a base64 bytes or a base64 string.
extension (str, optional) – the file extension when base64 is send. Defaults to None.
experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
run_id (str, optional) – the run id. Defaults to None.
deployment_id (str, optional) – the deployment id. Defaults to None.
monitoring_id (str, optional) – the monitoring id. Defaults to None.

import numpy as np
import seaborn as sns
from platiagro import save_figure

data = np.random.rand(10, 12)
plot = sns.heatmap(data)
save_figure(figure=plot.figure)

platiagro.delete_figures(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None) → List[str]¶

Delete a figure to the object storage.

Parameters

experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
run_id (str, optional) – the run id. Defaults to None.
deployment_id (str, optional) – the deployment id. Defaults to None.
monitoring_id (str, optional) – the monitoring id. Defaults to None.

from platiagro import delete_figures

delete_figures()

Get feature types¶

platiagro.infer_featuretypes(df: pandas.core.frame.DataFrame, nrows: int = 100)¶

Infer featuretypes from DataFrame columns.

Parameters

df (pandas.DataFrame) – the dataset.
nrows (int) – the number of rows to inspect.

Returns

A list of feature types.

Return type

list

import pandas as pd
from platiagro import infer_featuretypes

df = pd.DataFrame({"col0": ["01/01/2000", "01/01/2001"], "col1": [1.0, -1.0], "col2": [1, 0]})
result = infer_featuretypes(df)

platiagro.validate_featuretypes(featuretypes: List[str])¶

Verifies whether all feature types are valid.

Parameters: featuretypes (list) – the list of feature types.
Raises: ValueError – when an invalid feature type is found.

from platiagro import validate_featuretypes
from platiagro.featuretypes import DATETIME, NUMERICAL, CATEGORICAL

featuretypes = [DATETIME, NUMERICAL, CATEGORICAL]
validate_featuretypes(featuretypes)

featuretypes = ["float", "int", "str"]
validate_featuretypes(featuretypes)
ValueError: featuretype must be one of DateTime, Numerical, Categorical

Download artifact¶

platiagro.download_artifact(name: str, path: str)¶

Downloads the given artifact to the path.

Parameters

name (str) – the dataset name.
path (str) – destination path.

Raises

FileNotFoundError –

from platiagro import download_artifact

download_artifact(name="glove_s100.zip", path="/tmp/glove_s100.zip")

Functions you may use in your components:¶

List, load and save datasets¶

Load and save models¶

Save metrics¶

List, save and delete figures¶

Get feature types¶

Download artifact¶

PlatIAgro

Navigation

Related Topics