Functions you may use in your components:¶
List, load and save datasets¶
- platiagro.list_datasets() List[str] ¶
Lists dataset names from object storage.
- Returns
A list of all datasets names.
- Return type
list
from platiagro import list_datasets
list_datasets()
['iris', 'boston', 'imdb']
- platiagro.get_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None) Union[pandas.core.frame.DataFrame, BinaryIO] ¶
Retrieves dataset response object from minio.
If run_id exists, then gets the dataset from the specified run. If the dataset does not exist for given run_id/operator_id return the ‘original’ dataset
- Parameters
name (str) – the dataset name.
run_id (str, optional) – the run id of training pipeline. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
- Returns
urllib3.response.HTTPResponse object
- Raises
FileNotFoundError – If dataset does not exist in the object storage.
from platiagro import get_dataset
dataset = "iris"
get_dataset(dataset)
<urllib3.response.HTTPResponse object at 0x7f9c4711f2e0>
- platiagro.load_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None, page: Optional[int] = None, page_size: Optional[int] = None) Union[pandas.core.frame.DataFrame, BinaryIO] ¶
Retrieves the contents of a dataset.
If run_id exists, then loads the dataset from the specified run. If the dataset does not exist for given run_id/operator_id return the ‘original’ dataset
- Parameters
name (str) – the dataset name.
run_id (str, optional) – the run id of training pipeline. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
- Returns
The contents of a dataset. Either a pandas.DataFrame or an BinaryIO buffer.
- Raises
FileNotFoundError – If dataset does not exist in the object storage.
from platiagro import load_dataset
dataset = "iris"
load_dataset(dataset)
col0 col1 col2 col3 col4 col5
0 01/01/2000 5.1 3.5 1.4 0.2 Iris-setosa
1 01/01/2001 4.9 3.0 1.4 0.2 Iris-setosa
2 01/01/2002 4.7 3.2 1.3 0.2 Iris-setosa
3 01/01/2003 4.6 3.1 1.5 0.2 Iris-setosa
- platiagro.save_dataset(name: str, data: Optional[Union[pandas.core.frame.DataFrame, BinaryIO]] = None, df: Optional[pandas.core.frame.DataFrame] = None, metadata: Optional[Dict[str, str]] = None, run_id: Optional[str] = None, operator_id: Optional[str] = None)¶
Saves a dataset and its metadata to the object storage.
- Parameters
name (str) – the dataset name.
data (pandas.DataFrame, BinaryIO, optional) – the dataset contents as a pandas.DataFrame or an BinaryIO buffer. Defaults to None.
df (pandas.DataFrame, optional) – the dataset contents as an pandas.DataFrame. df exists only for compatibility with existing components. Use “data” for all types of datasets. Defaults to None.
metadata (dict, optional) – metadata about the dataset. Defaults to None..
run_id (str, optional) – the run id. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
- Raises
PermissionError – If dataset was read only.
import pandas as pd
from platiagro import save_dataset
from platiagro.featuretypes import DATETIME, NUMERICAL, CATEGORICAL
dataset = "test"
df = pd.DataFrame({"col0": ["01/01/2000", "01/01/2001"], "col1": [1.0, -1.0], "col2": [1, 0]})
save_dataset(dataset, df, metadata={"featuretypes": [DATETIME, NUMERICAL, CATEGORICAL]})
- platiagro.stat_dataset(name: str, run_id: Optional[str] = None, operator_id: Optional[str] = None) Dict[str, str] ¶
Retrieves the metadata of a dataset.
- Parameters
name (str) – the dataset name.
run_id (str, optional) – the run id of trainning pipeline. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
- Returns
The metadata.
- Return type
dict
- Raises
FileNotFoundError – If dataset does not exist in the object storage.
from platiagro import stat_dataset
dataset = "test"
stat_dataset(dataset)
{'columns': ['col0', 'col1', 'col2'], 'featuretypes': ['DateTime', 'Numerical', 'Categorical']}
- platiagro.download_dataset(name: str, path: str)¶
Downloads the given dataset to the path.
- Parameters
name (str) – the dataset name.
path (str) – destination path.
from platiagro import download_dataset
dataset = "test"
path = "./test"
download_dataset(dataset, path)
Load and save models¶
- platiagro.load_model(experiment_id: Optional[str] = None, operator_id: Optional[str] = None) Dict[str, object] ¶
Retrieves a model from object storage.
- Parameters
experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
- Returns
A dictionary of models.
- Return type
dict
- Raises
TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.
from platiagro import load_model
class Predictor(object):
def __init__(self):
self.model = load_model()
def predict(self, X)
return self.model.predict(X)
- platiagro.save_model(**kwargs)¶
Serializes and saves models.
- Parameters
**kwargs – the models as keyword arguments.
- Raises
TypeError – when a figure is not a matplotlib figure.
TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.
from platiagro import save_model
model = MockModel()
save_model(model=model)
Save metrics¶
- platiagro.list_metrics(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None) List[Dict[str, object]] ¶
Lists metrics from object storage. :param experiment_id: the experiment uuid. Defaults to None. :type experiment_id: str, optional :param operator_id: the operator uuid. Defaults to None. :type operator_id: str, optional :param run_id: the run id. Defaults to None. :type run_id: str, optional
- Returns
A list of metrics.
- Return type
list
- Raises
TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.
from platiagro import list_metrics
list_metrics()
[{'accuracy': 0.7}]
- platiagro.save_metrics(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, **kwargs)¶
Saves metrics of an experiment to the object storage. :param experiment_id: the experiment uuid. Defaults to None :type experiment_id: str, optional :param operator_id: the operator uuid. Defaults to None :type operator_id: str, optional :param run_id: the run id. Defaults to None. :type run_id: str, optional :param **kwargs: the metrics dict.
- Raises
TypeError – when experiment_id is undefined in args and env.
TypeError – when operator_id is undefined in args and env.
import numpy as np
import pandas as pd
from platiagro import save_metrics
from sklearn.metrics import confusion_matrix
data = confusion_matrix(y_test, y_pred, labels=labels)
confusion_matrix = pd.DataFrame(data, columns=labels, index=labels)
save_metrics(confusion_matrix=confusion_matrix)
save_metrics(accuracy=0.7)
save_metrics(reset=True, r2_score=-3.0)
List, save and delete figures¶
- platiagro.list_figures(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None) List[str] ¶
Lists all figures from object storage as data URI scheme.
- Parameters
experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
run_id (str, optional) – the run id. Defaults to None.
- Returns
A list of data URIs.
- Return type
list
from platiagro import list_figures
list_figures()
['data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAMAAAAC6CAIAAAB3B9X3AAAAAXNSR0IArs4c6QAAAARnQU1BAACxjwv8YQUAAAAJcEhZcwAADsMAAA7DAcdvqGQAAAG6SURBVHhe7dIxAQAADMOg+TfdicgLGrhBIBCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhCJQCQCkQhEIhDB9ho69eEGiUHfAAAAAElFTkSuQmCC']
- platiagro.save_figure(figure: Union[bytes, str], extension: Optional[str] = None, experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None)¶
Saves a figure to the object storage.
- Parameters
figure (bytes, str) – a base64 bytes or a base64 string.
extension (str, optional) – the file extension when base64 is send. Defaults to None.
experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
run_id (str, optional) – the run id. Defaults to None.
deployment_id (str, optional) – the deployment id. Defaults to None.
monitoring_id (str, optional) – the monitoring id. Defaults to None.
import numpy as np
import seaborn as sns
from platiagro import save_figure
data = np.random.rand(10, 12)
plot = sns.heatmap(data)
save_figure(figure=plot.figure)
- platiagro.delete_figures(experiment_id: Optional[str] = None, operator_id: Optional[str] = None, run_id: Optional[str] = None, deployment_id: Optional[str] = None, monitoring_id: Optional[str] = None) List[str] ¶
Delete a figure to the object storage.
- Parameters
experiment_id (str, optional) – the experiment uuid. Defaults to None.
operator_id (str, optional) – the operator uuid. Defaults to None.
run_id (str, optional) – the run id. Defaults to None.
deployment_id (str, optional) – the deployment id. Defaults to None.
monitoring_id (str, optional) – the monitoring id. Defaults to None.
from platiagro import delete_figures
delete_figures()
Get feature types¶
- platiagro.infer_featuretypes(df: pandas.core.frame.DataFrame, nrows: int = 100)¶
Infer featuretypes from DataFrame columns.
- Parameters
df (pandas.DataFrame) – the dataset.
nrows (int) – the number of rows to inspect.
- Returns
A list of feature types.
- Return type
list
import pandas as pd
from platiagro import infer_featuretypes
df = pd.DataFrame({"col0": ["01/01/2000", "01/01/2001"], "col1": [1.0, -1.0], "col2": [1, 0]})
result = infer_featuretypes(df)
- platiagro.validate_featuretypes(featuretypes: List[str])¶
Verifies whether all feature types are valid.
- Parameters
featuretypes (list) – the list of feature types.
- Raises
ValueError – when an invalid feature type is found.
from platiagro import validate_featuretypes
from platiagro.featuretypes import DATETIME, NUMERICAL, CATEGORICAL
featuretypes = [DATETIME, NUMERICAL, CATEGORICAL]
validate_featuretypes(featuretypes)
featuretypes = ["float", "int", "str"]
validate_featuretypes(featuretypes)
ValueError: featuretype must be one of DateTime, Numerical, Categorical
Download artifact¶
- platiagro.download_artifact(name: str, path: str)¶
Downloads the given artifact to the path.
- Parameters
name (str) – the dataset name.
path (str) – destination path.
- Raises
FileNotFoundError –
from platiagro import download_artifact
download_artifact(name="glove_s100.zip", path="/tmp/glove_s100.zip")