Page History

This short tutorial shows how to store data into the property prediction service with python and requests (see also Access to REST-Services in Python).

For IDL you can take a similar rout approach by adapting these instructions to Access to REST-Services in IDL.

...

The prediction service represents a web interface which allows to request, insert and modify prediction data from the database. Operations are performed by sending URL requests, whereas each operation is well defined as a so called route. The property prediction service comes along with a graphical user interface at http://localhost:8004/ui/ which provides visual access to all available routes. Hereby, all routes involving the insertion or modification of data are enlisted under the Edit section.

Image RemovedImage Added

For this tutorial we are using only two routes. One to add a new machine learning configuration and one to add the consequential prediction predictions.

/algoconfig/{name}
/prediction/bulkpredictionset

Each route can hold up to three different parameter types which are described within in details by the following article: "Ingest property data in database (REST API)".

Implementation

...

Ingest

In our example we run a machine learning algorithm which produces a set of flare prediction predictions to store within our database. Hereby, the algorithm consists of a training phase and a testing or prediction phase. Within the training phase the algorithm learns and tunes its parameters which are then can be stored within the database as a configuration for later use. Afterwards, within the prediction testing phase, we use this configuration to compute flare predictions which are also stored within the database.

The following code shows the two corresponding functions.

Code Block

language	py
linenumbers	true

def:

Implementation

Prepare

First of all you have to define the data you would like to store into the prediction service. In our example we are going to store some predictions from a machine learning algorithm into the dataset ml-algorithms.

Code Block

language	py
linenumbers	true

import requests

algo_config_name = "my_ml_configuration"
algo_config = {}
 train_model(model, train_data, validation_data, max_epoches, batch_size, environment):
    # train model (e.g. until max_epoches are reached or validation loss increases)
    model.train(train_data, validation_data, max_epoches, batch_size)
    # store model parameters within database
    post_data = {
        "algorithm_run_id": environment['runtime']['run_id'],
        "config_data": model.get_parameters(),
        "description": ""
    }
    response = requests.getpost('http://localhost:8004/algoconfig/list?algorithm_config_name=%s&algorithm_config_version=latest' % algo_config_nameenvironment['algorithm']['cfg_name'], json=post_data).json()
    if response['has_error'] == FalseTrue:
and response['result-count'] > 0: 	# as we requested the latest configuration we expect only one result within 'data'raise ValueError('An error occurred while storing the algorithm\'s configuration:\n%s' % response['error'])
    algo_configreturn = response['data'][0]
else
 
def test_model(model, test_data, environment):
    # http://localhost:8004/algoconfig/list?algorithm_config_name=a #if

Now the problem is that the ml_result is not structured in the way that the prediction service would understand it. So we have to restructure it to accomplish following definition:

Every configuration set needs at least a name attribute.
Every specific configuration has to be added to the config_data attribute and has to be a key-value pair.
Every specific prediction has to be added to the prediction_data attribute and has to be a key-value pair.

This would then look like this:

Code Block

language	py

# define data to store
post_data = {
    'name': 'ml-multilayer-perceptron',
    'timestamp': '2016-01-21T17:27:59.001Z',
    'config_data': ml_configurations,
    'prediction_data'test model (e.g. predict the test_data)
    model.run(test_data)
    # store predictions within database
    prediction_data = []
    for prediction in model.get_predictions():
        prediction_data.append({
            "time_start": prediction['time_start'],
            "time_duration": prediction['time_duration'],
            "probability": prediction['probability'],
            "intensity_min": prediction['intensity_min'],
            "intensity_max": prediction['intensity_max'],
            "meta": {
        'time-left': '32h12m04s'        "harp": prediction['harp'],
                "nar": prediction['nar']
            },
        'position': {    "data": prediction['data']
        })
    post_data 'lat_hg': 14.0= {
        "algorithm_config": environment['algorithm']['cfg_name'],
        "algorithm_run_id": environment['runtime']['run_id'],
        "prediction_data": prediction_data,
  'long_hg': -3.4
         "source_data": [get_fc_id(row) for row in test_data]
    },

    response = requests.post('http://localhost:8004/predictionset', json=post_data).json()
'probability': '88%',    if response['has_error'] == True:
        raise 'class': 'M'ValueError('An error occurred while storing the algorithm\'s prediction set:\n%s' % response['error'])
    }
}

Ingest

Now the ingest of this post_data is very simple. We first have to add the new dataset to the prediction service and then add a new configuration set.

Code Block

language	py

# add dataset
print('creating dataset...')
requests.post("http://localhost:8004/dataset/bulk", json=ml_datasets)

# add configuration set
print('storing data...')
requests.post("return response['data']

The post_data structure is equivalent to the algorithm_config_data or prediction_data definitions as given by the routes /algoconfig/{name} and /prediction/bulk:

Image Added

Every algorithm_configuration_data needs at least an algorithm_run_id and
config_data attribute.
Every specific configuration value has to be added to the config_data attribute and has
to be a key-value pair.

Every prediction_set needs at least an algorithm_configuration,
algorithm_run_id, source_data and prediction_data attribute.
The prediction_data represents a list of predictions, where each prediction needs
at least a time_start, time_duration, probability, intensity_min, intensity_max
and data attribute.
Every specific prediction value has to be added to the data attribute and has to be a
key-value pair.

Source Code

Here you can download the above code fragments as python code.

ingest_prediction_data.py

A more complete example using the above functions is given by the article 'Request prediction data from database (REST API)'.

Troubleshooting

ID

Description

Solution

1

TypeError while calling requests.post() command, e.g.:

TypeError: request() got an unexpected keyword argument 'json'

This error occurs with older versions of requests < 2.4.2.

[A] Update requests (recommended):

  pip install requests --upgrade

[B] Manually set the header definitions while calling 'my_url' with 'my_json_data':

Code Block

language	py

response = requests.post(
    my_url, data=json.dumps(my_json_data),
    headers={"content-type": "application/json"}
).json()

2

Serialization of complex data structures:

Simple data structures (e.g. a dictionary of dictionaries or arrays) are directly serializable by the json module and do not need any conversion (see code example above). However, with more complex structures the following errors may occur:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position ...: invalid start byte
Error message from the prediction service:
<class 'psycopg2.DataError'>: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.

In both cases we try to serialize characters which are not supported either by the codec (e.g. utf-8) or the database.

[A] Using the bas64 codec we can 'transform' each character into a simple ASCII representation (escaping all special characters):

Code Block

language	py

import base64
import pickle
...
p_object = pickle.dumps(my_object, 2)  # serializes my_object, using protocol "2"
d_object = base64.b64encode(p_object).decode('ascii')  # encodes serialized object
post_data = {
    "algorithm_run_id": environment['runtime']['run_id'],
    "config_data": {"my_object": d_object},
    "description": "..."
}
response = requests.post(
    'http://localhost:8004/

...

algoconfig/%s

...

' %

...

environment['algorithm']['cfg_name'],
    json=post_data

...

The addresses of the routes are the same we looked up before.

Retrieve

Now to check if everything worked we should retrieve all the configuration sets which are stored under this dataset. This can be done with a GET request.

...

language	py

...


).json()
  
# check wherever upload was successful (see example code above)
  
response = requests.get(

...


    'http://localhost:8004/

...

algoconfig/list?algorithm_config_name=%s'
    % environment['algorithm']['cfg_name']
).json()

...

Source Code

Here you can download the full python source code.

prediction_request_ingest.py

d_object = response['data'][0]['config_data']['my_object']
p_object = base64.b64decode(d_object)  # decodes serialized object
my_object = pickle.loads(p_object)  # deserializes my_object

Info
In the example above we pickle my_object using protocol 2 instead of the default protocol 0 (see line 4). Protocol 2 is hereby recommended especially for new-style classes, which are used by most modern libraries.

Info
This page was adopted from Ingest property data in database (REST API).

Space shortcuts

Page tree

Versions Compared

Old Version 2

New Version Current

Key

Implementation

Ingest

Implementation

Prepare

Ingest

Source Code

Troubleshooting

Retrieve

Source Code