Ingest prediction data in database (REST API)

This short tutorial shows how to store data into the prediction service with python and requests (see also Access to REST-Services in Python).

For IDL you can take a similar approach by adapting these instructions to Access to REST-Services in IDL.

The Prediction Service

The prediction service represents a web interface which allows to request, insert and modify prediction data from the database. Operations are performed by sending URL requests, whereas each operation is well defined as a so called route. The prediction service comes along with a graphical user interface at http://localhost:8004/ui/ which provides visual access to all available routes. Hereby, all routes involving the insertion or modification of data are enlisted under the Edit section.

For this tutorial we are using only two routes. One to add a new machine learning configuration and one to add the consequential prediction.

/algoconfig/{name}
/prediction/bulk

Each route can hold up to three different parameter types which are described in details by the following article: "Ingest property data in database (REST API)".

Implementation

Ingest

In our example we run a machine learning algorithm which produces a flare prediction to store within our database. Hereby, the algorithm consists of a training phase and a test or prediction phase. Within the training phase the algorithm learns and tunes its parameters which then can be stored within the database as a configuration for later use. Afterwards, within the test phase, we use this configuration to compute flare predictions which are also stored within the database. The following code shows the two corresponding functions.

def train_model(model, train_data, validation_data, max_epoches, batch_size, environment):
    # train model (e.g. until max_epoches are reached or validation loss increases)
    model.train(train_data, validation_data, max_epoches, batch_size)
    # store model parameters within database
    post_data = {
        "algorithm_run_id": environment['runtime']['run_id'],
        "config_data": model.get_parameters(),
        "description": ""
    }
    response = requests.post('http://localhost:8004/algoconfig/%s' % environment['algorithm']['cfg_name'], json=post_data).json()
    if response['has_error'] == True:
        raise ValueError('An error occurred while storing the algorithm\'s configuration:\n%s' % response['error'])
    return response['data']
 
def test_model(model, test_data, environment):
    # test model (e.g. predict the test_data)
    model.run(test_data)
    (time_start, position_hg, prediction_data) = model.get_prediction()
    # store predictions within database
    post_data = [
        {
            "algorithm_config": environment['algorithm']['cfg_name'],
            "algorithm_run_id": environment['runtime']['run_id'],
            "lat_hg": position_hg[0],
            "long_hg": position_hg[1],
            "prediction_data": prediction_data,
            "source_data": [get_fc_id(row) for row in test_data],
            "time_start": time_start
        }
    ]
    response = requests.post('http://localhost:8004/prediction/bulk', json=post_data).json()
    if response['has_error'] == True:
        raise ValueError('An error occurred while storing the algorithm\'s prediction:\n%s' % response['error'])
    return response['data']

The post_data structure is equivalent to the algorithm_config_data or prediction_data definitions as given by the routes /algoconfig/{name} and /prediction/bulk:


Every algorithm_configuration_data needs at least an algorithm_run_id and config_data attribute. Every specific configuration value has to be added to the config_data attribute and has to be a key-value pair.	Every prediction_data needs at least an algorithm_run_id, prediction_data, source_data and time_start attribute. Every specific prediction value has to be added to the prediction_data attribute and has to be a key-value pair.

Source Code

Here you can download the above code fragments as python code.

ingest_prediction_data.py

A more complete example using the above functions is given by the article 'Request prediction data from database (REST API)'.

Troubleshooting

ID

Description

Solution

1

TypeError while calling requests.post() command, e.g.:

TypeError: request() got an unexpected keyword argument 'json'

This error occurs with older versions of requests < 2.4.2.

[A] Update requests (recommended):

  pip install requests --upgrade

[B] Manually set the header definitions while calling 'my_url' with 'my_json_data':

response = requests.post(
    my_url, data=json.dumps(my_json_data),
    headers={"content-type": "application/json"}
).json()

2

Serialization of complex data structures:

Simple data structures (e.g. a dictionary of dictionaries or arrays) are
directly serializable by the json module and do not need any conversion
(see code example above). However, with more complex structures the
following errors may occur:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position ...: invalid start byte
Error message from the prediction service:
<class 'psycopg2.DataError'>: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.

In both cases we try to serialize characters which are not supported either
by the codec (e.g. utf-8) or the database.

[A] Using the bas64 codec we can 'transform' each character into a simple ASCII representation (escaping all special characters):

import base64
import pickle
...
p_object = pickle.dumps(my_object, 0)  # serializes my_object
d_object = base64.b64encode(p_object).decode('ascii')  # encodes serialized object
post_data = {
    "algorithm_run_id": environment['runtime']['run_id'],
    "config_data": {"my_object": d_object},
    "description": "..."
}
response = requests.post(
    'http://localhost:8004/algoconfig/%s' % environment['algorithm']['cfg_name'],
    json=post_data
).json()
  
# check wherever upload was successful (see example code above)
  
response = requests.get(
    'http://localhost:8004/algoconfig/list?algorithm_config_name=%s'
    % environment['algorithm']['cfg_name']
).json() 
d_object = response['data'][0]['config_data']['my_object']
p_object = base64.b64decode(d_object)  # decode serialized object
my_object = pickle.loads(p_object)  # deserialize my_object

This page was adopted from Ingest property data in database (REST API).

Space shortcuts

Page tree

The Prediction Service

Implementation

Ingest

Source Code

Troubleshooting