Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This short tutorial shows how to store data into the prediction service with python and requests (see also Access to REST-Services in Python).

For IDL you can take a similar rout approach by adapting these instructions to Access to REST-Services in IDL.

Routes

...

The Prediction Service

The prediction service represents a web interface which allows to request, insert and modify prediction data from the database. Operations are performed by sending URL requests, whereas each operation is well defined as a so called route. The prediction service comes along with a graphical user interface at http://localhost:8004/ui/). Under the Edit tab there are all routes which can be used to insert or update data of the prediction service.

 

Image Removed

 

which provides visual access to all available routes. Hereby, all routes involving the insertion or modification of data are enlisted under the Edit section.

Image Added

For this tutorial we are using just two of themonly two routes. One to add a new datasets machine learning configuration and one to add new configuration sets to the prediction servicethe consequential predictions.

  • /datasetalgoconfig/bulk/configset/{datasetname}

Requests

These two routes are both POST routes which requires them to be called with a POST request. To create simple post requests we recommend the python package requests.

A POST request needs two parameter: One is the address itself which is given by the route and the second one is the data which will be attached to the request.

Code Block
languagepy
r = requests.post("http://httpbin.org/post", json = {"key":"value"})

Implementation

Prepare

First of all you have to define the data you would like to store into the prediction service. In our example we are going to store some predictions from a machine learning algorithm into the dataset ml-algorithms.

...

languagepy

...

  • /predictionset

Each route can hold up to three different parameter types which are described in details by the following article: "Ingest property data in database (REST API)".

Implementation

Ingest

In our example we run a machine learning algorithm which produces a set of flare predictions to store within our database. Hereby, the algorithm consists of a training phase and a testing or prediction phase. Within the training phase the algorithm learns and tunes its parameters which then can be stored within the database as a configuration for later use. Afterwards, within the testing phase, we use this configuration to compute flare predictions which are also stored within the database. The following code shows the two corresponding functions.

Code Block
languagepy
linenumberstrue
def train_model(model, train_data, validation_data, max_epoches, batch_size, environment):
    # train model (e.g. until max_epoches are reached or validation loss increases)
    model.train(train_data, validation_data, max_epoches, batch_size)
    # store model parameters within database
    post_data = {
        "algorithm_run_id": environment['runtime']['run_id'],
        "config_data": model.get_parameters(),
        "description": ""
    }
    response = requests.post('http://localhost:8004/algoconfig/%s' % environment['algorithm']['cfg_name'], json=post_data).json()
    if response['has_error'] == True:
        raise ValueError('An error occurred while storing the algorithm\'s configuration:\n%s' % response['error'])
    return response['data']
 
def test_model(model, test_data, environment):
    # test model (e.g. predict the test_data)
    model.run(test_data)
    # store predictions within database
    prediction_data = []
    for prediction in model.get_predictions():
        prediction_data.append({
            "time_start": prediction['time_start'],
            "time_duration": prediction['time_duration'],
            "probability": prediction['probability'],
            "intensity_min": prediction['intensity_min'],
            "intensity_max": prediction['intensity_max'],
            "meta": {
        'time-left': '32h12m04s'        "harp": prediction['harp'],
                "nar": prediction['nar']
            },
        'position': {    "data": prediction['data']
        })
    post_data 'lat_hg': 14.0= {
        "algorithm_config": environment['algorithm']['cfg_name'],
        "algorithm_run_id": environment['runtime']['run_id'],
        "prediction_data": prediction_data,
  'long_hg': -3.4
         "source_data": [get_fc_id(row) for row in test_data]
    },

    response = requests.post('http://localhost:8004/predictionset', json=post_data).json()
'probability': '88%',    if response['has_error'] == True:
        raise 'class': 'M'ValueError('An error occurred while storing the algorithm\'s prediction set:\n%s' % response['error'])
    }
 }

Now the problem is that the ml_result is not structured in the way that the prediction service would understand it. So we have to restructure it to accomplish following definition:

...

return response['data']

The post_data structure is equivalent to the algorithm_config_data or prediction_data definitions as given by the routes /algoconfig/{name} and /prediction/bulk:

Image Added

Image Added

  • Every algorithm_configuration_data needs at least an algorithm_run_id and
    config_data attribute.
  • Every specific configuration value has to be added to the config_data attribute and has
    to be a key-value pair.
  • Every prediction_set needs at least an algorithm_configuration,
    algorithm_run_id, source_data and prediction_data attribute.
  • The prediction_data represents a list of predictions, where each prediction needs
    at least a time_start, time_duration, probability, intensity_min, intensity_max
    and data attribute.
  • Every specific prediction value has to be added to the

...

  • data attribute and has to be a
    key-value pair.

This would then look like this:

Code Block
languagepy
# define data to store
post_data = {
    'name': 'ml-multilayer-perceptron',
    'timestamp': '2016-01-21T17:27:59.001Z',
    'config_data': ml_configurations,
    'prediction_data': {
        'time-left': '32h12m04s',
        'position': {
            'lat_hg': 14.0,
            'long_hg': -3.4
        },
        'probability': '88%',
        'class': 'M'
    }
}

Ingest

Now the ingest of this post_data is very simple. We first have to add the new dataset to the prediction service and then add a new configuration set.

...

languagepy

...

Source Code

Here you can download the above code fragments as python code.

ingest_prediction_data.py

A more complete example using the above functions is given by the article 'Request prediction data from database (REST API)'.

Troubleshooting

IDDescriptionSolution
1

TypeError while calling requests.post() command, e.g.:

  • TypeError: request() got an unexpected keyword argument 'json'

This error occurs with older versions of requests < 2.4.2.

[A] Update requests (recommended):

  pip install requests --upgrade

[B] Manually set the header definitions while calling 'my_url' with 'my_json_data':

Code Block
languagepy
response = requests.post(
    my_url, data=json.dumps(my_json_data),
    headers={"content-type": "application/json"}
).json()
 2

Serialization of complex data structures:

Simple data structures (e.g. a dictionary of dictionaries or arrays) are directly serializable by the json module and do not need any conversion (see code example above). However, with more complex structures the following errors may occur:

  • UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position ...: invalid start byte
  • Error message from the prediction service:
    <class 'psycopg2.DataError'>: unsupported Unicode escape sequence
    DETAIL: \u0000 cannot be converted to text.

In both cases we try to serialize characters which are not supported either by the codec (e.g. utf-8) or the database.

[A] Using the bas64 codec we can 'transform' each character into a simple ASCII representation (escaping all special characters):

Code Block
languagepy
import base64
import pickle
...
p_object = pickle.dumps(my_object, 2)  # serializes my_object, using protocol "2"
d_object = base64.b64encode(p_object).decode('ascii')  # encodes serialized object
post_data = {
    "algorithm_run_id": environment['runtime']['run_id'],
    "config_data": {"my_object": d_object},
    "description": "..."
}
response = requests.post(
    'http://localhost:8004/

...

algoconfig/%s

...

' % 

...

environment['algorithm']['cfg_name'],
    json=post_data

...

The addresses of the routes are the same we looked up before.

Retrieve

Now to check if everything worked we should retrieve all the configuration sets which are stored under this dataset. This can be done with a GET request.

...

languagepy

...


).json()
  
# check wherever upload was successful (see example code above)
  
response = requests.get(

...


    'http://localhost:8004/

...

algoconfig/list?algorithm_config_name=%s'
    % environment['algorithm']['cfg_name']
).json() 

...

Source Code

Here you can download the full python source code.

prediction_request_ingest.py

 

d_object = response['data'][0]['config_data']['my_object']
p_object = base64.b64decode(d_object)  # decodes serialized object
my_object = pickle.loads(p_object)  # deserializes my_object
Info

In the example above we pickle my_object using protocol 2 instead of the default protocol 0 (see line 4). Protocol 2 is hereby recommended especially for new-style classes, which are used by most modern libraries.

Info

This page was adopted from Ingest property data in database (REST API).