Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The prediction service represents a web interface which allows to request, insert and modify prediction data from the database. Operations are performed by sending URL requests, whereas each operation is well defined as a so called route. The prediction service comes along with a graphical user interface at http://localhost:8004/ui/ which provides visual access to all available routes. Hereby, all routes involving the insertion or modification of data are enlisted under the Edit section.

Image RemovedImage Added

For this tutorial we are using only two routes. One to add a new machine learning configuration and one to add the consequential prediction predictions.

  • /algoconfig/{name}
  • /prediction/bulkpredictionset

Each route can hold up to three different parameter types which are described in details by the following article: "Ingest property data in database (REST API)".

...

In our example we run a machine learning algorithm which produces a set of flare prediction predictions to store within our database. Hereby, the algorithm consists of a training phase and a test testing or prediction phase. Within the training phase the algorithm learns and tunes its parameters which then can be stored within the database as a configuration for later use. Afterwards, within the test testing phase, we use this configuration to compute flare predictions which are also stored within the database. The following code shows the two corresponding functions.

Code Block
languagepy
linenumberstrue
def train_model(model, train_data, validation_data, max_epoches, batch_size, environment):
    # train model (e.g. until max_epoches are reached or validation loss increases)
    model.train(train_data, validation_data, max_epoches, batch_size)
    # store model parameters within database
    post_data = {
        "algorithm_run_id": environment['runtime']['run_id'],
        "config_data": model.get_parameters(),
        "description": ""
    }
    response = requests.post('http://localhost:8004/algoconfig/%s' % environment['algorithm']['cfg_name'], json=post_data).json()
    if response['has_error'] == True:
        raise ValueError('An error occurred while storing the algorithm\'s configuration:\n%s' % response['error'])
    return response['data']
 
def test_model(model, test_data, environment):
    # test model (e.g. predict the test_data)
    model.run(test_data)
    (time_start, position_hg, # store predictions within database
    prediction_data) = []
    for prediction in model.get_predictionpredictions():
    # store predictions within database prediction_data.append({
    post_data = [      "time_start": prediction['time_start'],
 {             "algorithmtime_configduration": environmentprediction['algorithm']['cfg_nametime_duration'],
            "probability": prediction['probability'],
            "algorithmintensity_run_idmin": environmentprediction['runtime']['run_idintensity_min'],
            "latintensity_hgmax": position_hg[0prediction['intensity_max'],
            "long_hgmeta": position_hg[1{
                "harp": prediction['harp'],
            "prediction_data    "nar": prediction_data['nar']
            },
            "source_data": [get_fc_id(row) for row in test_data]prediction['data']
        })
    post_data = {
        "algorithm_config": environment['algorithm']['cfg_name'],
        "algorithm_run_id": environment['runtime']['run_id'],
  "time_start      "prediction_data": timeprediction_startdata,
        }
    "source_data": [get_fc_id(row) for row in test_data]
    response }

    response = requests.post('http://localhost:8004/prediction/bulkpredictionset', json=post_data).json()
    if response['has_error'] == True:
        raise ValueError('An error occurred while storing the algorithm\'s prediction set:\n%s' % response['error'])
    return response['data']

The post_data structure is equivalent to the algorithm_config_data or prediction_data definitions as given by the routes /algoconfig/{name} and /prediction/bulk:

Image ModifiedImage Removed

Image Added

  • Every algorithm_configuration_data needs at least an algorithm_run_id and
    config_data attribute.
  • Every specific configuration value has to be added to the config_data attribute and has
    to be a key-value pair.
  • Every prediction_dataset needs at least an algorithm_configuration,
    algorithm_run_id, source_data and prediction_data, source_data and
    time_start attribute.
  • The prediction_data represents a list of predictions, where each prediction needs
    at least a time_start, time_duration, probability, intensity_min, intensity_max
    and data attribute.
  • Every specific prediction value has to be added to the prediction_ data attribute and has to be a
    a key-value pair.

Source Code

...

IDDescriptionSolution
1

TypeError while calling requests.post() command, e.g.:

  • TypeError: request() got an unexpected keyword argument 'json'

This error occurs with older versions of requests < 2.4.2.

[A] Update requests (recommended):

  pip install requests --upgrade

[B] Manually set the header definitions while calling 'my_url' with 'my_json_data':

Code Block
languagepy
response = requests.post(
    my_url, data=json.dumps(my_json_data),
    headers={"content-type": "application/json"}
).json()
 2

Serialization of complex data structures:

Simple data structures (e.g. a dictionary of dictionaries or arrays) are
directly serializable by the json module and do not need any conversion
(see code example above). However, with more complex structures the
following errors may occur:

  • UnicodeDecodeError: 'utf-8' codec can't decode byte 0x... in position ...: invalid start byte
  • Error message from the prediction service:
    <class 'psycopg2.DataError'>: unsupported Unicode escape sequence
    DETAIL: \u0000 cannot be converted to text.

In both cases we try to serialize characters which are not supported either
by the codec (e.g. utf-8) or the database.

[A] Using the bas64 codec we can 'transform' each character into a simple ASCII representation (escaping all special characters):

Code Block
languagepy
import base64
import pickle
...
p_object = pickle.dumps(my_object, 02)  # serializes my_object, using protocol "2"
d_object = base64.b64encode(p_object).decode('ascii')  # encodes serialized object
post_data = {
    "algorithm_run_id": environment['runtime']['run_id'],
    "config_data": {"my_object": d_object},
    "description": "..."
}
response = requests.post(
    'http://localhost:8004/algoconfig/%s' % environment['algorithm']['cfg_name'],
    json=post_data
).json()
  
# check wherever upload was successful (see example code above)
  
response = requests.get(
    'http://localhost:8004/algoconfig/list?algorithm_config_name=%s'
    % environment['algorithm']['cfg_name']
).json() 
d_object = response['data'][0]['config_data']['my_object']
p_object = base64.b64decode(d_object)  # decodedecodes serialized object
my_object = pickle.loads(p_object)  # deserializedeserializes my_object
Info

In the example above we pickle my_object using protocol 2 instead of the default protocol 0 (see line 4). Protocol 2 is hereby recommended especially for new-style classes, which are used by most modern libraries.

Info

This page was adopted from Ingest property data in database (REST API).