Ingest prediction data in database (REST API)

This short tutorial shows how to store data into the property service with python and requests (see also Access to REST-Services in Python).

For IDL you can take a similar rout by adapting these instructions to Access to REST-Services in IDL.

The Prediction Service

The prediction service represents a web interface which allows to request, insert and modify prediction data from the database. Operations are performed by sending URL requests, whereas each operation is well defined as a so called route. The property service comes along with a graphical user interface at http://localhost:8004/ui/ which provides visual access to all available routes. Hereby, all routes involving the insertion or modification of data are enlisted under the Edit section.

For this tutorial we are using only two routes. One to add a new machine learning configuration and one to add the consequential prediction.

/algoconfig/{name}
/prediction/bulk

Each route can hold up to three different parameter types which are described within the article "Ingest property data in database (REST API)".

Implementation

Prepare

In our example we run a machine learning algorithm which produces a flare prediction to store within our database. Hereby, the algorithm consists of a training phase and a test or prediction phase. Within the training phase the algorithm learns and tunes its parameters which then can be stored within the database as a configuration for later use. Afterwards, within the test phase we use this configuration to compute flare predictions which are also stored within the database. The following code shows two corresponding functions.

def train_model(model, train_data, validation_data, max_epoches, batch_size, environment):
    # train model (e.g. until max_epoches are reached or validation loss increases)
	model.train(train_data, validation_data, max_epoches, batch_size)
	# store model parameters within database
    post_data = {
        "algorithm_run_id": environment.r_id,
        "config_data": model.get_parameters(),
        "description": {},
    }
    requests.post('http://localhost:8004/algoconfig/%s' % cfg_name, data=post_data)

def test_model(model, test_data, environment):
    # test model (e.g. predict the test_data)
	(time_start, position_hg, prediction_data) = model.get_prediction()
	# store predictions within database
    post_data = [
        {
            "algorithm_config": environment['algorithm']['cfg_name'],
            "algorithm_run_id": environment['runtime']['run_id'],
            "lat_hg": position_hg[0],
            "long_hg": position_hg[1],
            "prediction_data": prediction_data,
            "source_data": [get_fc_id(row) for row in test_data],
            "time_start": time_start
        }
    ]
    requests.post('http://localhost:8004/prediction/bulk', data=post_data)

Integration

Given the two above functions we can now define our algorithm's workflow.

import json
import requests

# Setup environment
environment = {}
with open("params.json") as params_file:
    environment = json.loads(params_file.read())

# Download regions and properties
response = requests.get(
    'http://localhost:8002/region/%s/list?cadence=%s&time_start=between(%s,%s)'
    % (
        environment['algorithm']['dataset'],
        environment['algorithm']['cadence'],
        environment['algorithm']['time_start'],
        environment['algorithm']['time_end']
    )
).json()
if response['has_error'] == True:
    raise ValueError('No data found!')
else:
    # if we already have an algorithm configuration stored within the database,
    # we do not need to extract any train not validation data
    (train_data, validation_data, test_data) = createDataPartitions(response['data'])

# Setup model
model = MyMLAlgorithm(envirnoment['algorithm']['params'])

# Check wherever we have to train our algorithm or if we can download an already existing configuration
response = requests.get(
    'http://localhost:8004/algoconfig/list?algorithm_config_name=%s&algorithm_config_version=%s'
    % (environment['algorithm']['cfg_name'], 'latest')
).json()
if response['has_error'] == False and response['result-count'] > 0:
    # as we requested the latest configuration we expect only one result within 'data'
    algo_cfg = response['data'][0]
    model.set_parameters(algo_cfg)
else:
    train_model(
        model, train_data, validation_data,
        envirnoment['algorithm']['max_epoches'], envirnoment['algorithm']['batch_size'],
        environment
    )

Now the problem is that the ml_result is not structured in the way that the prediction service would understand it. So we have to restructure it to accomplish following definition:

Every configuration set needs at least a name attribute.
Every specific configuration has to be added to the config_data attribute and has to be a key-value pair.
Every specific prediction has to be added to the prediction_data attribute and has to be a key-value pair.

This would then look like this:

# define data to store
post_data = {
    'name': 'ml-multilayer-perceptron',
    'timestamp': '2016-01-21T17:27:59.001Z',
    'config_data': ml_configurations,
    'prediction_data': {
        'time-left': '32h12m04s',
        'position': {
            'lat_hg': 14.0,
            'long_hg': -3.4
        },
        'probability': '88%',
        'class': 'M'
    }
}

Ingest

Now the ingest of this post_data is very simple. We first have to add the new dataset to the prediction service and then add a new configuration set.

# add dataset
print('creating dataset...')
requests.post("http://localhost:8004/dataset/bulk", json=ml_datasets)

# add configuration set
print('storing data...')
requests.post("http://localhost:8004/configset/%s" % ml_datasets[0]['name'], json=post_data)

The addresses of the routes are the same we looked up before.

Retrieve

Now to check if everything worked we should retrieve all the configuration sets which are stored under this dataset. This can be done with a GET request.

# retrieving data
print('downloading all data...')
ml_config_sets = requests.get("http://localhost:8004/configset/%s/list" % ml_datasets[0]['name']).json()

print(ml_config_sets)

Source Code

Here you can download the full python source code.

prediction_request_ingest.py

This page was adopted from Ingest property data in database (REST API).

params_file

Space shortcuts

Page tree