This short tutorial shows how to store data into the property prediction service with python and requests (see also Access to REST-Services in Python).
For IDL you can take a similar rout approach by adapting these instructions to Access to REST-Services in IDL.
...
The prediction service represents a web interface which allows to request, insert and modify prediction data from the database. Operations are performed by sending URL requests, whereas each operation is well defined as a so called route. The property prediction service comes along with a graphical user interface at http://localhost:8004/ui/ which provides visual access to all available routes. Hereby, all routes involving the insertion or modification of data are enlisted under the Edit section.
For this tutorial we are using only two routes. One to add a new machine learning configuration and one to add the consequential prediction predictions.
- /algoconfig/{name}
- /prediction/bulkpredictionset
Each route can hold up to three different parameter types which are described within in details by the following article: "Ingest property data in database (REST API)".
Implementation
...
Ingest
In our example we run a machine learning algorithm which produces a set of flare prediction predictions to store within our database. Hereby, the algorithm consists of a training phase and a testing or prediction phase. Within the training phase the algorithm learns and tunes its parameters which are then can be stored within the database as a configuration for later use. Afterwards, within the prediction testing phase, we use this configuration to compute flare predictions which are also stored within the database.
The following code shows the two corresponding functions.
Code Block | ||||
---|---|---|---|---|
| ||||
def:
|
Implementation
Prepare
First of all you have to define the data you would like to store into the prediction service. In our example we are going to store some predictions from a machine learning algorithm into the dataset ml-algorithms.
Code Block | ||||
---|---|---|---|---|
| ||||
import requests algo_config_name = "my_ml_configuration" algo_config = {} train_model(model, train_data, validation_data, max_epoches, batch_size, environment): # train model (e.g. until max_epoches are reached or validation loss increases) model.train(train_data, validation_data, max_epoches, batch_size) # store model parameters within database post_data = { "algorithm_run_id": environment['runtime']['run_id'], "config_data": model.get_parameters(), "description": "" } response = requests.getpost('http://localhost:8004/algoconfig/list?algorithm_config_name=%s&algorithm_config_version=latest' % algo_config_nameenvironment['algorithm']['cfg_name'], json=post_data).json() if response['has_error'] == FalseTrue: and response['result-count'] > 0: # as we requested the latest configuration we expect only one result within 'data'raise ValueError('An error occurred while storing the algorithm\'s configuration:\n%s' % response['error']) algo_configreturn = response['data'][0] else def test_model(model, test_data, environment): # http://localhost:8004/algoconfig/list?algorithm_config_name=a #if |
Now the problem is that the ml_result is not structured in the way that the prediction service would understand it. So we have to restructure it to accomplish following definition:
- Every configuration set needs at least a name attribute.
- Every specific configuration has to be added to the config_data attribute and has to be a key-value pair.
- Every specific prediction has to be added to the prediction_data attribute and has to be a key-value pair.
This would then look like this:
Code Block | ||
---|---|---|
| ||
# define data to store post_data = { 'name': 'ml-multilayer-perceptron', 'timestamp': '2016-01-21T17:27:59.001Z', 'config_data': ml_configurations, 'prediction_data'test model (e.g. predict the test_data) model.run(test_data) # store predictions within database prediction_data = [] for prediction in model.get_predictions(): prediction_data.append({ "time_start": prediction['time_start'], "time_duration": prediction['time_duration'], "probability": prediction['probability'], "intensity_min": prediction['intensity_min'], "intensity_max": prediction['intensity_max'], "meta": { 'time-left': '32h12m04s' "harp": prediction['harp'], "nar": prediction['nar'] }, 'position': { "data": prediction['data'] }) post_data 'lat_hg': 14.0= { "algorithm_config": environment['algorithm']['cfg_name'], "algorithm_run_id": environment['runtime']['run_id'], "prediction_data": prediction_data, 'long_hg': -3.4 "source_data": [get_fc_id(row) for row in test_data] }, response = requests.post('http://localhost:8004/predictionset', json=post_data).json() 'probability': '88%', if response['has_error'] == True: raise 'class': 'M'ValueError('An error occurred while storing the algorithm\'s prediction set:\n%s' % response['error']) } } |
Ingest
Now the ingest of this post_data is very simple. We first have to add the new dataset to the prediction service and then add a new configuration set.
Code Block | ||
---|---|---|
| ||
# add dataset print('creating dataset...') requests.post("http://localhost:8004/dataset/bulk", json=ml_datasets) # add configuration set print('storing data...') requests.post("return response['data'] |
The post_data structure is equivalent to the algorithm_config_data or prediction_data definitions as given by the routes /algoconfig/{name} and /prediction/bulk:
|
|
Source Code
Here you can download the above code fragments as python code.
A more complete example using the above functions is given by the article 'Request prediction data from database (REST API)'.
Troubleshooting
ID | Description | Solution | |||||
---|---|---|---|---|---|---|---|
1 | TypeError while calling requests.post() command, e.g.:
This error occurs with older versions of requests < 2.4.2. | [A] Update requests (recommended): pip install requests --upgrade [B] Manually set the header definitions while calling 'my_url' with 'my_json_data':
| |||||
2 | Serialization of complex data structures: Simple data structures (e.g. a dictionary of dictionaries or arrays) are directly serializable by the json module and do not need any conversion (see code example above). However, with more complex structures the following errors may occur:
In both cases we try to serialize characters which are not supported either by the codec (e.g. utf-8) or the database. | [A] Using the bas64 codec we can 'transform' each character into a simple ASCII representation (escaping all special characters):
|
...
|
...
|
...
|
...
The addresses of the routes are the same we looked up before.
Retrieve
Now to check if everything worked we should retrieve all the configuration sets which are stored under this dataset. This can be done with a GET request.
...
language | py |
---|
...
|
...
|
...
|
...
Source Code
Here you can download the full python source code.
|
Info |
---|
This page was adopted from Ingest property data in database (REST API). |