File names for trained models:
In order to save the trained model with a name decided by the user, it is sufficient to write it in the configuration json file in the field "description": SHAUN: I think that we need to make this as human understandable as possible since there will be a lot of similarly trained models for each algorithm (e.g., "algorithm_abovec_24_0_00_Br_all" for system trained for flares above C1.0 over 24-hr windows at 0-hr latency issued from 00:00UT SHARPs using Br properties that correspond to all available).
"algorithm":{ "phase": "training", <--- do not touch
"config_name": "HybridLasso", <--- do not touch
"description": "HybridLasso_test", <---- HERE
"HybridLasso": true, <--- do not touch
In the prediction step, the name that was used in the training phase needs to be reported in the configuration json file, this time in the "config_name" field :
"algorithm": {
"phase": "execution", <--- do not touch
"config_name": "HybridLasso_test", <---- HERE
"description": "HybridLasso"
}
The writing of the trained model is working locally, let see if other fixings are needed once we run them on the cluster.
Core Training Configuration Workflow
- Integrate prediction algorithm X into the infrastructure (personally I don’t mind which, but Michele’s suggestion of 1 supervised and 1 unsupervised is good)
- The training time range to be used will be from 00:00:00 UT on 14-Sep-2012 (start of SHARP NRT data availability) to 23:59:59 UT on 31-Dec-2015
- Concern was raised previously about number of features for training, so a large training time range seems to be preferred
- This allows for the testing time range to be from 00:00:00 UT on 1-Jan-2016 to <most recently processed time stamp>, but this will be performed under WP5
- Run integrated prediction algorithm with the following training configuration settings:Integrate next prediction
- Use only 00:00 UT time stamps (to avoid SDO 24-hr periodic orbital effects)
- 24-hr forecast window
- 0-hr latency
- create separate individual training configuration files separately using Blos properties and Br properties for flaring levels of:
- C-class only (i.e., >= C1.0 and < M1.0)
- M-class only (i.e., >= M1.0 and < X1.0)
- X-class only (i.e., >= X1.0)
- Above M-class (i.e., >= M1.0)
- Above C-class (i.e., >= C1.0)
- Run all 5 training configuration parameter files
- Write variables of all 5 trained prediction models into Prediction Configuration DB
- Personally, I’m not sure if separate entries or a grouped JSON entry is better
- algorithm
- Prediction DB can be filled for each integrated prediction algorithm by launching all 5x trained prediction models for that algorithm on the chosen testing time range
- NOTE: the SDO/HMI image alignment bug from 13-Apr-2016 onwards will limit the availability of properties to make predictions from, until the replacement HMI data are available (UPSud is monitoring and downloading when available)
- Broader WP5 validation can be explored by choosing different durations of forecast window and repeating steps 2–5 and 7 for all integrated prediction algorithms
File name for trained model:
...
- _cclass_24_0_00_
...
- Blos_all and algorithm_cclass_24_0_00_Br_all
- M-class only (i.e., >= M1.0 and < X1.0) algorithm_mclass_24_0_00_
...
- Blos_all
...
- and algorithm_
...
- mclass_24_0_00_Br_all
- X-class only (i.e., >= X1.0) algorithm_
...
- xclass_24_0_00_
...
- Blos_all
...
- and algorithm_
...
- xclass_24_0_00_Br_all
and the same algorithm using Blos data:
...
- Above M-class (i.e., >= M1.0) algorithm_abovem_24_0_00_Blos_all
...
- and algorithm_
...
- abovem_24_0_00_
...
- Br_all
algorithm_xclass_24_0_00_Blos_all
...
- Above C-class (i.e., >= C1.0)algorithm_abovec_24_0_00_Blos_all
...
- and algorithm_abovec_24_0_00_
...
- Br_all
- There is no explicit reason why we should use all properties, so another set of 10x configuration files should be prepared and run for a reduced "optimized"/"feature selected" property set
...
- NOTE: there may be no point doing X-class, given the rarity of their occurrence in the training time period.
"algorithm":{ "phase": "training", <--- do not touch
"config_name": "HybridLasso", <--- do not touch
"description": "HybridLasso_test", <---- HERE
"HybridLasso": true, <--- do not touch
In the prediction step, the name that was used in the training phase needs to be reported in the configuration json file, this time in the "config_name" field :
"algorithm": {
"phase": "execution", <--- do not touch
"config_name": "HybridLasso_test", <---- HERE
"description": "HybridLasso"
}
The writing of the trained model is working locally, let see if other fixings are needed once we run them on the cluster.
Example Configuration JSON
- Run all 10 training configuration parameter files
- Write variables of all 10 trained prediction models into Prediction Configuration DB
- Integrate next prediction algorithm and repeat steps 2–5
- Prediction DB can be filled for each integrated prediction algorithm by launching all 10x trained prediction models for that algorithm on the chosen testing time range
- NOTE: the SDO/HMI image alignment bug from 13-Apr-2016 onwards will limit the availability of properties to make predictions from, until the replacement HMI data are available (UPSud is monitoring and downloading when available)
- Broader WP5 validation can be explored by choosing different durations of forecast window and repeating steps 2–5 and 7 for all integrated prediction algorithms
Example Configuration JSON
In this page we summarize the set of parameters for training the model.
Please update/change whatever is needed.
dataset": { "cadence":"24h",
...
"first_flare_class":false
},
SHAUN: Question to Marco/Dario - Does the following 'flare' structure need to be separate from the 'dataset' structure?
"flare":{"class":1, <-- flare_class = {'A': 0.01, 'B': 0.1, 'C': 1, 'M': 10, 'X': 100} is this conversion table ok?
...