Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

File names for trained models:

In order to save the trained model with a name decided by the user, it is sufficient to write it in the configuration json file in the field "description":  SHAUN: I think that we need to make this as human understandable as possible since there will be a lot of similarly trained models for each algorithm (e.g., "algorithm_abovec_24_0_00_Br_all" for system trained for flares above C1.0 over 24-hr windows at 0-hr latency issued from 00:00UT SHARPs using Br properties that correspond to all available).

"algorithm":{ "phase": "training",                    <--- do not touch
                    "config_name": "HybridLasso",  <--- do not touch
                    "description": "HybridLasso_test",   <---- HERE
                    "HybridLasso": true,                    <--- do not touch

In the prediction step, the name that was used in the training phase needs to be reported in the configuration json file, this time in the "config_name" field :

"algorithm": {
                   "phase": "execution",                         <--- do not touch
                   "config_name": "HybridLasso_test",  <---- HERE
                   "description": "HybridLasso"
}

The writing of the trained model is working locally, let see if other fixings are needed once we run them on the cluster.

Core Training Configuration Workflow

  1. Integrate prediction algorithm X into the infrastructure (personally I don’t mind which, but Michele’s suggestion of 1 supervised and 1 unsupervised is good)
  2. The training time range to be used will be from 00:00:00 UT on 14-Sep-2012 (start of SHARP NRT data availability) to 23:59:59 UT on 31-Dec-2015
    1. Concern was raised previously about number of features for training, so a large training time range seems to be preferred
    2. This allows for the testing time range to be from 00:00:00 UT on 1-Jan-2016 to <most recently processed time stamp>, but this will be performed under WP5
  3. Run integrated prediction algorithm with the following training configuration settings:Integrate next prediction
    1. Use only 00:00 UT time stamps (to avoid SDO 24-hr periodic orbital effects)
    2. 24-hr forecast window
    3. 0-hr latency
    4. create separate individual training configuration files separately using Blos properties and Br properties for flaring levels of:
      1. C-class only (i.e., >= C1.0 and < M1.0)
      2. M-class only (i.e., >= M1.0 and < X1.0)
      3. X-class only (i.e., >= X1.0)
      4. Above M-class (i.e., >= M1.0)
      5. Above C-class (i.e., >= C1.0)
  4. Run all 5 training configuration parameter files
  5. Write variables of all 5 trained prediction models into Prediction Configuration DB
    1. Personally, I’m not sure if separate entries or a grouped JSON entry is better
      1. algorithm
    and repeat steps 2–5
  6. Prediction DB can be filled for each integrated prediction algorithm by launching all 5x trained prediction models for that algorithm on the chosen testing time range
    1. NOTE: the SDO/HMI image alignment bug from 13-Apr-2016 onwards will limit the availability of properties to make predictions from, until the replacement HMI data are available (UPSud is monitoring and downloading when available)
  7. Broader WP5 validation can be explored by choosing different durations of forecast window and repeating steps 2–5 and 7 for all integrated prediction algorithms

File name for trained model:

...

      1. _cclass_24_0_00_

...

      1. Blos_all and algorithm_cclass_24_0_00_Br_all
      2. M-class only (i.e., >= M1.0 and < X1.0) algorithm_mclass_24_0_00_

...

      1. Blos_all

...

      1. and algorithm_

...

      1. mclass_24_0_00_Br_all
      2. X-class only (i.e., >= X1.0) algorithm_

...

      1. xclass_24_0_00_

...

      1. Blos_all

...

      1. and algorithm_

...

      1. xclass_24_0_00_Br_all

and the same algorithm using Blos data:

...

      1. Above M-class (i.e., >= M1.0) algorithm_abovem_24_0_00_Blos_all

...

      1. and algorithm_

...

      1. abovem_24_0_00_

...

      1. Br_all

algorithm_xclass_24_0_00_Blos_all

...

      1. Above C-class (i.e., >= C1.0)algorithm_abovec_24_0_00_Blos_all

...

      1. and algorithm_abovec_24_0_00_

...

      1. Br_all
    1. There is no explicit reason why we should use all properties, so another set of 10x configuration files should be prepared and run for a reduced "optimized"/"feature selected" property set

...

    1. NOTE: there may be no point doing X-class, given the rarity of their occurrence in the training time period.

 

"algorithm":{ "phase": "training",                    <--- do not touch
                    "config_name": "HybridLasso",  <--- do not touch
                    "description": "HybridLasso_test",   <---- HERE
                    "HybridLasso": true,                    <--- do not touch

 

In the prediction step, the name that was used in the training phase needs to be reported in the configuration json file, this time in the "config_name" field :

"algorithm": {
                   "phase": "execution",                         <--- do not touch
                   "config_name": "HybridLasso_test",  <---- HERE
                   "description": "HybridLasso"
}

 

The writing of the trained model is working locally, let see if other fixings are needed once we run them on the cluster.

 

Example Configuration JSON

 

  1. Run all 10 training configuration parameter files
  2. Write variables of all 10 trained prediction models into Prediction Configuration DB
  3. Integrate next prediction algorithm and repeat steps 2–5
  4. Prediction DB can be filled for each integrated prediction algorithm by launching all 10x trained prediction models for that algorithm on the chosen testing time range
    1. NOTE: the SDO/HMI image alignment bug from 13-Apr-2016 onwards will limit the availability of properties to make predictions from, until the replacement HMI data are available (UPSud is monitoring and downloading when available)
  5. Broader WP5 validation can be explored by choosing different durations of forecast window and repeating steps 2–5 and 7 for all integrated prediction algorithms

Example Configuration JSON

In this page we summarize the set of parameters for training the model.

Please update/change whatever is needed.

 

 

dataset": { "cadence":"24h",

...

              "first_flare_class":false
             },

SHAUN: Question to Marco/Dario - Does the following 'flare' structure need to be separate from the 'dataset' structure?
"flare":{"class":1,       <-- flare_class = {'A': 0.01, 'B': 0.1, 'C': 1, 'M': 10, 'X': 100} is this conversion table ok?

...