...
"algorithm": {
"phase": "execution", <--- do not touch
"config_name": "HybridLasso_test", <---- HERE
"description": "HybridLasso"
}
The writing of the trained model is working locally, let see if other fixings are needed once we run them on the cluster.
...
- Integrate prediction algorithm X into the infrastructure (personally I don’t mind which, but Michele’s suggestion of 1 supervised and 1 unsupervised is good)
- The training time range to be used will be from 00:00:00 UT on 14-Sep-2012 (start of SHARP NRT data availability) to 23:59:59 UT on 31-Dec-2015
- Concern was raised previously about number of features for training, so a large training time range seems to be preferred
- This allows for the testing time range to be from 00:00:00 UT on 1-Jan-2016 to <most recently processed time stamp>, but this will be performed under WP5
- Run integrated prediction algorithm with the following training configuration settings:
- Use 24-hr forecast window
- 0-hr latency
- use only 00:00 UT time stamps (to avoid SDO 24-hr periodic orbital effects)
- 24-hr forecast window
- 0-hr latency
- create individual training configuration files separately using Blos properties and Br properties for flaring levels of:
- C-class only (i.e., >= C1.0 and < M1.0)
algorithm_cclass_24_0_00_Blos_all and
algorithm_cclass_24_0_00_Br_all - M-class only (i.e., >= M1.0 and < X1.0)
algorithm_mclass_24_0_00_Blos_all and all
algorithm_mclass_24_0_00_Br_all - X-class only (i.e., >= X1.0)
algorithm_xclass_24_0_00_Blos_all and all
algorithm_xclass_24_0_00_Br_all - Above M-class (i.e., >= M1.0)
algorithm_abovem_24_0_00_Blos_all and
algorithm_abovem_24_0_00_Br_all - Above C-class (i.e., >= C1.0)
algorithm_abovec_24_0_00_Blos_all and
algorithm_abovec_24_0_00_Br_all
- C-class only (i.e., >= C1.0 and < M1.0)
- There is no explicit reason why we should use all properties (but is worth doing for completeness), so another set of 10x configuration files should be prepared and run for a reduced set of "optimizedoptimized"/"feature selected" property setproperties
- C-class only (i.e., >= C1.0 and < M1.0)
algorithm_cclass_24_0_00_Blos_opt
algorithm_cclass_24_0_00_Br_opt - M-class only (i.e., >= M1.0 and < X1.0)
algorithm_mclass_24_0_00_Blos_opt
algorithm_mclass_24_0_00_Br_opt - X-class only (i.e., >= X1.0)
algorithm_xclass_24_0_00_Blos_opt
algorithm_xclass_24_0_00_Br_opt - Above M-class (i.e., >= M1.0)
algorithm_abovem_24_0_00_Blos_opt
algorithm_abovem_24_0_00_Br_opt - Above C-class (i.e., >= C1.0)
algorithm_abovec_24_0_00_Blos_opt
algorithm_abovec_abovec_24_0_00_Br_opt
- C-class only (i.e., >= C1.0 and < M1.0)
- NOTE: there may be no point doing X-class only for either of these all or "optimized"/"feature selected" property set cases, given the their rarity of their occurrence in the training time period.
- NOTE: can run configurations with a combination of Blos and Br properties (possible filename tag Bmix) that would need to be run for all and "optimized"/"feature selected" property sets both with their own flaring level scenarios – therefore creating another 10 configuration files (or 8 if "X-class only" cases left out)
- Run all 20 training configuration parameter files (or 16 if "X-class only" cases left out)
- Write variables of all 10 20 trained prediction models into Prediction Configuration DB (or 16 if "X-class only" cases left out)
- Integrate next prediction algorithm and repeat steps 2–5
- Prediction DB can be filled for each integrated prediction algorithm by launching all 10x 20x (or 16x) trained prediction models for that algorithm on the chosen testing time range
- NOTE: the SDO/HMI image alignment bug from 13-Apr-2016 onwards will limit the availability of properties to make predictions from, until the replacement HMI data are available (UPSud is monitoring and downloading when available)
- Broader WP5 validation can be explored by choosing different durations of forecast window and repeating steps 2–5 and 7 for all integrated prediction algorithms
...
Please update/change whatever is needed.
"flare_history_window": 24 <--- we add this field to set the time interval (in hours) in which we check the occurance of a flare in the past.
It works if at least one item of the following dictionary is set to true
"flare_history_features": {"flare_past": true, "flare_index_past": true}
dataset": { "cadence":"24h",
...
"window":24,
"latency":0, <-- SHAUN: Question to Marco/Dario - Do these need to be present to filter same-format predictions (e.g., for ensemble forecasting)?
"issuing":"00" <-- SHAUN: Question to Marco/Dario - Do these need to be present to filter same-format predictions (e.g., for ensemble forecasting)?
},
Cristina: D. Shaun Bloomfield we added the "latency" variable to the flarecast engine (the default value was set to "0"). In order to add also the "issuing" variable, could you please tell me what quantity is represented by it?
Shaun: Cristina Campi I was thinking that this would be a good way to capture the UT time of the SHARPs being used. In this sense, it corresponds to the current implementation of "cadence":"24h" in the "dataset" structure, but would be more human-interpretable for the description of the training configuration (and appearance in its filename).
Cristina: D. Shaun Bloomfield, Ok, I am sorry but I am not sure I understood correctly: does it substitute the cadence field we use so far (i.e. it can assume values ike "24h", "12h" and so on) or is it a list of UT time to be used (for example "00,03,06,09,12,15,18,21" for a 3-hour cadence?)
Shaun: Cristina Campi, due to the SDO orbital periodic effects I don't think that we can ever combine properties across difference UT times. I see it as replacing the "cadence" tag, with "issuing":"00" being implemented in the engine as a request for cadence=24h in the property DB reading.
Cristina: D. Shaun Bloomfield For now I am going to use "issuing" to replace the "cadence" field (sorry to keep bothering you, just to be sure I understood for future develpment/use: for now we set "issuing" to "00" and it corresponds to the "24h" cadence. If we want a "12h" cadence which value do we need to set? "issuing: 12"? How does it interact with the starting time?)
Shaun: Cristina Campi not a problem, these are important thoughts and questions to have! Just a random thought off the top of my head - I would think that a 12-hr cadence forecast would have to be a combination of, e.g., a "window":12, "issuing":"00" trained system and a second "window":12, "issuing":"12" trained system. Right now, if "issuing" is set to "XX" and we leave "window":24 then this means that the engine would need to filter the full property DB for the occurrence of timestamps of XX:00 UT and build a 24-hr forecast training based on just that different SHARP timestamp. For now, we do not need to worry about this because our focus is on 24-hr forecasts with 0-hr latency from 00:00 UT - I just want there to be the algorithmic structure available to parameterize these changes in case we have time to do it.
Please select here all the properties you want to take into account
...