Page tree
Skip to end of metadata
Go to start of metadata

Elements retrieved from the 'property database' are structured as in the example below.

I'd ask WP2 members to edit and comment the lines below adding the following information for each key :

(1) if the key contains the value of the predictor (hence it must be used by ML algorithms),

(2) if the key contains relevant information which 'can' be used 'as' a predictor (for example the standard deviation 'sigma' can be used but it is not mandatory), or

(3) the key does not contain relevant information and it should not be used for learning/prediction.

Below I commented some lines as an example :

{
  "data":[
    {
      "data":{
        "alpha_exp_cwt_blos":{
          "alpha":-0.9320463538169861,             <---- value of the predictor               USE
          "fit_r":-0.9955066442489624,             <---- is a parameter ?                     DON'T USE
          "sigma":0.0149855138733983               <---- standard deviation of the predictor  IT CAN BE USED
        },
        "alpha_exp_cwt_br":{
          "alpha":-0.8825041055679321,              
          "fit_r":-0.9950829744338989,
          "sigma":0.01484755612909794
        },
        "alpha_exp_cwt_btot":{
          "alpha":-1.000222444534302,
          "fit_r":-0.9861923456192017,
          "sigma":0.02839041128754616
        },
        "alpha_exp_fft_blos":{
          "alpha":-1.027253031730652,
          "fit_r":-0.8860970735549927,
          "sigma":0.0676979273557663
        },
        "alpha_exp_fft_br":{
          "alpha":-0.9775007963180542,
          "fit_r":-0.8845617175102234,
          "sigma":0.06493799388408661
        },
        "alpha_exp_fft_btot":{
          "alpha":-0.8710989952087402,
          "fit_r":-0.8559461832046509,
          "sigma":0.06629730015993118
        },
        "beff_blos":{
          "beff":0.0,                               <---- USE
          "err_sep_length":0.0,                     
          "err_signed_flux":0.0,                    
          "sep_length":0.0,                         
          "signed_flux":0.0                         
        },
        "beff_br":{
          "beff":0.0,                               <---- USE
          "err_sep_length":0.0,
                     <---- USE
          "err_signed_flux":0.0,
          "sep_length":0.0,
          "signed_flux":0.0
        },
        "decay_index_blos":{
          "l_over_minhmin":0.0,                     <---- USE (Ratio of L to hmin for an MPIL segment of which hmin is lowest, calculated from Blos) 
          "maxl_over_hmin":0.0,                      <---- USE (Ratio of L to hmin for an MPIL segment of which L is longest, calculated from Blos)
          "max_l_over_hmin":0.0,                     <---- USE (Maximum of L/hmin ratios of all MPILs, calculated from Blos)
          "tot_l_over_hmin":0.0                      <---- USE (Sum of L/hmin ratios of all MPILs, calculated from Blos)
        },
        "decay_index_br":{
          "l_over_minhmin":0.0,                     <---- USE (Ratio of L to hmin for an MPIL segment of which hmin is lowest, calculated from Br)
          "maxl_over_hmin":0.0,                      <---- USE (Ratio of L to hmin for an MPIL segment of which L is longest, calculated from Br)
          "max_l_over_hmin":0.0,                     <---- USE (Maximum of L/hmin ratios of all MPILs, calculated from Br)
          "tot_l_over_hmin":0.0                      <---- USE (Sum of L/hmin ratios of all MPILs, calculated from Br)
        },
        "flare_association":{
          "f_etime_tau":null,
          "f_mag":"",
          "f_ptime_tau":null,
          "f_stime_tau":null
        },
        "ising_energy_blos":{
          "ising_energy":17.4415225982666,           <---- USE
          "num_neg":3497.0,
          "num_pos":49.0
        },
        "ising_energy_br":{
          "ising_energy":0.0,                        <---- USE
          "num_neg":284.0,
          "num_pos":0.0
        },
        "ising_energy_part_blos":{
          "ising_energy_part":0.002879713661968708,  <---- USE
          "num_neg":4.0,
          "num_pos":7.0
        },
        "ising_energy_part_br":{
          "ising_energy_part":0.0003076923021581024, <----USE
          "num_neg":1.0,
          "num_pos":1.0
        },
        "mpil_blos":{
          "max_length":0.0,                     <---- USE (Length of the longest MPIL segment calculated from Blos)
          "tot_length":0.0,                     <---- USE (Sum of MPIL segments calculated from Blos)
          "tot_usflux":0.0                      <---- USE (Total unsigned flux around MPIL segments calculated from Blos)
        },
        "mpil_br":{
          "max_length":0.0,                     <---- USE (Length of the longest MPIL segment calculated from Br)
          "tot_length":0.0,                     <---- USE (Sum of MPIL segments calculated from Br)
          "tot_usflux":0.0                      <---- USE (Total unsigned flux around MPIL segments calculated from Br)
        },
        "nn_currents":{
          "err_inet":91278491054615040000,
          "err_ipn_nn":4.715529918670654,
          "err_its":[
            64917928477104010000,
            64167164345472520000,
           ... many zeros here ....
          ],
          "err_its_pot":[
            64917928477104010000,
            64167164345472520000,
           ... many zeros here ....
          ],
          "err_tot_neg":0.0,
          "err_tot_pos":0.0,
          "err_tot_us_cur":0.0,
          "flimb":0.1071751043200493,
          "iimb":[
            1.0,
            1.0
          ],
          "ipn_nn":1.0,
          "its":[
            -75480620025799250000,
            -27374923626785540000,
           ... many zeros here ....
          ],
          "its_pot":[
            -3359994009907888000,
            881065811530219500,
           ... many zeros here ....
          ],
          "net_curr":0.0,
          "num_currents":2.0,
          "tot_neg":0.0,
          "tot_pos":0.0,
          "tot_us_cur":0.0                  <---- USE (total unsigned non-neutralized current)
        },
        "r_value_blos_logr":0.0,            <---- USE (R-value for LOS component)
        "r_value_br_logr":0.0,              <---- USE (R-value for radial component)
        "srs":{
          "area":0,                         <---- USE (Total area of active region sunspots)
          "dlong_hg":0,                     <---- USE (Longitudinal extend of active region sunspots)
          "mcint_com":"",                   <---- IT CAN BE USED IF AVAILABLE (McIntosh active region sunspot class: third component)
          "mcint_pen":"",                   <---- IT CAN BE USED IF AVAILABLE (McIntosh active region sunspot class: second component)
          "mcint_zur":"",                   <---- IT CAN BE USED IF AVAILABLE (McIntosh active region sunspot class: first component)
          "mtwil_class":"",                 <---- IT CAN BE USED IF AVAILABLE (Mount Wilson active region sunspot magnetic class)
          "n_spots":0                       <---- USE (Total number of active region sunspots)
        },
        "wlsg_blos":{
          "tot_len_pil":0.0,                <---- DON'T USE
          "value_int":0.0,                  <---- USE (Value calculated as in Falconer et. al., 2008)
          "value_tot":0.0                   <---- CAN BE USED.
        },
        "wlsg_br":{
          "tot_len_pil":0.0,
          "value_int":0.0,
          "value_tot":0.0
        }, 
        "flow_field_bvec":{
          "v_mean":                        <---- USE
          "v_median":                      <---- USE
          "vz_mean":                       <---- USE
          "vz_max":                        <---- USE
          "diver":                         <---- USE
          "conver":                        <---- USE
          "shear":                         <---- USE
          "diver_mean":                    <---- USE
          "conver_mean":                   <---- USE
          "shear_mean":                    <---- USE
          "diver_max":                     <---- USE
          "conver_max":                    <---- USE
          "shear_max":                     <---- USE
          "w_diver":                       <---- USE
          "w_conver":                      <---- USE
          "w_shear":                       <---- USE
          "w_diver_mean":                  <---- USE
          "w_conver_mean":                 <---- USE
          "w_shear_mean":                  <---- USE
          "w_diver_max":                   <---- USE
          "w_conver_max":                  <---- USE
          "w_shear_max":                   <---- USE
        },
        "helicity_energy_bvec":{
          "pos_dhdt_in":
          "abs_neg_dhdt_in":
          "abs_tot_dhdt_in":               <---- USE
          "tot_uns_dhdt_in":               <---- USE
          "pos_dhdt_sh":
          "abs_neg_dhdt_sh":
          "abs_tot_dhdt_sh":               <---- USE
          "tot_uns_dhdt_sh":               <---- USE
          "abs_tot_dhdt":                  <---- USE
          "abs_tot_dhdt_in_plus_sh":       <---- USE
          "tot_uns_dhdt":                  <---- USE
          "pos_dedt_in":               
          "abs_neg_dedt_in":
          "abs_tot_dedt_in":               <---- USE
          "tot_uns_dedt_in":               <---- USE
          "pos_dedt_sh":
          "abs_neg_dedt_sh":
          "abs_tot_dedt_sh":               <---- USE
          "tot_uns_dedt_sh":               <---- USE
          "abs_tot_dedt":                  <---- USE
          "abs_tot_dedt_in_plus_sh":       <---- USE
          "tot_uns_dedt":                  <---- USE
        }
      },
      "gs_slf" :{
"g_s": 0.0, <---- USE
"slf": 0.0, <---- USE
"d_l_f": 0.0,
"weight_cent": 0.0,
"lead_cent": 0.0,
"foll_cent": 0.0,
"fit_coeff": [0.,0.]
      }, 
      "frdim_Blos":{
"frdim": 0.0, <---- USE
"frdim_err": 0.0
},
      "frdim_Br":{
"frdim": 0.0, <---- USE
"frdim_err": 0.0
},
      "frdim_Btot":{
"frdim": 0.0, <---- USE
"frdim_err": 0.0
},
      "sfunction_Blos":{
     "zq": [0.,..0.,],                                   <---- USE zq(5)
     "zq_err" : [0.,..0.,],
     "q": [0.,..0.,],
     "sf":[0.,..0.,],          
     "rd":[0.,..0.,]
     },
      "sfunction_Br":{
     "zq": [0.,..0.,],                                   <---- USE zq(5)
     "zq_err" : [0.,..0.,],
     "q": [0.,..0.,],
     "sf":[0.,..0.,],          
     "rd":[0.,..0.,]
     },
      "sfunction_Btot":{
     "zq": [0.,..0.,],                                   <---- USE zq(5)
     "zq_err" : [0.,..0.,],
     "q": [0.,..0.,],
     "sf":[0.,..0.,],          
     "rd":[0.,..0.,]
     },
      "mf_spectrum_Blos":{
"dq": [0.,..,0], <---- USE dq(0) and dq(7)
"dq_err": [0.,...,0.],
"q": [0.,...,0.],
"alpha":[0.,...,0.],
"alpha_err":[0.,...,0.],
"falpha": [0.,...,0.],
"falpha_err":[0.,...,0.]
       },
      "mf_spectrum_Br":{
"dq": [0.,..,0], <---- USE dq(0) and dq(7)
"dq_err": [0.,...,0.],
"q": [0.,...,0.],
"alpha":[0.,...,0.],
"alpha_err":[0.,...,0.],
"falpha": [0.,...,0.],
"falpha_err":[0.,...,0.]
       },
      "mf_spectrum_Br":{
"dq": [0.,..,0], <---- USE dq(0) and dq(7)
"dq_err": [0.,...,0.],
"q": [0.,...,0.],
"alpha":[0.,...,0.],
"alpha_err":[0.,...,0.],
"falpha": [0.,...,0.],
"falpha_err":[0.,...,0.]
       },
      "helicity_Br":{
"pos_dhdt_s":
       "abs_neg_dhdt_s": 
       "abs_tot_dhdt_s": <---- USE
       "tot_uns_dhdt_s": <---- USE
       "pos_dhdt_m":
       "abs_neg_dhdt_m": 
       "abs_tot_dhdt_m": <---- USE
       "tot_uns_dhdt_m": <---- USE
       "abs_tot_dhdt": <---- USE
       "abs_tot_dhdt_s_plus_m": <---- USE
       "tot_uns_dhdt": <---- USE
       },
      "null_point":{
       "n_np": <---- USE
       "n_np_2_10": <---- USE
       "n_np_10_100": <---- USE
       },
           "fc_id":"flarecast-production_01-00000000-0000-0000-0000-000000000046",
      "lat_hg":18.4696006445173,
      "long_carr":89.92517737846,
      "long_hg":22.05396917846,
      "meta":{
        "harp":1678,
        "nar":null,
        "npr":null
      },
     "sharp_kw": { 
 "usflux": { 
 "ave": 688538714166526000, 
 "kurtosis": "-NaN", <---------------- DON'T USE FOR NOW
 "max": 3372120798528537000, 
 "median": 558996418845474800, 
 "skewness": "-NaN", <---------------- DON'T USE FOR NOW
 "stddev": "Infinity", <---------------- DON'T USE FOR NOW
  "total": 1.210726444314045e+22 } , 
 "sflux": { 
          "ave": 546770433660158000, 
 "kurtosis": "-NaN", <---------------- DON'T USE FOR NOW
  "max": 1617281274022461000, 
 "median": 506645131194007600, 
 "skewness": "-NaN", <---------------- DON'T USE FOR NOW
 "stddev": 220851805385392100, <---------------- DON'T USE FOR NOW
 "total": 324234860769562300000 } 
 },
      "time_start":"2013-09-23T00:00:02+00:00"
    }

 

*MPIL: Magnetic polarity inversion line

*hmin: Minimum height at which decay index of potential magnetic field becomes greater than the critical value (i.e., 1.5 in this calculation) for torus instability

*L: Length of an MPIL segment

  • No labels

3 Comments

  1. Federico Benvenuto Federica Sciacchitano Vittorio Latorre

    Everything in "sharp_kw" that is not a NaN can be used.

  2. Are there some features which often take NaN values? If yes, please read the following and let us now.

    Concerning NaN occurrences, our data (WP2 outcomes) handling works in the following way :

    after having read all the feature values you marked as USE, we store them in a matrix n x p (n=number of "point in time active region", p=number of features).

    In this matrix:

    • each row contains a "list" of feature values associated with a given point-in-time active region,
    • each column contains a "list" of values of a given feature.

    After having built this matrix, we need to clean it since it can contain:

    • some inconsistent data such as zero-rows (even associated to a flare!) or NaN-columns
    • NaN values somewhere which can't be used by the large majority of the algorithms

    Our strategy to manage these forms of non-compliance is to DISCARD first the bad-columns and after, if any the bad-rows according with the following rules:

    First, clean bad-columns :

    1) if a column (and then the associated feature) contains all NaN values it is discarded

    2) if the values of a column are all the same, the column (and then the associated feature) is discarded

    Second, clean bad-rows :

    3) if a row contains at least 1 NaN value, it (and then the corresponding point-in-time active region) is discarded.

    4) if a row contains all zero values, it (and then the corresponding point-in-time active region) is discarded.

    Pleas, be aware that if an algorithm often produces NaN values for a given feature, we risk to discard many point-in-time active regions.

    Then, it should be better to discard the 'bad' feature column at the very beginning.

    In conclusion, if you know in advance that there are features which often take NaN values, please let us now.
     

     

    1. One of my AR feature extraction algorithms can produce parameters with an empty string, which is the solar region summary (SRS) property extraction algorithm.
      Among SRS parameters, the following four parameters (mtwil_class, mcint_zur, mcint_pen and mcint_com), which are related to McIntosh sunspot classification and hale magnetic classification, will have empty string values if a given HARP doesn't have its corresponding NOAA active region number (NAR). 
      However, as I understood, we may not consider HARPs without NAR for flare prediction in the framework of FLARECAST.