In the present paper we report a case–study regarding how to optimally select a subset of variables among a given pool and on which to base a model for the prediction of short-term solar power generation. We will conduct our study without assuming any prior knowledge on the set of variables and in particular on what of each variable represents, so that the choice is made according to some formal criterion.
A central task when one is to forecast some energy market related quantities, such as spot prices or power generation, is the selection of an appropriate pool of variables on which to base a prediction algorithm. This task is without any doubt a key task that has to be carefully addressed, nevertheless it appears that literature is extremely poor as regard this central topic. In fact, any regression model need a suitable set of regression variables in order to produce accurate results. Even if one wants to employ the most recent techniques based on neural networks and machine learning algorithms, one has to tune the model on the most appropriate set of regression variables; a too poor number of variables may leads to a bad performing method whereas on the contrary a too rich set of variables may lead to overfit.