Browsing by Subject "Prognose"

Now showing 1 - 11 of 11

Comparison of omics technologies for hybrid prediction
(2019) Westhues, Matthias; Melchinger, Albrecht E.
One of the great challenges for plant breeders is dealing with the vast number of putative candidates, which cannot be tested exhaustively in multi-environment field trials. Using pedigree records helped breeders narrowing down the number of candidates substantially. With pedigree information, only a subset of candidates need to be subjected to exhaustive tests of their phenotype whereas the phenotype of the majority of untested relatives is inferred from their common pedigree. A caveat of pedigree information is its inability to capture Mendelian sampling and to accurately reflect relationships among individuals. This shortcoming was mitigated with the advent of marker assays covering regions harboring causal quantitative trait loci. Today, the prediction of untested candidates using information from genomic markers, called genomic prediction, is a routine procedure in larger plant breeding companies. Genomic prediction has revolutionized the prediction of traits with complex genetic architecture but, just as pedigree, cannot properly capture physiological epistasis, referring to complex interactions among genes and endophenotypes, such as RNA, proteins and metabolites. Given their intermediate position in the genotype-phenotype cascade, endophenotypes are expected to represent some of the information missing from the genome, thereby potentially improving predictive abilities. In a first study we explored the ability of several predictor types to forecast genetic values for complex agronomic traits recorded on maize hybrids. Pedigree and genomic information were included as the benchmark for evaluating the merit of metabolites and gene expression data in genetic value prediction. Metabolites, sampled from maize plants grown in field trials, were poor predictors for all traits. Conversely, root-metabolites, grown under controlled conditions, were moderate to competitive predictors for the traits fat as well as dry matter yield. Gene expression data outperformed other individual predictors for the prediction of genetic values for protein and the economically most relevant trait dry matter yield. A genome-wide association study suggested that gene expression data integrated SNP interactions. This might explain the superior performance of this predictor type in the prediction of protein and dry matter yield. Small RNAs were probed for their potential as predictors, given their involvement in transcriptional, post-transcriptional and post-translational regulation. Regardless of the trait, small RNAs could not outperform other predictors. Combinations of predictors did not considerably improve the predictive ability of the best single predictor for any trait but improved the stability of their performance across traits. By assigning different weights to each predictor, we evaluated each predictors optimal contribution for attaining maximum predictive ability. This approach revealed that pedigree, genomic information and gene expression data contribute equally when maximizing predictive ability for grain dry matter content. When attempting to maximize predictive ability for grain yield, pedigree information was superfluous. For genotypes having only genomic information, gene expression data were imputed by using genotypes having both, genomic as well as gene expression data. Previously, this single-step prediction framework was only used for qualitative predictors. Our study revealed that this framework can be employed for improving the cost-effectiveness of quantitative endophenotypes in hybrid prediction. We hope that these studies will further promote exploring endophenotypes as additional predictor types in breeding.
Divergence, convergence, and the history-augmented Solow model
(2017) Kufenko, Vadim; Prettner, Klaus; Geloso, Vincent
We test the history-augmented Solow model with respect to its predictions on the patterns of divergence and convergence between the nowadays industrialized countries of the OECD. We show that the dispersion of incomes increased after the Indus- trial Revolution, peaked during the Second World War, and decreased afterwards. This pattern is fully consistent with the transitional dynamics implied by the history-augmented Solow model.
Encompassing tests for value at risk and expected shortfall multi-step forecasts based on inference on the boundary
(2020) Schnaitmann, Julie; Liu, Xiaochun; Dimitriadis, Timo
We propose forecast encompassing tests for the Expected Shortfall (ES) jointly with the Value at Risk (VaR) based on flexible link (or combination) functions. Our setup allows testing encompassing for convex forecast combinations and for link functions which preclude crossings of the combined VaR and ES forecasts. As the tests based on these link functions involve parameters which are on the boundary of the parameter space under the null hypothesis, we derive and base our tests on nonstandard asymptotic theory on the boundary. Our simulation study shows that the encompassing tests based on our new link functions outperform tests based on unrestricted linear link functions for one-step and multi-step forecasts. We further illustrate the potential of the proposed tests in a real data analysis for forecasting VaR and ES of the S&P 500 index.
Forecasting DAX Volatility: A Comparison of Time Series Models and Implied Volatilities
(2016) Weiß, Harald; Wagenhals, Gerhard
This study provides a comprehensive comparison of different forecasting approaches for the German stock market. Additionally, this thesis presents an application of the MCS approach to evaluate DAX volatility forecasts based on high-frequency data. Furthermore, the effects of the 2008 financial crisis on the prediction of DAX volatility are analysed. The empirical analysis is based on data that contain all recorded transactions of DAX options and DAX futures traded on the EUREX from January 2002 to December 2009. The volatility prediction models employed in this study to forecast DAX volatility are selected based on the results of the general features of the forecasting models, and the analysis of the considered DAX time series. Within the class of time series models, the GARCH, the Exponential GARCH (EGARCH), the ARFIMA, and the Heterogeneous Autoregressive (HAR) model are chosen to fit the DAX return and realised volatility series. Additionally, the Britten-Jones and Neuberger (2000) approach is applied to produce DAX implied volatility forecasts because it is based on a broader information set than the BS model. Finally, the BS model is employed as a benchmark model in this study. As the empirical analysis in this study demonstrates that DAX volatility changes considerably over the long sample period, it investigates whether structural breaks induce long memory effects. The effects are separately analysed by performing different structural break tests for the prediction models. A discussion of the impact on the applied forecasting methodology, and how it is accounted for, is also presented. Based on the MCS approach, the DAX volatility forecasts are separately evaluated for the full sample and the subperiod that excludes the two most volatile months of the financial crisis. Because the objective of this work is to provide information to investment and risk managers regarding which forecasting method delivers superior DAX volatility forecasts, the volatilities are predicted for one day, two weeks, and one month. Finally, the evaluation results are compared with previous findings in the literature for each forecast horizon.
Genomic prediction in rye
(2017) Bernal-Vasquez, Angela-Maria; Piepho, Hans-Peter
Technical progress in the genomic field is accelerating developments in plant and animal breeding programs. The access to high-dimensional molecular data has facilitated acquisition of knowledge of genome sequences in many economically important species, which can be used routinely to predict genetic merit. Genomic prediction (GP) has emerged as an approach that allows predicting the genomic estimated breeding value (GEBV) of an unphenotyped individual based on its marker profile. The approach can considerably increase the genetic gain per unit time, as not all individuals need to be phenotyped. Accuracy of the predictions are influenced by several factors and require proper statistical models able to overcome the problem of having more predictor variables than observations. Plant breeding programs run for several years and genotypes are evaluated in multi environment trials. Selection decisions are based on the mean performance of genotypes across locations and later on, across years. Under this conditions, linear mixed models offer a suitable and flexible framework to undertake the phenotypic and genomic prediction analyses using a stage-wise approach, allowing refinement of each particular stage. In this work, an evaluation and comparison of outlier detection methods, phenotypic analyses and GP models were considered. In particular, it was studied whether at the plot level, identification and removal of possible outlying observations has an impact on the predictive ability. Further, if an enhancement of phenotypic models by spatial trends leads to improvement of GP accuracy, and finally, whether the use of the kinship matrix can enhance the dissection of GEBVs from genotype-by-year (GY) interaction effects. Here, the methods related to the mentioned objectives are compared using experimental datasets from a rye hybrid breeding program. Outlier detection methods widely used in many German plant breeding companies were assessed in terms of control of the family-wise error rate and their merits evaluated in a GP framework (Chapter 2). The benefit of implementation of the methods based on a robust scale estimate was that in routine analysis, such procedures reliably identified spurious data. This outlier detection approach per trial at the plot level is conservative and ensures that adjusted genotype means are not severely biased due to outlying observations. Whenever it is possible, breeders should manually flag suspicious observations based on subject-matter knowledge. Further, removing the flagged outliers identified by the recommendedmethods did not reduce predictive abilities estimated by cross validation (GP-CV) using data of a complete breeding cycle. A crucial step towards an accurate calibration of the genomic prediction procedure is the identification of phenotypic models capable of producing accurate adjusted genotype mean estimates across locations and years. Using a two-year dataset connected through a single check, a three-stage GP approach was implemented (Chapter 3). In the first stage, spatial and non-spatial models were fitted per locations and years to obtain adjusted genotype-tester means. In the second stage, adjusted genotype means were obtained per year, and in the third stage, GP models were evaluated. Akaike information criterion (AIC) and predictive abilities estimated from GP-CV were used as model selection criteria in the first and in the third stage. These criteria were used in the first stage, because a choice had to be made between the spatial and non-spatial models and in the third stage, because the predictive abilities allow a comparison of the results of the complete analysis obtained by the alternative stage-wise approaches presented in this thesis. The second stage was a transitional stage where no model selection was needed for a given method of stage-wise analysis. The predictive abilities displayed a different ranking pattern for the models than the AIC, but both approaches pointed to the same best models. The highest predictive abilities obtained for the GP-CV at the last stage did not coincide with the models that AIC and predictive ability of GP-CV selected in the first stage. Nonetheless, GP-CV can be used to further support model selection decisions that are usually based only upon AIC. There was a trend of models accounting for row and column variation to have better accuracies than the counterpart model without row and column effects, thus suggesting that row-column designs may be a potential option to set up breeding trials. While bulking multi-year data allows increasing the training set size and covering a wider genetic background, it remains a challenge to separate GEBVs from GY effects, when there are no common genotypes across years, i.e., years are poorly connected or totally disconnected. First, an approach considering the two-year dataset connected through a single check, adjusted genotype means were computed per year and submitted to the GP stage (Chapter 3). The year adjustment was done in the GP model by assuming that the mean across genotypes in a given year is a good estimate of the year effect. This assumption is valid because the genotypes evaluated in a year are a sample of the population. Results indicated that this approach is more realistic than relying on the adjustment of a single check. A further approach entailed the use of kinship to dissect GY effects from GEBVs (Chapter 4). It was not obvious which method best models the GY effect, thus several approaches were compared and evaluated in terms of predictive abilities in forward validation (GP-FV) scenarios. It was found that for training sets formed by several disconnected years’ data, the use of kinship to model GY effects was crucial. In training sets where two or three complete cycles were available (i.e. there were some common genotypes across years within a cycle), using kinship or not yielded similar predictive abilities. It was further shown that predictive abilities are higher for scenarios with high relatedness degree between training and validation sets, and that predicting a selection of top-yielding genotypes was more accurate than predicting the complete validation set when kinship was used to model GY effects. In conclusion, stage-wise analysis is recommended and it is stressed that the careful choice of phenotypic and genomic prediction models should be made case by case based on subject matter knowledge and specificities of the data. The analyses presented in this thesis provide general guidelines for breeders to develop phenotypic models integrated with GP. The methods and models described are flexible and allow extensions that can be easily implemented in routine applications.
Genomic selection in synthetic populations
(2017) Müller, Dominik; Melchinger, Albrecht E.
The foundation of genomic selection has been laid at the beginning of this century. Since then, it has developed into a very active field of research. Although it has originally been developed in dairy cattle breeding, it rapidly attracted the attention of the plant breeding community and has, by now (2017), developed into an integral component of the breeding armamentarium of international companies. Despite its practical success, there are numerous open questions that are highly important to plant breeders. The recent development of large-scale and cost-efficient genotyping platforms was the prerequisite for the rise of genomic selection. Its functional principle is based on information shared between individuals. Genetic similarities between individuals are assessed by the use of genomic fingerprints. These similarities provide information beyond mere family relationships and allow for pooling information from phenotypic data. In practice, first a training set of phenotyped individuals has to be established and is then used to calibrate a statistical model. The model is then used to derive predictions of the genomic values for individuals lacking phenotypic information. Using these predictions can save time by accelerating the breeding program and cost by reducing resources spent for phenotyping. A large body of literature has been devoted to investigate the accuracy of genomic selection for unphenotyped individuals. However, training individuals are themselves often times selection candidates in plant breeding, and there is no conceptual obstacle to apply genomic selection to them, making use of information obtained via marker-based similarities. It is therefore also highly important to assess prediction accuracy and possibilities for its improvement in the training set. Our results demonstrated that it is possible to increase accuracy in the training set by shrinkage estimation of marker-based relationships to reduce the associated noise. The success of this approach depends on the marker density and the population structure. The potential is largest for broad-based populations and under a low marker density. Synthetic populations are produced by intermating a small number of parental components, and they have played an important role in the history of plant breeding for improving germplasm pools through recurrent selection as well as for actual varieties and research on quantitative genetics. The properties of genomic selection have so far not been assessed in synthetics. Moreover, synthetics are an ideal population type to assess the relative importance of three factors by which markers provide information about the state of alleles at QTL, namely (i) pedigree relationships, (ii) co-segregation and (ii) LD in the source germplasm. Our results show that the number of parents is a crucial factor for prediction accuracy. For a very small number of parents, prediction accuracy in a single cycle is highest and mainly determined by co-segregation between markers and QTL, whereas prediction accuracy is reduced for a larger number of parents, where the main source of information is LD within the source germplasm of the parents. Across multiple selection cycles, information from pedigree relationships rapidly vanishes, while co-segregation and ancestral LD are a stable source of information. Long-term genetic gain of genomic selection in synthetics is relatively unaffected by the number of parents, because information from co-segregation and from ancestral LD compensate for each other. Altogether, our results provide an important contribution to a better understanding of the factors underlying genomic selection, and in which cases it works and what information contributes to prediction accuracy.
Prediction of hybrid performance in maize using molecular markers
(2008) Schrag, Tobias; Melchinger, Albrecht E.
Maize breeders develop a large number of inbred lines in each breeding cycle, but, owing to resource constraints, evaluate only a small proportion of all possible crosses among these lines in field trials. Therefore, predicting the performance of hybrids by utilising the data available from related crosses to identify untested but promising hybrids is extremely important. The objectives of this thesis research were to develop and evaluate methods for marker-based prediction of hybrid performance (HP) in unbalanced data as typically generated in commercial maize hybrid breeding programs. For HP prediction, a promising approach uses the sum of effects across quantitative trait loci (QTL) as predictor. However, comparison of this approach with established prediction methods based on general combining ability (GCA) was lacking. In addition, prediction of specific combining ability (SCA) is also possible with this approach, but was so far not used for HP prediction. The objectives of the first study in this thesis were to identify QTL for grain yield and grain dry matter content, combine GCA with marker-based SCA estimates for HP prediction, and compare marker-based prediction with established methods. Hybrids from four Dent × Flint factorial mating experiments were evaluated in field trials and their parental inbreds were genotyped with amplified fragment length polymorphism (AFLP) markers. Efficiency for prediction of hybrids, of which both parents were testcross evaluated (Type 2), was assessed by leave-one-out cross-validation. The established GCA-based method predicted HP better than the approach exclusively based on markers. However, with greater relevance of SCA, combining GCA with marker-based SCA estimates was superior compared with HP prediction based on GCA only. Linkage disequilibrium between markers was expected to reduce the prediction efficiency due to inflated QTL effects and reduced power. Thus, in the second study, multiple linear regression (MLR) with forward selection was employed for HP prediction. In addition, adjacent markers in strong linkage disequilibrium were combined into haplotype blocks. An approach based on total effects of associated markers (TEAM) was developed for multi-allelic haplotype blocks. Genome scans to search for significant QTL involve multiple testing of many markers, which increases the rate of false-positive associations. Thus, the TEAM approach was enhanced by controlling the false discovery rate. Considerable loss of marker information can be caused by few missing observations, if the prediction method depends on complete marker data. Therefore, the TEAM approach was improved to cope with missing marker observations. Modification of the cross-validation procedure reflected, that often only a subset of parental lines is crossed with all lines from the opposite heterotic group in a factorial mating design. The prediction approaches were evaluated with the same field data as in the previous study. The results suggested that with haplotype blocks instead of original marker data, similar or higher efficiencies for HP prediction can be achieved. Marker-based HP prediction of inter-group crosses between lines, which were marker genotyped but not testcross evaluated, was not investigated hitherto. Heterosis, which considerably contributes to maize grain yield, was so far not incorporated into marker-based HP prediction. Combined analyses of field trials from multiple experiments of a breeding program provide valuable data for HP prediction. With a mixed linear model analysis of such unbalanced data from nine factorial mating experiments, best linear unbiased prediction (BLUP) values for HP, GCA, SCA, line per se performance, and heterosis of 400 hybrids were obtained in the third study. The prediction efficiency was assessed in cross-validation for prediction of hybrids, of which none (Type 0) or one (Type 1) parental inbred was testcross evaluated. An extension of the established HP prediction method based on BLUP of GCA and SCA, but not using marker data, resulted in prediction efficiency intermediate for Type 1 and very low for Type 0 hybrids. Combining line per se with marker-based heterosis estimates (TEAM-LM) mostly resulted in the highest prediction efficiencies of grain yield and grain dry matter content for both Type 0 and Type 1 hybrids. For the heterotic trait grain yield, the highest prediction efficiencies were generally obtained with marker-based TEAM approaches. In conclusion, this thesis research provided methods for the marker-based prediction of HP. The experimental results suggested that marker-based HP prediction is an efficient tool which supports the selection of superior hybrids and has great potential to accelerate commercial hybrid breeding programs in a very cost-effective manner. The significance of marker-based HP prediction is further enhanced by recent advances in production of doubled haploid lines and high-throughput technologies for rapid and inexpensive marker assays.
Schätzung betrieblicher Kostenfunktionen mit künstlichen neuronalen Netzen
(2015) Simen, Jan-Philipp; Troßmann, Ernst
In this thesis a concept for estimating cost relationships with artificial neural networks is developed. The resulting open-source software application Cenobi (http://sourceforge.net/projects/cenobi/) is able to assess the impact of cost drivers on activity cost, plot non-linear cost functions, do forecasting and budgeting, calculate incremental cost, do unit costing, job costing etc., calculate cost driver rates and analyse cost variances. An object-oriented implementation of neural networks optimized by genetic algorithms provides the basis for these calculations.
Targeting the poor and smallholder farmers
empirical evidence from Malawi
(2009) Houssou, Nazaire; Zeller, Manfred
This paper develops low cost, reasonably accurate, and simple models for improving the targeting efficiency of development policies in Malawi. Using a stepwise logistic regression (weighted) along with other techniques applied in credit scoring, the research identifies a set of easily observable and verifiable indicators for correctly predicting whether a household is poor or not, based on the 2004-05 Malawi Integrated Household Survey data. The predictive power of the models is assessed using out-of-sample validation tests and receiver operating characteristic curves, whereas the model?s robustness is evaluated by bootstrap simulation methods. Finally, sensitivity analyses are performed using the international and extreme poverty lines. The models developed have proven their validity in an independent sample derived from the same population. Findings suggest that the rural model calibrated to the national poverty line correctly predicts the status of about 69% of poor households when applied to an independent subset of surveyed households, whereas the urban model correctly identifies 64% of poor households. Increasing the poverty line improves the model?s targeting performances, while reducing the poverty line does the opposite. In terms of robustness, the rural model yields a more robust result with a prediction margin ±10% points compared to the urban model. While the best indicator sets can potentially yield a sizable impact on poverty if used in combination with a direct transfer program, some non-poor households would also be targeted as the result of model?s leakage. One major feature of the models is that household score can be easily and quickly computed in the field. Overall, the models developed can be potential policy tools for Malawi.
Testing forecast rationality for measures of central tendency
(2020) Schmidt, Patrick W.; Patton, Andrew J.; Dimitriadis, Timo
Rational respondents to economic surveys may report as a point forecast any measure of the central tendency of their (possibly latent) predictive distribution, for example the mean, median, mode, or any convex combination thereof. We propose tests of forecast rationality when the measure of central tendency used by the respondent is unknown. We overcome an identification problem that arises when the measures of central tendency are equal or in a local neighborhood of each other, as is the case for (exactly or nearly) symmetric distributions. As a building block, we also present novel tests for the rationality of mode forecasts. We apply our tests to survey forecasts of individual income, Greenbook forecasts of U.S. GDP, and random walk forecasts for exchange rates. We find that the Greenbook and random walk forecasts are best rationalized as mean, or near-meanforecasts, while the income survey forecasts are best rationalized as mode forecasts.
The camp view of inflation forecasts
(2009) Schmid, Kai Daniel; Sauter, Oliver; Geiger, Felix
Analyzing sample moments of survey forecasts, we derive disagreement and un- certainty measures for the short- and medium term inflation outlook. The latter provide insights into the development of inflation forecast uncertainty in the context of a changing macroeconomic environment since the beginning of 2008. Motivated by the debate on the role of monetary aggregates and cyclical variables describing a Phillips-curve logic, we develop a macroeconomic indicator spread which is assumed to drive forecasters? judgments. Empirical evidence suggests procyclical dynamics between disagreement among forecasters, individual forecast uncertainty and the macro-spread. We call this approach the camp view of inflation forecasts and show that camps form up whenever the spread widens.