\newpage
The published methods in this thesis were validated using data simulations and sampling from reference genome sequences. Nevertheless, their use must be shown when applied to a variety of real metagenomes. The program taxator-tk was subsequently applied in two metagenome studies in completely different settings. For the publication by @BulgarelliStructure2015, taxonomic profiles were generated for metagenome contigs to study complex microbial communities associated with plant roots (rhizosphere). The taxonomic profiles where shown to be consistent with profiles based on independent 16S amplicon sequencing for the same communities. Furthermore, taxator-tk was able to discover members of clades, for instance Archaea and Cyanobacteria, which the 16S primers seemed to have missed in the amplification step. Such biases for the primers used to amplify regions of the 16S gene were also independently confirmed [@EloefadroshMetagenomics2016]. The taxonomic profiles based on shotgun metagenome data were also not influenced by 16S copy number variations in the corresponding genomes, unlike the amplicon profiles. In a second study of a benzene-degrading enrichment community [@DongReconstructing2017], taxator-tk was applied to derive bin-specific sequence data to train a full model for the composition-based classifier PhyloPythiaS [@PatilTaxonomic2011], so that the genomes of four species could be recovered, two of them with over 97% completeness. Thereby, we used the same logic to define the model and to seed the genome bins with training data as in the program PhyloPythiaS+ [@GregorPhylopythias2016], but we replaced the homology search based on marker genes with taxator-tk, which offered a better coverage of genomic reference for this task. The completeness and potential contamination levels of the derived genomes were checked independently, based on single-copy marker genes and the near-complete genomes were then used to study benzene degradation pathways by linkage to metabolomic experiments and to propose a benzene oxidation pathway with direct sulfate reduction.
Working with metagenomic data and comparing the results of different binning programs, for instance in [@DrogeTaxatortk2014], we observed that the current metagenome analysis toolbox features many programs for similar problems giving different results. One possible explanation is that metagenomics is an interdisciplinary field with contributions from biotechnology, ecology and medicine, each with a different focus on ecosystems and data (see @fig:metagenomes_environments). As a result, metagenomics lacks a systematic and cross-discipline view on software for data processing and analysis. To improve the situation, the Critical Assessment of Metagenomic Interpretation (CAMI) challenge (http://cami-challenge.org) compared computer programs for metagenome analysis, such as metagenome assembly, taxonomic profiling and genome binning. As part of my thesis work, I contributed both by taking part in the conception and implementation of the binning evaluation framework as well as by submitting taxator-tk for comparison [@SczyrbaCritical2017].