×

Analysing the scalability of multiobjective evolutionary algorithms when solving the motif discovery problem. (English) Zbl 1315.90044

Summary: In this paper we analyse the scalability of seven multiobjective evolutionary algorithms when they solve large instances of a known biological problem, the motif discovery problem (MDP). The selected algorithms are a population-based and a trajectory-based algorithms (DEPT and MO-VNS, respectively), three swarm intelligence algorithms (MOABC, MO-FA, and MO-GSA), a genetic algorithm (NSGA-II), and SPEA2. The MDP is one of the most important sequence analysis problems related to discover common patterns, motifs, in DNA sequences. A motif is a nucleic acid sequence pattern that has some biological significance as being DNA binding sites for a regulatory protein, i.e., a transcription factor (TF). A biologically relevant motif must have a certain length, be found in many sequences, and present a high similarity among the subsequences which compose it. These three goals are in conflict with each other, therefore a multiobjective approach is a good way of facing the MDP. In addition, in recent years, scientists are decoding genomes of many organisms, increasing the computational workload of the algorithms. Therefore, we need algorithms that are able to deal with these new large DNA instances. The obtained experimental results suggest that MOABC and MO-FA are the algorithms with the best scalability behaviours.

MSC:

90C29 Multi-objective and goal programming
92D10 Genetics and epigenetics
92D20 Protein sequences, DNA sequences

Software:

SPEA2; GSA; ABC
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Ao, W., Gaudet, J., Kent, W.J., Muttumu, S., Mango, S.E.: Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 305(5691), 1743-1746 (2004) · doi:10.1126/science.1102216
[2] Bailey, T.L., Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Mach. Learn. 21(1-2), 51-80 (1995)
[3] Che, D., Song, Y., Rashedd, K.: MDGA: motif discovery using a genetic algorithm. In: Proceedings of the 2005 Conference on Genetic and, Evolutionary Computation (GECCO’05), pp. 447-452 (2005)
[4] Congdon, C.B., Fizer, C.W., Smith, N.W., Gaskins, H.R., Aman, J., Nava, G.M., Mattingly, C.: Preliminary results for GAMI: a genetic algorithms approach to motif inference. In: Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB’05), pp. 97-104 (2005)
[5] Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. Wiley, New York (2001) · Zbl 0970.90091
[6] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 6, 182-197 (2002) · doi:10.1109/4235.996017
[7] D’haeseleer, P.: What are DNA sequence motifs? Nat. Biotechnol. 24(4), 423-425 (2006) · doi:10.1038/nbt0406-423
[8] Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(Suppl 1), S354-S363 (2002) · doi:10.1093/bioinformatics/18.suppl_1.S354
[9] Favorov, A.V., Gelfand, M.S., Gerasimova, A.V., Ravcheev, D.A., Mironov, A.A., Makeev, V.J.: A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length. Bioinformatics 21(10), 2240-2245 (2005) · doi:10.1093/bioinformatics/bti336
[10] Fogel, G.B., Porto, V.W., Varga, G., Dow, E.R., Craven, A.M., Powers, D.M., Harlow, H.B., Su, E.W., Onyia, J.E., Su, C.: Evolutionary computation for discovery of composite transcription factor binding sites. Nucleic Acids Res. 36(21), e142 (2008) · doi:10.1093/nar/gkn738
[11] Fogel, G.B., Weekes, D.G., Varga, G., Dow, E.R., Harlow, H.B., Onyia, J.E., Su, C.: Discovery of sequence motifs related to coexpression of genes using evolutionary computation. Nucleic Acids Res. 32(13), 3826-3835 (2004) · doi:10.1093/nar/gkh713
[12] Frith, M.C., Hansen, U., Spouge, J.L., Weng, Z.: Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 32(1), 189-200 (2004) · doi:10.1093/nar/gkh169
[13] González-Álvarez, D.L., Vega-Rodríguez, M.A., Gómez-Pulido, J.A., Sánchez-Pérez, J.M.: A multiobjective variable neighborhood search for solving the motif discovery problem. Int. Workshop Soft Comput. Models Ind. Appl. (SOCO’10) 73, 39-46 (2010)
[14] González-Álvarez, D.L., Vega-Rodríguez, M.A., Gómez-Pulido, J.A., Sánchez-Pérez, J.M.: Solving the motif discovery problem by using differential evolution with pareto tournaments. In: Proceedings of the 2010 IEEE Congress on, Evolutionary Computation (CEC’10), pp. 4140-4147 (2010)
[15] González-Álvarez, D.L., Vega-Rodríguez, M.A., Gómez-Pulido, J.A., Sánchez-Pérez, J.M.: Applying a multiobjective gravitational search algorithm (MO-GSA) to discover motifs. International Work Conference on Artificial Neural Networks (IWANN’11), LNCS 6692/2011, 372-379 (2011)
[16] González-Álvarez, D.L., Vega-Rodríguez, M.A., Gómez-Pulido, J.A., Sánchez-Pérez, J.M.: Finding motifs in DNA sequences applying a multiobjective artificial bee colony (MOABC) algorithm. In: Proceedings of 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EVOBIO’11), LNCS 6623/2011, pp. 89-100 (2011)
[17] González-Álvarez, D.L., Vega-Rodríguez, M.A., Gómez-Pulido, J.A., Sánchez-Pérez, J.M.: Predicting DNA motifs by using evolutionary multiobjective optimization. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 913-925 (2011) · doi:10.1109/TSMCC.2011.2172939
[18] González-Álvarez, D.L., Vega-Rodríguez, M.A., Gómez-Pulido, J.A., Sánchez-Pérez, J.M.: Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery. Eng. Appl. Artif. Intell. 26(1), 314-326 (2012) · doi:10.1016/j.engappai.2012.06.014
[19] Hertz, G.Z., Stormo, G.D.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563-577 (1999) · doi:10.1093/bioinformatics/15.7.563
[20] Karaboga, D.: An idea based on honey bee swarm for numerical optimization. Technical report-tr06, Erciyes University, Turkey (2005)
[21] Karaboga, D., Basturk, B.: On the performance of artificial bee colony (ABC) algorithm. Appl. Soft Comput. J. 8(1), 687-697 (2008) · doi:10.1016/j.asoc.2007.05.007
[22] Kaya, M.: MOGAMOD: multi-objective genetic algorithm for motif discovery. Expert Syst. Appl. 36(2), 1039-1047 (2009) · doi:10.1016/j.eswa.2007.11.008
[23] Liu, F.F.M., Tsai, J.J.P., Chen, R.M., Chen, S.N., Shih, S.H.: FMGA: finding motifs by genetic algorithm. Fourth IEEE Symposium on Bioinformatics and, Bioengineering (BIBE’04), pp. 459-466 (2004)
[24] Lones, M.A., Tyrrell, A.M.: Regulatory motif discovery using a population clustering evolutionary algorithm. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(3), 403-414 (2007) · doi:10.1109/tcbb.2007.1044
[25] Mladenovic, N., Hansen, P.: Variable neighborhood search. Comput. Operat. Res. 24, 1097-1100 (1997) · Zbl 0889.90119 · doi:10.1016/S0305-0548(97)00031-2
[26] Paul, T.K., Iba, H.: Identification of weak motifs in multiple biological sequences using genetic algorithm. In: Proceedings of the 2006 Conference on Genetic and, Evolutionary Computation (GECCO’06), pp. 271-278 (2006)
[27] Pavesi, G., Mauri, G., Pesole, G.: An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics 17(Suppl 1), S207-S214 (2001) · doi:10.1093/bioinformatics/17.suppl_1.S207
[28] Rashedi, E., Nezamabadi-pour, H., Saryazdi, S.: GSA: a gravitational search algorithm. Inf. Sci. 179(13), 2232-2248 (2009) · Zbl 1177.90378 · doi:10.1016/j.ins.2009.03.004
[29] Regnier, M., Denise, A.: Rare events and conditional events on random strings. Discrete Math. Theor. Comput. Sci. 6, 191-214 (2004) · Zbl 1059.05008
[30] Roth, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 16(10), 939-945 (1998) · doi:10.1038/nbt1098-939
[31] Shao, L., Chen, Y.: Bacterial foraging optimization algorithm integrating tabu search for motif discovery. IEEE International Conference on Bioinformatics and, Biomedicine (BIBM’09), pp. 415-418 (2009)
[32] Shao, L., Chen, Y., Abraham, A.: Motif discovery using evolutionary algorithms. International Conference of Soft Computing and, Pattern Recognition (SOCPAR’09), pp. 420-425 (2009)
[33] Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures, 4th edn. Chapman & Hall/CRC Press, New York (2007) · Zbl 1118.62001
[34] Sinha, S., Tompa, M.: YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 31(13), 3586-3588 (2003) · doi:10.1093/nar/gkg618
[35] Srinivas, N., Deb, K.: Multi-objective function optimization using non-dominated sorting genetic algorithms. Evol. Comput. 2(3), 221-248 (1995) · doi:10.1162/evco.1994.2.3.221
[36] Stine, M., Dasgupta, D., Mukatira, S.: Motif discovery in upstream sequences of coordinately expressed genes. 2003 Congr. Evol. Comput. (CEC’03) 3, 1596-1603 (2003) · doi:10.1109/CEC.2003.1299863
[37] Storn, R., Price, K.: Differential evolution—a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 11(4), 341-359 (1997) · Zbl 0888.90135 · doi:10.1023/A:1008202821328
[38] Thijs, G., Lescot, M., Marchal, K., Rombauts, S., De Moor, B., Rouzé, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling. Bioinformatics 17(12), 1113-1122 (2001) · doi:10.1093/bioinformatics/17.12.1113
[39] Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. 23(1), 137-144 (2005) · doi:10.1038/nbt1053
[40] van Helden, J., Andre, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. 281(5), 827-842 (1998) · doi:10.1006/jmbi.1998.1947
[41] van Helden, J., Rios, A.F., Collado-Vides, J.: Discovering regulatory elements in non-coding sequences by analysis of spaced dyads. Nucleic Acids Res. 28(8), 1808-1818 (2000) · doi:10.1093/nar/28.8.1808
[42] Weicker, N., Szabo, G., Weicker, K., Widmayer, P.: Evolutionay multiobjective optimization for base station transmitter placement with frequency assignment. IEEE Trans. Evol. Comput. 7(2), 189-203 (2003) · doi:10.1109/TEVC.2003.810760
[43] Workman, C.T., Stormo, G.D.: ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. Pacific Symposium on Biocomputing, pp. 467-478 (2000)
[44] Yang, X.S.: Firefly algorithms for multimodal optimization. 5th International Symposium of Stochastic Algorithms: Foundations and Applications (SAGA’09), LNCS 5792, pp. 169-178 (2009) · Zbl 1260.90164
[45] Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: empirical results. Evol. Comput. 8(2), 173-195 (2000) · doi:10.1162/106365600568202
[46] Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength pareto evolutionary algorithm. Technical report tik-report 103, Swiss Federal Institute of Technology Zurich, Switzeland (2001)
[47] Zitzler, E., Thiele, L.: Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE Trans. Evol. Comput. 3(4), 257-271 (1999) · doi:10.1109/4235.797969
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.