Feature selection for consistent biclustering via fractional 0-1 programming.

*(English)*Zbl 1123.90073Summary: Biclustering consists in simultaneous partitioning of the set of samples and the set of their attributes (features) into subsets (classes). Samples and features classified together are supposed to have a high relevance to each other which can be observed by intensity of their expressions. We define the notion of consistency for biclustering using interrelation between centroids of sample and feature classes. We prove that consistent biclustering implies separability of the classes by convex cones. While previous works on biclustering concentrated on unsupervised learning and did not consider employing a training set, whose classification is given, we propose a model for supervised biclustering, whose consistency is achieved by feature selection. The developed model involves solution of a fractional 0-1 programming problem. Preliminary computational results on microarray data mining problems are reported.

PDF
BibTeX
XML
Cite

\textit{S. Busygin} et al., J. Comb. Optim. 10, No. 1, 7--21 (2005; Zbl 1123.90073)

Full Text:
DOI

##### References:

[1] | A. Ben-Dor, L. Bruhn, I. Nachman, M. Schummer, and Z. Yakhini, ”Tissue classification with gene expression profiles,” Journal of Computational Biology, vol. 7, pp. 559–584, 2000. |

[2] | A. Ben-Dor, N. Friedman, and Z. Yakhini, ”Class discovery in gene expression data,” in Proc. Fifth Annual Inter. Conf. on Computational Molecular Biology (RECOMB), 2001. |

[3] | S. Busygin, G. Jacobsen, and E. Krámer, ”Double Conjugated Clustering Applied to Leukemia Microarray Data,” SDM 2002 Workshop on Clustering High Dimensional Data and its Applications, 2002. |

[4] | Y. Cheng and G.M. Church, ”Biclustering of Expression Data,” in: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, 2000, pp. 93–103. |

[5] | I.S. Dhillon, ”Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning,” in: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD), August 26–29, 2001, San Francisco, CA. |

[6] | P. Hansen, M. Poggi de Aragão, and C.C. Ribeiro, ”Hyperbolic 0–1 programming and query optimization in information retrieval,” Math. Program., vol. 52, pp. 256–263, 1991. · Zbl 0737.90044 |

[7] | S. Hashizume, M. Fukushima, N. Katoh, and T. Ibaraki, ”Approximation algortihms for combinatorial fractional programming problems,” Mathematical Programming, vol. 37, pp. 255–267. · Zbl 0616.90078 |

[8] | L.-L. Hsiao, F. Dangond, T. Yoshida, R. Hong, R.V. Jensen, J. Misra, W. Dillon, K.F. Lee, KE. Clark, P. Haverty, Z. Weng, G. Mutter, M.P. Frosch, M.E. MacDonald, E.L. Milford, C.P. Crum, R. Bueno, R.E. Pratt, M. Mahadevappa, J.A. Warrington, G. Stephanopoulos, G. Stephanopoulos, and S.R. Gullans, ”A Compendium of Gene Expression in Normal Human Tissues,” Physiol. Genomics, vol. 7, pp. 97–104, 2001. |

[9] | T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. Bloomfield, and E.S. Lander, ”Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring,” Science, vol. 286, pp. 531–537, 1999. |

[10] | Y. Kluger, R. Basri, J.T. Chang, and M. Gerstein, ”Spectral biclustering of microarray data: Coclustering genes and conditions,” Genome Res, vol. 13, pp. 703–716, 2003. |

[11] | J.-C. Picard and M. Queyranne, ”A network flow solution to some nonlinear 0–1 programming problems, with applications to graph theory,” Networks, vol. 12, pp. 141–159, 1982. · Zbl 0485.90081 |

[12] | O.A. Prokopyev, H.-X. Huang, and P.M. Pardalos, ”On complexity of unconstrained hyperbolic 0–1 programming problems,” Oper. Res. Lett., vol. 33, pp. 312–318, 2005a. · Zbl 1140.90469 |

[13] | O.A. Prokopyev, C. Meneses, C.A.S. Oliveira, and P.M. Pardalos, ”On Multiple-Ratio Hyperbolic 0–1 Programming Problems,” to appear in Pacific Journal of Optimization, 2005b. · Zbl 1105.90094 |

[14] | S. Saipe, ”Solving a (0,1) hyperbolic program by branch and bound,” Naval Res. Logist. Quarterly, vol. 22, pp. 497–515, 1975. · Zbl 0335.90042 |

[15] | M. Tawarmalani, S. Ahmed, and N. Sahinidis, ”Global optimization of 0–1 Hyperbolic Programs,” J. Global Optim., vol. 24, pp. 385–416, 2002. · Zbl 1046.90054 |

[16] | J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik, Feature selection for SVMs. NIPS, 2001. |

[17] | T.-H. Wu, ”A note on a global approach for general 0–1 fractional programming,” European J. Oper. Res., vol. 101, pp. 220–223, 1997. · Zbl 0916.90252 |

[18] | E.P. Xing and R.M. Karp ”CLIFF: Clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts,” Bioinformatics Discovery Note, vol. 1, pp. 1–9, 2001. |

[19] | CAMDA 2001 Conference. http://bioinformatics.duke.edu/camda/camda01/. |

[20] | HuGE Index.org Website. http://www.hugeindex.org. |

[21] | ILOG Inc. CPLEX 9.0 User’s Manual, 2004. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.