Using error decay prediction to overcome practical issues of deep active learning for named entity recognition.

*(English)*Zbl 07289237Summary: Existing deep active learning algorithms achieve impressive sampling efficiency on natural language processing tasks. However, they exhibit several weaknesses in practice, including (a) inability to use uncertainty sampling with black-box models, (b) lack of robustness to labeling noise, and (c) lack of transparency. In response, we propose a transparent batch active sampling framework by estimating the error decay curves of multiple feature-defined subsets of the data. Experiments on four named entity recognition (NER) tasks demonstrate that the proposed methods significantly outperform diversification-based methods for black-box NER taggers, and can make the sampling process more robust to labeling noise when combined with uncertainty-based methods. Furthermore, the analysis of experimental results sheds light on the weaknesses of different active sampling strategies, and when traditional uncertainty-based or diversification-based methods can be expected to work well.

##### MSC:

68T05 | Learning and adaptive systems in artificial intelligence |

##### Keywords:

active learning; transparency; robustness to labeling noise; black-box models; clustering; named entity recognition
PDF
BibTeX
XML
Cite

\textit{H.-S. Chang} et al., Mach. Learn. 109, No. 9--10, 1749--1778 (2020; Zbl 07289237)

Full Text:
DOI

##### References:

[1] | Si, A.; Fujita, N.; Shinomoto, S., Four types of learning curves, Neural Computation, 4, 4, 605-618 (1992) |

[2] | Bachman, P., Sordoni, A., & Trischler, A. (2017). Learning algorithms for active learning. In ICML. |

[3] | Baldridge, J., & Osborne, M. (2004). Active learning and the total cost of annotation. In NIPS. |

[4] | Bloodgood, M., & Vijay-Shanker, K. (2009). A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping. In CoNLL. |

[5] | Bodenreider, O. (2004). The unified medical language system (umls): Integrating biomedical terminology. Nucleic Acids Research, \(32(suppl_-1)\), D267-D270. |

[6] | Bouguelia, M. R., Belaïd, Y., & Belaïd, A. (2015). Stream-based active learning in the presence of label noise. In 4th International conference on pattern recognition applications and methods-ICPRAM 2015. |

[7] | Bouguelia, MR; Nowaczyk, S.; Santosh, K.; Verikas, A., Agreeing to disagree: Active learning with noisy labels without crowdsourcing, International Journal of Machine Learning and Cybernetics, 9, 8, 1307-1319 (2018) |

[8] | Chen, I., Johansson, F. D., & Sontag, D. (2018). Why is my classifier discriminatory? In Advances in neural information processing systems (pp. 3543-3554). |

[9] | Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12(Aug), 2493-2537. · Zbl 1280.68161 |

[10] | Culotta, A., & McCallum, A. (2005). Reducing labeling effort for structured prediction tasks. In AAAI. |

[11] | Dasgupta, S., Two faces of active learning, Theoretical Computer Science, 412, 19, 1767-1781 (2011) · Zbl 1209.68408 |

[12] | Doğan, RI; Leaman, R.; Lu, Z., NCBI disease corpus: A resource for disease name recognition and concept normalization, Journal of Biomedical Informatics, 47, 1-10 (2014) |

[13] | Fang, M., Li, Y., & Cohn, T. (2017). Learning how to active learn: A deep reinforcement learning approach. In EMNLP. |

[14] | Fu, W., Wang, M., Hao, S., & Wu, X. (2018). Scalable active learning by approximated error reduction. In SIGKDD. |

[15] | Gal, Y., Islam, R., & Ghahramani, Z. (2017). Deep bayesian active learning with image data. In ICML. |

[16] | Greenberg, N., Bansal, T., Verga, P., & McCallum, A. (2018). Marginal likelihood training of bilstm-crf for biomedical named entity recognition from disjoint label sets. In EMNLP. |

[17] | Guillory, A., & Bilmes, J. (2010). Interactive submodular set cover. arXiv preprint arXiv:10023345. |

[18] | Hestness, J., Narang, S., Ardalani, N., Diamos, G., Jun, H., Kianinejad, H., et al. (2017). Deep learning scaling is predictable, empirically. arXiv preprint arXiv:171200409. |

[19] | Khetan, A., Lipton, Z. C., & Anandkumar, A. (2018). Learning from noisy singly-labeled data. In ICLR. |

[20] | Konyushkova, K., Sznitman, R., & Fua, P. (2017). Learning active learning from data. In NIPS. |

[21] | Koshorek, O., Stanovsky, G., Zhou, Y., Srikumar, V., & Berant, J. (2019). On the limits of learning to actively learn semantic representations. In CoNLL. |

[22] | Kremer, J., Sha, F., & Igel, C. (2018). Robust active label correction. In International conference on artificial intelligence and statistics. |

[23] | Lakkaraju, H., Kamar, E., Caruana, R., & Horvitz, E. (2017). Identifying unknown unknowns in the open world: Representations and policies for guided exploration. In AAAI. |

[24] | Lowell, D., Lipton, Z. C., & Wallace, BC. (2019). Practical obstacles to deploying active learning. In EMNLP. |

[25] | Lund, C.; Yannakakis, M., On the hardness of approximating minimization problems, Journal of the ACM (JACM), 41, 5, 960-981 (1994) · Zbl 0814.68064 |

[26] | MacQueen, J., Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Oakland, CA, USA, 1, 281-297 (1967) · Zbl 0214.46201 |

[27] | Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In NIPS. |

[28] | Mitzenmacher, M.; Upfal, E., Probability and computing: Randomization and probabilistic techniques in algorithms and data analysis (2017), Oxford: Cambridge University Press, Oxford · Zbl 1368.60002 |

[29] | Murty, S., Verga, P., Vilnis, L., Radovanovic, I., & McCallum, A. (2018). Hierarchical losses and new resources for fine-grained entity typing and linking. In ACL. |

[30] | Mussmann, S., & Liang, P. (2018). On the relationship between data efficiency and error for uncertainty sampling. In ICML. |

[31] | Nash, SG, Newton-type minimization via the Lanczos method, SIAM Journal on Numerical Analysis, 21, 4, 770-788 (1984) · Zbl 0558.65041 |

[32] | Phillips, R., Chang, K. H., & Friedler, S. A. (2018). Interpretable active learning. In Conference on fairness, accountability and transparency. |

[33] | Ravi, S., & Larochelle, H. (2018). Meta-learning for batch mode active learning. In ICLR workshop. |

[34] | Reichart, R., Tomanek, K., Hahn, U., & Rappoport, A. (2008). Multi-task active learning for linguistic annotations. In ACL. |

[35] | Roy, N., & McCallum, A. (2001). Toward optimal active learning through Monte Carlo estimation of error reduction. In ICML. |

[36] | Rubens, N.; Sheinman, V.; Tomioka, R.; Sugiyama, M., Active learning in black-box settings, Austrian Journal of Statistics, 40, 1-2, 125-135 (2011) |

[37] | Sculley, D. (2010). Web-scale k-means clustering. In WWW. |

[38] | Sener, O., & Savarese, S. (2018). Active learning for convolutional neural networks: Acore-set approach. In ICLR. |

[39] | Settles, B. (2009). Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison. |

[40] | Settles, B. (2011). From theories to queries: Active learning in practice. In Active Learning and Experimental Design workshop In conjunction with AISTATS 2010. |

[41] | Settles, B., & Craven, M. (2008). An analysis of active learning strategies for sequence labeling tasks. In EMNLP. |

[42] | Settles, B., Craven, M., & Ray, S. (2008). Multiple-instance active learning. In NIPS. |

[43] | Shen, Y., Yun, H., Lipton, Z. C., Kronrod, Y., & Anandkumar, A. (2018). Deep active learning for named entity recognition. In ICLR. |

[44] | Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? Improving data quality and data mining using multiple, noisy labelers. In SIGKDD. |

[45] | Siddhant, A., & Lipton, Z. C. (2018). Deep bayesian active learning for natural language processing: Results of a large-scale empirical study. In EMNLP. |

[46] | Strubell, E., Verga, P., Belanger, D., & McCallum, A. (2017). Fast and accurate entity recognition with iterated dilated convolutions. In EMNLP. |

[47] | Sang, E. F. T. K., & De Meulder, F. (2003). Introduction to the CoNLL-2003 shared task: Language-independent named entity recognition. In NAACL. |

[48] | Tomanek, K., & Olsson, F. (2009). A web survey on the use of active learning to support annotation of text data. In NAACL HLT 2009 workshop on active learning for natural language processing. |

[49] | Wang, C., Chiticariu, L., & Li, Y. (2017a). Active learning for black-box semantic role labeling with neural factors. In IJCAI. |

[50] | Wang, W., Yang, N., Wei, F., Chang, B., & Zhou, M. (2017b). Gated self-matching networks for reading comprehension and question answering. In ACL. |

[51] | Wei, K., Iyer, R., & Bilmes, J. (2015). Submodularity in data subset selection and active learning. In ICML. |

[52] | Zhang, J.; Wu, X.; Shengs, VS, Active learning with imbalanced multiple noisy labeling, IEEE Transactions on Cybernetics, 45, 5, 1095-1107 (2015) |

[53] | Zhao, L., Sukthankar, G., & Sukthankar, R. (2011). Incremental relabeling for active learning with noisy crowdsourced annotations. In SocialCom/PASSAT. |

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. It attempts to reflect the references listed in the original paper as accurately as possible without claiming the completeness or perfect precision of the matching.