×

PPDB

swMATH ID: 33749
Software Authors: J. Ganitkevitch, B. V. Durme, C. Callison-Burch
Description: PPDB: The Paraphrase Database. We present the 1.0 release of our para- phrase database, PPDB. Its English portion, PPDB:Eng, contains over 220 million para- phrase pairs, consisting of 73 million phrasal and 8 million lexical paraphrases, as well as 140 million paraphrase patterns, which cap- ture many meaning-preserving syntactic trans- formations. The paraphrases are extracted from bilingual parallel corpora totaling over 100 million sentence pairs and over 2 billion English words. We also release PPDB:Spa, a collection of 196 million Spanish paraphrases. Each paraphrase pair in PPDB contains a set of associated scores, including paraphrase probabilities derived from the bitext data and a variety of monolingual distributional similar- ity scores computed from the Google n-grams and the Annotated Gigaword corpus. Our re- lease includes pruning tools that allow users to determine their own precision/recall tradeoff.
Homepage: https://www.cis.upenn.edu/~ccb/ppdb/pdf/ppdb-naacl-2013.pdf
Related Software: LEXenstein; BERT; LSBert; Meteor; BabelNet; DKPro Similarity; word2vec; YAP3; WordNet
Referenced in: 1 Publication

Referenced in 1 Serial

1 Artificial Intelligence

Referenced in 1 Field

1 Computer science (68-XX)

Referencing Publications by Year