×

Sally

swMATH ID: 8470
Software Authors: Rieck, Konrad; Wressnegger, Christian; Bikadorov, Alexander
Description: Sally: a tool for embedding strings in vector spaces Strings and sequences are ubiquitous in many areas of data analysis. However, only few learning methods can be directly applied to this form of data. We present Sally, a tool for embedding strings in vector spaces that allows for applying a wide range of learning methods to string data. Sally implements a generalized form of the bag-of-words model, where strings are mapped to a vector space that is spanned by a set of string features, such as words or n-grams of words. The implementation of Sally builds on efficient string algorithms and enables processing millions of strings and features. The tool supports several data formats and is capable of interfacing with common learning environments, such as Weka, Shogun, Matlab, or Pylab. Sally has been successfully applied for learning with natural language text, DNA sequences and monitored program behavior.
Homepage: http://dl.acm.org/citation.cfm?id=2503345
Keywords: string embedding; bag-of-words models; learning with sequential data
Related Software: PyLab; WEKA; SHOGUN; Matlab; hglm; KeBABS; Harry
Referenced in: 1 Publication

Standard Articles

1 Publication describing the Software Year
Sally: a tool for embedding strings in vector spaces
Rieck, Konrad; Wressnegger, Christian; Bikadorov, Alexander
2012

Referenced in 1 Field

1 Computer science (68-XX)

Referencing Publications by Year