Releases · finalfusion/finalfusion-python

05 Jun 07:30

sebpuetz

0.7.1

be3bf9b

Latest

This release marks a major change to finalfusion-python: the entire package has been rewritten in Python and is no longer a wrapper around finalfusion-rust.

The API is now almost on par with finalfusion-rust and in some places even goes beyond that.

Vocab, Storage, Metadata and Norms are now accessible as properties on Embeddings
Any of the chunks above can be loaded by themselves from a finalfusion file
All chunks can be constructed from within Python
- It's possible to add, remove or change embeddings
Storage types integrate directly with numpy arrays
Reading and writing to all common Embedding formats (word2vec, GloVe, fastText) is supported
The API for vocabularies and subword indexers has been made mor ergonomic:
- vocab words and the word -> index mapping are accessible as properties
- SubwordVocabs expose the subword indexer through vocab.subword_indexer

In addition to the overhauled API, finalfusion-python now comes with executables:

ffp-convert to convert between embedding formats
ffp-similar and ffp-analogy for similarity and analogy queries
ffp-bucket-to-explicit to convert from bucket subword to explicit subword embeddings

Check out the documentation at https://finalfusion-python.readthedocs.io for more information!

Assets 2

08 Mar 12:47

danieldk

0.6.2

e5f538e

0.6.2

Bump version to 0.6.2

Assets 6

18 Nov 11:49

danieldk

0.6.1

0a588a2

0.6.1

Bump the version to 0.6.1

Assets 7

15 Nov 18:29

danieldk

0.6.0

63f4592

0.6.0

Bump the version to 0.6.0

Assets 7

10 Sep 09:00

danieldk

0.5.0

d58d0eb

Support for fastText, word2vec, and text embeddings

The largest change is this release is support for reading fastText, word2vec, and text embeddings, in addition to finalfusion embeddings.

Add support for reading fastText (Embeddings.read_fasttext()), text (Embeddings.read_text()), textdims (Embeddings.read_text()), and word2vec (Embeddings.read_fasttext()) formats.
Each of these newly-supported formats provides a keyword argument lossy. If set, the embeddings will be read lossily, permitting invalid UTF-8 in words.
Add the embedding_similarity method, which looks up words that are similar to a given embedding. The method for traditional word-based lookups has been renamed from similarity to word_similarity.
Iteration over embeddings returned tuples (word, embedding) in previous releases. Now instances of the Embedding class are returned, which provide word, embedding, and norm properties. norm is the embedding norm before normalization of an embedding using its l2 norm.
Add support for memory mapping quantized embedding matrices.
Add the ngram_indices and subword_indices to the Vocab class. These methods return the subword indices for a given word, which can be used to retrieve the subword embeddings individually. The ngram_indices methods returns each subword with its index, whereas subword_indices only returns the indices.
Update to pyo3 0.8.

Assets 2

10 Sep 11:39

danieldk

travis-0.5.0-rebuild

a1dc616

travis-0.5.0-rebuild

CI: Fix crate name in Travis-CI builds

Assets 7

24 Jul 17:10

danieldk

0.4.0

9afa6ac

0.4.0

Bump version to 0.4.0

Assets 7

14 Jun 11:12

danieldk

0.3.1

c11ab24

0.3.1

Bump version to 0.3.1

Assets 8

12 Jun 07:14

danieldk

0.3.0

f366aa8

New convenience methods

This release has the following changes:

Add the matrix_copy method to get a numpy array copy of the embedding matrix.
Add the vocab method to get a Vocab instance, which provides the item_to_indices method to get the indices or subword indices of a word. Vocab also provides indexing to look up the word corresponding to an index (e.g. vocab[3823]).
Upgrade to finalfusion 0.6.

Assets 8

24 Apr 11:06

danieldk

0.2.0

c1ca7c1

Switch to numpy arrays

Return numpy arrays rather than Python lists.
Update to pyo3 0.6.
Switch from rust2vec to the finalfusion crate.

Assets 2

Releases: finalfusion/finalfusion-python

Finalfusion in Python

Uh oh!

0.6.2

Uh oh!

0.6.1

Uh oh!

0.6.0

Uh oh!

Support for fastText, word2vec, and text embeddings

Uh oh!

travis-0.5.0-rebuild

Uh oh!

0.4.0

Uh oh!

0.3.1

Uh oh!

New convenience methods

Uh oh!

Switch to numpy arrays

Uh oh!