This repository contains examples, demonstrations, and support scripts
for building custom sourmash
databases, using
the new sourmash sketch fromfile command
and related additions to sourmash.
See sourmash#1671 for the overall discussion about building databases.
See an example of building a private database.
Another example: building protein and DNA databases starting from genomes.
Building a DNA+protein database from the NCBI genome assembly & proteome files.
Building a DNA+protein database from an NCBI genome assembly file.
fasta-to-fromfile.py- build afromfileCSV file from a list of FASTA files.genbank-to-fromfile.py- build afromfileCSV file from a list of FASTA files downloaded from Genbankkiln.py- support library for buildingfromfileCSVs.mass-rename.py- a script to bulk-rename sourmash signatures.mass-merge.py- a script to bulk-merge sourmash signatures by spreadsheet column attribute.sigs-to-manifest.py- a script to extract and/or update sourmash manifests from many databases.