-
Notifications
You must be signed in to change notification settings - Fork 2
working model of spell check mechanism (basic) #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,19 @@ | ||
| #Makefile | ||
|
|
||
| EXECUTABLE := __init__ | ||
|
|
||
| SOURCES := *.py | ||
|
|
||
| EXT := py | ||
| CC := python | ||
|
|
||
| 0: | ||
| $(CC) $(SOURCES) | ||
| $(CC) $(EXECUTABLE).$(EXT) 0 | ||
|
|
||
| 1: | ||
| $(CC) $(SOURCES) | ||
| $(CC) $(EXECUTABLE).$(EXT) 1 | ||
|
|
||
|
|
||
| # this line required by make - don't delete |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,3 +1,24 @@ | ||
| # ReadMe-bot | ||
|
|
||
| a bot to check github readme for spelling and grammer errors and create a pull request with fixes and details. | ||
| # ReadMe-bot | ||
| a bot to check github readme for spelling and grammer errors and create a pull request with fixes and details. | ||
|
|
||
|
|
||
| Spell-Check | ||
| =========== | ||
|
|
||
| Spell Checker in Python | ||
|
|
||
| Use | ||
| ---- | ||
| Cloning and Running Program | ||
| cd Spell-Check | ||
| make 0 or make 1</code></pre> | ||
|
|
||
| Removing .pyc files if needed | ||
| <pre><code>make realclean</code></pre> | ||
|
|
||
| Note: When using word generated mistakes, reoccuring words or letters may appear. Cause being that random numbers aren't always completely random when generated reoccuringly. | ||
|
|
||
|
|
||
| # Contributors | ||
| Manas-kashyap | ||
| Xeon-xolt |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| from spellcheck import * | ||
| import sys | ||
|
|
||
| def main(): | ||
| spellchk = SpellCheck('/usr/share/dict/words') | ||
| spellchk.run(sys.argv[1]) | ||
|
|
||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. file should end with a blank line
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay , i will remeber it |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,49 @@ | ||
| import re, random | ||
| class Misspell: | ||
| #Give list of words to mispell | ||
| def __init__(self, wordList): | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pep8: use lowercase words separated by underscore for variable names.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay , changes applied . |
||
| self.wList = wordList | ||
|
|
||
| def genWord(self): | ||
| return self.misspelled(self.wList[random.randint(0,len(self.wList)-1)]) | ||
|
|
||
| def misspelled(self, word): | ||
| if len(word) == 1: | ||
| return word | ||
| vowels = 'aeiouy' | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. y is a vowel?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oops sorry , my bad |
||
| consonants = 'bcdfghjklmnpqrstvwxyz' | ||
| if len(word) < 9: | ||
| mistakes = 1 | ||
| elif len(word) < 12: | ||
| mistakes = 2 | ||
| elif len(word) < 17: | ||
| mistakes = 3 | ||
| else: | ||
| mistakes = 4 | ||
| newWord = word[0] | ||
| prev = word[0] | ||
| for i in word[1:]: | ||
| if mistakes != 0: | ||
| rNum = random.randint(1,10) | ||
| else: | ||
| rNum = 5 | ||
| if rNum == 2: | ||
| newWord = newWord[:len(newWord)-2] + i + prev | ||
| elif rNum == 3: | ||
| if i in vowels: | ||
| c = vowels[random.randint(0, len(vowels)-1)] | ||
| while i == c: | ||
| c = vowels[random.randint(0, len(vowels)-1)] | ||
| else: | ||
| c = i | ||
| newWord += c | ||
| elif rNum == 4: | ||
| newWord += i + i | ||
| else: | ||
| newWord += i | ||
| prev = i | ||
| mistakes -= 1 | ||
| if newWord == word: | ||
| newWord += prev | ||
| return newWord | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,73 @@ | ||
| import re, collections, sys | ||
| from misspell import Misspell | ||
| class SpellCheck: | ||
|
|
||
| alphabet = 'abcdefghijklmnopqrstuvwxyz' | ||
|
|
||
| def __init__(self, path): | ||
| self.dictPath = path | ||
|
|
||
| def words(self, text): | ||
| return re.findall('[a-z]+', text.lower()) | ||
|
|
||
| def train(self, words): | ||
| occurences = {} | ||
| for l in self.alphabet: | ||
| occurences[l] = collections.defaultdict(lambda: 1) | ||
| for w in words: | ||
| occurences[w[0]][w] += 1 #Incrementing occurence of word | ||
| return occurences | ||
|
|
||
| def edits1(self, word): | ||
| splits = [(word[:i], word[i:]) for i in range(len(word) + 1)] | ||
| deletes = [a + b[1:] for a, b in splits if b] | ||
| transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b)>1] | ||
| replaces = [a + c + b[1:] for a, b in splits for c in self.alphabet if b] | ||
| inserts = [a + c + b for a, b in splits for c in self.alphabet] | ||
| return set(deletes + transposes + replaces + inserts) | ||
|
|
||
| def known_edits2(self, word, wDict): | ||
| return set(e2 for e1 in self.edits1(word) for e2 in self.edits1(e1) if e2 in wDict) | ||
|
|
||
| def known(self, word, wDict): | ||
| return set(w for w in word if w in wDict) | ||
|
|
||
| def correct(self, word, wDict): | ||
| candidates = self.known([word], wDict[word[0]]) or self.known(self.edits1(word), wDict[word[0]]) or self.known_edits2(word, wDict[word[0]]) or [word] | ||
| return max(candidates, key=wDict.get) # returning the element of the set with the highest probability of being the correct word | ||
|
|
||
|
|
||
|
|
||
| def run(self, option): | ||
| lWords = self.words(file(self.dictPath).read()) | ||
| try: | ||
| if option == '0': | ||
| lWords = self.train(lWords) | ||
| while True: | ||
| word = raw_input('>') | ||
| if not word.isalpha(): | ||
| continue | ||
| spellchk = self.correct(word.lower(), lWords) | ||
| if spellchk == word and spellchk not in lWords[word[0]]: | ||
| print 'NO SUGGESTION' | ||
| else: | ||
| print spellchk | ||
| print #'\n' | ||
| elif option == '1': | ||
| misspell = Misspell(lWords) | ||
| lWords = self.train(lWords) | ||
| while True: | ||
| word = misspell.genWord() | ||
| print 'Incorrect -', word | ||
| spellchk = self.correct(word, lWords) | ||
| if spellchk == word and spellchk not in lWords[word[0]]: | ||
| print 'NO SUGGESTION' | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use python logging module instead of print statements There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. when does this loop terminate? I do not see a break.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay , i am reading about python logging module , and maybe i forgot to give break . |
||
| else: | ||
| print 'Correct -',spellchk | ||
| print #'\n' | ||
| raw_input('<enter>\n') | ||
| except KeyboardInterrupt: | ||
|
|
||
| 'exit' | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what do you intend to do with the string exit? it is not going to have any effect.
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay , removing the unnecessary things |
||
| except EOFError: | ||
| 'exit' | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use 4 spaces instead of tabs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure , will keep that in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhanuvrat why ?use tabs instead of spaces?