Python enchant and building a spell checker
I decided to do this experiment because I realized Whispering wall would default to the end clause if everything a user enters is spelled wrong.
Pictured below: Watson assistant failed when I told him "I want to work on improving watson" I probably had a mispelling, I can not fathom why it didn't pick something out of the sentence to use.
hhtthttps://twitter.com/NelliesNoodles/status/1109236474640257024ps://twitter.com/NelliesNoodles/status/1109236474640257024ttps://twitter.com/NelliesNoodles/status/1109236474640257024
My enchant dictionary checked the validity of a word, but if that word was misspelled it was not valid, and placed in the errors.
The enchant dictionary also has an issue with whitespace, as I found out doing this, and that needed to be corrected for Wiwa also.
Hoping to update her tomorrow.
I still have to build the method that returns a 'fixed it for you' sentence, But I am debating if I want wiwa to just run with whatever word the suggestor gives, and pick a script from there. She won't have a database to store correct or incorrect fixes, and the point is to keep her talking and engaging.
It's more realistic that someone would repeat back to you a word they misheard because you mispronounced it, rather then detect any flaw in your word through how you spell it in your mind when you speak it. It may be a silly concept, but the whispering wall is an ongoing experiment and meant for entertainment.
I left the print statements in where pytest was not being helpful.
I hope to get better at writing tests. *Cheers to that!*
This one is really tiny. I tested the prior version which was using python's itertools and permutations, but a warning:
the permutations has no limit on where it will stop building. O^n! is dangerous for a beginners machine like mine. I'll figure out how to place a limiter on a file I run someday. And I'll post it so you all can use it too.
Here's the code, have at it!
####### the pytest file ########
import pytest
from spell_checker2 import WordFinder
def test_whitespace_del():
wf = WordFinder()
sentence = " this is a sentence with whitespace in head"
x = wf.remove_preceeding_whitespace(sentence)
assert x == "this is a sentence with whitespace in head"
one_character = " n"
x2 = wf.remove_preceeding_whitespace(one_character)
assert x2 == 'n'
def test_cleaner():
wf = WordFinder()
sentence = " $ ^&#@!)() hippo"
cleaned = wf.clean_input(sentence)
assert cleaned == "hippo"
###### The WordFinder File ######
# spell-check and word suggestion: Attempt #2
import re
import enchant
class WordFinder(object):
def __init__(self):
self.ench_dictionary = enchant.Dict("en-US")
self.get_words = self.ench_dictionary.suggest
####### validation and cleaning of input #########
def check_word(self, word):
return self.ench_dictionary.check(word)
def validate_string(self, sentence):
"""
using a whitespace counter, check if the string is made
up of only whitespace. raises enchant ValueError
'can not spellcheck an empty string'
"""
if sentence:
contents = list(sentence)
length = len(contents)
whitespace_count = 0
for word in contents:
if word != " ":
pass
else:
whitespace_count += 1
if length == whitespace_count:
return False
else:
return True
else:
return False
def remove_preceeding_whitespace(self, sentence):
"""
remove all whitespace before the first character appears
"""
sentence_list = list(sentence)
index = 0
indexs_to_be_deleted = []
for character in sentence_list:
if not character.isspace():
#print("breaking at:", index)
break
else:
#print("placing index in list: ", index)
indexs_to_be_deleted.append(index)
index += 1
#print(sentence_list)
if len(indexs_to_be_deleted) > 0:
placement = 0
for integer in indexs_to_be_deleted:
del sentence_list[integer + placement]
placement -= 1
#print(sentence_list)
#new_sentence = ''.join(i) for i in sentence_list
new_sentence = ''
for words in sentence_list:
new_sentence += words
return new_sentence
# default return original sentence, head whitespace not found
return sentence
def remove_non_alphanumeric(self, sentence):
"""
Whispering wall checks for a question first, then
processes the string, so a regex strip would happen after
the string has been checked for any '?'.
"""
# stack overflow answers to the rescue!
# a lot of times there are multiple answers you can put
# together to make what you are looking for.
# link:
# https://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python
newstring = re.sub(r'\W+', ' ', sentence, flags=re.UNICODE)
return newstring
def clean_input(self, user_input):
"""
Clean the user input outside of the while loop.
return clean valid sentence to use enchant's
dictionary.check() on.
"""
# convert any input to string
user_input = str(user_input)
# strip out all non alphanumerics with regex
user_input = self.remove_non_alphanumeric(user_input)
#print("after regex: \n", user_input)
# clean any whitespace in the head of the user_input
## IMPORTANT ##
## this needs to be done AFTER regex or a whitespace might remain
user_input = self.remove_preceeding_whitespace(user_input)
######## development purposes, return only first word ###
# make only one word is accessed with split
user_input = user_input.split(" ")
# lower case only the first word given in request
user_input = user_input[0].lower()
#process word with enchant.suggest
return user_input
############## end cleaning ################
########### core loops for word processing ########
def build_suggestion(self, sentence):
# This is where we will replace words that are mispelled
# and return a suggested sentence.
# use clean on the sentence
# if a word is spelled wrong, use the enchant suggest
# replace word with the first one listed
# return the suggested sentence
pass
def get_suggestions(self):
user_request = input("your word:")
while user_request not in ['EXIT', 'QUIT']:
# Make sure request is not empty:
# IMPORTANT #
# Cleaning needs to be done before validation #
# enchant raises ValueError on empty strings #
user_request = self.clean_input(user_request)
valid = self.validate_string(user_request)
if user_request and valid:
if self.check_word(user_request):
print("Your word is correct.\n")
else:
result = self.get_words(user_request)
print(result, "\n")
else:
# print error for empty request
print("No word given in previous request.")
user_request = input("Your word:")
### running in command prompt ###
run_loop = WordFinder()
run_loop.get_suggestions()
####### manual print statement tests to find errors ####
#wf = WordFinder()
#sentence = " this is a sentence with whitespace in the head"
#wf.get_suggestions()
#wf2 = WordFinder()
#print(x)
#wf3 = WordFinder()
#sentence = " %$^&#*#&^# hippo"
#wf3.clean_input(sentence)
Pictured below: Watson assistant failed when I told him "I want to work on improving watson" I probably had a mispelling, I can not fathom why it didn't pick something out of the sentence to use.
hhtthttps://twitter.com/NelliesNoodles/status/1109236474640257024ps://twitter.com/NelliesNoodles/status/1109236474640257024ttps://twitter.com/NelliesNoodles/status/1109236474640257024
My enchant dictionary checked the validity of a word, but if that word was misspelled it was not valid, and placed in the errors.
The enchant dictionary also has an issue with whitespace, as I found out doing this, and that needed to be corrected for Wiwa also.
Hoping to update her tomorrow.
I still have to build the method that returns a 'fixed it for you' sentence, But I am debating if I want wiwa to just run with whatever word the suggestor gives, and pick a script from there. She won't have a database to store correct or incorrect fixes, and the point is to keep her talking and engaging.
It's more realistic that someone would repeat back to you a word they misheard because you mispronounced it, rather then detect any flaw in your word through how you spell it in your mind when you speak it. It may be a silly concept, but the whispering wall is an ongoing experiment and meant for entertainment.
I left the print statements in where pytest was not being helpful.
I hope to get better at writing tests. *Cheers to that!*
This one is really tiny. I tested the prior version which was using python's itertools and permutations, but a warning:
the permutations has no limit on where it will stop building. O^n! is dangerous for a beginners machine like mine. I'll figure out how to place a limiter on a file I run someday. And I'll post it so you all can use it too.
Here's the code, have at it!
####### the pytest file ########
import pytest
from spell_checker2 import WordFinder
def test_whitespace_del():
wf = WordFinder()
sentence = " this is a sentence with whitespace in head"
x = wf.remove_preceeding_whitespace(sentence)
assert x == "this is a sentence with whitespace in head"
one_character = " n"
x2 = wf.remove_preceeding_whitespace(one_character)
assert x2 == 'n'
def test_cleaner():
wf = WordFinder()
sentence = " $ ^&#@!)() hippo"
cleaned = wf.clean_input(sentence)
assert cleaned == "hippo"
###### The WordFinder File ######
# spell-check and word suggestion: Attempt #2
import re
import enchant
class WordFinder(object):
def __init__(self):
self.ench_dictionary = enchant.Dict("en-US")
self.get_words = self.ench_dictionary.suggest
####### validation and cleaning of input #########
def check_word(self, word):
return self.ench_dictionary.check(word)
def validate_string(self, sentence):
"""
using a whitespace counter, check if the string is made
up of only whitespace. raises enchant ValueError
'can not spellcheck an empty string'
"""
if sentence:
contents = list(sentence)
length = len(contents)
whitespace_count = 0
for word in contents:
if word != " ":
pass
else:
whitespace_count += 1
if length == whitespace_count:
return False
else:
return True
else:
return False
def remove_preceeding_whitespace(self, sentence):
"""
remove all whitespace before the first character appears
"""
sentence_list = list(sentence)
index = 0
indexs_to_be_deleted = []
for character in sentence_list:
if not character.isspace():
#print("breaking at:", index)
break
else:
#print("placing index in list: ", index)
indexs_to_be_deleted.append(index)
index += 1
#print(sentence_list)
if len(indexs_to_be_deleted) > 0:
placement = 0
for integer in indexs_to_be_deleted:
del sentence_list[integer + placement]
placement -= 1
#print(sentence_list)
#new_sentence = ''.join(i) for i in sentence_list
new_sentence = ''
for words in sentence_list:
new_sentence += words
return new_sentence
# default return original sentence, head whitespace not found
return sentence
def remove_non_alphanumeric(self, sentence):
"""
Whispering wall checks for a question first, then
processes the string, so a regex strip would happen after
the string has been checked for any '?'.
"""
# stack overflow answers to the rescue!
# a lot of times there are multiple answers you can put
# together to make what you are looking for.
# link:
# https://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python
newstring = re.sub(r'\W+', ' ', sentence, flags=re.UNICODE)
return newstring
def clean_input(self, user_input):
"""
Clean the user input outside of the while loop.
return clean valid sentence to use enchant's
dictionary.check() on.
"""
# convert any input to string
user_input = str(user_input)
# strip out all non alphanumerics with regex
user_input = self.remove_non_alphanumeric(user_input)
#print("after regex: \n", user_input)
# clean any whitespace in the head of the user_input
## IMPORTANT ##
## this needs to be done AFTER regex or a whitespace might remain
user_input = self.remove_preceeding_whitespace(user_input)
######## development purposes, return only first word ###
# make only one word is accessed with split
user_input = user_input.split(" ")
# lower case only the first word given in request
user_input = user_input[0].lower()
#process word with enchant.suggest
return user_input
############## end cleaning ################
########### core loops for word processing ########
def build_suggestion(self, sentence):
# This is where we will replace words that are mispelled
# and return a suggested sentence.
# use clean on the sentence
# if a word is spelled wrong, use the enchant suggest
# replace word with the first one listed
# return the suggested sentence
pass
def get_suggestions(self):
user_request = input("your word:")
while user_request not in ['EXIT', 'QUIT']:
# Make sure request is not empty:
# IMPORTANT #
# Cleaning needs to be done before validation #
# enchant raises ValueError on empty strings #
user_request = self.clean_input(user_request)
valid = self.validate_string(user_request)
if user_request and valid:
if self.check_word(user_request):
print("Your word is correct.\n")
else:
result = self.get_words(user_request)
print(result, "\n")
else:
# print error for empty request
print("No word given in previous request.")
user_request = input("Your word:")
### running in command prompt ###
run_loop = WordFinder()
run_loop.get_suggestions()
####### manual print statement tests to find errors ####
#wf = WordFinder()
#sentence = " this is a sentence with whitespace in the head"
#wf.get_suggestions()
#wf2 = WordFinder()
#print(x)
#wf3 = WordFinder()
#sentence = " %$^&#*#&^# hippo"
#wf3.clean_input(sentence)
Comments
Post a Comment