Python enchant and building a spell checker

I decided to do this experiment because I realized Whispering wall would default to the end clause if everything a user enters is spelled wrong.

Pictured below:  Watson assistant failed when I told him "I want to work on improving watson"  I probably had a mispelling, I can not fathom why it didn't pick something out of the sentence to use.



hhtthttps://twitter.com/NelliesNoodles/status/1109236474640257024ps://twitter.com/NelliesNoodles/status/1109236474640257024ttps://twitter.com/NelliesNoodles/status/1109236474640257024

My enchant dictionary checked the validity of a word, but if that word was misspelled it was not valid, and placed in the errors. 
The enchant dictionary also has an issue with whitespace, as I found out doing this, and that needed to be corrected for Wiwa also.
Hoping to update her tomorrow.

I still have to build the method that returns a 'fixed it for you' sentence, But I am debating if I want wiwa to just run with whatever word the suggestor gives, and pick a script from there.  She won't have a database to store correct or incorrect fixes, and the point is to keep her talking and engaging.

It's more realistic that someone would repeat back to you a word they misheard because you mispronounced it, rather then detect any flaw in your word through how you spell it in your mind when you speak it. It may be a silly concept, but the whispering wall is an ongoing experiment and meant for entertainment.


 


I left the print statements in where pytest was not being helpful.
I hope to get better at writing tests. *Cheers to that!*
This one is really tiny.  I tested the prior version which was using python's itertools and permutations, but a warning:
the permutations has no limit on where it will stop building.  O^n!  is dangerous for a beginners machine like mine.  I'll figure out how to place a limiter on a file I run someday.  And I'll post it so you all can use it too.

Here's the code, have at it!

#######  the pytest file  ########

import pytest
from spell_checker2 import WordFinder

def test_whitespace_del():
    wf = WordFinder()
    sentence = "   this is a sentence with whitespace in head"
    x = wf.remove_preceeding_whitespace(sentence)

    assert x == "this is a sentence with whitespace in head"

    one_character = "   n"

    x2 = wf.remove_preceeding_whitespace(one_character)

    assert x2 == 'n'

def test_cleaner():
    wf = WordFinder()
    sentence = "  $ ^&#@!)() hippo"
    cleaned = wf.clean_input(sentence)
    assert cleaned == "hippo"


######  The WordFinder File  ######

 # spell-check and word suggestion:  Attempt #2

import re
import enchant


class WordFinder(object):
    def __init__(self):
        self.ench_dictionary = enchant.Dict("en-US")
        self.get_words = self.ench_dictionary.suggest


#######    validation and cleaning of input  #########
    def check_word(self, word):
        return self.ench_dictionary.check(word)

    def validate_string(self, sentence):
        """
        using a whitespace counter, check if the string is made
        up of only whitespace. raises enchant ValueError
        'can not spellcheck an empty string'
        """

        if sentence:
            contents = list(sentence)

            length = len(contents)
            whitespace_count = 0
            for word in contents:
                if word != " ":
                    pass
                else:
                    whitespace_count += 1
            if length == whitespace_count:
                return False
            else:
                return True
        else:
            return False


    def remove_preceeding_whitespace(self, sentence):
        """
        remove all whitespace before the first character appears
        """

        sentence_list = list(sentence)
        index = 0
        indexs_to_be_deleted = []
        for character in sentence_list:
            if not character.isspace():
                #print("breaking at:", index)
                break
            else:
                #print("placing index in list: ", index)
                indexs_to_be_deleted.append(index)
                index += 1
        #print(sentence_list)
        if len(indexs_to_be_deleted) > 0:
            placement = 0
            for integer in indexs_to_be_deleted:
                del sentence_list[integer + placement]
                placement -= 1
            #print(sentence_list)
            #new_sentence = ''.join(i) for i in sentence_list

            new_sentence = ''
            for words in sentence_list:
                new_sentence += words
            return new_sentence
        # default return original sentence, head whitespace not found
        return sentence

    def remove_non_alphanumeric(self, sentence):

        """
        Whispering wall checks for a question first, then
        processes the string, so a regex strip would happen after
        the string has been checked for any '?'.
        """

        # stack overflow answers to the rescue!
        # a lot of times there are multiple answers you can put
        # together to make what you are looking for.
        # link:

        # https://stackoverflow.com/questions/1276764/stripping-everything-but-alphanumeric-chars-from-a-string-in-python

        newstring = re.sub(r'\W+', ' ', sentence, flags=re.UNICODE)
        return newstring


    def clean_input(self, user_input):
        """
        Clean the user input outside of the while loop.
        return clean valid sentence to use enchant's
        dictionary.check() on.
        """

        # convert any input to string
        user_input = str(user_input)

        # strip out all non alphanumerics with regex
        user_input = self.remove_non_alphanumeric(user_input)
        #print("after regex: \n", user_input)
        # clean any whitespace in the head of the user_input
        ##  IMPORTANT  ##
        ## this needs to be done AFTER regex or a whitespace might remain

        user_input = self.remove_preceeding_whitespace(user_input)


        ########  development purposes, return only first word ###
        # make only one word is accessed with split
        user_input = user_input.split(" ")
        # lower case only the first word given in request
        user_input = user_input[0].lower()
        #process word with enchant.suggest

        return user_input
############## end cleaning ################

###########  core loops for word processing ########

    def build_suggestion(self, sentence):
        # This is where we will replace words that are mispelled
        # and return a suggested sentence.

        # use clean on the sentence
        # if a word is spelled wrong, use the enchant suggest
        # replace word with the first one listed
        # return the suggested sentence

        pass

    def get_suggestions(self):
        user_request = input("your word:")

        while user_request not in ['EXIT', 'QUIT']:
           # Make sure request is not empty:
           #    IMPORTANT  #
           # Cleaning needs to be done before validation #
           # enchant raises ValueError on empty strings #
           user_request = self.clean_input(user_request)
           valid = self.validate_string(user_request)
           if user_request and valid:
               if self.check_word(user_request):
                   print("Your word is correct.\n")
               else:
                   result = self.get_words(user_request)
                   print(result, "\n")
           else:
               # print error for empty request
               print("No word given in previous request.")
           user_request = input("Your word:")









###  running in command prompt  ###
run_loop = WordFinder()
run_loop.get_suggestions()





#######  manual print statement tests to find errors ####
#wf = WordFinder()
#sentence = "   this is a sentence with whitespace in the head"
#wf.get_suggestions()
#wf2 = WordFinder()
#print(x)
#wf3 = WordFinder()
#sentence = "  %$^&#*#&^# hippo"
#wf3.clean_input(sentence)








Comments

Popular posts from this blog

JavaScript Ascii animation with while loops and console.log

playing with trigonometry sin in pygame

JavaScript and a Matrix