NLTK python , getting Wiwa to answer to interjections
Todays short project is to get my Whispering Wall, Wiwa, a way to answer to interjections.
I often have people I know play with her, to watch what I need to update and fix.
My best friend used things such as: 'Wow' and 'Really', and Wiwa did not know what to do with a one word response.
She either tagged it as a noun, and grabbed from the noun script, or it answered with the 'Is the cat on the keyboard?' because the UH (interjection) tag was not being used, and therefor the words were not being processed leading to an empty list to grab a response from.
So first step is to remember how this works, then open a file and start testing out tagging interjections.
websites used:
https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/
# Having trouble finding out why my interjections are always tagged:
# [('WOW', NN)] -- Noun , instead of the expected [('WOW', UH)] -- interjection
http://www.nltk.org/book_1ed/ch05.html
# And I really can't find why my NLTK tag's are not registering it.
# I believe the tokenize, and tag, are set up for a series of words. It assumes any one word statement is a Noun, *Unless in the nltk look up it has only one part of speech it belongs to?*
; even if it is a word that does NOT exist??
picture of what I mean:
I don't have a lot of time today, so to simplify and solve my problem, It would be easier, and cover more territory if my parser recognized it was a one-word input, and then deal with it.
Using python's PyEnchant (A spellchecking that can also see if a word is in the English dictionary ) I can check if it's a real word, or garble-de-gook, and respond accordingly, a script for one word responses, a script for garble-de-gook. (because sometimes it may just be a misspelled word) Maybe a way to find most similar word that they meant to type should be in the __future__?
Done for the day. Screenshot:
So here's some code:
I stole pieces I already had written with Wiwa. Hopefully I'll be updating her again soon.
Github: Whispering Wall
####### interjections.py, huh_script.txt, script4.txt #####
### The script files are just lines of responses. I will put huh_script at bottom
----------------------------------------------
#!bin/usr/python3
# -*- coding: utf-8 -*-
from random import randint
import enchant
import sys
#import nltk
#from nltk.corpus import wordnet
"""
nltk requires some downloads to use, but in this I ended up
removing the use of nltk.
I could not find good documentation on interjections, or why
NLTK seemed to tag almost everything that was a one word structure
a Noun, [('FlabberNoodles', 'NN')]
"""
# make a script to cover interjections, misspellings, garble-de-gook
# These should really be moved into a function so they aren't globals
huh_script = sys.argv[1]
simple_script = sys.argv[2]
def get_script_line(arg):
"""
Uses a txt script, that has a response on each line
The random will pick a line from script as a response
"""
with open(arg) as f:
for i, l in enumerate(f):
pass
count = i
if count != None:
with open(arg) as f:
lines = f.readlines()
x = randint(0, count)
return lines[x]
def get_user_input():
# At some point, maybe Wiwa can have different cue's for user input?
user_input = 'Please state the nature of your medical emergency:'
return input(user_input)
def enchant_check(arg):
""" using the PyEnchant English dictionary to
check validity of a word.
returns Bool-- True (is valid word) or False(is not an (US-English) word)
"""
dictionary = enchant.Dict("en_US")
x = dictionary.check(arg)
return x
def parse_string(string_data):
# the dictionary will be used to check if a word exists
if type(string_data) != str:
# always account for errors in data type
# saves so much trouble debugging
# especially with a good message to lead to the problem
message = "parse_string arguement should be string. \n Got this instead:"
x = type(string_data)
print(message, x)
raise TypeError
else:
# removed NLTK process.
"""
tokenized = nltk.word_tokenize(string_data)
tagged = nltk.pos_tag(tokenized)
"""
word_list = string_data.split()
if len(word_list) == 1:
valid = enchant_check(word_list[0])
else:
valid = None
return valid
def run_test_Wiwa():
done = False
while not done:
medical_nature = get_user_input()
if 'EXIT' in medical_nature:
print("Well I guess I'll go back to doing nothing.")
done = True
exit(0)
else:
response = parse_string(medical_nature)
# testing error real quick with : response = 99
if response != None:
if response == True:
answer = get_script_line(huh_script)
elif response == False:
answer = get_script_line(simple_script)
else:
x = type(response)
# I swear I've been putting the message directly in the
# raise error(message) and it was printing fine
# now it does not want to recognize the newline: \n ??
message = "Value Error in run_test_wiwa" + "\n" + " response variable \nis not of (bool or None) type\n got this instead:"
print(message, x)
error_message = "expected bool or None type."
raise ValueError(error_message)
else:
# response was None. multiple words, or No words were given
answer = "My friend Wiwa is much better equipped to talk to you."
print(answer)
run_test_Wiwa()
#### huh_script.txt ####
"I didn't know you felt so passionate."
"While one word answers are exceptional responses... please try again."
"I do say!"
"really."
"How about we use that word in a sentence?"
"very parabolic of you"
"If only words grew on trees."
"Bless you."
"Do you need a tissue?"
"Can we discuss the many letters available on your keyboard?"
I often have people I know play with her, to watch what I need to update and fix.
My best friend used things such as: 'Wow' and 'Really', and Wiwa did not know what to do with a one word response.
She either tagged it as a noun, and grabbed from the noun script, or it answered with the 'Is the cat on the keyboard?' because the UH (interjection) tag was not being used, and therefor the words were not being processed leading to an empty list to grab a response from.
So first step is to remember how this works, then open a file and start testing out tagging interjections.
websites used:
https://pythonprogramming.net/natural-language-toolkit-nltk-part-speech-tagging/
# Having trouble finding out why my interjections are always tagged:
# [('WOW', NN)] -- Noun , instead of the expected [('WOW', UH)] -- interjection
http://www.nltk.org/book_1ed/ch05.html
# And I really can't find why my NLTK tag's are not registering it.
# I believe the tokenize, and tag, are set up for a series of words. It assumes any one word statement is a Noun, *Unless in the nltk look up it has only one part of speech it belongs to?*
; even if it is a word that does NOT exist??
picture of what I mean:
I don't have a lot of time today, so to simplify and solve my problem, It would be easier, and cover more territory if my parser recognized it was a one-word input, and then deal with it.
Using python's PyEnchant (A spellchecking that can also see if a word is in the English dictionary ) I can check if it's a real word, or garble-de-gook, and respond accordingly, a script for one word responses, a script for garble-de-gook. (because sometimes it may just be a misspelled word) Maybe a way to find most similar word that they meant to type should be in the __future__?
Done for the day. Screenshot:
So here's some code:
I stole pieces I already had written with Wiwa. Hopefully I'll be updating her again soon.
Github: Whispering Wall
####### interjections.py, huh_script.txt, script4.txt #####
### The script files are just lines of responses. I will put huh_script at bottom
----------------------------------------------
#!bin/usr/python3
# -*- coding: utf-8 -*-
from random import randint
import enchant
import sys
#import nltk
#from nltk.corpus import wordnet
"""
nltk requires some downloads to use, but in this I ended up
removing the use of nltk.
I could not find good documentation on interjections, or why
NLTK seemed to tag almost everything that was a one word structure
a Noun, [('FlabberNoodles', 'NN')]
"""
# make a script to cover interjections, misspellings, garble-de-gook
# These should really be moved into a function so they aren't globals
huh_script = sys.argv[1]
simple_script = sys.argv[2]
def get_script_line(arg):
"""
Uses a txt script, that has a response on each line
The random will pick a line from script as a response
"""
with open(arg) as f:
for i, l in enumerate(f):
pass
count = i
if count != None:
with open(arg) as f:
lines = f.readlines()
x = randint(0, count)
return lines[x]
def get_user_input():
# At some point, maybe Wiwa can have different cue's for user input?
user_input = 'Please state the nature of your medical emergency:'
return input(user_input)
def enchant_check(arg):
""" using the PyEnchant English dictionary to
check validity of a word.
returns Bool-- True (is valid word) or False(is not an (US-English) word)
"""
dictionary = enchant.Dict("en_US")
x = dictionary.check(arg)
return x
def parse_string(string_data):
# the dictionary will be used to check if a word exists
if type(string_data) != str:
# always account for errors in data type
# saves so much trouble debugging
# especially with a good message to lead to the problem
message = "parse_string arguement should be string. \n Got this instead:"
x = type(string_data)
print(message, x)
raise TypeError
else:
# removed NLTK process.
"""
tokenized = nltk.word_tokenize(string_data)
tagged = nltk.pos_tag(tokenized)
"""
word_list = string_data.split()
if len(word_list) == 1:
valid = enchant_check(word_list[0])
else:
valid = None
return valid
def run_test_Wiwa():
done = False
while not done:
medical_nature = get_user_input()
if 'EXIT' in medical_nature:
print("Well I guess I'll go back to doing nothing.")
done = True
exit(0)
else:
response = parse_string(medical_nature)
# testing error real quick with : response = 99
if response != None:
if response == True:
answer = get_script_line(huh_script)
elif response == False:
answer = get_script_line(simple_script)
else:
x = type(response)
# I swear I've been putting the message directly in the
# raise error(message) and it was printing fine
# now it does not want to recognize the newline: \n ??
message = "Value Error in run_test_wiwa" + "\n" + " response variable \nis not of (bool or None) type\n got this instead:"
print(message, x)
error_message = "expected bool or None type."
raise ValueError(error_message)
else:
# response was None. multiple words, or No words were given
answer = "My friend Wiwa is much better equipped to talk to you."
print(answer)
run_test_Wiwa()
#### huh_script.txt ####
"I didn't know you felt so passionate."
"While one word answers are exceptional responses... please try again."
"I do say!"
"really."
"How about we use that word in a sentence?"
"very parabolic of you"
"If only words grew on trees."
"Bless you."
"Do you need a tissue?"
"Can we discuss the many letters available on your keyboard?"
Comments
Post a Comment