Python, excel, pandas, write an excel file to a panda's database
Using Python, Pandas and Excel to create my database
I started writing a new set of python class's for the machine learning experiment. With the original code I found that I was coding all the decisions that were being made. Now while it functioned fine, the point of getting the machine to do the work was lost.
So the new one started trying to figure out how to make it quicker, reusable, and the data modifiable. The decision tree needed something to work with besides the hard coded solution. This is where pandas and excel will come in.
Below the bad descisions section is the new code. Feel free to scroll down if that's what your here for.
Image: Pandas in action.
Compacted terminal print of the columns of data labeled: size, seed_type, [...], ornaments, colors, names. The rows are identified by number only. 1 - 8. These identifiers in the column will help the new decision tree eliminate what plant we are describing down to a list of names that fits for size, seed_type, bundle_type, ornaments, colors, names. I have no background in plant biology or the nature of that scientific field, these are not real scientific classifications.
Bad Descisions:
Here's the old decision tree, it may not be easy to follow, but the point is, I knew what choices had to be made to get to the proper solution. So by me coding it in like this, the whole machine learning aspect is lost. Also it's massive and just too bulky. If you want to copy this and play with it, no worries, but I am going to completely change how this class is set up for the next post on this experiment.
-- Start code block 1 --
-- End code block 1 --I started writing a new set of python class's for the machine learning experiment. With the original code I found that I was coding all the decisions that were being made. Now while it functioned fine, the point of getting the machine to do the work was lost.
So the new one started trying to figure out how to make it quicker, reusable, and the data modifiable. The decision tree needed something to work with besides the hard coded solution. This is where pandas and excel will come in.
Below the bad descisions section is the new code. Feel free to scroll down if that's what your here for.
Image: Pandas in action.
Compacted terminal print of the columns of data labeled: size, seed_type, [...], ornaments, colors, names. The rows are identified by number only. 1 - 8. These identifiers in the column will help the new decision tree eliminate what plant we are describing down to a list of names that fits for size, seed_type, bundle_type, ornaments, colors, names. I have no background in plant biology or the nature of that scientific field, these are not real scientific classifications.
Bad Descisions:
Here's the old decision tree, it may not be easy to follow, but the point is, I knew what choices had to be made to get to the proper solution. So by me coding it in like this, the whole machine learning aspect is lost. Also it's massive and just too bulky. If you want to copy this and play with it, no worries, but I am going to completely change how this class is set up for the next post on this experiment.
-- Start code block 1 --
class DecisionTree(object):
def __init__(self, size):
self.color = None
self.size = size
self.small_type = ['blueberry', 'raspberry', 'grape', 'blackberry', 'acai', 'strawberry', 'juniper', 'mulberry', 'cherry']
self.small_growth_type = ['bundled', 'single']
self.small_seed_type = ['outer', 'single core', 'multiple unpatterned']
self.small_bundle_dict = {
'red': ['grape', 'raspberry','mulberry', 'cherry', 'strawberry'],
'green': ['grape'],
'blue': ['blueberry', 'juniper'],
'purple': ['acai', 'grape'],
'black': ['blackberry', 'acai', 'grape']
}
self.medium_type = ['banana', 'apple', 'orange', 'pear', 'peach', 'plum', 'grapefruit', 'avacado', 'lime', 'kiwi', 'lemon']
self.medium_ornaments = ['textured', 'smooth']
self.medium_seed_type = ['single core', 'sectioned seed pods', 'multiple unpatterned', 'none']
self.large_type = ['watermelon', 'papaya', 'pineapple', 'Jack fruit', 'musk-cantalope-honeydew', 'dragonfruit', 'mango']
self.large_ornaments = ['striped', 'fruitlets-spikes', 'ridged', 'smooth']
self.colors = ['red', 'green', 'yellow', 'orange', 'purple', 'blue', 'black', 'white', 'grey']
def info(self):
message = """
This class, DecisionTree will classify the catagory of fruit, first by size
melons tend to be the largest fruit, > 8 inches.
apples, oranges, kiwi, pear -- medium < 8 inches and > 2
berries & grape types, small -- < 2 inch
From size we can then choose the other qualifiers:
for small types, bundles indicate a raspberry like growth pattern, or grapes
single stem growth indicates fruits that develop from their own unshared stem
strawberries, blueberries, juniper
seed type can further narrow the fruit down, as cherrys and acai have singe seeds,
strawberrys have outer seeds, and the other types have multiple seeds.
Then we can choose by color at that point
For medium types,
Ornaments : what does the surface of this fruit display that is significant?
Is it smooth like an apple or plum, or does it have a bumpy, or hairy texture like peaches or lemons?
seed type: Does it have a single core seed, multiple seeds, or sectioned seeds.
Single seed: Avacado
sectioned seeds: citrus and apples
multiple, all others
Large fruit types,
Ornaments will tell us a lot about a large fruit in this list
Stripe and smooth are traits of a watermelon, fruitlets or spikes can lead us to pineapple or jackfruit
ridged for musk, cantalope, honeydew type melons, smooth for papaya, and mango.
Not smooth can be a dragonfruit, pineapple, melons (not watermelon)
"""
def size_tree(self):
size = self.size
size = int(size)
if size <= 3:
return 'small'
elif 8 > size >= 1:
return 'medium'
elif size >= 8:
return 'large'
else:
print("Size not found = ", size)
return None
def get_size(self, db):
size = self.size_tree()
db.size = size
print("\nyour size classification is:", size)
return size
def get_seed_type(self):
print("\nTo narrow down the choices, what seed type does this fruit have?")
seed_type = input("(S)ingle core, (M)ultiple inner seeds, (O)uter seeds, (C)ompartmentalized seeds, (N)o seeds>>")
if seed_type in ['S', 's', 'single', 'Single']:
# database string: 'core'
return 'S'
elif seed_type in ['M', 'm', 'Multiple', 'multiple']:
# database string: 'multiple'
return 'M'
elif seed_type in ['O', 'o', 'Outer', 'outer']:
# database string: 'outer'
return 'O'
elif seed_type in ['C', 'c', 'Compartmentalized', 'compartmentalized']:
# not used in small: database name: 'compartment'
return 'C'
elif seed_type in ['N', 'n', 'No', 'no', 'None', 'none']:
# not used in small: database name: 'seedless'
return 'N'
else:
print("seed type could not be found: ", seed_type)
return None
def get_fruit_color_bundle(self):
# red is the one group with lots of berries.
# narrow it down with seed type.
print("\nby the most similar, which color best describes this fruit?")
color = input("(R)ed , (B)lue, (Y)ellow, (P)urple, (Br)own, (O)range, (G)reen : ")
blues = ['B', 'b', 'blue', 'Blue']
reds = ['R', 'r', 'red', 'Red']
yellows = ['Y', 'y', 'Yellow', 'yellow']
purples = ['P', 'p', 'Purple', 'purple']
browns = ['Br', 'br', 'BR', 'Brown', 'brown']
oranges = ['O', 'o', 'Orange', 'orange']
greens = ['G', 'g', 'Green', 'green']
list_names = []
color_of = 'None'
if color in reds:
seed = self.get_seed_type()
color_of = 'red'
if seed == None:
print("this seed type is not currently found in our red fruits. So sorry.")
elif seed == 'S':
list_names.append('cherry')
elif seed == 'O':
list_names.append('strawberry')
elif seed == 'M':
list_names.append('raspberry')
list_names.append('grape')
list_names.append('mulberry')
elif color in greens:
color_of = 'green'
list_names.append('grape')
elif color in purples:
print('purple berry')
color_of = 'purple'
# this would include blackberries, it's closest to purple
list_names.append('blackberry')
list_names.append('acai')
list_names.append('juniper')
elif color in blues:
color_of = 'blue'
list_names.append('blueberry')
else:
color_of = 'undefined'
print("\nthis color small fruit bundled not found it may be a grape:", color_of)
list_names.append('unknown')
print("returning: ", list_names, color_of)
items = [list_names, color_of]
print(items)
return items
def get_fruit_color_single(self):
# red is the a color group with lots of berries.
# narrow it down with seed type.
print("\nby the most similar, which color best describes this fruit?")
color = input("(R)ed , (B)lue, (Y)ellow, (P)urple, (Br)own, (O)range, (G)reen : ")
blues = ['B', 'b', 'blue', 'Blue']
reds = ['R', 'r', 'red', 'Red']
yellows = ['Y', 'y', 'Yellow', 'yellow']
purples = ['P', 'p', 'Purple', 'purple']
browns = ['Br', 'br', 'BR', 'Brown', 'brown']
oranges = ['O', 'o', 'Orange', 'orange']
greens = ['G', 'g', 'Green', 'green']
color_of = 'Undefined'
names_list = []
if color in reds:
seed = self.get_seed_type()
color_of = 'red'
if seed == None:
print("this seed type is not currently found in our red fruits. So sorry.")
elif seed == 'S':
names_list.append('cherry')
elif seed == 'O':
names_list.append('strawberry')
elif seed == 'M':
names_list.append('raspberry', 'grape', 'mulberry')
elif color in greens:
color_of = 'green'
names_list.append('grape')
elif color in purples:
color_of = 'purple/black'
names_list.append('blackberry', 'acai', 'grape')
# this would include blackberries, it's closest to purple
elif color in blues:
color_of = 'blue'
names_list.append('blueberry')
else:
print("\nthis color small fruit single stem not found it may be a wild unidentified berry:", color)
items = [names_list, color_of]
return items
def growth_tree_small(self, test=False):
S_strings = ['single', 'Single', 'S']
B_strings = ['bundled', 'bundle', 'Bundle', 'Bundled', 'B']
message1 = "\ngrowth type choices: bundled or single. B= bundled, S= single"
message2 = "bundled means from a stem, multiple flowers form on smaller stems to become a fruit."
message3 = "single means from a stem, a single flower is responsable for a single fruit."
print(message1); print(message2); print(message3)
growth = input("Is this fruit (B)undled or (S)ingle?")
if growth in B_strings:
new_small_fruits = ['raspberry', 'grape', 'blackberry', 'acai', 'cherry', 'mulberry']
if test:
print(new_small_fruits)
return self.get_fruit_color_bundle()
elif growth in S_strings:
new_small_fruits = ['strawberry', 'blueberry', 'juniper']
if test:
print(new_small_fruits)
return self.get_fruit_color_single()
else:
print("growth pattern not found: ", growth)
return [['None'], 'None']
def seed_tree_medium(self, seed):
message1 = "seed types are : "
new_med_fruits = self.medium_type
def seed_tree_large(self, seed):
new_large_fruits = self.large_type
def from_size(self):
#type = self.size_tree(test=False)
type = self.size_tree(test=True)
if type == 'small':
self.growth_tree_small()
elif type =='medium':
print("size is not small, it is = ", type)
else:
pass
Get that Excel Data!
Now the code for python, pandas and excel to work together.
The idea is, on the next try, I can ask the database to eliminate choices by the traits I tell it a fruit has. Size would be the first, from there we ask the seed_type, then for small fruits, bundles, like you see in raspberries or grapes. These are important in identifying these small morsels.
The plant data isn't the important bit however, the important bit is to get the machine to tell me which fruit I am describing by the traits I give it. I haven't decided how probability will work it's way in quite yet, but one step at a time.
Note that window's paths use backslashes, a real pain with python. I found this excellent article about it if you're interested. https://pythonconquerstheuniverse.wordpress.com/2008/06/04/gotcha-%E2%80%94-backslashes-in-windows-filenames/
Image the Excel file:
Pandas neatly grabs this files and assigns it data into a database for me as seen in the first image.
The code:
--start code block 2--
# Create a panda Database that can grab my un-scientific plant data from an Excel
# Use this data in my python file to find a fruit based on traits
# my os path= C:\Users\sarah\Desktop\python_stuffs\panda\
# references:
# 1) https://www.dataquest.io/blog/excel-and-pandas/
# installs: pip install --> pandas, xlrd(for reading the excel file to pandas)
# imports: pandas, os(to get through python, backslashes and paths to files.)
import pandas as pd
import os
class PandaFruit(object):
def __init__(self):
self.excel_file = "\\fruit.xlsx"
db = None
def load_file(self):
winpath = os.getcwd()
print(winpath)
new_file = winpath + self.excel_file
print("\n newfile = ", new_file, "\n")
db = pd.read_excel(new_file)
print(db)
return True
class TestPandaFruit(object):
def __init__(self):
self.class_to_test = PandaFruit()
def test1(self):
# test PandaFruit load_file method.
# database should be created from excel file, and printed to terminal.
test = self.class_to_test.load_file()
assert test == True
new_test = TestPandaFruit()
new_test.test1()
Comments
Post a Comment