Python, excel, pandas, write an excel file to a panda's database

Using Python, Pandas and Excel to create my database

I started writing a new set of python class's for the machine learning experiment. With the original code I found that I was coding all the decisions that were being made.  Now while it functioned fine, the point of getting the machine to do the work was lost.

So the new one started trying to figure out how to make it quicker, reusable, and the data modifiable. The decision tree needed something to work with besides the hard coded solution.  This is where pandas and excel will come in.

Below the bad descisions section is the new code.  Feel free to scroll down if that's what your here for.

Image: Pandas in action.
Compacted terminal print of the columns of data labeled: size, seed_type, [...], ornaments, colors, names.   The rows are identified by number only.  1 - 8.  These identifiers in the column will help the new decision tree eliminate what plant we are describing down to a list of names that fits for size, seed_type, bundle_type, ornaments, colors, names.   I have no background in plant biology or the nature of that scientific field, these are not real scientific classifications.






Bad Descisions:


Here's the old decision tree,  it may not be easy to follow, but the point is, I knew what choices had to be made to get to the proper solution.  So by me coding it in like this, the whole machine learning aspect is lost.   Also it's massive and just too bulky.   If you want to copy this and play with it, no worries, but I am going to completely change how this class is set up for the next post on this experiment.

-- Start code block 1 --


class DecisionTree(object):
    def __init__(self, size):
        self.color = None
        self.size = size
        self.small_type = ['blueberry', 'raspberry', 'grape', 'blackberry', 'acai', 'strawberry', 'juniper', 'mulberry', 'cherry']
        self.small_growth_type = ['bundled', 'single']
        self.small_seed_type = ['outer', 'single core', 'multiple unpatterned']
        self.small_bundle_dict = {
            'red': ['grape', 'raspberry','mulberry', 'cherry', 'strawberry'],
            'green': ['grape'],
            'blue': ['blueberry', 'juniper'],
            'purple': ['acai', 'grape'],
            'black': ['blackberry', 'acai', 'grape']
            }
        self.medium_type = ['banana', 'apple', 'orange', 'pear', 'peach', 'plum', 'grapefruit', 'avacado', 'lime', 'kiwi', 'lemon']
        self.medium_ornaments = ['textured', 'smooth']
        self.medium_seed_type = ['single core', 'sectioned seed pods', 'multiple unpatterned', 'none']
        self.large_type = ['watermelon', 'papaya', 'pineapple', 'Jack fruit', 'musk-cantalope-honeydew', 'dragonfruit', 'mango']
        self.large_ornaments = ['striped', 'fruitlets-spikes', 'ridged', 'smooth']
        self.colors =  ['red', 'green', 'yellow', 'orange', 'purple', 'blue', 'black', 'white', 'grey']
        
    def info(self):
        message = """
        This class, DecisionTree will classify the catagory of fruit, first by size
        melons tend to be the largest fruit, > 8 inches.
        apples, oranges, kiwi, pear -- medium < 8 inches and > 2 
        berries & grape types, small -- < 2 inch
        From size we can then choose the other qualifiers:
        for small types, bundles indicate a raspberry like growth pattern, or grapes 
        single stem growth indicates fruits that develop from their own unshared stem
        strawberries, blueberries, juniper
        seed type can further narrow the fruit down, as cherrys and acai have singe seeds,
        strawberrys have outer seeds, and the other types have multiple seeds.
        Then we can choose by color at that point
        For medium types, 
        Ornaments :  what does the surface of this fruit display that is significant?
        Is it smooth like an apple or plum, or does it have a bumpy, or hairy texture like peaches or lemons?
        seed type:  Does it have a single core seed, multiple seeds, or sectioned seeds.
        Single seed:  Avacado
        sectioned seeds: citrus and apples
        multiple, all others
        Large fruit types,
        Ornaments will tell us a lot about a large fruit in this list
        Stripe and smooth are traits of a watermelon, fruitlets or spikes can lead us to pineapple or jackfruit
        ridged for musk, cantalope, honeydew type melons, smooth for papaya, and mango.
        Not smooth can be a dragonfruit, pineapple, melons (not watermelon) 
        """

    def size_tree(self):
        size = self.size
        size = int(size)
        if size <= 3:
            return 'small'
        elif 8 > size >= 1:
            return 'medium'
        elif size >= 8:
            return 'large'
        else:
            print("Size not found = ", size)
            return None

    def get_size(self, db):
        size = self.size_tree()
        db.size = size
        print("\nyour size classification is:", size)
        return size

    def get_seed_type(self):
        print("\nTo narrow down the choices, what seed type does this fruit have?")
        seed_type = input("(S)ingle core, (M)ultiple inner seeds, (O)uter seeds, (C)ompartmentalized seeds, (N)o seeds>>")
        if seed_type in ['S', 's', 'single', 'Single']:
            # database string: 'core'
            return 'S'
        elif seed_type in ['M', 'm', 'Multiple', 'multiple']:
            # database string: 'multiple'
            return 'M'
        elif seed_type in ['O', 'o', 'Outer', 'outer']:
            # database string: 'outer'
            return 'O'
        elif seed_type in ['C', 'c', 'Compartmentalized', 'compartmentalized']:
            # not used in small:   database name: 'compartment'
            return 'C'
        elif seed_type in ['N', 'n', 'No', 'no', 'None', 'none']:
            # not used in small:   database name: 'seedless'
            return 'N'
        else:
            print("seed type could not be found: ", seed_type)
            return None

    def get_fruit_color_bundle(self):
        # red is the one group with lots of berries.
        # narrow it down with seed type.
        print("\nby the most similar, which color best describes this fruit?")
        color = input("(R)ed , (B)lue, (Y)ellow, (P)urple, (Br)own, (O)range, (G)reen :  ")
        blues = ['B', 'b', 'blue', 'Blue']
        reds = ['R', 'r', 'red', 'Red']
        yellows = ['Y', 'y', 'Yellow', 'yellow']
        purples = ['P', 'p', 'Purple', 'purple']
        browns = ['Br', 'br', 'BR', 'Brown', 'brown']
        oranges = ['O', 'o', 'Orange', 'orange']
        greens = ['G', 'g', 'Green', 'green']
        list_names = []
        color_of = 'None'
        if color in reds:
            seed = self.get_seed_type()
            color_of = 'red'
            if seed == None:
                print("this seed type is not currently found in our red fruits. So sorry.")                
            elif seed == 'S':
                list_names.append('cherry')
            elif seed == 'O':
                list_names.append('strawberry')
            elif seed == 'M':
                list_names.append('raspberry')
                list_names.append('grape')
                list_names.append('mulberry')
                
        elif color in greens:
            color_of = 'green'
            list_names.append('grape')
        elif color in purples:
            print('purple berry')
            color_of = 'purple'
            # this would include blackberries, it's closest to purple
            list_names.append('blackberry')
            list_names.append('acai')
            list_names.append('juniper')
           
        elif color in blues:
            color_of = 'blue'
            list_names.append('blueberry')
        else:
            color_of = 'undefined'
            print("\nthis color small fruit bundled not found it may be a grape:", color_of)
            list_names.append('unknown')
        print("returning: ", list_names, color_of)
        items = [list_names, color_of]
        print(items)
        return items

    def get_fruit_color_single(self):
        # red is the a color group with lots of berries.
        # narrow it down with seed type.
        print("\nby the most similar, which color best describes this fruit?")
        color = input("(R)ed , (B)lue, (Y)ellow, (P)urple, (Br)own, (O)range, (G)reen :  ")
        blues = ['B', 'b', 'blue', 'Blue']
        reds = ['R', 'r', 'red', 'Red']
        yellows = ['Y', 'y', 'Yellow', 'yellow']
        purples = ['P', 'p', 'Purple', 'purple']
        browns = ['Br', 'br', 'BR', 'Brown', 'brown']
        oranges = ['O', 'o', 'Orange', 'orange']
        greens = ['G', 'g', 'Green', 'green']
        color_of = 'Undefined'
        names_list = []
        if color in reds:
            seed = self.get_seed_type()
            color_of = 'red'
            if seed == None:
                print("this seed type is not currently found in our red fruits. So sorry.")            
            elif seed == 'S':                
                names_list.append('cherry')
            elif seed == 'O':
                names_list.append('strawberry')
                
            elif seed == 'M':
                names_list.append('raspberry', 'grape', 'mulberry')
                
        elif color in greens:
            color_of = 'green'
            names_list.append('grape')
            
        elif color in purples:
            color_of = 'purple/black'
            names_list.append('blackberry', 'acai', 'grape')
            # this would include blackberries, it's closest to purple
           
        elif color in blues:
            color_of = 'blue'
            names_list.append('blueberry')
            
        else:
            print("\nthis color small fruit single stem not found it may be a wild unidentified berry:", color)
        items = [names_list, color_of]
        return items
        

    def growth_tree_small(self, test=False):
        S_strings = ['single', 'Single', 'S']
        B_strings = ['bundled', 'bundle', 'Bundle', 'Bundled', 'B']
        message1 = "\ngrowth type choices: bundled or single. B= bundled, S= single"
        message2 = "bundled means from a stem, multiple flowers form on smaller stems to become a fruit."
        message3 = "single means from a stem, a single flower is responsable for a single fruit."
        print(message1); print(message2); print(message3)
        growth = input("Is this fruit (B)undled or (S)ingle?")
        
        if growth in B_strings:
            new_small_fruits = ['raspberry', 'grape', 'blackberry', 'acai', 'cherry', 'mulberry']
            if test:
                print(new_small_fruits)
            return self.get_fruit_color_bundle()
        elif growth in S_strings:
            new_small_fruits = ['strawberry', 'blueberry', 'juniper']
            if test:
                print(new_small_fruits)
            return self.get_fruit_color_single()
        else:
            print("growth pattern not found: ", growth)
            return [['None'], 'None']

    def seed_tree_medium(self, seed):
        message1 = "seed types are : "
        new_med_fruits = self.medium_type

    def seed_tree_large(self, seed):
        new_large_fruits = self.large_type

    def from_size(self):
        #type = self.size_tree(test=False)
        type = self.size_tree(test=True)
        if type == 'small':
            self.growth_tree_small()
        elif type =='medium':           
            print("size is not small, it is = ", type)
        else:
            pass
        


-- End code block 1 --


Get that Excel Data!

Now the code for python, pandas and excel to work together.
The idea is, on the next try, I can ask the database to eliminate choices by the traits I tell it a fruit has.  Size would be the first, from there we ask the seed_type, then for small fruits, bundles, like you see in raspberries or grapes. These are important in identifying these small morsels.

The plant data isn't the important bit however, the important bit is to get the machine to tell me which fruit I am describing by the traits I give it.  I haven't decided how probability will work it's way in quite yet, but one step at a time.

Note that window's paths use backslashes, a real pain with python.   I found this excellent article about it if you're interested.  https://pythonconquerstheuniverse.wordpress.com/2008/06/04/gotcha-%E2%80%94-backslashes-in-windows-filenames/

Image the Excel file: 
 Pandas neatly grabs this files and assigns it data into a database for me as seen in the first image.

The code:



--start code block 2--



# Create a panda Database that can grab my un-scientific plant data from an Excel
# Use this data in my python file to find a fruit based on traits
# my os path= C:\Users\sarah\Desktop\python_stuffs\panda\
#  references:
#   1) https://www.dataquest.io/blog/excel-and-pandas/

# installs:  pip install -->  pandas, xlrd(for reading the excel file to pandas)
# imports: pandas,  os(to get through python, backslashes and paths to files.)

import pandas as pd
import os


class PandaFruit(object):
    def __init__(self):
        self.excel_file = "\\fruit.xlsx"
        db = None

    def load_file(self): 
        winpath = os.getcwd()
        print(winpath)
        new_file = winpath + self.excel_file
        print("\n newfile = ", new_file, "\n")

        db = pd.read_excel(new_file)
        print(db)
        return True


class TestPandaFruit(object):
    def __init__(self):
        self.class_to_test = PandaFruit()

    def test1(self):
        # test PandaFruit load_file method.  
        # database should be created from excel file, and printed to terminal.
        test = self.class_to_test.load_file()
        assert test == True
        

new_test = TestPandaFruit()
new_test.test1()







--end code block 2--






Comments

Popular posts from this blog

Pandas Python and a little probability

Statistics, Python, Making a Stem Plot

parenting, learning, and code