playing with dictionaries python


list => key look up

Update: 9-2-2021  Took the extra newlines out of the code block, changed title, might tinker a bit.

Update: 9-10-19 Putting the python into a code block for easier viewing.  I was doing this experiment while working with LearnMorePythonTheHardWay by Zed Shaw.  The book really pushed me to dissect things and figure them out and will be forever grateful for it. I highly recommend the learn python the hard way books to anyone seeking to learn python or even code for the first time.

Update: 1-4-18  A little thing.  Added a node ident... and a _length()  not much. *added a get_ident too...* Thinking about making this doubly linked.  Later it would be more flexible.  Still not so sure how I feel about if manipulating data makes it prone to corruption. Someone told me it doesn't... but it just seems like anything in this world would be prone to damage/corruption the more it gets tinkered with... It's just the nature of the universe.    Anyways...   Now our nodes will have a unique identity regardless of the stuff we put in them.   Hello Poland.  Not sure why you like looking at this one, but Cheers!!!

Update: 12-30-17     I did a little tinkering with it, added a couple methods.  It will now look up names and make a list of values that match a name given and return it.... or look up a value and return a list of matching names...  I did a couple runs with it, and those will be at the bottom. If any of the lookers wanted to try it out. It uses the Singly Linked List to store the results it returns.  That will be the list it returns.


*NOTE*
Way back in 2017, I wrote this bit below.  I was tackling computer science, and I didn't even know it at the time.  These were my thoughts, and my conjecture trying to figure out what was going on in a project inside the LearnMorePython the Hard Way book.

What is hash() doing??
hash() ....  It's doing this...  I have a huge bundle of places to put multiple different things.  The fastest way to find individual things,  if say it's a 256 bucket data structure, would be if the things were evenly distributed.   So if I have 20 different things that start with 'new'... newyork, newjersey, newhouse, newark, newton...  we want them to spread out evenly.  Because hash does all the work of transforming and remembering an integer string, (random but not really- set seeded algorithm decides what the integer will be based on the bytes of the string involved) We can get greatly diverse number identifications for each of these 'new----' strings.

One might be -72934928349  and another 387389999,  but hash will also be able to retrieve the same string with those codes,  if 387389999 is newpants,  it'll be 387389999 every time you use hash in that run of the 'script'. 
 
Likewise, when we go to put them in buckets, they are likely to go into different buckets.  Modulo, %256 is going to get a number between 0-255  on either of those, and they will likely be different. So one might go in bucket 49,  the other bucket 244.



Why I wrote this bit of code in 2017

I explored making a dictionary on my own... because I was just not understanding why in the world we needed to do all this crazy technical stuff.  If I want to store a bunch of keys and values,  why do I need these hashes and these Doubly linked lists inside doubly-linked lists?  I mean, I know he (Zed Shaw) is an expert and has decades of experience so there is most definitely a reason.  But on the other hand, I'm sitting here like....  why not just tinker with it.  Learning experience.... and maybe I can see what's going on?   And It'll keep me from feeling so overwhelmed by it all.

So here's a simple little one so far.

I'm going to tinker as I go.  It doesn't do much yet.  But hell... Not sure what the real purpose of the data structure is yet.  Apparently, there is a much larger purpose I'm not aware of.  The Wikipedia page for hash tables has this plethora of information on making these keys have unique values so that when stuff is looked up in the table there aren't conflicting look-ups. For storage of say 20,000 names, of which say, there are 200 John Smiths, and birthdays of these John Smiths there are 10 people born on December 10th, then how do we know if we have the correct record of the John Smith we seek?

That's why they make SSNs and that's why they don't let people create usernames that already exist.  I guess I just don't know. If it has a unique hash when you type in John Smith and there are 200 you still don't know which one you're getting.... OH GEESUM CROW.... yeah.  That's why I do these diversions/experiments.    Why can't people explain it in a way I understand?


Anyways.  My distraction for the day.  A little dictionary of my own.


--- Start code block ---


class DicaNode(object):



    def __init__(self, name, value, nxt):
        self.ident = None
        self.name = name
        self.value = value
        self.next = nxt



    def __repr__(self):
        nval = self.next and self.next.name or None     
        return f"[{self.name} : {self.value}], [{nval}]"



class Dictionary(object):

    def __init__(self):
        self.start = None
        self.end = None

    def push(self, node):
        end = self.end
        begin = self.start
        if begin == None:
            node.ident = 1
            self.start = node
            self.end = node         

        elif begin == end:
            node.ident = 2
            self.start.next = node
            self.end.next = node
            self.end = node

         

        else:

          # I am going to change this to node.ident = self.end.ident + 1
          # if nodes are removed there will be a possible duplicate ident
          # this way.  But if we add 1 to the end's ident... the new one will always
          # be a new integer.

            node.ident = self._length() + 1
            self.end.next = node
            self.end = node

         

    def _length(self):

          begin = self.start  
          count = 0
          while begin:
             count +=1
             begin = begin.next
          return count

       

    def add(self, key, value):
        node = DicaNode(key, value, None)
        self.push(node)

    ### so if we have 30 of the same name... they will conflict... 
    ### hash gives them unique address's.
    ### harry entered with value 30, is not the same as harry with value 67.....
    ### each node should have a unique id?

    def count_names(self, name):
        # Give the name a unique ID, why not just keep the node as a unique node?
        # I think wikipedia said something about data memory being used up 
        # if I have 20 joe smith's.... I have to have 20 index's for him and his values
        # 20 unique identifiers for each node.  
        # if I look up joe smith however I'm going to only
        # get one of the 20 different things

        begin = self.start
        end = self.end
        count = 0
        if begin == end:
            if self.begin.name == name:
                found = f" 1 - {name} found!"
                print(found)
                return name

            else:
                return None

        else:          

            while begin:
                if begin.name == name:
                    count += 1
                begin = begin.next



        found = f" {count} -- {name} found. "
        print(found)



    def show(self):
        ## print method
        node = self.start
        if node:
            while node:
                print(node)
                print(node.ident)
                node = node.next



############## New methods  12-30 #############


    def get_values_of_name(self, name):
        print("get_names")
        storage = SLList()
        begin = self.start
        end = self.end

     

        if begin == end:
            if self.begin.name == name:
                storage.push(begin.value)
                return storage

            else:
                return None

        else:
            print("else")
            while begin:
                if begin.name == name:
                   storage.push(begin.value)
                begin = begin.next

        return storage



    def get_names_of_values(self, value):

        print("get values")
        storage = SLList()
        begin = self.start
        end = self.end     

        if begin == end:
            if self.begin.value == value:
                storage.push(begin.name)
                return storage

            else:
                return None

        else:

            print("else")
            while begin:
                if begin.value == value:
                   storage.push(begin.name)
                begin = begin.next

        return storage



  def get_ident(self, nodeid):

        try:
            int(nodeid)
            begin = self.start

            while begin:
                if begin.ident == nodeid:
                    stats = f"[{begin.name} : {begin.value}] ({begin.ident})"
                    return stats
                else:
                    begin = begin.next



        except ValueError:
            print(" node id must be a valid positive whole number integer")







     

###TESTING :::

colors = Dictionary()

colors.add('blue', 'navy')

colors.add('orange', 'burnt')

colors.add('yellow', 'sunshine')

colors.add('blue', 'dark ocean')

colors.show()

colors.count_names('blue')



###  new method tests ::::

blues = colors.get_values_of_name('blue')

blues.dump()



dates = Dictionary()

dates.add('harry', 'dec12')

dates.add('beth', 'dec23')

dates.add('tim', 'dec14')

dates.add('timN', 'dec12')

dates.add('sarah', 'aug7')

birthdays = dates.get_names_of_values('dec12')

birthdays.dump()



### test idents

print(dates.get_ident(2))

print(dates.get_ident('dec12'))







--- End code block ---




Comments

Popular posts from this blog

JavaScript Ascii animation with while loops and console.log

playing with trigonometry sin in pygame

playing with color in powershell python