Posts Tagged ‘ programming ’

Choosing data structures

I decided earlier in the week that I wanted to write an implementation of a general Markov chain in C; this is not particularly hard, but it does raise some interesting questions along the way—in particular, about data structures. A Markov chain really consists of two nested data structures: the set of the nodes in the list, and the set of outbound edges from each node; these structures serve vastly different purposes, as the order in which the nodes in the Markov chain are accessed depends on how edges are selected, but edges need to be sorted in a standard order so as to match how a Markov process functions. Accordingly, I found myself basing my decision on a few criteria:

Markov nodes:

  • Order of access not absolutely required to be stable
  • Key most likely in a totally ordered set
  • Minimizing access time a handy thought

Edge nodes:

  • Order of access paramount. Needs to remain stable.
  • Edges absolutely in a totally ordered set
  • Access time less important than order.

Based on these criteria, I arrived at two vastly different structures. For the Markov nodes, that “minimizing access time” led directly to thinking about trees, and thinking about how motion through a Markov chain happens led to considering a splay tree. A splay tree, if you’re unaware, is a binary search tree which has only one operation: splay. When you splay the tree, you modify its structure, bringing the node you are looking for (or the closest match) to the root of the tree. This has the handy property of keeping all the most frequently used nodes right in the top few levels of the tree, and still maintains an amortized O(log(n))—even with the slight probability that repeated splay operations will lead to it becoming a linked list.

To the contrary, the data structure I selected for the edges is a linked list, sorted from largest key to smallest key; in this case, I required that the structure remain stable, and furthermore that it have a particular order, with every node in a known position with regards to its value. Although search is an O(n) operation, it is doubtful a graph will have so many connections as to make that prohibitive, while at the same time maintaining a safe, regular, easy to traverse structure when running find_next_state().

By the by, the current code can be found in my Bitbucket repository, in markov.c and markov.h, respectively. They’re currently under rapid development and are nowhere near release-ready; I’m expecting the API to change significantly over the next few days.

Branching out into new areas

I’ve started learning how to program in Java; this is a New Thing for me. Well, not entirely new. I’ve got a deep knowledge of how the system works together, thanks to debugging my Linux installation over the years, and I know how to format source code thanks to LaTeX. Even simply arranging my thoughts in such a super-logical manner isn’t entirely new to me; I spent a fairly fruitful semester of junior high learning how to program my calculator. The greatest change is really just learning the language.

It’s been a while since I worked on learning a new computer language — LaTeX has been in my repertoire for a long time now, long enough that I can essentially parse it into its final structure as I’m writing it. At this point, what I really need is just a condensed version of the course I’m in; this is how this works, this is where that fits. I have all the pieces, I just need them to fit together. Unfortunately, I’m in the class for those who have no basis in CS at all … but it gives me a nice, light class to take. I’ll just breeze through it.