Posts Tagged ‘ mathematics ’

Choosing data structures

I decided earlier in the week that I wanted to write an implementation of a general Markov chain in C; this is not particularly hard, but it does raise some interesting questions along the way—in particular, about data structures. A Markov chain really consists of two nested data structures: the set of the nodes in the list, and the set of outbound edges from each node; these structures serve vastly different purposes, as the order in which the nodes in the Markov chain are accessed depends on how edges are selected, but edges need to be sorted in a standard order so as to match how a Markov process functions. Accordingly, I found myself basing my decision on a few criteria:

Markov nodes:

  • Order of access not absolutely required to be stable
  • Key most likely in a totally ordered set
  • Minimizing access time a handy thought

Edge nodes:

  • Order of access paramount. Needs to remain stable.
  • Edges absolutely in a totally ordered set
  • Access time less important than order.

Based on these criteria, I arrived at two vastly different structures. For the Markov nodes, that “minimizing access time” led directly to thinking about trees, and thinking about how motion through a Markov chain happens led to considering a splay tree. A splay tree, if you’re unaware, is a binary search tree which has only one operation: splay. When you splay the tree, you modify its structure, bringing the node you are looking for (or the closest match) to the root of the tree. This has the handy property of keeping all the most frequently used nodes right in the top few levels of the tree, and still maintains an amortized O(log(n))—even with the slight probability that repeated splay operations will lead to it becoming a linked list.

To the contrary, the data structure I selected for the edges is a linked list, sorted from largest key to smallest key; in this case, I required that the structure remain stable, and furthermore that it have a particular order, with every node in a known position with regards to its value. Although search is an O(n) operation, it is doubtful a graph will have so many connections as to make that prohibitive, while at the same time maintaining a safe, regular, easy to traverse structure when running find_next_state().

By the by, the current code can be found in my Bitbucket repository, in markov.c and markov.h, respectively. They’re currently under rapid development and are nowhere near release-ready; I’m expecting the API to change significantly over the next few days.

It’s all a puzzle.

Mathematics and I have a curious relationship.

I’m a very audiovisual-oriented thinker. If I can truly visualise a problem, I can solve it. This is where physics and I have problems, because there are too many specialised methods, and visualisation can break down fairly easily. On the other hand, equations arrange themselves in my mind, graphs plot in my head, and all of this just fits.

That’s not to say math is easy. Quite the contrary; as I’m writing this in my notebook, I’m avoiding working on my homework, so I can develop some peace of mind before taking up the project again. No, what happens is exhaustion — mental exhaustion and unfamiliar concepts forming a vile brew that corrupts everything I’m working on.

So I go away, leave it behind, do something else, calm down, break the pieces up and set them aside in my mind, where they fade out and leave only a few pure concepts to be explored once I have some time to really look and see what I need. My physics professor last semester said it well: « Go home, drink a beer, relax for a while, and then try it again. »

Now, he was talking about the midterm exam, which I was going to do horribly on because for whatever reason, physics is no longer the area where I have strong analytic skills. (Just as an aside, I do miss physics, but I think Mathematics is a more sure route to my interests, and my enjoyment.)

I look at this state of « blockedness » as a symptom of too much work; really, there is such a thing. After classes, the last thing I needed to do was start working immediately. Unfortunately, today, I did that, and I set back my homework by at least an hour, if not more.

Editorial post-script: I got back to the rez and promptly solved the two remaining problems in about an hour, tops. This is being typed and posted a few weeks later, because I’m a lazy bastard.