Saturday 10 November 2012

Happy Pi Day! (belated)

Pi Day is an international holiday celebrating the mathematical constant π. It is celebrated on March 14, i.e., 3/14 in month/day notation. It is typically celebrated by telling everyone you know, "Hey, it's Pi Day!" More enterprising people bake lots of pies, take pictures of them, and then post the pictures on the Internet.

After moving to the UK, where the date would be written 14/3, that choice for Pi Day seemed wrong, reeking of American cultural hegemony. There had to be a better way.

So I came up with one. Why not celebrate Pi Day on the 314th day of the year. In most years, this is November 10. This should be easy for everyone to remember, because it is the day after my birthday. In leap years, like 2012, Pi Day occurs a day earlier, on November 9.

Happy belated Pi Day!

(Those amused by the juxtaposition of this post with the preceding one on this blog are welcome to their amusement.)

Wednesday 3 October 2012

A Very British Thanksgiving

A humble note to the great British nation:

It has become cliche to say that the UK and the US enjoy a special relationship. Despite the obvious differences in language, size of automobiles, average waist circumference, availability of socialized medicine, and so on, it is undeniable that the two cultures are more similar than they are different. But these two sister countries are still divided by a cultural chasm, one that prevents us from truly having common ground, a chasm deeper than politics, football, or religion. This chasm is nothing less than the holiday of Thanksgiving.

Although you will no doubt be familiar with the holiday from your exposure to American books, films, and television, it is impossible to truly understand the spirit of Thanksgiving without having experienced it. The time has come for a new social movement to celebrate the goals of peace, cultural understanding, and consuming a seven kilogram turkey in one sitting. The time has come for the British people to finally adopt Thanksgiving as a national holiday.

Thanksgiving is an exceedingly simple holiday, celebrated on the fourth Thursday in November. Here's what you do: On Thursday, you prepare a large roast and serve it to your family. On Friday, you skive off work. That's it. First you eat a big roast, then you skive off work. I ask you, can you imagine a holiday more intrinsically suited to British culture? Frankly, I'm disappointed that you people didn't think of it first.

Now, as you become more experienced at celebrating Thanksgiving, there are many ways in which you can make the celebration more elaborate, if you prefer a more authentically American experience. For example, for Thanksgiving dinner, it is common to invite members of one's extended family, some of whom travel long distances to attend. This tradition will no doubt yield the same hilarious results in the UK as it does in the US.

Or you might like to try your hand at the American custom of "Black Friday". On the Friday after Thanksgiving---the skive day, remember---some people like to spend the day shopping, for their Christmas presents, ostensibly. The shops accommodate this by opening their doors early in the morning and advertising a range of one-day-only special offers, for which people---yes! Americans!---queue as early as 4 or 5 o'clock in the morning. Now, as much as you may welcome the opportunity of queueing, I do not recommend that you attempt to follow this tradition literally in the UK, as at 5am Friday on a British high street, you are likely to find yourself lonely. And, most probably, wet. Instead, I recommend that you wake up at a more reasonable hour, like noon, and do your shopping then.

Whichever of these more advanced Thanksgiving traditions you choose to adopt, it is important to emphasize that whilst all of these traditions can make be a fun way to add spice to the holiday, none of them are essential to the true spirit of Thanksgiving. Anyone to whom these enhanced traditions appear onerous should simply content themselves with the roast, the long lie in, and the knowledge that they are doing their part to increase cultural understanding across the Atlantic.

This year, why not start the festive season off right, with a very British Thanksgiving. Remember: Thursday, you make a roast. Friday, you skive. That's all there is to it.

In fact, why don't you try it out this Thursday? You know, for practice.

Notes

1 Thanksgiving would more properly be termed a North American holiday, as it is celebrated in Canada as well. However, I understand from my reading of Wikipedia that in Canada Thanksgiving is celebrated on a Monday rather than a Thursday. This makes no sense at all. For this reason I have made the decision to ignore the existence of Canadian Thanksgiving for the purposes of this essay.

Tuesday 2 October 2012

Note to Self

Probably best not to attempt using the phrase "posterior analysis" as a term of art. Fortunately I caught this before attempting to send it to anyone else...

Friday 24 August 2012

Spot the Scot

This is the final weekend of the Edinburgh Fringe Festival, an enormous and insane annual event which draws around half a million people to a city of around half a million people. Walking round the city this week, I thought of a game to pass the time when stuck in a Festival crowd.

The game is called "Spot the Scot". To play, start by walking down the streets of Edinburgh. Then, pick a group of people coming toward you on the street, not too distant, but far enough that you can't hear them. Give them a good look over, and guess whether they are actually Scottish or not. As they pass you, eavesdrop to find out if you were right.

I have found this game thoroughly enjoyable, and I highly recommend it. Feel free to post strategies or high scores in the comments.

Saturday 18 August 2012

Lies and Taxes

No matter one's political persuasion, it is hard not to think, as Willard Foxton argues in an interesting essay that the income tax code in the UK (and in the US too, for that matter) is too complex. In a more cynical mood I would be tempted to say that the tax law is so complex, because complex tax laws benefit the rich, and the rich make the laws.

In the UK there have been several scandals on tax avoidance, perhaps most notably, one of the two biggest Scottish football teams blowing up due to an offshore tax evasion scheme. At first I was unable to understand in the news reports why other football teams, and their fans, seemed so rabidly angry at the Rangers. But of course: football does not have a salary cap, so if a team unfairly spend less money on tax, it can spend more money on players. By cheating at their taxes, the Rangers were also cheating at football.

Taxes are political footballs as well, especially in the US. In the US there is an additional crazy phenomenon that creating a new program makes you an irresponsible tax and spend liberal that is taking money out of the pockets of working families, while cutting taxes makes you a deficit hawk. (I mean, uh, not to get too overtly political or anything?) Therefore if you are an American politician---of either party---and you want to create a new program for a noble goal, e.g., to pay for college scholarships for middle class families, why not make it a tax credit? That way, you get the noble program, and you can say that you're cutting taxes too!

Thursday 16 August 2012

Principles of Research Code

Ali Eslami has just writen a terrific page on organizing your experimental code and output. I pretty much agree with everything he says. I've thought quite a bit about this and would like to add some background.
Programming for research is very different than programming for industry. There are several reasons for this, which I will call Principles of Research Code. These principles underly all of the advice in Ali's post and in this post. These principles are:
  1. As a researcher, your product is not code. Your product is knowledge. Most of your research code you will completely forget once your paper is done.
  2. Unless you hit it big. If your paper takes off, and lots of people read it, then people will start asking you for a copy of your code. You should give it to them, and best to be prepared for this in advance.
  3. You need to be able to trust your results. You want to do enough testing that you do not, e.g., find a bug in your baselines after you publish. A small amount of paranoia comes in handy.
  4. You need a custom set of tools. Do not be afraid to write infrastructure and scripts to help you run new experiments quickly. But don't go overboard with this.
  5. Reproducability. Ideally, your system should be set up so that five years from now, when someone asks you about Figure 3, you can immediately find the command line, experimental parameters, and code that you used to generate it.
Principle 1 implies that the primary thing that you need to optimise for in research code is your own time. You want to generate as much knowledge as possible as quickly as possible. Sometimes being able to write fast code gives you a competitive advantage in research, because you can run on larger problems. But don't spend time optimising unless you're in a situation like this.
Also, I have some more practical suggestions to augment what Ali has said. These are
  1. Version control: Ali doesn't mention this, probably because it is second nature to him, but you need to keep all of your experimental code under version control. To not do this is courting disaster. Good version control systems include SVN, git, or Mercurial, etc. I now use Mercurial, but it doesn't really matter what you use. Always commit all of your code before you run an experiment. This way you can reproduce your experimental results by checking out the version of your code form the time that you ran an experiment.
  2. Random seeds: Definitely take Ali's advice to take the random seed as a parameter to your methods. Usually what I do is pick a large number of random seeds, save them to disk, and use them over and over again. Otherwise debugging is a nightmare.
  3. Parallel option sweeps: It takes some effort to get set up on a cluster like ECDF, but if you invest this, you get some nice benefits like the ability to run a parameter sweep in parallel.
  4. Directory trees: It is good to have your working directory in a different part of the directory space from your code, because then you don't get annoying messages from your version control system asking you why you haven't committed your experimental results. So I end up with a directory structure like
    ~/hg/projects/loopy_crf/code/synth_experiment.py
        ~/results/loopy_crf/synth_experiment/dimensions_20_iterations_300
    
    Notice how I match the directory names to help me remember what script generated the results.
  5. Figures list. The day after I submit a paper, I add enough information to my notebook to meet Principle 5. That is, for every figure in the paper, I make a note of which output directory and which data file contains the results that made that figure. Then for those output directories, I make sure to have a note of which script and options generated those results.
  6. Data preprocessing. Lots of times we have some complicated steps to do data cleaning, feature extraction, etc. It's good to save these intermediate results to disk. It's also good to use a text format rather than binary, so that you can do a quick visual check for problems. One tip that I use to make sure I keep track of what data cleaning I do is to use Makefiles to run the data cleaning step. I have a different Makefile target for each intermediate result, which gives me instant documentation.
If you want to read even more about this, I gave a guest lecture last year on a similar topic (slides, podcast).

Sunday 12 August 2012

Software I Like

I've just made an update to my list of software I like motivated by my experiences setting up a new computer.

Saturday 11 August 2012

Reflecting on a New Computer

Presently I am still enjoying the honeymoon phase of my new laptop. To avoid the slightest appearance of ostentation, I will refrain from going into details of exactly what laptop I got, except to say that it is of course a Mac, and it's REALLY REALLY cool!

Apple provides a Migration Assistant that apparently will copy all of your files and settings from your old computer, so that your new Mac looks exactly like your old one. My feeling about this is: Why would anyone want that? For me, one of the pleasures of a new computer is that it's *clean*, unburdened with hundreds of files scattered around my home directory that I never use but are too important (or too numerous) to simply delete.

So for years, whenever I get a new computer, I never copy my files over en masse. Instead, I copy over a small set of files that I know I need, and leave the rest on a backup. Then the next day, I find that I need a file on the backup that I didn't realize, go back and copy this over, etc.

This process stabilizes after a week or so, and my electronic life feels much less cluttered.

I suppose that I could just blow away my home directory every year for the same feeling, but somehow it is hard to convince myself to do this.

I wish that I could use the same process for physical papers, but sadly paper information cannot be stored as compactly as its electronic equivalent.

Super-Mac-Geek-Alert: For several years, I have been using Keychain to store secure notes such as password hints for bank logins, etc. I thought I was very clever to avoid impressive but costly tools like 1Password. Then I tried to copy the Keychain to my new computer. Painful. I think from now on I'll keep these notes on a small encrypted disk image.

Friday 3 August 2012

Converting Fahrenheit into Celsius... The Smart Way

or, from small beginnings...

One of the minor challenges of moving from the US to the UK is temperature. People in the UK always discuss the weather, and when they do, they use Celsius. My brain still works in Fahrenheit, so I need to convert typical daily outdoor temperatures in my head, and quickly enough that I can carry on a conversation.

You probably learned a formula in school for doing this. Completely useless. Forget all about it—but you already have, haven't you? You might remember, if you're clever, that the formula involves 9, 5, and 32 in some combination. But is it 9/5 or 5/9? Do you add 32, or do you subtract it? And do you do that before or after you multiply? And now people are wondering why you've been staring at them for two minutes when all they asked is how hot it was when you were in Seattle last week.

The problem is that the equation to convert C to F is too similar to the equation for the reverse, and both equations are too difficult to compute mentally. What we need is a simpler equation, that is easy to remember, and easy to work out quickly in your head.

So here's the trick. You memorise the following correspondences:

0 °C =32 °F
10 °C =50 °F
20 °C =68 °F
30 °C =86 °F

Then, to convert any temperature that is near these, approximate 1 °C = 2 °F. This will allow you to convert almost any naturally occurring outdoor temperature in the UK in either direction to within 1° accuracy.

Let's try it. As I write the current temperature in Edinburgh is 14 °C. This is 10 °C plus 4° extra. From memory convert the 10 °C to 50 °F. Then convert 4 °C extra to 8 °F extra and add it back on. This gives you 14°C = 58°F. This is not exact, but close enough that you know to wear a jumper. The exact formula is

  14 * 9 / 5 + 32 = 57 F
Good luck doing that in your head.

It tickles me that (maths alert) this is a piecewise linear approximation to a linear function. Mathematically, you would have to believe that a piecewise linear function would be more complicated, but mentally, it's not. Maybe there's a deep psychological principle here that scientists will figure out someday.

Until then, quite chilly today, isn't it?