So this past week I spent a couple of days making an API in Clojure to scrape data from the ATP World Tour website. Here are a github link and a clojars link to the project. Anyways, instead of aimlessly discussing the API I’ll talk about some of the things I learned about Clojure while making the API.
- How map actually works. I never really understood how map worked other than I just thought it was a cool way to reduce nested for loops. The way I think of it now is kind of like this. Let’s say you have a list of items you want to pass through a function f(x). Map iterates through that list and passes each item through and let’s you use it in f(x).
- Testing is amazing. So initially I went by the program testing features using the println command and a (defn -main [& args] ..). It’s so much more effective to have testing because you always end up changing features around, trying to optimize things, etc. And every once in a while that leads to you breaking your program (or features in it). While you may get explicit errors when the program itself crashes, it won’t tell you when features that you knew previously to have run correctly no longer run correctly. Hence why testing. Honestly, before Clojure I was never big on testing or test driven development. Even a stint of using Ruby on Rails hadn’t changed my opinion on testing until now.
- Midje – it makes testing in Clojure a lot easier. Check it out. Also lein midje :autotest is amazing.
- Java interloping can be useful! Certain common Java functions are already integrated into Clojure (e.g. subs = substring), but those that are not can still be very useful. I found myself using the .indexOf function as a base for a large chunk of my testing (as it helped determined if a substring was a part of a string). In hindsight, this can be cut out by simply using regex checks, but at the moment it was a useful learning tool that helped demonstrate Java’s utility.
- Be careful about how you name your files/namespaces. Seriously. Typos were an incredible headache when trying to get testing up and running and the errors were utterly confusing.
Those were some of my adventures in Clojure this past week. Hope you got something out of it. I plan to write a more comprehensive guide at some point down the line. Cheers!
I’ve been quite busy this past week with real world commitments so decided to post some Elo rankings for this past week. Enjoy any of you tennis fans! If enough of you like this kind of stuff I wouldn’t mind putting it in the side bar (or Glicko-2 ratings, whatever people seem to prefer). Ratings are sorted by Serve (ratingS) and then by Return (ratingR).
Last time we examined the Glicko-2 model and understood how it worked mathematically. This time we shall focus on implementing the Glicko-2 system for tennis.
Let’s establish a few of the ground rules, 1) we will be using the constants mentioned last week (the base rating as 1500 and the initial deviation as 400 / ln (10. Other constants remain. 2) Despite Glicko-2’s ability to do batch updates (i.e. for a tournament), after trying both methods I found that there is no significant benefit in batch updates and, in fact, updating each match individually proves to have better accuracy. 3) We need to have two ratings per player, one on serve and one on return. (more…)
The Glicko-2 rating system is the second generation of rating systems developed by Mark Glickman to estimate a player’s skill in chess. Glicko itself is, in my opinion, a more sophisticated version of Elo. I truly love the rating system both for its simplicity and the information it provides. Unlike Elo, Glicko provides both the rating deviation and the volatility a player has. That being said, let’s get right into the math and then explain how to implement it for tennis.
The Elo rating system originated in the mid-1900s and has since been predominantly used in chess rankings. On occasion Elo has been used for other sports, or for video games (e.g. League of Legends used the system until just recently). The system itself is very basic; it’s entirely based on wins and losses against other players. Simply put, your new skill will be measured based on the skill of your opponent, your expected performance given your opponent’s skill and your skill, and your actual skill (win, loss, draw, or anywhere in between) on the actual game.
Before we begin it’s important to remember that I’m only going to talk about a few different ways to approach this problem; there are alternative routes you can take to do predictive modeling. Be warned, long post ahead! That being said let’s break this down into two parts. First, determining an individual player’s skill at point in time. Second, given a player one’s skill and a player two’s skill at point in time, determining the probability that a player will win the match. Basically, we’re trying to determine the skill of a player, and, given different players’ skills, calculate their probability of winning a match. (more…)