Meet the New Website, Same as the Old Website, Roughly

I’ve finally taken the plunge and switched my website to a more modern blogging software (WordPress) and a dedicated media gallery (Gallery 2). Hopefully through the magic of redirects most shouldn’t notice much difference. I just hope planet hasn’t decided to dump all my posts onto the front page, if it has then I apologise.

I’ve also taken this opportunity to move all my extensions to addons.mozilla.org. Most are currently still in the sandbox, hopefully they will come public soon. I’m also using the nice new AMO API service to populate details on the add-on homepages, means there is only one place I need to make most changes to.

Trimming the Fat

As I mentioned in my last post and as I know everyone is well aware, tinderbox is starting to feel the strain. You already can’t just glance at it to see what is going on. The Firefox tree is currently showing build and test results from a total of 23 machines. The interesting thing is that not all of these machines are actually doing any compiling. The rest are merely running tests on builds produced by the other machines. That isn’t to say that those test results are less important but I wonder whether it is worth treating these differently.

Thinking about this difference I’ve spent a short time knocking together an experimental view of tinderbox. The idea is simple, the test boxes that aren’t actually compiling don’t get a dedicated column. We still want those results though, and luckily there is somewhere to put them, on the build that they are testing. So every actual compile ends up as a box within which can be multiple results from testers. This leaves us a very manageable 9 columns, 3 per platform.

This is quite early stages, the physical perf and test numbers aren’t visible yet, there are a couple of bugs I know about, it doesn’t auto update and I wouldn’t recommend you really use this, but I am interested to know what people think before I proceed any further with it. There are a few questions in my mind like how to colour the main box of the build (currently the worse result from the compile and all the test results), how to colour the header (I suspect the worst of the most recent results from all of the machines reporting in that column), where to even put the full perf/test numbers (there isn’t really room in some of the fast nightly boxes). More importantly is this even readable? I think it is good for some things, often tinderbox gets confusing when you see a talos result fail long after the nightly box went green, here that talos red would be back on the nightly box that failed. I can imagine problems trying to figure out why the header is red when it’s actually the talos results from build a couple of times ago that is causing it.

Anyway without further ado, take a look at Tindermerge and let me know your thoughts.

Visualising our Labours

Well it has been a mad few weeks. Between landing the new Get Add-ons pane, sheriffing the Firefox 3 beta 3 freeze, decimating old parts of the code and working through follow-up issues from the Get Add-ons pane and not to mention two 10 hour flights it really has been busy. Thankfully I’ve not just been totally buried under code, I’ve had some small spare time to tinker with a few other things (I need to to keep my sanity).

We’re getting better and better at adding quality control to our code. The QA team do their awesome work, we have some 50,000 automated unit tests now (depending how you count) and our tinderbox now has more machines running performance tests than it does machines building the Fox. One problem that all this brings though is a bit of information overload. The tinderbox is now so large that you can no longer glance to see if you can check in or not. Goodness knows how many pages you need to view to truly monitor all of the performance graphs. So, spurred on by Jonathan’s performance dashboard, I’ve been tinkering with some ways to present some of this data.

MultiperfFirst off there is the old-style performance machines, pre-talos. These often find regressions first, because they cycle fast. Apparently that is going to improve but until then these old machines have useful data. Unfortunately the graphs they produce are … well lets just say pig ugly! It didn’t take too long to knock up something very basic, but nicer looking and allowing multiple results on the same graph. You can see it graphing the live data (yes it will appear blank for the age it takes to draw). Unfortunately the graph kit I’m using seems a little slow and not all that functional, but that could be changed. Then again I have just found out that new work on the graphs server pretty much blows this out of the water!

So that’s all pretty and good but doesn’t bring a much new to the table. Graphs are good but what we want to be able to do is relate the performance changes to the code that caused it. At this point Johnathan was kind enough to point me at Timeplot. This is a rather neat tool that lets you visualise numerical data on the same graph as defined events. That’s kinda perfect for this. It took me a little longer to hook this one up given the multiple sources and the kinda weird API that can’t normally load from anything other than a remote text file, but here is the result. You can see the performance data of course (Ts from the linux perf box), then each vertical band is a checkin which you can click on for more details.

CheckinplotIn the end the timeplot is not massively useful. When you get down to it we tend to check in far more frequently than we gather performance information. I’ll be interested to see if the fast run talos boxes will improve matters. If we had many of them then we can start correlating the regressions across them and thus far reduce the range in which to look for potential causes. I’m interested in other ideas people might have for visualising these kinds of data, I think finding the right way to visualise our information is the key to managing things in the future.

Another thing that needs help is the tinderbox of course. While I was in Mountain View last week Johnathan and I did a quick brainstorm of what the tinderbox is used for. We think it is being used for too many different things and if we split the display out into just those tasks that individuals needed then it would be far more manageable. Maybe you have your own comments on uses we didn’t come up with.

Just finally, after playing with timeplot I was still in a tinkering mood. One thing I periodically have to do is look back over what work I did on specific days. Bugmail is a good indicator for this. Now though I am working in my own Mercurial repository which means I have a full checkin log of my own to look over. Given that it is interleaved with the full Mozilla CVS checkin log it still a little arduous to scan through. But know I have the tools and after not long I can know skim over my previous checkins and even see in depth what each involved (yes, sorry boss it doesn’t look like I did much coding today!).