My Hack Week: The new Banshee

A couple of weeks ago I really started investing a lot of time into what will become the new Banshee. There are a few big goals for this effort, including a full client UI/services stack split, insane performance improvements, and greatly improved library management and playback support.

Banshee will be split into two core stacks: a “headless” services stack and the thick client UI stack (what one would traditionally call “Banshee”).

Headless Services

One of the biggest areas of focus for Banshee will soon be on platform integration and the ability to run different client UIs against Banshee’s core.

There will be a headless services stack, which will include the source manager (which in turn provides access to the library, playlists, and other sources) and the playback controller. Everything here will be exposed over D-Bus, and can be extended through Mono.Addins.

So for instance, F-Spot could integrate the Banshee library into its slide show functionality; PiTiVi could access the same to offer music to a user for video editing; Elisa could be extended to pull its content from Banshee; Joe’s Banshee Music Server will talk to this stack to serve a user their content from any browser in the world.

The services stack is highly efficient – serving data is extremely fast, and consumes very little memory – and yes, it’s running on Mono.

The Thick Client

Banshee’s primary UI won’t be changing too drastically, but there are some key things to note.

There will at last be a multi-select artist/album browser, even though I’m still not a fan of this feature. It is however the number one most requested feature. There will also be a play queue, which battles for the browser as the most requested feature.

Most importantly, I am dropping the GtkTreeView for our track view. The biggest problem in Banshee over the past two years has been the memory use and speed of the track view. This is because it is not possible to write custom tree models in Gtk# – which lacks GInterface support – and thus we have had to rely on the simple GtkListStore model. This is not good when you need to map directly to a database, for example. It requires loading everything from the database ahead of time, which means the startup time is O(N), and memory consumption the same.

Now, before you start jumping to conclusions or throwing out solutions to this problem that would still allow us to use the GtkTreeView, let me cover the available options:

  • Write the model in C, bind that. While this would work, it would mean we’d have to rely on a native library for Banshee (and I am trying to work away from libbanshee, which currently only does GStreamer work). Extending this model would be painful in the managed realm as well, as we really want this to be a strong extension point.
  • Fix Gtk# to support GInterfaces. Well this one is obvious. If Gtk# properly supported GInterfaces, I would have written a managed custom tree model long ago, but would then have been done with it. Read on, and I’ll tell you why I’m “glad” GInterfaces aren’t supported. However, this does seriously need to be fixed. It’s the biggest hole in our binding.
  • Use P/Invoke hacks to implement the custom model in C#. While possible, it’s quite ugly. It’s not so bad for simple models that aren’t meant to be reused, but when you need to implement sorting, filtering, drag and drop, and other features, this gets out of control quickly. It’s just a gross hack, not to be taken seriously. It says more for the native interop in Mono than anything else.

The solution? I started writing a new widget, similar to GtkTreeView, but only for lists at the moment. I have no intentions of supporting trees, as this widget is designed specifically with our Banshee needs in mind (but is meant to be used in other managed applications that need list-based data binding).

The first thing you will notice is that startup time is ridiculously fast. The view needs only to know two things from the model: a row count, and a method for fetching a row at an index. Simple.

Columns are managed by a ColumnController, which is not a part of the view (one of the things that irritates me about the GtkTreeView). Columns in the ColumnController provide cell renderers, much like those in core GTK. With this all pieced together, when a row comes into view, the model is queried. The object returned from the model is then used when the view iterates the columns in the ColumnController, which is then passed to each renderer for binding.

Objects returned by the model have attributes on either the class or its properties which binds them to certain columns. The renderer is then responsible for rendering the bound object or property thereof. Pretty simple, but wildly effective, and insanely fast.

Also, the width of the columns are percentage based, like ETable in Evolution. It’s slick. I hate horizontal scrolling. There is support for a minimum absolute width and completely fixed width columns.

I will save a more detailed technical overview for some developer doc on our Wiki. Now on to a screencast.

This first one shows the new view and browser operating on a very sane collection with a decent amount of cover art. But it’s not a very large collection and doesn’t do the performance of the view justice. It’s nice however. No comments on the music selection – this database was carefully crafted by hand from a collection of databases so I could have a “nice” database with full collections of albums and proper metadata – that is, it’s not mine!

Theora Screencast

This second one shows the startup performance. It really is O(1). However, sorting and filtering gets slow when you reach such a high value. With the use of indexes in sqlite and a cache table though, filtering and sorting works against the cache, not the primary tables, once an initial sort or filter is applied. That is, it gets faster as you narrow your results, as to be expected.

Theora Screencast

I don’t really think anyone has a million songs, but this new model/view can still cope, even if it is slightly sluggish. However, my goals for performance optimization are to be fast at 100k songs. I think that’s a pretty high average, and we’re almost there. Just some more SQL tuning and I think we’ll have it. The limitation currently is that sorting and filtering starts to get “sluggish” at about 75k rows on my machine.

Oh, and this database was generated by a tool I wrote yesterday that uses a dictionary to randomly create a music library. It came up with some rather interesting band names. Other projects might find it useful for performance testing.

I’ll close on this point just by leaving a few stats, all tests based on a 25,000 song database.

Old Banshee New Banshee
Startup Time 32 Seconds 0.8 Seconds
RSS Consumption 81 MB 8.5 MB

Now, these tests weren’t performed very scientifically, but the point is that the differences are so vast. What’s more is that the Old Banshee numbers are based off O(N) efficiency, while the New Banshee numbers are O(1) – the same startup time and memory consumption exists for the database with almost a million rows. It would probably take an hour or so to load that with Old Banshee, and I don’t think I have the memory for it.

Great, but where can I get this?

All of this work is being performed on a separate branch, currently called “list-view” in Banshee trunk. I do not recommend anyone use this new work yet, however. I can reasonably say that it will not munge your existing data, but it will migrate it into new tables.

There is still a lot of work to be done in the coming weeks, but I hope to merge this back into Banshee trunk proper sometime around GUADEC.

That is all for now.