Too much silence

October was a very busy month for me. It started out in Seattle at the Helix Summit at RealNetworks, moved immediately to the GNOME summit in Boston, and ended with the Mono Summit, also in Boston. I’ve so far only spent 6 nights at my apartment in Raleigh this month.

During all that time, I think I managed to [rather quietly] make two or three Banshee releases.

I’ve got a month of blogging to catch up on, and have a massive post/update on Banshee in the works (and has been for some time, so please forgive me). It’ll be full of goodies, so keep an eye out for it in the next day or so.

Cracking down on heap abuse (part 2)

Also during the Mono Summit, Ben ran mono --profile on Banshee and alerted me to the fact that taglib-sharp was seriously abusing the heap. I had known it was less than optimal, knew about where the issue was, and knew the fix – but I really hadn’t had much time to address it.

Seeing real numbers was a big motivating factor however. The heap/memory numbers I’ll present here are total heap allocations – not a total heap growth that’s never GCed (leaked). That is, the numbers aren’t a heap reservation that’s present at one time, so it’s not something a user may really notice – these are accumulations of very small allocs/frees throughout the lifetime of the test program.

The Problem

In taglib-sharp, which is a fully managed C# port of TagLib (C++), there are four custom collections classes that provide some extra useful operations. This is where I have focused my optimization work as all formats use these collections, thus every format benefits from the work. Also, it was blindingly obvious what the problem was once I looked at the code.

Take the TagLib.ByteVector class. It serializes many formats of data into a collection of bytes. However, it was using the System.Collections.ArrayList class to store these bytes! As a byte type is stack-based data, and ArrayList stores heap-based data, each stack-based member of the collection must be boxed for storage and unboxed for retrieval. This boxing/unboxing operation essentially allocates a heap-based object to act as a container to store the stack-based byte data. So for each byte that’s pushed into the collection, another object is allocated on the heap! Yikes.

The Solution

Luckily, we have generics. Generics to the rescue – and I cannot stress this enough. Simply by replacing ArrayList with List<byte>, the excessive memory problem goes away, since the type to be stored in the collection is known at compile time, and thus boxing/unboxing is no longer necessary. While the fix is very simple, what we get in return is sheer love.

This was the first time I had really looked at all the custom collections code in taglib-sharp. I spent a few hours rewriting/refactoring it all to implement generic interfaces, and created a TagLib.ListBase<T> base collections class that provides common operations for all custom collections used in TagLib. In the process I made some other minor optimizations and clean ups, and now the code is much easier to read and is rather consolidated. Fun.

The Tests

To make sure I didn’t break anything in the process, I finally set up a pretty extensive NUnit test suite for taglib-sharp and implemented tests for every collection class. I also ported our old format tests from entagged-sharp to taglib-sharp and identified a few problems (which Brian has now fixed).

In the process of setting up the tests, and to see exactly how much better we are with generics, I wrote some performance stress tests as well.

Before my optimization work:

  • Using ByteVector.FromUri() on a 4.1MB M4A file (this call simply loads the entire file into the ByteVector, creating about 4,300,000 bytes in the vector – this is not how files are read in “real life” in taglib-sharp, but is a raw stress test of the ByteVector collection):
    1. Total Heap Allocations: 103,246 KB (Yes, that’s 103 MB!)
    2. Execution time: 1.7 seconds
  • Using File.Create() on the same 4.1MB M4A file 10,000 consecutive
    times (this is how you actually process a file for metadata in taglib-sharp, and shows more “real life” optimizations):

    1. Total time (10,000 iterations) 00:01:17.1391680
    2. Average time (1 iteration): 0.0077027999 seconds

After my optimization work:

  • Using ByteVector.FromUri()
    1. Total Heap Allocations: 16,421 KB (Wow, dropped to 16 MB!)
    2. Execution time: 0.04 seconds (whoa!)
  • Using File.Create()
    1. Total time (10,000 iterations) 00:00:41.0676870
    2. Average time (1 iteration): 0.0040969793 seconds

So what’s that mean? The average tag reading operation is now about 200% faster and uses substantially less memory during the operation. This speed up translates directly to applications.

These numbers are also just from optimizing TagLib.ByteVector. After optimizing the other three collection classes, things speed up even more. For instance before the optimization, Banshee took around 9.5 minutes to import 5100 audio files on my machine. After the optimization it took about 4 minutes – a 5.5 minute speed up. Keep in mind there are other factors during a real import, such as directory walking overhead, mimetype detection, and disk caching, all of which accounts for about 0.5 minutes of the overall importing process.

Also, each format may use the collections differently, so some may benefit more than others. I’ve now moved on to identifying optimizations that can be done at the format level. For instance, I found out that taglib-sharp’s OGG/Vorbis support is very, very fast. However, the MPEG4 format is very, very slow in comparison. This led me into further investigation and comparisons.

Comparing taglib-sharp to other players in the game (sort of)

In the 0.10.x series in Banshee, we had used entagged-sharp, a fully managed metadata/tag reader. It was pretty good, but did not have write support, had a number of issues dealing with poor/illegal/improper string encodings in ID3 tags, and only had partial MPEG4/AAC and ASF/WMA support. I had originally planed on moving to GStreamer for tag reading, but there were a number of problems with this:

  • A hard dependency on GStreamer for tag reading
  • More native< -->managed code/interop
  • Missing/poor demuxer support for some formats (missing in terms of “it’s not in -base or -good, so it doesn’t exist for us in practicality” ) – this was the biggest factor in my decision to not go with GStreamer for tag reading
  • GStreamer 0.10 really isn’t suited to easily, efficiently, and safely do strict tag reading. The way GStreamer 0.10 works makes it rather difficult/expensive to perform a blocking operation, which is what’s often needed for tag reading.

Anyway, I’m not trying to put down GStreamer – I just came to the conclusion that for strictly reading tags, it’s probably not the best solution. I use its tag reading support in the playback pipeline to get live metadata updates from streams, etc.

For 0.11.x, I ended up choosing a new player in the game, taglib-sharp. It had full read/write support and supported MPEG4 and ASF very well. It also handled poor/illegal string encodings “properly” – or at least much better than entagged-sharp.

taglib-sharp does everything we need it to do in terms of functionality, so now it’s just a matter of improving its performance. All that said, I constructed a test today that compares taglib-sharp, entagged-sharp, and GStreamer. Each tag reader is used 10,000 times on each file in the test suite.

Total is the time taken for test to be run 10000 times.
Avg is the time taken for test to be run once.

              File         Reader      Avg.     Total
-----------------------------------------------------
       sample.flac      GStreamer  0.003094   30.9390
                           TagLib  0.000324    3.2374
                         Entagged  0.000288    2.8834

     sample_v1.mp3      GStreamer  0.004671   46.7052
                           TagLib  0.000472    4.7197
                         Entagged  0.000269    2.6885

     sample_v2.mp3      GStreamer  0.004011   40.1102
                           TagLib  0.001756   17.5608
                         Entagged  0.000444    4.4426

        sample.mpc      GStreamer  0.003685   36.8494
                           TagLib  0.000161    1.6140
                         Entagged  0.000300    3.0025

        sample.ogg      GStreamer  0.013235  132.3549
                           TagLib  0.000615    6.1522
                         Entagged  0.002869   28.6909

   sample_both.mp3      GStreamer  0.004087   40.8693
                           TagLib  0.001033   10.3278
                         Entagged  0.000471    4.7082

        sample.m4a         TagLib  0.002908   29.0760
                         Entagged  0.000580    5.7995

        sample.wma         TagLib  0.000802    8.0176
                         Entagged  0.000374    3.7391

The results show that GStreamer is vastly slower than either taglib-sharp or entagged-sharp. This may not be entirely fair however, as the GStreamer code was under development a few months ago, and I have a feeling the bottleneck there isn’t so much the demuxing process, but rather having to wait on the processing thread in order to create a blocking operation. Running the GStreamer tests also tend to crash every so often. I had to remove MPEG4 from the test as I don’t have an MPEG4 demuxer for GStreamer, but it’s substantially slower in taglib-sharp than it is in entagged-sharp. My ASF/WMA test was also removed as my ASF demuxer in GStreamer would get completely stuck.

taglib-sharp’s OGG/Vorbis and MPC support is really good, but loses out to entagged-sharp with everything else at the moment. So, long story slightly longer: these formats will be the focus of my future optimizations.

You can try my tests for yourself (and see my GStreamer tag reading code if you want to complain/flame/advise/etc – just know I never completed it as I did a 180 and moved to taglib-sharp). I’ve disabled the GStreamer test since it’s unstable and doesn’t work on all formats that taglib-sharp and entagged-sharp do.

Anyway, I’ll commit the generics/optimizations patch against taglib-sharp this evening, and it will then be available in Banshee.

Cracking down on heap abuse (part 1)

Last week during the Mono Summit, we discovered a few memory and performance issues. Apparently every System.Type returned by System.Reflection.Assembly.GetTypes() is effectively leaked, never to be GCed. Making this call on assemblies with many types can be quite a detrimental operation (such as reflecting against mscorlib).

Update: I knew I’d get corrected on the above statement. By “leaked” I apparently mean that System.Type is just retained forever by the runtime. It’s a special structure that must be manually managed by the runtime, and is allocated in a special way.

Joe looked into Beagle’s usage of Assembly.GetTypes() and managed to reduce leakage on the heap by a whopping 7MB simply by only reflecting against Beagle assemblies (vs. all currently loaded in the app domain, including mscorlib).

On the flight back from Boston on Friday, I decided to look into the issue in Banshee as well. In a few places, I was making the offending call against Banshee.Base (looking for custom branding implementations and Banshee.IO backend implementations). Our plugin system was also making the call on plugin assemblies to actually find plugins, and taglib-sharp was using it to find format implementations in the assembly. I was able to replace the call very easily with a static type table in a few cases where the types I’m looking for are already known at compile time, but it does take some of the “elegance” out of the design.

The plugins case is different, where you don’t know types until they’re found via reflection at runtime (using Assembly.GetTypes()). Miguel and I discussed a solution for this, and it’s working out well. Basically, each plugin assembly should provide a known static class with a known static method that returns an array of the types that should be loaded:

public static class PluginModuleEntry
{
    public static Type [] GetTypes()
    {
        return new Type [] {
            typeof(Banshee.Plugins.Daap.DaapPlugin)
        };
    }
}

If this implementation cannot be found in the assembly, Banshee’s plugin factory will fall back on the Assembly.GetTypes() method so older plugins are of course still compatible, but plugin authors are encouraged to drop this code in their assemblies to keep heap abuse down.

Now, in terms of the actual problem (Assembly.GetTypes() leaking to begin with), I’m not sure what that’s about or if/when it will be fixed. Nevertheless, Beagle uses 7MB less memory (I hope that’s right), and Banshee uses 1MB less memory when we avoid the calls.