Also during the Mono Summit, Ben ran
mono --profile on Banshee and alerted me to the fact that taglib-sharp was seriously abusing the heap. I had known it was less than optimal, knew about where the issue was, and knew the fix – but I really hadn’t had much time to address it.
Seeing real numbers was a big motivating factor however. The heap/memory numbers I’ll present here are total heap allocations – not a total heap growth that’s never GCed (leaked). That is, the numbers aren’t a heap reservation that’s present at one time, so it’s not something a user may really notice – these are accumulations of very small allocs/frees throughout the lifetime of the test program.
In taglib-sharp, which is a fully managed C# port of TagLib (C++), there are four custom collections classes that provide some extra useful operations. This is where I have focused my optimization work as all formats use these collections, thus every format benefits from the work. Also, it was blindingly obvious what the problem was once I looked at the code.
TagLib.ByteVector class. It serializes many formats of data into a collection of bytes. However, it was using the
System.Collections.ArrayList class to store these bytes! As a
byte type is stack-based data, and
ArrayList stores heap-based data, each stack-based member of the collection must be boxed for storage and unboxed for retrieval. This boxing/unboxing operation essentially allocates a heap-based object to act as a container to store the stack-based byte data. So for each byte that’s pushed into the collection, another object is allocated on the heap! Yikes.
Luckily, we have generics. Generics to the rescue – and I cannot stress this enough. Simply by replacing
List<byte>, the excessive memory problem goes away, since the type to be stored in the collection is known at compile time, and thus boxing/unboxing is no longer necessary. While the fix is very simple, what we get in return is sheer love.
This was the first time I had really looked at all the custom collections code in taglib-sharp. I spent a few hours rewriting/refactoring it all to implement generic interfaces, and created a
TagLib.ListBase<T> base collections class that provides common operations for all custom collections used in TagLib. In the process I made some other minor optimizations and clean ups, and now the code is much easier to read and is rather consolidated. Fun.
To make sure I didn’t break anything in the process, I finally set up a pretty extensive NUnit test suite for taglib-sharp and implemented tests for every collection class. I also ported our old format tests from entagged-sharp to taglib-sharp and identified a few problems (which Brian has now fixed).
In the process of setting up the tests, and to see exactly how much better we are with generics, I wrote some performance stress tests as well.
Before my optimization work:
ByteVector.FromUri() on a 4.1MB M4A file (this call simply loads the entire file into the ByteVector, creating about 4,300,000 bytes in the vector – this is not how files are read in “real life” in taglib-sharp, but is a raw stress test of the
- Total Heap Allocations: 103,246 KB (Yes, that’s 103 MB!)
- Execution time: 1.7 seconds
File.Create() on the same 4.1MB M4A file 10,000 consecutive
times (this is how you actually process a file for metadata in taglib-sharp, and shows more “real life” optimizations):
- Total time (10,000 iterations) 00:01:17.1391680
- Average time (1 iteration): 0.0077027999 seconds
After my optimization work:
- Using ByteVector.FromUri()
- Total Heap Allocations: 16,421 KB (Wow, dropped to 16 MB!)
- Execution time: 0.04 seconds (whoa!)
- Using File.Create()
- Total time (10,000 iterations) 00:00:41.0676870
- Average time (1 iteration): 0.0040969793 seconds
So what’s that mean? The average tag reading operation is now about 200% faster and uses substantially less memory during the operation. This speed up translates directly to applications.
These numbers are also just from optimizing
TagLib.ByteVector. After optimizing the other three collection classes, things speed up even more. For instance before the optimization, Banshee took around 9.5 minutes to import 5100 audio files on my machine. After the optimization it took about 4 minutes – a 5.5 minute speed up. Keep in mind there are other factors during a real import, such as directory walking overhead, mimetype detection, and disk caching, all of which accounts for about 0.5 minutes of the overall importing process.
Also, each format may use the collections differently, so some may benefit more than others. I’ve now moved on to identifying optimizations that can be done at the format level. For instance, I found out that taglib-sharp’s OGG/Vorbis support is very, very fast. However, the MPEG4 format is very, very slow in comparison. This led me into further investigation and comparisons.
Comparing taglib-sharp to other players in the game (sort of)
In the 0.10.x series in Banshee, we had used entagged-sharp, a fully managed metadata/tag reader. It was pretty good, but did not have write support, had a number of issues dealing with poor/illegal/improper string encodings in ID3 tags, and only had partial MPEG4/AAC and ASF/WMA support. I had originally planed on moving to GStreamer for tag reading, but there were a number of problems with this:
- A hard dependency on GStreamer for tag reading
- More native< -->managed code/interop
- Missing/poor demuxer support for some formats (missing in terms of “it’s not in -base or -good, so it doesn’t exist for us in practicality” ) – this was the biggest factor in my decision to not go with GStreamer for tag reading
- GStreamer 0.10 really isn’t suited to easily, efficiently, and safely do strict tag reading. The way GStreamer 0.10 works makes it rather difficult/expensive to perform a blocking operation, which is what’s often needed for tag reading.
Anyway, I’m not trying to put down GStreamer – I just came to the conclusion that for strictly reading tags, it’s probably not the best solution. I use its tag reading support in the playback pipeline to get live metadata updates from streams, etc.
For 0.11.x, I ended up choosing a new player in the game, taglib-sharp. It had full read/write support and supported MPEG4 and ASF very well. It also handled poor/illegal string encodings “properly” – or at least much better than entagged-sharp.
taglib-sharp does everything we need it to do in terms of functionality, so now it’s just a matter of improving its performance. All that said, I constructed a test today that compares taglib-sharp, entagged-sharp, and GStreamer. Each tag reader is used 10,000 times on each file in the test suite.
Total is the time taken for test to be run 10000 times.
Avg is the time taken for test to be run once.
File Reader Avg. Total
sample.flac GStreamer 0.003094 30.9390
TagLib 0.000324 3.2374
Entagged 0.000288 2.8834
sample_v1.mp3 GStreamer 0.004671 46.7052
TagLib 0.000472 4.7197
Entagged 0.000269 2.6885
sample_v2.mp3 GStreamer 0.004011 40.1102
TagLib 0.001756 17.5608
Entagged 0.000444 4.4426
sample.mpc GStreamer 0.003685 36.8494
TagLib 0.000161 1.6140
Entagged 0.000300 3.0025
sample.ogg GStreamer 0.013235 132.3549
TagLib 0.000615 6.1522
Entagged 0.002869 28.6909
sample_both.mp3 GStreamer 0.004087 40.8693
TagLib 0.001033 10.3278
Entagged 0.000471 4.7082
sample.m4a TagLib 0.002908 29.0760
Entagged 0.000580 5.7995
sample.wma TagLib 0.000802 8.0176
Entagged 0.000374 3.7391
The results show that GStreamer is vastly slower than either taglib-sharp or entagged-sharp. This may not be entirely fair however, as the GStreamer code was under development a few months ago, and I have a feeling the bottleneck there isn’t so much the demuxing process, but rather having to wait on the processing thread in order to create a blocking operation. Running the GStreamer tests also tend to crash every so often. I had to remove MPEG4 from the test as I don’t have an MPEG4 demuxer for GStreamer, but it’s substantially slower in taglib-sharp than it is in entagged-sharp. My ASF/WMA test was also removed as my ASF demuxer in GStreamer would get completely stuck.
taglib-sharp’s OGG/Vorbis and MPC support is really good, but loses out to entagged-sharp with everything else at the moment. So, long story slightly longer: these formats will be the focus of my future optimizations.
You can try my tests for yourself (and see my GStreamer tag reading code if you want to complain/flame/advise/etc – just know I never completed it as I did a 180 and moved to taglib-sharp). I’ve disabled the GStreamer test since it’s unstable and doesn’t work on all formats that taglib-sharp and entagged-sharp do.
Anyway, I’ll commit the generics/optimizations patch against taglib-sharp this evening, and it will then be available in Banshee.