Audio profile configuration for the masses

Welcome to the second part of the “Things you may not know about Banshee” series of posts, where I highlight some cool features about Banshee that have been introduced in the 0.11.x series. I’m making up for all the blogging I haven’t done in the last 4-5 months.

Just before the GNOME Summit last year, I started working on a user friendly way to perform audio profile configuration. For example, selecting the desired bitrate for an MP3 stream. I had a few goals in mind at the time:

  1. Must be audio framework agnostic. Banshee supports both GStreamer and Helix, and this needs to work with both frameworks without any issues. The user should be able to configure profiles using the same interface and not know which audio framework will be doing the heavy lifting. Essentially, the audio profiles framework should not actually need to know anything about specific audio frameworks. Ever.

  2. Must provide a straight-forward, sensible user interface for configuring complex pipelines. The primary point here is that the user should never have to edit a raw pipeline. A user should not have to “know GStreamer” to change their desired encoding bitrate.

    The current GNOME audio profiles editor
    The current GNOME audio profiles editor :-(

  3. Multiple configurations should be supported on the same profile. Profiles are things like “MP3”, “Ogg Vorbis”, “FLAC”. The profile contains the pipeline and interface description. Configurations are sets of values that can be merged into and saved from a profile. This allows a user to configure a 128 Kbps MP3 encoding setting for their iPod and a 192 Kbps encoding setting for ripping CDs to their local library. Each configuration uses the same base profile, but its settings are different.

  4. Never show profiles that the user won’t be able to use. Not all users have the necessary components installed to be able to encode AAC or MP3 for example. Profiles for these formats should be provided, but if they won’t be able to run, they should not be shown. This means profiles should be tested against their default configuration values before ever presenting a user interface.

With all this in mind, I set out to write the beast. The user interface is defined in XML. Variables define a UI control type, possible values, etc. “Processes” are also defined in the XML with an audio framework ID, for instance “gstreamer.” For GStreamer, the process is the pipeline definition.

However, as of early this morning, the process definition is now an S-Expression. Before, it was simply a pipeline string that had $variables in it, which would be expanded based on the user configuration.

Since the GNOME Summit, I have been working with this profile stuff on and off. It’s been functional since Banshee 0.11.2 (a few months ago), but has been evolving in various ways since then. During this time it was clear that more expressiveness was needed for generating the actual process/pipeline definition. For example, in GStreamer if a user chooses to use VBR in LAME, the xingmux element should be added to the pipeline. However, xingmux is in gst-plugins-bad, and chances are not many users actually have xingmux. This means xingmux should only be appended to the pipeline if VBR is enabled and xingmux is actually available. Other reasons for needing more expressiveness are arguments for GStreamer elements that may be mutually exclusive. If I use mode X, I must provide arguments A and B but not C. If I use mode Y, I must provide arguments C but neither A nor B.

Last night I decided I needed to write an S-Expression evaluator to make this expressiveness a reality. 10 hours later, we now have SExpEngine, and it can do some really cool things. Functions are very easy to add to it and there are a number of built-in functions for logic, conditionals, comparisons, casting, arithmetic, and strings. It also supports variables, which can either be value types or a callback method that returns a tree.

I added a function to allow process S-Expressions to test sub-parts of a pipeline before merging it in to the resulting/final pipeline (think xingmux, from above). Additional GStreamer functionality can be added to build variations of a pipeline based on available elements, differences in GStreamer versions, etc. S-Expressions mean configurability (woo – fake words), reliability, and compatibility.

The result is something I’m quite happy with. For example, here is the S-Expression for the GStreamer LAME process:

+ "audioconvert ! "
"audio/x-raw-int,rate=" $sample_rate ",channels=" $channels " ! "
"audioconvert ! "
"lame mode=" $mode " "
(if (= $vbr_mode 0)
  (+ "bitrate=" $bitrate)
  (+ "vbr-mode=" $vbr_mode 
    " vbr-quality=" (- 9 $vbr_quality)
    (if (gst-element-is-available "xingmux")
      " ! xingmux"
(if (gst-element-is-available "id3v2mux")
  " ! id3v2mux"
  " ! id3mux"

To make sense of the variables, take a look at the full XML LAME profile..

Now, getting back to the “okay, why should I, as a user, care” side of things, I’ll close the post with a screencast (ooooh, fancy, I’ve never done one of these!) that shows all of the profile stuff in action. For the sake of also demoing how the S-Expression evaluates into a proper GStreamer pipeline, I ran Banshee in debug mode for most of the screencast, which shows a text view and a “Test S-Expr” button. Rest assured, if you’re running Banshee like a normal user, you’ll never see this part of the profile configuration dialog :-).

Banshee's audio profiles
What I hope one day can replace the GNOME audio profiles editor so applications other than Banshee can take advantage of the sweetness. Click the screenshot to watch the screencast. (Ogg/Theora).

I’m still working a lot of things out with this, but it’s my hopes to some day make this work outside of Banshee. It’s written with that in mind. At the very least, I’d like to make the XML profile and S-Expression format some kind of standard.

48 Replies to “Audio profile configuration for the masses”

  1. @bla

    Helix (or more correctly RealPlayer) is the only sane way to get proper Realmedia file support – which is used vor so many web radios for example.

  2. @Michael

    open-source Helix is practically useless. And closed-source RealPlayer is plain evil. Hence why support any of the two projects?

  3. Wow, that’s a horribly misinformed statement. While GStreamer is prevalent for obvious reasons in the GNOME community, we support Helix for a number of reasons. In SLED and SUSE we are able to provide users with hassle-free AAC and MP3 decoding and MP3 encoding by partnering with RealNetworks.

    So, I have to ask the obvious opposite… what’s the point in hard-coding this to work with just a specific platform?

  4. Wow, this is fantastic stuff Aaron.

    I think you have come up with a very elegant solution to an ugly problem.

    Editing profiles as GStreamer command line arguments just made no sense. There is really no reason why a user could not edit the basics of the audio configuration with a couple of controls and had to resort to learning the GStreamer command line interface.

    This shows that Banshee, beyond being a media player is a work of love, it shows craftsmanship.

    Keep up the good work!


  5. Before this goes any further – my blog is not the place to discuss the merrits of closed vs proprietary codecs. The only difference between RealPlayer and HelixPlayer is that RealPlayer is shipped with proprietary codecs. Just like GStreamer, all of the Helix framework is open source.

  6. @bla:

    On the technical side: Am glad that Banshee supports multiple backend engines. For one thing, it will make it simple to use Windows or MacOS native encoders and decoders when Banshee is ported to those operating systems.

    But also, in the short term, it means that Banshee can be deployed to commercial users and get codecs for encoding and decoding (encoding mp3 for instance) with all the licenses required in certain countries.


  7. Miguel, totally hit the nail on the head. Fantastic work!

    I hope a few comments are allowed:

    – Xing MP3: is anyone still using this encoder over LAME? Is it just form the sake of completeness?

    – It good that you set –vbr-new for LAME. But why offer the choice between stereo, joint stereo and dual? This has confused many people in the past to assume joint stereo is inferior, when in fact it improved sound quality considerably. I think “auto” combined with the channel switch is enough.

    rationale of 1 and 2: offer less opportunity to achieve suboptimal results

    – A Wavpack profile would help a lot. I don’t think FLAC “needs” a config, but a single slider (faster—-smaller) would not be excessive. And Speex?

  8. Supporting both GStreamer and Helix backends in the same app smells much like that KDE “Phonon” crack.

    Besides that Helix has been removed from Fedora Core. Neither Debian nor Ubuntu use it for anything. (yeah, it’s available in the distributions, but it is not installed by default and nothing links against it, not even banshee). Suse is the only one left who supports Helix. Given Suse’s market share today it’s unlikely they’ll be able change helix’s position in any way inthe future.

    Maybe it’s time to finally drop support for Helix and stop investing time in supporting software nobody uses?

    “hassle-free” MP3 is available for GStreamer as good as for Helix, BTW.

  9. Tobias,

    1) We ship a Xing GStreamer encoder in SUSE, as that’s what we have the license to. LAME is a much better choice if it is available of course. One thing I might consider is allowing profiles to disable other profiles. So for example, if a user has LAME, they see only the LAME profile and not Xing. If they have Xing but no LAME, they’ll see the Xing profile.

    2) I’m not really up to speed on all that LAME has to offer and its optimal configurations. I have no problem with just hard coding ‘auto’ if that’s the best choice across the board. Although, it is under the “advanced” expander :)

    I’m also not up to speed on wavpack, nor do I have the element, so if you can provide a gst-inspect dump of the elements needed and a sample pipeline for using it, I’ll gladly add the profile (please don’t post that to this blog though). The same goes for Speex.

    I’ll add options for FLAC as well. Thanks for the feedback!

  10. @Miguel:

    Instead of thinking of porting Banshee to Windows, you’d better think of porting GStreamer to it.

  11. bla,

    Did you even read the previous replies? I really want to stop talking about Helix vs GStreamer – this is not the place, that was not the intended topic for the post, and I’m not removing Helix support. However, you should know that Helix is used in almost every mobile consumer device that does multimedia, and in lots of other software and dedicated media devices. Helix is king in the hardware and mobile world – and I hate to spill the beans – but that’s a far larger market than GNOME is at the present time. I could walk up to random stranges in the street and chances are their phone has Helix on it. I don’t think I can say the same for GStreamer.

    My point is that Helix is not obsolete, by any means. The framework that powers Banshee’s Helix backend is the same framework that can play 3GP video and other content on your phone – and it’s all still open source, codecs aside.

    I’m not advocating for or against any specific framework – I’m simply saying that the DESIGN of the PROFILES framework need not be tied to one AUDIO framework. So, stop trolling and get your facts straight.

    And don’t also get me wrong about GStreamer – I love it, I’m not knocking it, and the community around it is fantastic.

  12. Haven’t thought about it but speex could be a nice thing to have for re-encoding podcast shows when transfering to a DAP!

    The same perhaps also justifies a “mono” setting in mp3, when you transfer songs to you mobile phone for example, where only size does matter?

    @bla: stop bashing. everyone is free to do what he/she wants and I’d rather have the *option* to use helix in banshee than a (for me, as I don’t have a windows license) totally useless windows port. But I’m sure some day there will be a kick-ass banshee port for windows. You could actually help getting there sooner =)

  13. @bla:

    From your disparranging comments about Phonon and Helix, I get the impression that to you this is a matter of winners and losers.

    I do not see this as a zero sum game. What works for you will not work for me. For instance, neither Fedora nor Ubuntu are sold with licenses to mp3 encoding. People do it on “their own” by installing some third party software, which in some deployments is not permitted.

    I have never been a fan that I should use technologies because they have the gross market share or that I should adopt one thing over another over that principle. If that were the case, we should all be speaking Chinese.

    Zealotry is hardly a technique for argumentation.


  14. Are you using MONO in your work on the audio profiles? If so, I would prefer not to see the gnome-audio-profiles-properties contaminated with a mono dependency.

  15. @aaron:

    maybe helix still has some market share on mobile phones. But does Banshee run on a mobile phone? No! Helix is dead on free desktops, suchs as GNOME.

  16. Brian: yes, and I’m not sure why it matters. Details of that can all be worked out later. Right now it’s not ready for use outside Banshee, but Mono is, like it or not, a blessed dependency in GNOME, as is Gtk#.

    bla: you’re trying to miss the point so you can appear to have some merrit, but you’ve failed. Stop posting.

  17. Hi,

    I think I’ve found a bug in the gstreamer pipeline generation, but my assumtions are only based on the screenshot in your blog :)
    The LAME VBR quality is set to 8, but the generated expression contains
    … vbr-quality=1 …

  18. gst-lame’s vbr-quality: 0 – Best
    lame: -q 0: use slowest & best possible version of all algorithms
    banshee: “best” or “9” results in “0” which is correct. it is only slightly confusing and perhaps I would flip the numbers so that “best” is actually also “0” there, it’s better for people who actually know the encoder settings

  19. I flipped them intentionally. If you look at the whole XML definition, you’ll see the variable range definition and then note the S-Expression does (- 9 $vbr_quality). I think having the UI show worst = 0, best = 9 makes more sense from a usability stand point. Most people are in fact not going to be familiar with the inner workings and expectations of LAME. However ,I was waiting to see if someone caught that ;-).

  20. that is really nice, audio profiles have been the ugliest corner of gnome for a long time.

    however, wouldn’t it be more sensible to expose that horrible S-expr text box only AFTER an user has clicked “advanced” button?


  21. Hi Aaron, this is absolutely great and I’m looking forward to seeing it sometime soon in 0.11.4?

    Flipping the quality-indicators for LAME’s VBR confused the hell out of me and will do so with every other seasoned LAME-user, I’m afraid. :(

    I’m sorry about this user ‘bla’ stirring some little flames on the blog. Those comment about Xing and Helix are completely unjustified; having Helix in openSUSE is an excellent solution to the MP3-dilemma of free distributions and Xing especially has evolved into one fine and very fast MP3-codec since Realnetworks acquired it. bla, read that up on the premiere audiocoding site,, if you don’t take my word for it:

  22. Now I see which Xing encoder this is. It’s the one that Real acquired and improved heavily through the Helix community. It’s fast and from what I read the quality seems to be up to par.

    For the Lame -V parameter I would say it will generate more confusion to invert the scale. The maximal quality is the rightmost slider position, which is clearly labeled as such. People looking up guides with V-numbers and experts will not appreciate the switch.

  23. Please no controls for the lossless codecs. It’s bad enough having an advanced button there for those three people who need to transcode their files to mono for storage. That an option exists in the encoder does not necessarily mean that it needs a slider in the UI, hidden by default or not.

    That said, this kicks thorough ass. The lame encoder should probably be using its presets rather than going through the slider though. And the text at the top that says “good trade-off”? That’s probably best left on the main “choose an encoder” dialogue, where it’s most pertinent.

    – Chris

  24. I actually noticed the inverted scale when you first posted the vid in the forums. Why even display the number?

  25. If you want to replace the gnome-media profiles, you’ll really need somthing to better suit _our_ needs. Don’t forget, Banshee is here b/c of Helix, but the rest of GNOME really barely cares. Something that would really make me like this would be if it did things like video encoding also. There’a gstreamer bug about that, I’m sure you can find that.

    Anyway, as for your goals, I get the impression that you still need to know “pipelines” if you want to use this stuff. Is it really all that useful then? Don’t get me wrong, it looks fancy, but extendible and fancy aren’t necessarily the same thing. I see some sliders, can I (from within the profile, or better, within the element) say which properties should be exposed as sliders? That would be really neat. If it’s all hardcoded (“mp3/lame bitrate” is a slider), then it’s really not all that useful except for some limited cases.

    Using XML instead of gconf values is a good idea btw, gnome-media’s profiles shouldve done that too. Too late now. :-). Nice work so far, let’s see where this goes. If you want to integrate with gnome-media, maybe try to use gnome-media’s profiles as a “fallback”, so that changes there also get propagated to your application. That way, you get your own fancy stuff, while still being a good part of the GNOME desktop.

  26. Aaron: True, Mono is blessed for GNOME in new modules, but existing GNOME modules need to go through the proposal process again if they gain a new dependency on GTK# and/or Mono, which would be the case for Gnome-Media.

  27. Aaron, the timing of your post is fortuitous, because after being presented with a strange error message when inserting a music CD, I have been trying to understand these strange “audio profile” thingies.

    After watching this video, I *think* I understand that they’re collections of settings with which to encode saved audio files. As such, most people won’t need to bother with them, rather choosing from thoughtful presets such as “Ogg Vorbis (Compact)” and “Ogg Vorbis (High Quality)”.

    Therein lies a couple of small problems. First, while the interface you’ve implemented seems very nifty, it appears to allow only one set of settings per encoder. Is that correct? Whereas if I’m importing a symphony I might want higher-quality settings than if I’m importing dozens of language-teaching songs, even if I’m using Vorbis for both. And second, the interface offers no hints on *why* I’d want to choose one codec (or, in the case of MP3, encoder!) over the others.

    I think both these problems could be solved by changing the “Output format:” menu to a menu of presets, rather than of codecs. The last item in this menu (after a separator) would be “Custom…” (replacing the “Edit…” button). The resulting dialog would let you set up a new profile (based on the most-recently-chosen profile), including its name (auto-generated from the other settings until edited manually), format (with room in the listbox for a brief description of what each format is best suited for), and format-specific options. (I overuse parentheses, can you tell?)

    Meanwhile, a few minor layout points:

    * Labels for option menus, sliders, and text fields, among others, should end in a colon.

    * Since most of the formats are lossy, you can declutter the menu by removing “(Lossy)” from their names. The presence of “(Lossless)” for the lossless ones is enough to convey that the others are lossy.

    * The baseline of the “VBR Quality:” label should line up with the baseline of the text at either end of the slider. (If this is impossible in GTK, have you reported it as a GTK bug?)

    * It looks unstable for a window to resize in *both* directions because of the choice made in a control (in this case, the “VBR Mode” and “Variable Bitrate” menus). I suggest widening the dialog so that its width doesn’t need to change during use.

    * If (a) a given option menu will only ever contain two options, (b) you’re not aiming for consistency with a more populous menu elsewhere, and (c) you have enough room, use radio buttons instead.

    * It looks hierarchically wrong for the visibility of controls with bold labels (e.g. “Channels”, “Sample Rate”) to be determined by a disclosure control with a non-bold label (“Advanced”). In these cases, the dialogs have so few controls, and if someone’s got this far in the first place they’re sufficiently brave, that you might as well just show all the controls all the time. No need for a disclosure control at all.

    Thanks muchly for helping make Gnome cooler, and feel free to e-mail me if you want any design help. :-)

  28. Aaron, you forgot one thing… Gnome has an (unofficial) policy of things like settings and whatnot instantly applying. If the user has the advanced mode open, obviously they want to see things like that. Have the results window update in realtime everytime the user changes a setting. That can’t be enough to a cpu drain to matter and it just adds more polish to banshee.

    Are there any plans to write this in C and replace the current settings dialog with this one? It is so much better and makes more sense. Nice work!

  29. Ronald: having the same for video would be cool, yeah… but I think that could be done just the same. As for the auto generating…. I don’t think this should be done because you don’t really want every property exposed in the GUI, don’t you?

  30. Michael: probably not, but you also don’t want to hardcode it in the application. What if I install a new plugin/encoder that it doesn’t know? How will I change my quality-setting property? (Note that this is much better than what we have right now…)

    As for re-use outside Banshee, it should really be done in C. I would like to see such stuff in sound-juicer and gnome-sound-recorder instead of the current dialog.

  31. Well I think from the 100 or something properties a gst element provides you pick the 10 most-likely to used (from which you perhaps even only need 3 but a few more commonly used “outsiders” are fine) and just build a GUI using those.

    If someone can provide the encoder plugin then surely it would be no problem to also update gnome-media or whatever for this plugin. This would be no more “hard coding” then what we have today.

    I don’t really care for the language as this is generally the type of application that has a “can even be done in python” blessing ;)

  32. Jeff, the S-expression field and button are *for debugging purposes*. Yes, the button could be replaced by making the field auto-update, but really, that’s probably not a good use of Aaron’s time since people who don’t read this page will almost certainly never see it. :-D

    (BTW, the winkies in my previous comment are a bug in this site’s auto-formatting. A double-quote is being converted to an entity, and the “;” character ending that entity is then wrongly being interpreted as the eyes of a smiley.)

  33. @Ronald McDonald

    Banshee is not “here b/c of Helix”.

    Banshee is here because Aaron wanted to write a media player with Mono and Gtk# (it used to be called Sonance). Sonance was a great application way before Novell got interested in it.

    Novell got interested in Sonance at the time because it could be X11 licensed (important to our customers) and because we felt that innovation using Mono was going to happen faster than it would if we depended on a C or C code base.

    We turned out to have made the right decision on both counts, and Banshee has only gotten better over time.


  34. I a Jokosher dev and I have talked to others about doing something like this, but specifically for Gstreamer. Currently we have lame, vorbis, flac and wave hardcoded in Jokosher as well as pulling the profiles in from Gnome Audio Profiles. Of course there are no Python bindings for G-A-P but talking through gconf works well enough for now.

    G-A-P doesn’t have any options because that would require auto-generated GUIs, which usually look like crap. I would just like to say that this is perhaps the nicest auto-generated GUI I have ever seen (even though its not fully auto-generated because you have to create an XML file describing it).

    If you made this available either as part of GNOME (as you suggested) or as an optional package, and stored the pipelines in gconf like G-A-P does, I would love to make Jokosher support it. Keep up the good work Aaron!

  35. About LAME’s quality-scale: Perhaps it would be good to give the values in a fashion that many of its users are familiar with, i.e. “V 0” – “V 9”

  36. Great stuff Aaron! The audio profiles have long been a half done project so its nice to see some serious progress in the area. Hopefully it can evolve into generic audio profile system that others, like Jokosher also can hook into.

  37. This is the kind of usability love GNOME must continue to get. What needs to be done to make this standard for GNOME?

  38. Nice work Aaron – it looks awesome. It’s exactly the sort of love that the audio-profiles UI has needed for a long time now.

  39. This is totally great. :-)

    I just ran into this profile dialog a couple of days ago in Sound Juicer, and was, well “surprised”.


  40. hi Aaron,

    looks great! I am the buzztard developer and have the same issues with the gnome-audio-profiles like the jokosher guys. The things I had in my mind was:
    * why are the in gconf (they aren’t settings)
    * which gobject properties in the pipeline should be settings
    * figure out which profiles can be used (element availabillity check)

    This seems to match the things you fixed. Some questions still:
    * why do you include the GParamSpec limits in the xml profile (like min max value, or enum values)?
    * any idea of how to get profiles ready for i18n?
    * for the element availability check, it might be a good idea to show profiles with missing elements as grayed items and use the libgimmi-code approach once its established

    I’d like to see this replacing the current audio profiles once, as generic media encoding profiles.

    Stefan / ensonic

  41. The current gnome-audio-profiles already does the availability check (2.17.2). A disadvantage of being framework-agnostic is that you can no longer do that, unless you either want to link to all frameworks or want to resort to ugly hacks like calling gst-inspect on the commandline.

    Doesn’t xml have provision for internationalization (see e.g gconf schema files, which are essentially xml)?

    My idea for “which properties should be exposed” was to actually introduce a private GstParamSpecFlag (like readable/writable) flag for gstelements.

  42. @Ronald,

    I was also thinking about another GParamSpecFlag, we already have ‘Controlable’ which flags run-time chanable properties, likewise we could have ‘Editable’ which marks properties that make sense to be set by the user initialy (and remain that way for the whole processing).

  43. Pingback: hard core sex clip

Comments are closed.