This Time Self-Hosted
dark mode light mode Search

Bundling libraries for despair and insecurity

When I started my work reporting bundled libraries almost an year ago, my idea had a lot to do with sharing code and just to the side to do with the security issues related to bundled libraries. I had of course first hand experience with the problem, since xine-lib has (and still in part had) bundled a lot of libraries. When I took over maintainership of it in Gentoo, it was largely breaching policy, and the number of issues I had with that was huge. With time, and coordination with upstream (to the point of me becoming upstream), the issues were addressed, and nowadays most of xine-lib bundled libraries are ignored in favour of the system copies (where possible; some were largely modified to the point of not being usable, but that’s still something we’re fighting with). Nowadays, the 1.2 branch of xine-lib already doesn’t have a FFmpeg copy at all, always using the system copy (or an eventual static copy built properly).

But nowadays I started to see that what is obvious to me about the problems with bundled copies of libraries is not obvious to all developers, and even less obvious to “power users” who proxy-maintain ebuilds and just want them to work for them, rather than complying with Gentoo policies and standards. Which is why I think that sunrise and other overlays should always be scrutinised carefully before being added to a system.

At any rate, for this reason I’m going to explain in this post why you should not use bundled internal copies of libraries for packages added to Gentoo, and why in particular these packages should not be deemed stable at all.

The first issue to discuss is why do upstreams bundle libraries, since knowing the reasoning behind that is often helpful to identify whether it makes sense at all to keep them or not. The first most obvious answer is: to reduce dependencies. For a long time this was the major reason behind xine-lib usage of internally bundled libraries. As it turns out, with time this reason became moot: distributions started packaging xine-lib directly reducing the number of users wishing to build it from sources; even those wanting to build xine-lib from sources would find all the needed libraries in the distributions, most of the times. When this is the sole reason for libraries building, upstream should be very well open to add a configure option (or anything similar to that) to use the system copy, optionally or by default, with fallback to the bundled copy.

A second reason might be that the library is being unstable when it comes to API; this is probably the first reason why FFmpeg is often bundled in software rather than using the copy on the system; while this is a concern that makes more sense then the one before, it’s still mostly a moot point since it really just requires to fix the issue at the source: get the original project to maintain API compatibility, to provide an API compatibility layer, or to finalise its API. Even when it cannot be helped, because the API is in flux, maintained software fears not the API break; it might be a bit of an hassle but in general it’s feasible to keep the use and the library in sync.

Thirdly, more worrisome, is when the library is modified, slightly or heavily, after bundling; in this case using the system copy might be quite a burden because it will lack the specific changes as made by the project. In this case there is a lot of work involved, sometimes more work that it can be taken care by distributions, and requires coordination of the project’s upstream together with the higher level upstream. This is what happened with xine-lib and FFmpeg: the copy in the xine-lib sources was heavily modified to suit both the build system and the interface requirements of xine, which made it also very difficult to update the internal snapshot. All the interface changes needed have then been pushed upstream to FFmpeg, and the buildsystem changes were made moot by using the default buildsystem (with needed changes pushed upstream) embedded in autotools; and then FFmpeg was entirely removed from xine-lib’s sources.

Now, on the other hand, the disadvantages of using bundled libraries are probably worse: code duplication means that there is more data to process (both at build time to compile and at load time to store in memory), there is more space used by the binaries, and there are duplicated bugs that need to be fixed twice. A lot of time in xine-lib the problems with decoding something with FFmpeg were solved by just using a newer FFmpeg; why keeping one then?

The most important issue though is about security: when a vulnerability is found in a library like zlib, fixing the library alone is not enough: while that fixes the majority of the software in a system, it’s not going to fix those who bundle it, both closed-source and open source. For instance, take dzip; it uses an ancient internal version of zlib; if somebody knows the format well enough, it’s far from impossible to craft a dzip file that contains a deflated stream that can executed malicious code.

For this latter issue alone, I’d say that any software bundling source code is not good enough to go stable on its own. Of course sometimes one has to bend the rules because of past mistakes, for instance even though Ruby bundles stuff, we cannot stop newer versions to go stable; this problem is not a regression. But should stop other broken software from entering portage or at least the stable tree.

But it’s not just security, subtle bugs might actually be quite a problem. For instance, you might remember all Java applications failing when libX11 was built with XCB support some time ago. The problem was due to some stricter checks in libxcb compared to what libX11 have been checking before, but the source of the problem was Xinerama. The problem with that was that Sun bundled an internal copies of libXinerama sources in the JRE sources, and even though libXinerama was since then fixed regarding that particular issue (the crash with XCB), it was never updated in the JRE before the issue became a nuisance for users.

A very similar issue, also involving X11 (just by chance, it’s not that all the issues involve X11) is this particular bug in Xorg that is triggered when launching SDL-based applications, because libSDL bundles ancient versions of X11 libraries.

As I said earlier, unbundling is rarely easy; there are subtle issues to be checked out, for instance one has to check if there are changes at all beside eventual build-system related things (for instance to avoid using a full-fledged ./configure), but altogether it’s usually not tremendously impossible. Of course one has to stop thinking “Oh my, what if a library changes and the software breaks?”, otherwise the task gets impossible. Software changes, software bitrots. It’s not by bundling internal copies of libraries that you can stop that. When the compiler gets upgraded, you’re going to have your software break, and you should fix your software; if the C library cleans up the includes, your software might not compile or might misbehave, deal with it. Sometimes the bundled libraries implement protocols and formats that need to work together with some other piece of software; if that changes, the bundled libraries are just going to break further.

Your software is rarely special enough that you can be exempted from following the rules. Even OpenOffice is using lots of system libraries nowadays!

Bundling and modifying libraries is just like forking a project, and forking might not always be the best approach ; sometimes with dead upstream for a project, forking is your only hope; but even in those cases there are nice ways to bundle libraries. If you look at the way nmap bundles libdnet, you can see that they not only document all the changes they made to the library, but also provides splitted down and commented patches for the various changes, making it possible to adapt it to their need.

For proprietary software packages, of course, the matter is different, since you cannot usually unbundle the libraries yourself; but it’s a good idea to ask upstream nicely if they can use system copies instead of internal ones. Mind you, some might be happy to fix their packages not to be vulnerable any longer. Although I guess lots of them might actually prefer to keep them as they are since it’s a cost to them. One more reason not to trust them, to me.

So bottom line, if you’re working on an ebuild for a new piece of software to submit for Portage addition, please look well to see if the software is bundling libraries, and if it is, don’t let it enter portage that way. And if you’re a developer who wants to push some ebuild to the tree, also remember to ensure that it complies with our policies and doesn’t bundle in libraries.

Comments 8
  1. I’ve always enjoyed your posts, they tend to be much more technical than most other blogs, even developer’s blogs. Not many people put thought into the low level things you are working on either.Maybe just add a tag or category for the deep tech stuff, so people can filter those out if they just want overviews, news, or philosophy :).

  2. Full ack. Libraries in general are annoying, and lots of subtle bugs can arise with different versions. What worries me the most is proprietary software. It isn’t just libraries either, take for instance VMWare Server 2.0 which bundles its own Java servelet container. Fixes to proprietary software stand little chance of ever getting pulled upstream to reduce the Gentoo dev’s workload either.

  3. I´d prefer if the whole of your posts is available on feeds and aggregations. But then I enjoy the technical stuff…

  4. I second Kevin’s post. The details in your blog are very interesting. But I prefer to read aggregations by RSS-feed. So it’s perhaps a better idea to mark your posts as suggested. Otherwise the whole RSS-stuff becomes useless ;-). Thank you for your work and for writing about it!

  5. I’d like to agree with what the others have said. I enjoy reading the rich and gory details included in your posts. I do find it somewhat inconvenient to have to open another tab in order to continue reading now.That aside, I think that tagging would accomplish the goal of allowing some to filter out the longer articles. Or maybe a server-side feed that excluded the specific tag for the most flexibility?Anyway, thanks as well for the work that you do, and the time you put into your articles. I always enjoy reading them!(Note, Minor Annoyance: I ended up rewriting this comment because I clicked ‘Preview!’ with JavaScript disabled. It seemed to try to degrade to a URL like …/2009/01/02/bundling-…/comments/new/preview but that wasn’t so functional.)

  6. Okay I decided to abandon the test with extended content, I’ll set the feed to publish the full content once again, and I’ll think about adding a FeedBurner-powered feed excluding the Technical category (that is there already). Unfortunately it’s difficult to set up feeds to _exclude_ tags, so while the Technical category feed already allows to exclude non-technical articles, I’ll end up using FeedBurner for that.

  7. As someone who might be described as a budding ‘power user’… when writing my first from-scratch ebuild the other day I did think of your anti-bundling campaign.Frankly, it is not a difficult idea to understand and appreciate the benefits of. Perhaps it is just too easy for people to have higher priorities until they get bitten by this issue?If a package includes a version of library which is known to be exploitable isn’t that call for a GLSA? I don’t know if you’re already doing this, but that might people take it a little more seriously?Going back to my ebuild, after figuring out how to write the ebuild in the first place, using the options provided in its configure.in I made it use external libraries. However, I just didn’t have the time or energy to go digging for symbols expressed by other included libraries – I feel like I have failed you Diego 😉

  8. Re: I’ve been told quite a few times that my posts tend to be too long,You post well-written material. It it is long, great! Why do people want to complain about it?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.