In the land of smartcards

Even though I did post that I wanted to get onto hardware signatures I ended up getting an USB smartcard reader for a job that requires me to deal with some kind of smartcards; I cannot go much further on the matter right now though, so I’ll skip over most of the notes here.

Now, since I got the reader, but not yet most of the specifics I need to actually go on with the job, I’ve been playing with actually getting the reader to work with my system. Interestingly enough, as usual, the first problem is very Gentoo-specific: the init script does not work properly, and I’m now working on fixing that up.

But then the problem is to actually find a smart card to test with; in my haste I forgot about getting at least one or two smartcards to play with when I ordered the device, and now it’d be stupidly expensive to order them. Of course I’ll go around this time and get myself the Italian electronic ID card (CIE), but even that does not come cheap (€25, and a full morning wasted), and I cannot just do that right now.

So I went around to see what I had at home with a smartcard chip, after discarding my old, expired MasterCard (even though I thought about it before, I was warned against trying that), I decided to try with a GSM SIM card, which I had laying around (I had to get a new one to switch my current phone plan to a business subscriber plan; before I was using a consumer pre-paid plan).

Now, although I was able to test that the reader detects and initialises the card correctly (although it is not in the pcsc-tools database!), I wanted to see if it was actually possible to access it fully; luckily the page of a Gentoo user sent me to some software, written by an Italian programmer, that should do just that: monosim which, as you’d expect, is written in C# and Mono, which is good given I’m currently doing the same for another customer of mine.

Unfortunately, it seems like the mono problem comes up again: upstream never considered the fact that the libpcsclite.so ABI changes between different architectures, even on the same operating system. Not that I find that a good idea in general, since I always try to stick with properly-sized parameters (thanks stdint.h), but it happens, and we should get ready to actually resolve the problems when they appear.

Now, I really don’t even want to get started with all the mess that RMS have uncovered lately; just like I did a few years back, I replace the idealistic problems from Stallman with technical limitations, see for instance my post about “the java crap” (which – by the way – hasn’t finished being a problem, outlasting the idealistic problems).

And I’m still waiting for Berkeley DB to finish its testsuite, after more than twelve (12!) hours, on an eight core system, with parallel processes (I get five TCL processes to hog up the same amount of cores at almost any time). I don’t even want to think how long it would take on a single-core system. Once that’s done, I can turn the system down for some extraordinary maintenance.

How _not_ to fix GCC 4.4 bugs

So with GCC 4.4 and glibc 2.10, C++ support went, once again, stricter. Now, leaving aside all my possible comments on a language for which there is still an absolute vacuum of actual implementations years after publishing, let just look at what the problem is this time around.

The main issue, in which glibc 2.10 is also related, is that the C-style string functions now return pointers with the same constant modifier as they are given; so if you look for a characters in a constant string, it’ll return a constant string pointer, and vice-versa if you give it a variable string, it’ll return you a variable string pointer (more here if you’re bored).

This means that the following code will build fine with either GCC 4.3 or glibc 2.10 but will fail when both of those (or more recent) versions are used:

#include <cstring>

int main() {
  char *foo = strchr("ciao", 'a');
}
% g++-4.3.3 test-const.cc -o test-const    
% g++-4.4.0 test-const.cc -o test-const 
test-const.cc: In function ‘int main()’:
test-const.cc:4: error: invalid conversion from ‘const char*’ to ‘char*’

The error from the compiler is quite real: you’re mixing up different type of variables, although in this particular instance you’re not doing anything wrong with it, but for instance take the following code rather than the one shown earlier:

#include <cstring>

int main() {
  char *foo = strchr("ciao", 'a');
  *foo = 'e';
}

This code is trying to change something that it shouldn’t; in particular, given the pointer is now pointing inside a literal, which is then inside the .rodata section of the ELF and in a shared, non-writeable area of memory, when executed this will cause a segmentation fault (a crash, for those not used to this terminology). But it can get less obvious and more sneaky, since instead of a literal, you could have a parameter, declared constant.

Of course, whenever you have a variable or a parameter that is declared constant, but is not actually residing in read-only memory areas (like .rodata), you’re just a cast away from having it non-constant. But then you’d be seeing the cast, and that would be like a yellow light sign. On the other hand, with the old method of just having the function cast away the constant modifier, it was less obvious at first sight.

Okay so we know what the problem is, why the error was introduced, let’s go down to business with what the problem is. I have seen more than a few patches out there that, to make software build on GCC 4.4/glibc 2.10 simply cast away the constant modifier, C-style:

#include <cstring>

int main() {
  char *foo = (char*)strchr("ciao", 'a');
}

This is wrong. You should not do that. Why? Because you’re hiding a problem; in more than half the cases, the solution is simply to change the declaration of the variable:

#include <cstring>

int main() {
  const char *foo = strchr("ciao", 'a');
}

Of course, this does not cover all the cases; there are a few when the pointer is actually used to change memory areas. In those cases, since fixing the issue might be overkill, I’d highly suggest a different syntax:

#include <cstring>

int main() {
  char *foo = const_cast<char*>(strchr("ciao", 'a'));
  *foo = 'e';
}

This uses the explicit const_cast keyword from C++, and the very fact that it’s an eyesore in the code should be enough to scream “Workaround!”, which is what it is in truth.

So please, don’t just cast it away C-style, give it a bit more thoughts, please!

Does as needed link make software faster?

Christian Ruppert (idl0r) asked me today whether --as-needed makes software faster or smaller. I guess this is one of the most confusing points about --as-needed; focus of tons of hearsay, and with different assertions all around the place. So I’ll do my best to explain this once and for all.

In perfect conditions, that is, if --as-needed were not doing anything at all, then it wouldn’t be changing anything. The flag is not magical, it does not optimise the output at all. The same exact result you would have if libtool wasn’t pushing all crap down the line, and if all the build systems only requested the correct dependencies.

When it does matter is when overlink is present. To understand what the term overlink refers to check my old post that explains a bit what --as-needed does, and shows the overlink case, the perfect link case, and what really happens.

Now, of course you’ll find reports of users saying that --as-needed makes software faster or smaller. Is this true, or false? It’s not easy to answer one straight answer because it depends on what it’s happening with and without the flag. If with the flag there are libraries loaded, directly and indirectly, that are not used (neither directly nor indirectly), then the process spawned from the executable will have less libraries loaded in the address space, and thus both be faster to load (no need to read, map in memory, and relocate those libraries) and smaller in memory (some libraries are “free” in memory, but most will have relocations, especially if immediate bindings (“now” bindings) are used, like happens for setuid executables.

Indeed, the biggest improvements you can have when comparing the with and without cases in a system, or in software, that uses immediate bindings. In that case, all the symbols from shared objects are bound at load, instead than at runtime, so the startup time for the processes are cut down sensibly. This does not only involve hardened systems, or setuid binaries, but also applications using plugins, that may be requesting immediate bindings (to reject the plugin, rather than aborting at runtime, in case of missing symbols).

Should website do public service?

Today I finally put online my new website based on the fsws framework. While still not ready for release, right now it can generate in a single call (but with dual pass!) the whole site, the page sitemap (compliant with the specification) and even the robots.txt file (my reason to generate it with the rest of the site is that it keeps a pointer to the sitemap, and at the same time, you can ignore a whole subtree much more easily, by just setting parameters on the various pages).

The nice thing about fsws is in its very lightweight output: the whole site I wrote for my friend is less than 300k, and requires almost no server-side handling at all. The only thing that I’m forced to do is some playing with Apache’s mod_rewrite to change the content type of the pages, because Internet Explorer (who else?) fails to handle properly-served XHTML content (and asks to save the pages instead of opening them).

But together with this particular quirk, I also keep another piece of code, that works quite alike a web application, even if it’s self-contained inside the webserver configuration: a sanity check for the browser, based on the user agent, just like my antispam filter in this blog. It checks for both older browser versions and particular user agent signatures that indicate the presence of adware, spyware and viruses on the requesting user’s system.

When these signatures are identified, all the requests for actual pages are redirected to an error-like page that warns the user about the problem and ask him (or her) to update or change browser, or to install and use an antivirus. Now, since the site is entirely static and there is no user interaction with the server-side components beside the HTTP server itself, there is no real need for me to discard requests coming from unsafe clients, so my only reason to actually implement this type of code is public service.

I haven’t implemented the same trick on my website, yet. I’m still a bit conflicted about its usage. From one side, applying it means that part of the internet users will be unable to even view my site, which being even my professional site, might be a not so sound business move; from the other point of view, if most of the sites out there (with the obvious exclusion of those providing tools like browsers and antivirus) were to refuse requests from IE6 and other old browsers, maybe their widespread user would be put to a stop.

And to which extent should I (we) be refusing requests? Having a minimum base version for any browser is a good start, but there is more to that. As I noticed, there are quite a few Windows spyware, adware, and trojans (especially dialups) that register themselves as part of the Internet Explorer user agent. I have no idea why they do that, maybe it’s to pay some kind of provision to the trojan’s authors, but we could be using this kind of information to notify the users about malware presence on their systems.

Unfortunately, there doesn’t seem to be a comprehensive database of user agent identifiers, although with a bit of search over a sample you can easily find a lot of useful data; and also, since the whole check right now is handled through a simple redirection, I have no way to provide the user with any kind of feedback about what kind of malware is in their system. I guess that using some quick javascript inside the error page itself would be able to solve this.

More XSL translated websites

I have written before that, over CMS- or Wiki-based website, I prefer static websites, and that with a bit of magic with XSL and XML you can get results that look damn cool. I also have worked on the new xine site which is entirely static and generated from XML sources and libxslt.

When I wrote the xine website, I also reused some of the knowledge from my own website even though the two of them are pretty different in many aspects: my website used one xml file per page, with an index page, and a series of extra stylesheets that convert some even higher level structures into the mid-level blocks that then translated to XHTML; the xine website used a single XML file with XInclude to merge in many fragments, with one single document for everything, similarly to what DocBook does.

Using the same framework, but made a bit more generic, I wrote the XSL framework (that I called locally “Flameeyes’s Static Website” or fsws for short) that is generating the website for a friend of mine, an independent movie director (which is hosted on vanguard too). I have chosen to go down this road because he needed something cheap, and he didn’t care much about interaction (there’s Facebook for that, mostly). In this framework I implemented some transformation code that implements part of the flickr REST API, and also a shorthand to blend in Youtube videos.

Now, I’m extending the same framework, keeping it abstract from the actual site usage, allowing different options for settig up the pages, to rewrite my own website with a cleaner structure. Unfortunately it’s not as easy as I thought, because while my original framework is extensible enough, and I was able to add in enough of my previous stylesheets’ fragments into it without changing it all over, there are a few things that I could probably share again between different sites without needing to recreate it each time but require me to make extensive changes.

I hope that once I’m done with the extension, I’ll be able to publish fsws as a standard framework for the creation of static websites; for now I’m going to extend it just locally, and for a selected number of friends, until I can easily say “Yes it works” – the first thing I’ll be doing then would be the xine website. But I’m sure that at least this kind of work is going to help me getting better understanding of XSLT that I can use for other purposes too.

Oh and in the mean time I’d like to pay credit to Arcsin whose templates I’ve been using both for my and others’ sites… I guess I know who I’ll be contacting if I need some specific layout.

PulseAudio and quirks

Seems like even my previous post about PulseAudio got one of the PA-bashers to think I’m a nuisance for their “cause”, whatever that is. For this reason I’d like to try to explain some of the quirks regarding PulseAudio, distributions, quirks and so on. Let’s call this a bit of a backstage analysis of what’s going on about Linux and audio, from somebody that has little vested interested in trying to roll the thing for PulseAudio.

The first problem to address relates to the comments that KDE people find PulseAudio a problem; I guess this has to be decomposed in a series of multiple problems: Lennart is a GTK/GNOME guy, so he obviously provided the original tools for GTK/GNOME. For a while I was interested in writing the equivalents for KDE (3) but I never had the time; now that I also moved to GNOME independently, I sincerely have no intention to write KDE tools for PA… but one has to wonder why nobody in KDE went out of his/her way to try doing this before. It’s not like it had to be part of KDE proper, it would have been okay to be an unofficial standalone application.

There is also another problem: most of the KDE guys who do see problems with PulseAudio are most likely using Phonon with xine-lib backend, configured to use the PulseAudio output plugin. Given I’m the one I wrote most of it originally, I can say that it sucks big time. Unfortunately I have had no time to work on that lately, I hope I might have that time in the future, but the two years I spent between hospitals seriously indebted me to the point I’m doing about 18 hours of work a day on average. For those who do want to use xine-lib with Pulse, I’d like to suggest the long route: set up the ALSA Pulse plugin, and then let xine just use ALSA.

There is of course another problem for KDE: while GNOME historically had no problem with force in dependencies that are Linux-specific or that work most of the time just on Linux (think about HAL adoption for instance), and relied on the actual vendors to do the eventual porting, KDE strives to work most of the time on multiple operating systems, including as of KDE 4 also Mac OS X and Windows. Now you might like this or not, but it’s their choice; and the problem is that while there is some kind of PulseAudio support for Windows, at least OSX is pretty badly shaped (also on my radar).

For what concerns distribution support, it is true that Lennart usually just care about Fedora; you have to accept this as part of the deal given RedHat is – as far as I know at least, Lennart feel free to correct me if I’m wrong – the one vendor paying his bills. Now of course we’d all love to support all the distributions at the same time, but the only way that’s possible is if multiple maintainers do coordinate; I’ve been doing my best to pass all the patches upstream when I’ve added them to Gentoo, and I see Colin Guthrie from Mandriva doing the same. One thing I can “blame” Lennart for (and I told this to him before, too!) is not creating a GIT branch with the cherry-picked patches he applies on the Fedora packaging for us to pick up… and the fact that he doesn’t like neither making releases or leaving access to others to do so.

To be honest, there is little different in this from what other projects do with distributions like Ubuntu when they are paid by Canonical. I think this is obvious, everybody looks at their little garden first. But this is not something that should concern us I guess. Gentoo has been quite out of the loop for what concerns PulseAudio, and I’m sorry, that was mostly my fault. I’m doing my best to let us update as soon as possible, but it’s not just that simple, as I already explained .

Then let me just say something about Lennart’s refusal to support system mode (which is available and advertised in Gentoo since PulseAudio entered the tree): I can’t blame him for that. First, his design for PulseAudio is based on providing something that works for the desktop use case. Something along the lines of Windows’s or OSX’s audio subsystems, neither of which provide anything akin to system mode. And indeed PulseAudio, by design, can handle the same situations, including multi-user setups with fast user switching. The fact that a system mode exists at all is due to the fact that I for one needed something like it on my setup, hacked it around for Gentoo, and then Lennart made my life easier implementing some extra bits on PulseAudio proper, but it was certainly not his idea.

What people complain about usually is the need for an X session (not strictly true, PulseAudio will start just fine in SSH — it would probably be possible to even fix it up so that it would tunnel audio just like you can tunnel X!), and the fact that audio does not continue to work when X exits (also not strictly true, if your audio player is running in screen it would be working just fine; it’s the fact that the media player crashes that makes your audio stop). Additionally people complain about the security problem of wanting to have all the processes to run under the same user, rather than allowing them to be on different users, like mpd.

Well, some complains are valid, other are not: it is true that PulseAudio does not work in multi-seat-multi-user environments, at least not with a single audio device, it is unfortunate and I don’t know if it’ll ever do work in that situation without a system mode. It is also true that running processes as different users for privileges separation does not work without system mode. But both these options are walking quite away from the the desktop design that PulseAudio is implementing; sure they are valid use cases, just like embedded systems (Palm Pre uses PulseAudio if you didn’t notice that before), but they are not what Lennart is interested in himself; at the same time I don’t think he’d be stopping anyone to improve the system mode support for those, as long as it wouldn’t require the desktop setup to make compromises.

Because the idea is, as usual in any software design, the one that you have to take compromises; Lennart wants the best experience for what concern desktop systems, and he compromises that system mode is not part of his plan, and it shouldn’t be hindering him. At the same time, while he does get upset when people ask for support about it, and he wrote why it’s not supported he hasn’t removed it (yet — if I was him, at this point I could have just removed it out of spite!). So colouring him as the master of evil does not seem the very best idea — and especially that makes me picture him in the part of Warren in the Trio, from Buffy’s season six.

Oh and a final note: it doesn’t have to surprise that Lennart and Fedora don’t care about running mpd and other services as different users, there are probably quite a few reasons for this. I cannot speak for Fedora, given I’m not involved in it, but my suppositions are that firstly the ALSA dmix plugin is somewhat scary from a security point of view (for me too) because it uses shared memory between processes from different users to do the mixing, and the second is that Fedora does a lot to use SElinux even on standard desktops. This is much tighter than separating privileges with different users since it forces the processes to behave as instructed. Unfortunately on Gentoo the SElinux support seems to have gone for good, at least to me.

How to improve releases quality: working on PulseAudio 0.9.16

Just when I said that I was resuming my work as a PulseAudio maintainer in Gentoo, Lennart released a 0.9.16-test1 tarball. This was my cue to enter the scene upstream: the first test at packaging this in Gentoo failed, for a series of different reasons, some of which are internal (we don’t have the latest version of udev available yet, I hope we will by the time PulseAudio 0.9.16 final is releasd), but most are due to upstream changes that didn’t take into consideration some corner cases that Gentoo, as usual, gets to deal with.

So you won’t see the test1 (rc1) ebuild in the tree at all, you’ll probably have to wait for test2, and even that will require some work. For now I’ve fixed all the build- and run-time issues I’ve seen in the released tarball and git repository; plus I’ve been able to get it to properly build fully on both (Gentoo/)FreeBSD and OpenSolaris (with Prefix). I haven’t been able to experiment with actually having it playing yet, but it’ll come there at one point.

Unfortunately there are still a few shady details that I or someone else has to take care of. For instance, the tests still fail consistently: last time I tried them I got two failures on Yamato, one related to IPv6 enabled in PulseAudio build, but not enabled for the kernel, resulting in the IP ACL test asseting out (now I’ve fixed it, by warning of the case, and ignoring it as a failure); the other is the mixing test, which fails for everybody because it doesn’t know anything about the 24-bit and 24-bit-in-32-bit sample types; this I extended to support 24-bit, but was unable to do anything about the 24-bit-in-32-bit because I couldn’t grok it properly.

On non-Linux operating systems (FreeBSD and OpenSolaris), I had to work on a few more issues, like implicit declarations (there still is one in OpenSolaris), shadowed names, and of course there is some slight porting to be done, which I have nowhere near finished yet: the shm (Shared Memory) support in FreeBSD is imperfect, and for neither operating systems I’ve implemented the “get process name” function.

Okay I’m not able to provide a 100% porting to all the operating systems out there, but I still think I can do a bit to help out by making sure that PulseAudio won’t need to be extensively patched by all the porters out there. And until Lennart actually gets around merging my patches, you can find all them at gitorious so you can test them.

Plugins design: interface style

In my previous post of plugins versus builtins I’ve given a few reasons to make use of plugins, explaining the bad side of them, which left up in the air the fact that you can very well decide to support both plugins and builtins – I think VLC does that by the way – and let the user (or the distributor) choose at build-time whether to link them in statically or link them externally.

To make this possible, you’ve obviously got to deal with a few different design decisions; while plugins have more or less the same basic idea for design, their implementation may be quite different. Since usually you need more than one function from a plugin (at least setup, action and teardown), you have different methods to handle them.

For instance, you can have one symbol per function, and use dlopen() (or similar) for each of them to fill in a structure to call the various interfaces, or you can have a single structure already filled in with the pointers to the (static) functions, or finally you can have a single initialization function that fills in the structure and return it to the calling application.

While the difference between these approaches may seem to be minimal, it actually can be quite a difference: calling into the dynamic loader multiple times can waste time, so having a single entrypoint makes perfect sense from a design point of view. But the remaining two approaches are also very different.

From a logic point of view of a theoretical programmer, having a structure, an object, already pre-filled with the pointers of the static member functions is the best choice. This is, after all, how the C++ vtables work: they are little more than a structure with functions’ addresses. But if your background is more practical than theoretical, you might have already noticed a problem here; when using position independent code (PIC or PIE) all the objects containing pointers (doesn’t matter whether they point to functions or data) need to be relocated at runtime (that’s what .data.rel is used for).

When relocating objects, the .data.rel section of an executable object (shared or not), will become dirty, causing it not to be shared among processes any longer. As I’ve written in the previous post, prelink does not work on them which means it won’t be able to alleviate the effect of relocation on the object (like it would be for normal shared libraries). Also, since plug-ins are isolated shared objects, they don’t share the sections of the main program – I already said this before – the relocation of small “vtable-like” objects will cause the copy-on-write of a full page (4KiB) for each plugin loaded, even when the object itself is just a few bytes big.

My preferred solution is to have a single interface function, which, once called, will fill-in a heap-allocated function with the pointers to the rest of the interface function. This option allows to just have a single symbol per plugin that needs to be looked up, and at the same time sidesteps the problem of relocation (the pointers will be calculated at runtime, just like a relocation, but without the memory hit).

The next post in the series will try to focus more on the style of the initialisation function, to be shared between plugins and builtins.

Planning for PulseAudio

Thanks to Betelgeuse I finally have audio again on Yamato (again, thanks! — on a different note, this actually made me find out that there absolutely is a bug in ALSA that causes mmap to kill PulseAudio both with the ICE1712 and the HDA drivers), so I’m resuming my duty as PulseAudio maintainer. This is the reason why PulseAudio jumped to version 0.9.15-r50 in ~arch. So what’s up with that?

My current plans in respect to PulseAudio are trying to get 0.9.15 in stable to replace the ancient 0.9.9. What has stopped PulseAudio to go stable up to this point has been exactly two dependencies: OpenRC and libtool 2.2. Originally, the idea was to keep PulseAudio only compatible with OpenRC and no longer with baselayout 1; it was supposed to go stable pretty soon and the baselayout 1 init script was so scarily incomplete that we simply preferred not have to support it.

Unfortunately, there is still no date for OpenRC to go stable, if it’ll go at all in its current form. At the same time, Lennart has seriously warned against system wide mode (even though there are still valid use cases for which Gentoo often is used!) so keeping the new versions off from stable for a “minor” feature that is not even recommended to be used sounds like a bad plan.

For this reason I’ve now split the ebuild in two versions: one will keep the system mode support, with the system mode warnings, the init script and all the niceties, and the other won’t, and won’t depend on OpenRC at all; the latter is what is supposed to go stable and what stable users should locally unmask if they want PulseAudio.

Let me state again: if you want newer PulseAudio and you’re in stable explicitly request the -r1 version, not the -r50!.

Unfortunately while I should be able to ask for stable right away for what concerns time and bugs, there are a few dependencies, which include libtool 2.2 which is not stable yet (and I think it should be, the tinderbox haven’t found many libtool 2.2 bugs lately and quite a few packages started requiring that, rather than just a generic libtool that 1.5 is compatible with).

I still have no real plans for the realtime support; while Lennart released rtkit (does anybody find it concerning that Linux started having packages with names vaguely similar to those from Apple’s OS X?), it needs a patched kernel, which means I should probably be pestering our kernel team to get those patches included before we can actually provide it, even optionally.

This week I hope to be able to work on mpd too, so that the Gentoo packaging plays nice with PulseAudio (right now the fact that you have to run it with a different user forces you to use a systemwide instance).

Slimming down the portage tree

So while I’ve got the tinderbox turned off I’ve been taking care of a few different QA duties that I’m probably not supposed to do but I’m sure to help Gentoo. I’m actually pretty sure that this kind of tasks might actually be even more interesting for users than what I’ve been doing with the tinderbox.

While the tinderbox’s main goal is to be able to find the broken software that is in the tree, this usually produces just a lot of work for other developers (bugs to fix) and a few extra side-effects like identifying smaller QA violations and some very broken package that I have been last riting and that will be removed over the course of the next two months.

On the other hand, the manual analysis I’ve been doing tonight aims to check the actual stuff that is added to the tree, like binary files or big files directories. For those wondering why I’m on a crusade against binary files in the tree, I have to say that first of all, CVS makes it difficult to handle binary files, and this makes them unsuitable for being added to the tree. Additionally, binary files in the tree often mean there is something else broken with the packages: compressed big patches (that still keep big) and huge, messed up files directory with unused content, and stuff like that.

I’ve been able to shave a few kilobytes off the tree by moving a few files on the mirrors and removing the big files from the tree; but I’ve also started sending last rites for the packages that have this kind of issues and I don’t see as being ready to be fixed sometime soon. Interestingly enough, it turns out like there is enough cross-over between the packages that fail, that have QA issues and that are polluting the tree with too-big files directories and so on.

So please don’t get mad at me again if I masked for removal a package you use: if you want to keep it in the tree, please get it fixed.