This Time Self-Hosted
dark mode light mode Search

Let’s actually get some metadata!

This blog entry is intended to be a technical proposal, so you might need some knowledge on how Gentoo works if you want to follow it very well. I hope to be as clear as possible so that even non-technical users can read this, but I can’t guarantee it.

So you might or might not know (but if you are a Gentoo developer, and you don’t know, please leave your badge and CVS commit access on the table at the right of the door) that together with ebuilds, Manifests, ChangeLogs and files/ subdirectory, a package’s directory usually contain a metadata.xml file. This file right now contains very little information, although it’s very important information: which herd the package belongs to, who is/are the maintainer(s) of the package, and an optional long description that couldn’t fit in the DESCRIPTION variable of the ebuild. Optionally, you can encode them in different languages, although I see very little going on with that.

These are indeed metadata for the package, as they are data that describe the package. Good, then. Are there problems with this? I don’t.

So if we don’t have problems to solve we should be all set, right? No I don’t think so. As Doug pointed on #gentoo-dev a while ago, it’s very difficult to understand what generic USE flags do in a particular package. Sure you all can imagine that jpeg or png USE flags do in general, but what do they do in the particular case of CUPS for instance? Do they allow jpeg or png files to be printed? Do they allow the HTTP interface to use jpeg or png files?

This is not limited to CUPS and those USE flags. What does the minimal USE flag do for libcdio? And what does it do for vcdimager? And for other packages? While the general sense of “minimal” is clear – it cuts down the stuff that the package installs just to install the core of it, like a library for instance – the specific behaviour of the flag on a given package might require looking at the package.

Doug proposed to use use.local.desc more often, but the problem with that is that the file is already way too long, and you can’t really do much explanation with it, or you make it very very big. And it’s impossible to restrict the description of an USE flag per version.

Here is the problem then, how can represent enough information for this? Well, one way is to write extensive documentation for the package and add it to the Gentoo documentation. This works well for complex packages like, say, Apache, but it doesn’t scale well for simple packages that may not have documentation at all because they are not supposed to be directly used by users.

A possible solution is to consider the particular description of an USE flag for the package as metadata of that package, and, well, write down that information in metadata.xml. I suppose the main problem here is that a lot of Gentoo developers despise XML in any form, without even considering that, while I actually hate any configuration file in XML, it’s quite easy to use it to represent information that needs to be accessed through many different means, as an interexchange format.

Anyway, this is just a proposal, feel free to comment if you want to say something about it; please avoid XML bashing in general, although feel free to comment if you think that it’s inappropriate its use in this context.

I suppose that for this proposal to be accepted, there are three main obstacles to overcome: the first is modifying the DTD to allow some tags to document the USE flags; the second is deciding if the flag in metadata.xml is enough to skip IUSE.invalid warnings (by changing repoman in case), and the third is implementing the support for fetching the description of the USE flag in tools like ufed and similar.

I just hope this doesn’t have to become a GLEP to actually be used.

Comments 9
  1. I completely agree. There needs to be per-package descriptions for USE flags. I’ve always thought it silly to list the USE flags per-package on gentoo-portage.com when the only descriptions are “global.”

  2. That does sound like a really good idea and would help people a lot.XML ain’t pretty but might be the right tool for that job.

  3. Nice idea. In my opinion this should be made mandatory for new packages and added to existing packages as they are touched (bumped etc). The syntax should however incorporate versioning, which will probably make it not very human readable anymore. Like this:

    <use flag="brokenlib" version="&lt;=cat-bar/foo-0.9">Do stuff using the old brokenlib libs (replaced by myass-libs in foo-1.0)</use>

    This will however cause problems as the package atoms may break XML syntax (see above example). It gets even more complicated if one really wants to avoid redundancy with IUSE in the ebuilds (we probably have to accept that redundancy).Your proposal should be discussed on gentoo-dev ML.

  4. I definitely agree.As a Gentoo user, I’ve often searched (way too often in vain) the forums for some particular package’s USE flag descriptions. Reading the ebuild file sometimes helps, but sometimes one just gets to know that “foo” USE flag enables the –with-foo configure option for the package, which is hardly helpful (you have to unpack the source and dig into the package’s README and so on – at that point, why not just compile it manually, anyway?). The descriptions in use.local.desc are too short and generic, plus it makes much more sense to ship package-specific info with the package, not in a centralized place. Just my 0.02.

  5. We just happened to discuss this again with leio and genstef on #gentoo-dev yesterday. I think most of us agree here, we do need better documentation of what _exactly_ a given USE flag does to a given package.However I don’t think that metadata is the right place to store this. Usage of USE flags may change from one ebuild (i.e. release) to the other, and although one can argue this won’t change often, it can (thus it will) change. This means we have to provide that information in the ebuild. Make it a set of variables whose names are the USE flag and a common prefix, or one variable containing the descirptions in XML, or whatever else. But it does need to be on a per-ebuild basis, like DESCRIPTION and HOMEPAGE, for example.

  6. For what concern the per-ebuild property of USE flags, yes, they change and not so rarely, but @metadata.xml@ already provides means to limit it to single versions.I think that if we want to make the documentation proper, we’d have to put it outside the ebuilds. I can think of a couple of examples where proper documentation of USE flags might end up being longer than the current ebuild.Also, you’d end up duplicating information needlessly every revbump, if you don’t change the data, wasting space.And finally, in ebuild variables you’re limited to what bash can handle, and you’d see yourself using escapes quite often. In XML you either use a CDATA or you use the escapes for the only three entities that needs to be escaped: < > &, I find that easier to write.

  7. Having just read the post from Doug Goldstein concerning this topic I must say: It is great to see that Gentoo is not just alive but living!

  8. That would be a fantastic feature to have. I can’t count the times I’ve sat and wondered what the hell a specific use flag did in a package.

  9. Good point about being able to restrict some content of metadata.xml to specific ebuild versions. Since this is such a good idea, how about we also move HOMEPAGE, DESCRIPTION and LICENSE to metadata.xml then ?

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.