This Time Self-Hosted
dark mode light mode Search

Ruby-NG: Package in a Bottle (or, learn how to write a new Ruby ebuild)

I have to say that in the months we’ve been working on the new eclasses, I never went on describing properly how to use them. My hope was to write this documentation straight into the next-generation development manual for Gentoo, but since that project is far from coming, I’ll just rely on my blog for a little while more.

As described in my blog posts the idea behind the “new” (they are in tree for a few months already by now) eclasses is to be able to both handle “proper” Gentoo phases for packaging gems, and at the same time manage dependency and support tracking for multiple Ruby implementations (namely, Ruby 1.8, Ruby 1.9 and JRuby right now). How can we achieve this? Well, with two not-too-distinct operations; first of all we avoid using RubyGems as a package manager – we still use, in some cases, the gem format, and we always use the loader when it makes sense – and then we leverage the EAPI=2 USE-based dependencies.

Why should we not use RubyGems package management for our objective? With the old gems.eclass we used to encapsulate the install operation from RubyGems inside our ebuilds, but it was all done at once, directly into the install phase of the ebuild. We couldn’t have phases (and related triggers) such as prepare, compile, test and install. In particular we had no way to run tests for the packages at install time, which is one of the most useful features of Gentoo as a basis for solid systems. There are also other problems related to the way the packages are handled by RubyGems, including dependencies that we might want to ignore (like runtime dependencies injected by build-time tools), and others that are missing in the specification. All in all, Portage does the job better.

For what concerns the USE-based dependencies, when we merge a package for a set of implementations (one, two, three or any other number), we need its dependencies (at least, the non-optional ones) installed for the same set of implementations, otherwise it cannot work (this is a rehashing of the same-ABI, any-ABI dependencies problem I wrote about one and a half years ago). To solve this problem, our solution is to transforms the implementation into USE flags (actually, they are RUBY_TARGETS flags, but we handle them exactly like USE flags thanks to USE_EXPAND), at that point, when one is enabled for a package, the dependencies need to have the same flag enabled (we don’t care if a dependency has a flag enabled that is not enabled in the first package, though).

This actually creates a bit of a problem though, as you end up having two sets of dependencies: those that are used through Ruby itself (same-ABI dependencies) and those that are not (any-ABI dependencies), such as the C libraries that are being wrapped around, the tools used at runtime by system calls, and so on so forth. To handle this, we ended up adding extra functions that handle the dependencies: ruby_add_bdepend and ruby_add_rdepend, both of which “split the atoms” (yeah this phrase sounds nerdy enough), appending the USE-based dependencies to each. They also have a second interface, in which the first parameter is now a space-separated (quoted) list of USE flags the dependency is conditional to.

This is not the only deviation from the standard syntax that ruby-ng.eclass causes: the other is definitely more substantial: instead of using the standard src_(unpack|prepare|compile|test|install) functions, we have two sets of new functions to define: each_ruby_$phase and all_ruby_$phase. This ties into the idea of supporting multiple implementations, as there are actions that you want to take in almost the same way for all the supported implementations (such as calling up the tests), and others that you want to execute just once (for instance generating, and installing, the documentation). So you get one each and one for all function for each phase.

There are more subtle dependencies of course; in the call to the each type of functions you get ${RUBY} to be the command to call the current implementation, while in the all functions it’s set to the first-available implementation (this is important as we might not support the default implementation of the system). The end result is that you cannot call neither scripts, nor commands, directly; you should, instead, use the ${RUBY} -S ${command} format (for the commands in the search path, like rake, at least), so that the correct implementation gets called.

Oh and of course you cannot share the working directory between multiple implementations, most of the time, especially for the compiled extensions (those written in C). To solve this problem, at the end of the prepare phase, we create an implementation-private copy of the source directory, and we use that in the various each functions; to be on the safe side, we also keep a different source directory for the all functions, so that the results from one build won’t cause problems in the others. To avoid hitting performance too much here, we actually do exactly two tricks: the first is to use hardlinks when copying the source directories (this way, the actual content of the files is shared among the directories, and only the inodes and metadata is duplicated); the second is to invert the order of the all/@each@ calls on the prepare phase.

While in all other cases all is executed after the implementation-specific functions, the all phase is executed before the other prepare functions… which are preceded by the copying, of course. This means that the changes applied during the all_ruby_prepare function are done over the single generic directory and then is copied (hardlinked) to the others.

So this covers most of the functionality of the ruby-ng.eclass, but we had another tightly-related eclass added at the same time: ruby-fakegem.eclass. Like the name let you guess, this is the core of our ditching RubyGems as a package manager entirely. Not only it gives us support for unpacking the (newer) .gem files, but it also provides default actions to deal with testing, documentation and installation; and of course, it provides the basic tools to create fake RubyGems specifications, as well as wrapping of gem-provided binaries. An interesting note here: all the modern .gem files are non-compressed tarballs, that include a compressed metadata YAML file, and a compressed tarball with the actual source files; in the past, there has been a few gems that used instead a base64/mime encoding for sticking the two component files together. For ease of maintaining it, and for sanity, we’ve decided to only support the tarball format; the older gems can be either fixed, worked around or replaced.

The boilerplate code for ruby-fakegem assumes that most gems will have their documentation generation, and tests, handled through means of rake; this is indeed the most common situation, even though it’s definitely not the same situation among different projects. As I said before, Ruby’s motto is definitely “there are many ways to skin a cat”, and there are so many different testing frameworks, with different task names, that it’s not possible to have the same exact code to work for all the gems unless you actually parametrise it. The same goes for the documentation building, even when the framework is almost always the same (RDoc; although there are quite a few packages using YARD nowadays, and a few that are using Hanna — which we don’t have in tree, nor will support, as it requires a specific version of the RDoc gem. an older one). The result is that we have two variables to deal with that: RUBY_FAKEGEM_TASK_TEST and RUBY_FAKEGEM_TASK_DOC which you can set in the ebuild (before inheriting the eclass) to call the correct task.

Now, admittedly this goes a bit beyond the normal ebuild syntax, but we found it much easier to deal with common parameters through variables set before the inherit step, rather than having to write the same boilerplate code over and over… or have to deduce get it directly from the source code (which would have definitely wasted much more time). Together with the two variables above we have two more to handle documentation: RUBY_FAKEGEM_DOCDIR that is used to tell the eclass where the generated documentation is placed, so that it can be properly installed by the ebuild, and RUBY_FAKEGEM_EXTRADOC that provides a quick way to install “Read Me”, ”Change logs” and similar standalone documentation files.

Finally, there are two more variables that are used to handle more installation details. RUBY_FAKEGEM_EXTRAINSTALL is used to install particular files or directories from the sources to the system; this is useful when you have things like Rails or Rudy wanting to use some of the example or template files they are shipped with, at runtime; they are simply installed in the tree like they were part of the gem itself. RUBY_FAKEGEM_BINWRAP is the sole glob-expanded variable in the eclass, and tells it to call the “binary wrapper” (not really binary, but rather scripts wrapper; the name is due to the fact that it refers to the bin/ directory) for the given files, defaulting to all the files in the bin/ directory of the gem; it’s here to be tweaked because in some cases, like most of the Rudy dependencies, the files in the bin/ directory are not really scripts that are useful to be installed, but rather examples and other things that we don’t want to push in the system’s paths. It also comes useful when you might want to rename the default scripts for whatever reason (like, they are actually slotted).

What I have written here is obviously only part of the process that goes into making ebuilds for the new eclasses, but should give enough details for now for other interested parties to start working on them, or porting them even. Just one note before I leave you to re-read this long and boring post: for a lot of packages, the gem does not provide documentation, or a way to generate it, or tests, or part of the datafiles needed for tests to run. In those cases you really need to use a tarball, which might come out of GitHub directly, if the repository is tagged, or might require you toy with commit IDs to find the correct commit. Yup, it’s that fun!

Comments 1

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.