This Time Self-Hosted
dark mode light mode Search

Log analysis, yet again

So I’m again trying to find a solution to the log analysis problem; the main issue at this point is that the tinderbox is generating something along the lines of 200MB of logs a week — probably also because thanks to Zac it’s much much more efficient than it was before. With such an amount of data to shuffle through, the grep command from within Emacs is no longer feasible.

What I’m considering using now is to store most of the data directly inside a database (PostgreSQL since that’s what I’m using already here) and then take it out from that, in a simple (web) interface. The reason why I’m going for a web interface is that it’s likely what takes less time to design, to quickly report and copy content.

On the storage side, the main question for me is whether the database should also contain specific details of the problem or just the presence of such a problem and a pointer to the log file. In the former case, the web application could easily be extended to something more than a glorified grep, but it’d have to store a non-trivial amount of data. Some log files are well over the 10MB, so it gets a bit tricky to handle those properly.

Thinking a bit further on the interface, it should really be a way to report bugs directly: if the application can find that the merge found ELF files in /usr/share, filing the bug directly is just a matter of finding who exactly maintains a particular package (which is quite easy), and it wouldn’t even require copy-pasting if the data is available directly in the database, already parsed. Obviously, it would still require manual confirmation before opening the bug, and before doing so, it should also implement an easy search function to show possible duplicates.

While my first guess was to write a stupid CGI (or using the Ruby integrated webservers in a script) to have on the browser the results from the database, I’m now more interested in the idea of having some more complete application to deal with this. Pavel also suggested for allowing other developers to access the interface to report the bugs, so that even if I’m not around to do the filing someone else can. Unfortunately that also bring up a problem: if I were to allow developers to file bugs with their account I’d have to make them give their login information to the tinderbox (and I don’t like that not even if it’s me running it); on the other hand I’d rather not make them file bugs with my own account, so I guess it’d require to set up a no-mail account for the tinderbox (no-mail since it’d be pointless to have mail coming for a tinderbox account), and then make the users CC their own address by default.

Now comes the problem: I can probably start working on such an interface myself, using Ruby on Rails, which is something I’m somewhat fluent in; on the other hand, I know of no Ruby interface for the Bugzilla RPC protocol, but there is a well-tested pybugz extension for Python (which I’m definitely not fluent in). Before I start hacking anything at all (since that’s going to change quite some bits of the interface; if I were to use Ruby on Rails, the ORM will most likely call for an abstracted interface to the database, which is good for some things but not for everything), I really need to see if somebody could help me with such a task in the long run.

If somebody is up to writing the interface in Python to my specs, using pybugz, that’d be fine, otherwise I’d like to see if somebody already worked in a pybugz-like interface for Ruby instead. At worse I could settle for just opening the bug with pre-filled fields, and then attach the build log afterwards (to attach the log I need to know the bug number of the just filed bug), and that’s not feasible by just providing a link to the pre-filled bug (although it should be still be quite an improvement to my workflow, if I had that!).

So, anybody can provide any insight or volunteer to help me out?

Comments 2
  1. “… if I were to allow developers to file bugs with their account I’d have to make them give their login information to the tinderbox (and I don’t like that not even if it’s me running it);”an alternative approach: encrypt each tinderbox user’s bugzilla-account-information with a different encryption key. on login, user will provide his decryption key/password and will – either be kept decrypted in memory during session then discarded on logout- or load decryption key on every bugzilla access, after usage discard immediately

  2. At the expense of one extra click, you can avoid all the work with the login information by using the “Remember values as bookmarkable template” button, or rather the logic that makes it work. Consider the link below, which opens the bug entry page with a bug subject of “Tinderbox example” and a body of “Test bug”. Your webapp can derive an appropriate pre-fill URL and present it to the user. The user then clicks it, reviews the data, and hits submit. It is submitted directly from his browser, so his login credentials are used without ever sending them to you. It is not clear if this is what you meant in the last paragraph when you mentioned pre-filled fields, so I thought I should point it out explicitly.https://bugs.gentoo.org/ent…As regards attaching files: you could have the developer paste back into the webapp the bug number given to them, and allow build logs to be attached from your account (either your main account or a tinderbox specific one). This is probably less prone to abuse than letting people file arbitrary bugs as you, especially if you only allow trusted developers access to ask the webapp to post attachments, and it can only post build logs with a fixed description. I think it should even be possible to do this from curl (once you give it a copy of your cookie), so you could put together a shell script that, given a list of bugs and logs, goes through attaching files to each one. Landfill will be your friend for testing this one. ;)For this relatively simple usage, even simple scraping of the bugzilla responses via regular expressions is workable, if unclean.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.