Andi Kleen's blog http://halobates.de/blog Tilting at windmills and other endeavors Wed, 02 May 2012 11:58:27 +0000 en hourly 1 http://wordpress.org/?v=3.0.4 epoll and garbage collection http://halobates.de/blog/p/189 http://halobates.de/blog/p/189#comments Tue, 01 May 2012 13:01:34 +0000 therapsid http://halobates.de/blog/?p=189 epoll, xor lists, pointer compression. What do they have in common? They all don’t play well with automatic garbage collection.

I recently wrote my first program using epoll. And I realized that epoll does not play well with garbage collection. It is one of the few (only?) system calls where the kernel saves a pointer for user space. It supports stuffing other data in there too, like the fd, but for anything non trivial you typically need some state per connection, thus a pointer.

A garbage collector relies on identifying all pointers to an object, so that it can decide whether the object is still used or not.

Now normally user programs should have some other reference to implement time outs and similar. But if they don’t and the only reference is stored in the kernel and the garbage collector comes in at the wrong time the connection object will be garbage collected, even though the connection is still active. This is because the garbage collector cannot see into the kernel.

Something to watch out for.

]]>
http://halobates.de/blog/p/189/feed 2
The weekend error anomaly http://halobates.de/blog/p/181 http://halobates.de/blog/p/181#comments Fri, 20 Apr 2012 03:11:02 +0000 therapsid http://halobates.de/blog/?p=181 I run mcelog.org which describes the standard way in Linux to handle machine check errors. Most of the hits of the website are just people typing a error log into a search engine mcelog.org ranks quite high on Linux machine check related terms.

The log files give me some indication how many errors are occurring on Linux systems in the field. Most of the errors are corrected memory errors on systems with ECC memory: a bit flipped, but the ECC code corrected it and and no actual data corruption occurred. (In this sense they are not actually errors, a more correct term would be “events”). Other errors like network errors or disk errors are not logged by mcelog.

I noticed is that there seem to be less memory errors on weekends. Normally the distribution of hits is fairly even over the week. But on Saturday and Sunday it drops into half.

It’s interesting to speculate why this this weekend anomaly happens.

ECC memory is normally only on server systems, which should be running 24h. In principle errors should be evenly distributed over the whole week.

Typing the error into google is no automated procedure. A human has to read the log files and do it manually.

If people are more likely to do this on work days one would expect that they would catch up on the errors from the weekend on Monday. So Monday should have more hits. But that’s not in the data: Monday is not different from other weekdays.

It’s also sticky for each system (or rather each human googling). Presumably the person will google the error only once no matter how many errors their system have and after that “cache” the knowledge what the error means. So the mcelog.org hits are more a indication of “first memory error in the career of a system administrator” (assuming the admin has perfect memory, which may be a bold assumption). But given a large enough supply of new sysadmins this should be still a reasonable indication of the true number of errors (at least on the systems likely to be handled by rookie administrators)

The hour distribution is more even, with 9-10 slightly higher. Not sure which time zone and what that means on the geographical distribution of errors and rookie admins.

One way to explain the weekend anomaly could be that that servers may be more busy on weekdays and they may have more errors when they are busy. Are these two assumptions true? I don’t know. It would be interesting to know if this shows up in other peoples large scale error collections too.

I wonder if it’s possible to detect solar flares in these logs. Need to find a good data source for them. Are there any other events generating lots of radiation that may affect servers? I hope there will never be a nuke or a super nova blast in the data.

]]>
http://halobates.de/blog/p/181/feed 5
Longterm 2.6.35.13 kernel released http://halobates.de/blog/p/125 http://halobates.de/blog/p/125#comments Thu, 28 Apr 2011 18:31:25 +0000 therapsid http://halobates.de/blog/?p=125 A new longterm 2.6.35.13 Linux kernel is released. This version contains security fixes and everyone using 2.6.35 is encouraged to update.

tarball, patch, incremental patch against 2.6.35.12, ChangeLog

]]>
http://halobates.de/blog/p/125/feed 0
A Linux longterm 2.6.35.12 kernel has been released. http://halobates.de/blog/p/131 http://halobates.de/blog/p/131#comments Thu, 31 Mar 2011 19:15:14 +0000 therapsid https://halobates.de/blog/?p=131 This release contains security fixes and everyone is encouraged to update. Thanks to all contributors.

Announcement

Full tarball linux-2.6.35.12.tar.gz

SHA1: 71bd9d5af3493c80d78303901d4c28b3710e2f40

Patch against 2.6.35: patch-2.6.35.12.gz

SHA1: 30305ebca67509470a6cc2a80767769efdd2073e

Patch against 2.6.35.11 patch-2.6.35.11-12.gz

SHA1: e2c30774474f0a3f109a8cb6ca2a95f647c196b8

ChangeLog since 2.6.35

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/longterm/linux-2.6.35.y.git

]]>
http://halobates.de/blog/p/131/feed 0
The Linux longterm kernel 2.6.35.11 has been released http://halobates.de/blog/p/63 http://halobates.de/blog/p/63#comments Sun, 06 Feb 2011 20:39:03 +0000 therapsid http://halobates.de/blog/?p=63 The 2.6.35.11 longterm kernel is out.
Patches and tarball at kernel.org or from the git tree. Full
ChangeLog.

Thanks to all contributors.

]]>
http://halobates.de/blog/p/63/feed 4
More ergonomics – desktop lightning: redshift http://halobates.de/blog/p/119 http://halobates.de/blog/p/119#comments Sun, 23 Jan 2011 05:10:08 +0000 therapsid https://halobates.de/blog/?p=119 A cool new desktop ergonomics tool I started using recently is redshift. redshift is a clone of the f.lux package, which also has Mac and Windows versions (There’s also another fork of redshift called redshift-gui, but I’m using the plain redshift which works fine for me)

Side note: some corporate firewalls seem to block the f.lux page because it contains flux minus dots (like the botnet?)

The basic idea is to adjust the light temperature of your screen to your inner clock and vice versa. Your inner clock running in a circadian rhythm regulates your sleeping schedule, body temperature and lots of other things. The human clock (or rather the clock of near all animals) does not have an exact 24hour period, but runs slightly slower. To keep your rhythm aligned to the day it regularly needs to be re-adjusted by a zeitgeber. This is normally done using day light input to your eyes, using brightness and light temperature (red for evening etc.)

Now the problem with people like me (and likely you) who do stare for long times into computer screens is that the lightening does not change there, unlike natural light. The light temperature stays cold like at noon, even if it’s in the middle of the night. This doesn’t give any cues to your internal zeitgeber.

You configure f.lux/redshift with your position and they compute the current position of the sun and adjust the light temperature of the screen based on that. This generally means that the screen gets more and more red towards the evening. In the night it stays still red

(I guess that’s not fully natural because real night is dark, but then a dark screen wouldn’t be very practical. Perhaps we had a few hundred thousand years to adjust to staring into red fire at night though, but I don’t want to get into evolutionary psychology style just so stories here)

This should make you more sleepy towards the evening and keep your sleep rhythm nearer normal day/night.

Does it actually work? I’m not sure, but I like it at least. (but perhaps that’s just because I don’t mind my screen having a red taint — you can tell what times of the day I prefer for work) Informally at least my sleep rhythm didn’t get worse from using it, and may be slightly better. I also like the visual indication of the time.

I guess I should do a comparison study and keep some data on the sleep rhythm while using redshift and not using it, but so far I haven’t had the energy to really set up such a experiment.

The f.lux page has some more theory and references.

One problem right now is that the X server gamma adjustment does not cover the mouse, so the mouse pointer in cold colors looks a bit out of place towards the evening. I wonder how hard it would be to fix that?

I also sometimes have to turn it off when viewing pictures or movies, but only rarely.

And when you travel you don’t have to forget to adjust the coordinates.

Interesting side question: how to set up the tool when you use the laptop during an intercontinential flight? Could it help with jetlag when used the right way? But how?

]]>
http://halobates.de/blog/p/119/feed 2
Computer ergonomics II – xterms colors http://halobates.de/blog/p/106 http://halobates.de/blog/p/106#comments Tue, 18 Jan 2011 14:43:33 +0000 therapsid https://halobates.de/blog/?p=106 After the basic setup for the xterms colors are very important. My personal theory is that always looking at the same color tires the eyes. To avoid that I use some scripts to randomize the colors based on some basic patterns.

First I have the blackterm. This is a white on black setup, but with the white and black actually being randomized shades of grey. This way all the xterms look slightly different and can be easily visually distingushed. blackterm is my basic terminal that I’ve used for a long time.

Then I got the whiteterm. According to conventional wisdom black on white is easier to handle for the brain, although I am never sure I ever quite believed it. But still it’s useful I guess. whiteterm works the same way as blackterm and actually uses shades of grey, just the white and black are switched.

I typically try to combine both the black terms and the white terms in a single desktop, with giving priority to one of them depending on mood.

Then the final piece is the randterm. The idea and data file is originally from David Holland. randterm randomizes the foreground and background colors based on a data file. Some of the combinations are not so great (but I already eliminated most of the really awful ones), but most are ok. Sometimes when I don’t like a combination I start it a few times until I get one that matches my current mood. Typically I add one or two randterms to the stable of black and white terminals to make the desktop more visually interesting.

For all of those I have large (18 pixels) and small (10 pixels) variants. The scripts default to large and can be switched to small with the --small argument.

Overall in a desktop menu you end up with 6 choices to create your xterm menu or shortcuts:

Color Large Small
Black-on-White blackterm blackterm --small
White-on-Black whiteterm whiteterm --small
Random randterm randterm --small

With these choices you can set up a colorfull desktop that is easy on the eyes, like this:

The same idea can be applied to other programs. I used to do the same with Emacs windows, although currently I went back to mostly plain black on white Emacs.

One problem with the different schemes are terminal programs that use their own color schemes, like vim or lynx or emacs -nw. They typically have color schemes for dark or light xterms that you can statically configure, but do not really deal well with having both dark and light and even randomized. Often I just turn off the colors. An easy way to do that is to start them with TERM=vt100 … The programs often fall back to using bold instead of colors then, which works for me.

At least for vim I tended to also have aliases to start vim with a dark color set and a light color set, but you always have to remember to use the right alias then and it doesn’t quite work with programs that start the editor directly through $EDITOR. So usually I just turn them off or rely on bold only.

I guess best would be wrappers that query the xterm on their current colors and select the right color schemes, but I haven’t implemented that so far.

]]>
http://halobates.de/blog/p/106/feed 3
Computer ergonomics I – xterms fonts http://halobates.de/blog/p/98 http://halobates.de/blog/p/98#comments Sun, 16 Jan 2011 05:20:45 +0000 therapsid https://halobates.de/blog/?p=98 If you do your computer work mostly in text terminal windows like me, setting up the right ergonomics for them is very important. It always pays off to invest some time in setting up the environment you spend the most time in (like you should spend some resources on finding the right mattress you spend one third of your live on)

First I use good old xterm(1). xterm is much faster than the default terminals now used in desktops. Gnome Terminal has lots of problems: First it is quite slow, which can make a big difference for jobs that generate a lot of output like a large compilation (see the warning from the SBCL folks at the bottom of the page). And it does have strange semantics with cut-n-paste. And various other issues. I haven’t used the KDE terminal program for some time, but I remember it also being somewhat problematic. On the other hand xterm just works. xterm tends to be included with distributions, but on some newer versions you need to install the package explicitely.

The defaults of xterm are quite reasonable, except for the too small scrollback buffer. The later can be set with the -sl option or by putting a XTerm*saveLines: line into your .Xdefaults. I usually also disable the scrollbar with +sb to disable clutter.

Then you need a good font for them. The default fonts used in xterms are not great and tend to be too small at least for me. The best xterm font I know of is the sgi-screen font. This is a bitmap font that has been properly designed by a good font designer, not TTF. If you ever used an SGI IRIX workstation that’s the font used in the text terminals. Luckily SGI freed it some time ago and the font is included with OpenSUSE. The rpm installs fine on other distributions (with –nodeps), but you may need to set up the font path manually. For example in Fedora this can be done with ln -s /usr/share/fonts/misc/sgi/ /etc/X11/fontpath.d

sgi screen lorem ipsum in vi

One trap with xterms is that if you set the font using the -fn argument xterm will automatically try to generate an bold font for it. sgi-screen has bold versions of each font, but xterm cannot find it directly for some reason. I work around this by always specifying a bold font too (with -fb)

Setting the right font size is important too. I tend to use large and small xterms for different purposes. For a primary work xterm you want a larger font (I use the 18 pixels sgi-screen usually). To watch a logfile or nurse a long running compile job I use a smaller font, like a 10 pixel sgi-screen.

Setting the right size is also important. Typesetters know that too long lines are hard to read. Because of that I try to avoid xterms beyond 80 characters width.

More on xterms later.

Update: fixed link to sgi-fonts.

]]>
http://halobates.de/blog/p/98/feed 6
The Korean Tamagotchi http://halobates.de/blog/p/93 http://halobates.de/blog/p/93#comments Wed, 12 Jan 2011 10:16:00 +0000 therapsid https://halobates.de/blog/?p=93 My Samsung mobile dumb phone has the annoying tendency to complain in the middle of the night when it runs out of battery. For some reason this always happens in the middle of the night and already has woken me up several times. The alarm happens multiple times, so if you don’t give it its juice on time it’ll wake you up later again. It’s a bit like a small kid or a Tamagotchi (are they actually still used?).

It’s very annoying. It’s interesting to think how to designers could have avoided the problem. Check the current time zone and do battery alarms in a few hours in advance during normal waking hours?

]]>
http://halobates.de/blog/p/93/feed 2
Big kernel lock semantics http://halobates.de/blog/p/85 http://halobates.de/blog/p/85#comments Fri, 24 Dec 2010 11:10:22 +0000 therapsid http://halobates.de/blog/?p=85 My previous post on BKL removal received some comments about confusing and problematic semantics of the big kernel log. Does the BKL really have weird semantics?

Again some historical background: the original Unix and Linux kernels were written for single processor systems. They employed a very simple model to avoid races accessing data structures in the kernel. Each process running in kernel code owns the CPU until it yields explicitly (I ignore interrupts here, which complicate the model again, but are not substantial to the discussion). That’s a classical cooperative multi-tasking or coroutine model. It worked very well, after all both Unix and Linux were very successful.

Coroutines are a established programming model: in many scripting languages they are used to implement iterators or generators. They are also used for lots of other purposes, with many libraries being available to implement them for C.

Later when Linux (and Unix) were ported to multi-processor systems this model was changed and replaced with explicit locks. In order to convert the whole kernel step by step the big kernel lock was introduced which emulates the old cooperative multi-tasking model for code running under it. Only a single process can take the big kernel lock at a time, but it is automatically dropped when the process sleeps and reaquired when it wakes up again.

So essentially BKL semantics are not “weird”: they are just classic Unix/Linux semantics.

Good literature on the topic is the older Unix Systems for Modern Architectures book by Curt Schimmel, which provides a survey of the various locking schemes to convert a single processor kernel into a SMP system.

When a subsystem is converted from the BKL it is often replaced by a mutex code lock. A single mutex protects the complete subsystem. This locking model is not fully equivalent with the BKL: the BKL is dropped when code sleeps, while the mutex will continue serializing even over the sleep region. Often this doesn’t matter: a lot of common kernel utility functions may sleep, but in practice do only rarely in exceptional cases. But sometimes sleeping is actually common — for example when doing disk IO — and in this case the straight forward mutex conversion can seriously limit parallelism. The old BKL code was able to run the sleeping regions in parallel. The mutex version is not. For example I believe this was a performance regression in the early versions of Frederic Weisbecker’s heroic reiserfs BKL conversion.

With all this I don’t want to say that removing the BKL is bad: I actually did some own work on this and it’s generally a good thing. But it does not necessarily give you immediate advantages and may even result in slower code at first.

]]>
http://halobates.de/blog/p/85/feed 3