Thursday, March 19, 2015

System data collection: overhead of long-lived and at-interval collectors

In systems monitoring, data points are harvested with help of collection agents, or collectors. Collectors associate metric values with a point in time and periodically report the result to a time series database.

For my pet projects, I use OpenTSDB's tcollector, which fundamentally supports two modes of running a collection agent:
  • at-interval - a program taking the measurement is being spawned repeatedly in a subprocess at a constant interval, it reports measurements and exits
  • long-lived - tcollector spawns a single subprocess for the data collection agent. The agent outputs datapoints at a constant interval in a continuous loop.

Long-lived collectors are always preferred, because the fork-exec cycle occurring repeatedly with at-interval collectors is an expensive operation, and therefore the at-interval collectors will always incur more overhead.

I wrote a simple collector reporting the CPU temperature in Celsius as read from the /usr/bin/acpi command. I ran it at first as an at-interval collector every minute and subsequently switched it to a functionally equivalent long-lived version. Here is the result:
As reported from the collector itself, the long-lived version keeps the executing CPU entire three degree Celsius cooler.

Wednesday, October 16, 2013

Goodbye #Dublin, you town of the hurdled ford

It starts like this: Meeting friends at the Spire. watching performers on Grafton Street. Pub crawls. Spotting seals in Liffey. Cliff walk in Howth.

Photo: Natalia Czachowicz
Then you can see it: Drifting the sea of redbrick houses on the Dart between Pearse Station and Clontarf Road. Wandering about the concrete jungle of East Wall. Sandymount Beach at low tide. The palpitating sea in the Dalkey Harbour. Watching from the Druids Chair as rain clouds approach.

Then it gets intense: Infiltrating a herd of deer in Phoenix park. The nighttime cheese’n’garlic chips after pints. Picnic lunch in Merrion Square at the peak of  summer. Coffee and scone on a commute along Grand Canal.

Then it affects you: The cold and hot water taps. Getting scared by a Viking Tour approaching from behind. Awaiting Dublin Bus for ages in pouring rain on O’Connell street just to see three of them come all at once.

Then you enjoy it: An all-day 5-item breakfast. The dim light and saxony carpet in a local pub. Pretending you’re in subtropics in the greenhouse at Drumcodra Botanic Gardens when Spain is out of the question. The taste of Guinness.

Then you start to care: Admiring the truly capitalist spirit of Moore Street. Finding new beginnings along Lower Baggot Street. Sharing bad news with Patrick Kavanagh on his bench. The hope that Luas lines meet some day. This “finally!” feeling after landing at the Dublin Airport.

And then you leave.


Tuesday, September 17, 2013

Things #relo companies don't want you to know

Milton Friedman once explained to us what the four ways to spend money are. They're aptly summarized in the quadrant on the right.

When a company agrees to pay for your relocation and provide you with a capped allowance it may be argued that you fit in the category circled in red, i.e. you'll spend someone else's money on yourself. In this case, it's reasonable to assume that you'll want to get your stuff moved at a level of service that you expect, but you won't care particularly about the price as long as it doesn't exceed the allowance. Guess what, moving companies understand that very well and some of they will try to make an easy buck at your expense.

I'm relocating to Northern California from Ireland later in the year. At the end of August I had a visit from a surveyor working for local movers to estimate the volumetric weight of the goods to be moved. He was shown around the apartment, he asked a few questions, and seemed to have an encouraging attitude towards taking everything possible. For instance, he insisted that we take planters with us, despite not being able to take the plants due to strict import and quarantine regulations. He also suggested that clothes airers are very hard to come by in the United States and should definitely be taken along. WTF?! Ok, whatever, ignored that. To finish it off, he put the shipment down as air freight, which is at least 30% more expensive than sea freight, without a mention about the costs.

When the costings estimate came in I was told that my allowance allocated for the move of goods, flights and temporary accommodation was almost exceeded by the shipment alone. It seemed a little dear for a limited amount of stuff that my partner and I decided to move. At that point I did a quick Fermi estimation of the total weight of all goods and the weight from the costings estimate seemed exaggerated about 5 times. When I raised a concern with the move coordinator I received some serious pushback - the coordinating mover defended survey results using my ignorance as reason for confusion. Yet, they never requested any additional evidence. A rather underwhelming response.

I wasn't going to take that lying down.

Before I go on, let me summarize the rules of the housemove game: the surveyors use dimensional weight as a billing technique for estimating the cost. They look at how much space in cubic feet each item takes, multiply each cubic foot by 7 lbs for sea freight or 10 lbs for air freight, and present you with a volumetric weight estimate. So a load with a volume of 50cuft corresponds to 350lbs of volumetric weight. If the actual weight is greater than the volumetric weight, you're charged by actual weight. Simple as that.

Unhappy about the original estimate, I created a model cubic foot (cuft) and took digital snaps of items listed in the survey, captioning them with their corresponding volume estimate.

You never know when a set of red chopsticks and CD box may come in handy. Meet my model Cubic Foot.

The photos around this text include some of the juicier bits. I think you get the idea of the degree of error for most of the items. Do you think the games take 8 cubic feet, or more like 1? How about the backpacks. Is it ambiguous that they take less than 10cuft? They also weigh less than 70lbs, you can take my word for it.

Okay, you might now say, but what about overhead from packaging? The goods can't go in on their own. True, but it's unreasonable to assume that packaging takes 5-7 times the space of the goods ... and this particular survey only reported the net volume of goods. Crating and packaging was added extra on top, as a significant percentage (18%) of the total. A double-edged rip-off.

It should come as no surprise that an independent survey carried out by a competitor produced close to 1/2 of estimated volume. Gee, who would have guessed?

To sum up: Opportunity makes a thief. If you don't want to let this happen to you, make sure you're prepared. Here are three things you should do:
  1. Before surveyor's visit:
    1. Browse the docs on MovingScam
    2. Get wikismart on dimensional weight
  2. Upon surveyor's visit,
    1. tell them upfront, that you'll request their paperwork for inspection,
    2. be very specific as to what you take with you (they seem to hate it)
    3. for every item recorded on the list, ask how much they estimate it for.
After that, the costings estimate that they'll present you with is likely to be accurate.

Now, on the subject of costings. When a relo company gives you an estimated total cost of your move, this is what they mean.

Suppose you're relocating from Ireland to the United States, and the relo coordinator comes back with $10k for door-to-door move. The pie chart on the left breaks down where your 10 grand go.

The actual shipping is just over a quarter of the overall cost. Close to a third goes to the guys who come to pack your stuff.

Alright, that's how it works. I hope it helps!

Saturday, August 3, 2013

Temporal granularity in time series monitoring

Consider running a computationally expensive batch job that runs once an hour. On successful run a data point equal to 1 is recorded, a failed job or lack thereof results in no data being published at all.

The following chart illustrates how presentation of the same data at different temporal granularities may lead to two different conclusions.

The green line represents successful runs with 1-hour granularity, and the red line displays the sum of successful runs in every 8-hour interval.

On the green line the multi-hour gaps between successful runs became invisible when looking at extended periods of time, and occasional stacking up of 2 successful runs in one hour around July 13th does not suggest trouble. But the red line tells a very different story, and clearly suggests that since June 20th the load cannot be met with available capacity. Organic growth?

For more on temporal granularity in monitoring time series data, see this post.

Friday, July 5, 2013

Simulating service dependency outage with iptables

Distributed computer systems applications in general, and web services in particular, rely on a number of dependencies behind the front-end layer, such as databases, queueing infrastructure, distributed locking mechanisms and all kinds of middle-tier services. Reliability of the distributed system strongly depends on the availability of its critical dependencies, in an inversely proportional relationship - the more dependencies, the higher the likelihood any one of them could be down at any given time. And dependencies do become unavailable, mostly for brief periods of time during active/standby failovers or due to network routing blips.

Whatever the reason for unavailability, you want to make sure that your service handles dependency failures gracefully. One way of verifying this is to simulate an outage. If you're in charge of your dependencies, you can just bring them down and see what happens, e.g. shutdown a database. This might not always be feasible - a database might be a resource shared among many users, who shouldn't need to suffer from your testing. The same applies when using a cloud service as dependency - it can't be shut down easily at your convenience, yet it will go down, if just for a brief period, later in the month.

For these situations, iptables comes to the rescue.

Here are the two rules I add to the service hosts to simulate a dependency outage for respective scenarios:
  • Connect timeout: All packets, including those carrying TCP handshake simply get lost.
    iptables -A OUTPUT -d -j DROP 
  • Socket read timeout: The TCP handshake succeeds SYN, SYN+ACK, but the final ACK originating from the client gets dropped
    iptables -A OUTPUT -p tcp --tcp-flags ACK ACK -d -j DROP

Tuesday, July 2, 2013

Linux and the Unix Philosophy by Mike Gancarz

I was given to read Linux and the Unix Philosophy by Mike Gancarz. The work was published back in 2003, and I understand it to be a sequel of The UNIX Philosophy published almost a decade earlier. 

The book is mostly philosophical (surprise, surprise), and an interesting read from a historical point of view. It brought me back in time to the good old age of Linux Kernel 2.0.0 and the like. While some of the truths described are timeless, many are just outdated, historical perspectives. Examples: the author advises to resist the temptation to rewrite shell scripts in C... While relatively few people would have tried rewriting scripts in low-level languages back in 2003, nowadays, with an advent of modern dynamic scripting languages such as Python and Ruby, the point is simply moot. So is convincing people that printing source code on paper is suboptimal. There are very few people in this world who still wouldn't realize it. On the other hand the book doesn't seem to talk too much about aspects of scalability and robustness. The author rightly points out that "The philosophy of one century is the common sense of the next". Yeah, in this case it's hard to disagree.

Thursday, June 27, 2013

Calculating week range based on a timestamp

When someone asks you to create a weekly report (pulling some data from an arbitrary database), you'll be faced with the task of coming up with a week range first: the start and end date for a given week. You're given a Unix timestamp. Assuming that a week starts on Monday midnight and knowing that Unix epoch started at on a Thursday 00:00:00 UTC on 1 January 1970, here is a simple way of coming up with a week range:
DAY_IN_SECONDS = 24 * 60 * 60

def get_week_range(timestamp):
    """Return start and end of the week surrounding the timestamp."""
    # Get previous Thursday midnight.
    secs_since_previous_thu = timestamp % WEEK_IN_SECONDS
    previous_thursday = timestamp - secs_since_previous_thu
    # Adjust start of the week by three days.
    week_start = previous_thursday - 3 * DAY_IN_SECONDS
    week_end = week_start + WEEK_IN_SECONDS - 1
    return (week_start, week_end)
Verification; get the week range of the week from a fortnight ago in an interactive shell.
$ ipython

In [1]: import weekrange

In [2]: import time

In [3]: weekrange.get_week_range(time.time() - 2 * weekrange.WEEK_IN_SECONDS)
Out[3]: (1370822400.0, 1371427199.0)

In [4]: !date -u -d@1370822400.0
Mon Jun 10 00:00:00 UTC 2013

In [5]: !date -u -d@1371427199.0
Sun Jun 16 23:59:59 UTC 2013