Don’t Read Your Logs

Photo provided by Karen Arnold under the CC0 Public Domain license.

I’ve had a number of discussions, both offline and online, about logging practices. In my view, reading individual log lines is (almost) always a sign that there are gaps in your system’s monitoring tools – especially when dealing with software at a medium- or large scale. Just as I would never want to use tcpdump to analyze statsd metric packets individually, ideally, I don’t want to look at individual log lines either.

This advice follows the spirit of “disable port 22 on your machines”. While most developers would agree that using dedicated orchestration tools is preferable to manual server wrangling, I know of very few projects or companies that take the extreme stance of disabling SSH access to all machines. Nevertheless, treating manual SSH as a sign of gaps in your system’s orchestration tools is an excellent master cue for architecting scalable and maintainable systems.

Similarly, I know of none that truly send all logs to /dev/null, or block read access to all logs. But treating logs as an extreme anti-pattern provides an excellent forcing function for designing observable systems.

Logs as Metrics

Let’s start with a basic example:

log.Printf(“Loaded page in %f seconds”, duration)

In this case, the log line serves the same purpose as a statsd metric. We could replace it immediately with

statsd.Histogram(“page.load.time_ms”, duration)

and the result would be better, because we’d be able to use the full extent of aggregation tools at our disposal. It’s possible to extract this information from a log line into a structured form, but it’s more work, and it’s unnecessary. The log line doesn’t give us any information that the structured metric doesn’t.

Logs as Debugger Tracing

A more common example:

log.Printf(“about to make API request on”, obj_id)
obj = c.Load(obj_id)
if obj == nil {
    log.Printf(“could not load object”, obj_id)
} else {
    log.Printf(“loaded object”, obj_id)
}

Logs are oftentimes used as a runtime pseudo-debugger. In this case, we’re using logs as a way to verify that a particular line of code was called for a particular transaction. The actual text of the log doesn’t even matter. Instead of “About to make API request”, we could have written

log.Printf(“api.request_pre”, id)

or even

log.Printf(“I like potato salad”, obj_id)

As long as it’s unique to that particular line of code, it serves functionally the same purpose – it confirms that the program execution reached that point in the code.

When we use log lines this way, we’re forming a mental model of the code, and using the logs to virtually step through the code, exactly the way a traditional debugger like GDB might. Transaction-level (or request-level) tracing tools provide this same kind of visibility, with a better visual display.

Without actually counting, I’d estimate that at least 80% – if not more – of log lines that I’ve seen in most open-source projects fit this overall use case: using log lines to virtually “trace” the execution path of source code on a particular piece of problematic input.

Logs as Error Reporting

Another common pattern:

try:
    writeResult()
except Exception as e:
    log.info(“error writing result!”, e)

Here, we’re using logs as a way to capture context for an error. But again, this is a relatively inconvenient way to explore information like a stacktrace. Tools like Sentry or Crashlytics also allow exception reporting, but unlike logging tools, they allow us to classify and group exceptions. We can still view individual stacktraces, but we don’t have to sift through as much noise to identify problems. And by tracking the state of reported exceptions, we can identify regressions more easily. Structured logging systems are generally not capable of handling this – and even when the workflow is possible, it’s nowhere near as convenient as what dedicated exception tracking systems allow.

If you really can’t break the habit of logging errors, you can at least add a hook to log.error that sends the error (and a complete stacktrace) to an error reporting tool like Sentry.

Logs as Durable Records

Furthermore, you can’t even assume that the logs you try to write will actually get written! Even when operating on a small scale, logs can be a lossy pipeline, and the potential for failure only increases with scale.

For example, if you’re developing a script intended to run on your local machine, do you know how your code will behave if your disk hangs? If you run out of space on the partition? How reliable is your log rotation? What happens to your server if it fails?

For a very small script, these sorts of failures may not matter to you, but they can come back to bite you, even at that small scale. For larger-scale services with tighter reliability guarantees, there are ways to mitigate these specific problems – a buffer, a log collector, a distributed indexer – but each solution comes with its own risks and problems. If you try to keep patching these by introducing more tools to make your logging reliable, at some point, you’ll discover that you’ve reinvented your own distributed database. And writing your own database is not inherently a bad idea, but to do it well, it’s the sort of task that’s best undertaken intentionally, rather than by accident.

To be fair to logging, this problem is not unique to logs. It comes from the limitations set forth by the CAP theorem, which means that every monitoring tool has to figure out its own way to deal with them. The problem with logs, however, is that the failure modes are much more subtle and easy to overlook.

For example:

statsd.Increment(‘api.requests_total”, tags={“country”: country.String()}, rate=.1)

It’s relatively obvious to see that this line of code might not always emit a metric, because even under normal operating conditions, you know (a) network operations can have problems, (b) it uses UDP, and (c) the metrics are sampled at the specified rate.

It’s a lot less obvious that this line of code might fail, because STDOUT and STDERR are generally assumed to be always available under normal operating conditions:

log.Printf(“Received api request from %s”, country.String())

Compared to logging, the failure modes of using runtime metrics, request tracing tools, or exception tracking tools are more visible and well-defined.

Instead of using logs as an accidental database, consider your underlying use case and which dedicated database would serve that need better. There’s no one-size-fits-all answer here; you may find that your use case is best served by a NoSQL database like CouchDB, or you may find that you’re really aiming to replicate the functionality of a message queue like Kafka, or another database altogether. If any of these (or other) tools fit your use case, they’re almost guaranteed to be a better fit, long-term, than logs.

Don’t Stop Logging, But Stop Reading Log Lines

By this point, it may sound like I’m firmly anti-logging. I do consider myself an environmentally-friendly person, but when it comes to software, I support careful uses of logging.

Logging can be useful for some purposes. However, it’s rare that they’re the only tool for monitoring your code. And it’s even rarer that they’re the best tool. When writing software that scales, you need to be able to deal with aggregate information – the firehose is too unwieldy to parse mentally. Logs that can be aggregated are better than logs that can’t. In those cases, it’s best to keep logging, but when you need to diagnose a problem, you’ll be interested in reading aggregate queries across your logs, rather than viewing raw, unaggregated logs in chronological order. The former is a powerful way to absorb a lot of information about your systems quickly. The latter is a glorified tail -f | grep.

The next time you start to write a log line, ask yourself whether another observability tool would be a better fit. Oftentimes, it will! If not, that’s fine. Just remember that, ideally, nobody should ever be reading that raw line, so take care to structure the information in a way that facilitates the kind of aggregation queries you’ll need.

I Can Text You 💩, But I Can’t Write My Name

Today, Model View Culture published an article I wrote about Unicode, character encoding, and non-Latin alphabets. I’ve included an excerpt below:

I am an engineer, and I am a writer. As an engineer, I spend a lot of time thinking about how text is stored, but relatively little about what information the text actually represents. To the computer, text is an abstract entity – a
stream of 0s and 1s, and any semantic meaning is in the eye of the
beholder. As a writer, of course, the meaning is everything, and the
mechanics of how the text is stored is merely a technical detail.

But in an economy that is increasingly digital, increasingly global,
and increasingly multilingual, we can no longer maintain this
distinction. The information we want to represent is intimately linked
to how it is stored. We can no longer separate the two.

Read the rest at Model View Culture

 

Photo CC-BY TMAB2003.

What Would Body Cameras Do?

On December 3rd, a New York City grand jury failed to indict NYPD officer Daniel Pantaleo in the death of Eric Garner. Protesters around the world, from Oakland to New Delhi, reacted to this decision, demanding reforms to counterbalance the power wielded by law enforcement. They adopted as a slogan Garner’s chilling final words: I can’t breathe. I can’t breathe. I can’t breathe.

Garner is only one of several high-profile cases of black men killed by police. Sadly, these incidents are not rare. By some counts, a black man is killed by police officers nearly every day. Race plays heavily into this risk: a black, male teenager is 21 times more likely to be shot dead by a police officer than a white one.

Of all the varied proposals for reform, perhaps the most popular among politicians is to outfit all police officers with body cameras. President Obama recently requested over $250 million from Congress to fund body cameras and police training. Proponents of this plan claim that body cameras will ensure that evidence is available in all cases of alleged police misconduct. They note that people behave differently when they know they are being watched, and conclude that body cameras will reduce misconduct, both by police officers and by civilians.

This argument draws on a common narrative: photography as documentation. This narrative is by no means new. Susan Sontag wrote, “A photograph passes for incontrovertible proof that a given thing happened. The picture may distort; but there is always a presumption that something exists, or did exist, which is like what’s in the picture”.

And yet, we must ask ourselves: is there such a thing as an impartial photograph? After all, every photograph tells a story. Every photograph is narrated in the first person.

Sontag explains, “To photograph is to appropriate the thing photographed. It means putting oneself into a certain relation to the world that feels like knowledge – and, therefore, like power”. Body cameras, mounted on the bodies of the police, serve as a permanent record of what the officers see. Body cameras, mounted on the bodies of the police, ensure that police remain in the position of power: the allegedly infallible narrator. Body cameras, mounted on the bodies of the police, reinforce the same imbalance in power structures that they are purported to keep in check. They allow officers to appropriate every interaction by legitimizing the literal viewpoint of the officer. But the objective of reform is not to appropriate civilians targeted by law enforcement; it is to appropriate law enforcement itself.

It might be a different story if we ensured that this power would be reciprocal: that citizens would be able to appropriate law enforcement, just as law enforcement appropriates black lives. But this is not the case: citizen bystanders often get harassed by officers when recording encounters, even when recording officers is legal. In fact, in Garner’s case, Pantaleo was not indicted, but Ramsey Orta, the bystander who filmed Garner’s death, was. At the same time as we provide police with an additional form of power, we rob citizens of this same tool. Police officers may tell their story, but citizens remain “the thing photographed.”

Defendants are not required to testify before a grand jury. Their attorneys usually recommend against it, as it can be incredibly risky. Defendants’ attorneys are not permitted to be present, and with no judge, defendants are completely at the mercy of the prosecutor. Yet, despite these circumstances, Pantaleo felt confident enough to testify, and during the grand jury hearing, he narrated “three different videos of the arrest that were taken by bystanders”. If he had worn a body camera, perhaps he could have stayed at home; his account would have been presented as a fourth video, with him behind the camera.

So we must ask ourselves – would body cameras have made a difference in Garner’s case? If not, what is the goal of arming officers with one more weapon? Or more bluntly, as Sydette Harry asks, ‘Why must black death be broadcast and consumed to be believed, and what is it beyond spectacle if it cannot be used to obtain justice?’.

Thanks to Andrea Garcia-Vargas, Dan Mundy, Michael, and Jakob for reading drafts of this post
Image provided by Scott Robinson under the Creative Commons 2.0 License

Beyond Culture Fit: Community Value-Add

Recently, a founder of an early-stage startup asked me for tips on evaluating culture fit when building an early team. As the founder of most successful startups will agree, picking the first few members of your team is important. They set the tone for your company as it grows.

Personally, I think that the term “culture fit” can be misleading. It implies a sort of homogeneity, which is actually the exact opposite of what most companies want. I make a point of the language here because I think it can be harmful to internalize the phrase “culture fit” when what you really want is to build a community. “Community value-add” might be a better term.

If you think of yourselves as evaluating “culture fit” you’re placing your brain in pattern-matching mode, using the team you already have as a pattern and evaluating individuals against that pattern to test their fit. Even if you don’t intend to, this means you may implicitly be looking for someone who is like you and your cofounder(s)/teammate(s). Those people aren’t necessarily bad to have, but it can be a limiting perspective. A good community has people who can create some tension (in the appropriate ways!), because that’s what creativity and thinking “outside the box” are all about: respectfully challenging the status quo, for the sake of improving the company, product, etc.

Taken to the extreme, a company trapped in pattern-matching mode might subconsciously only hire people who fit their background and demographics. Aside from being potentially illegal (discrimination, etc.), this is actually very bad for your company and product. A healthy company needs a variety of perspectives represented in product decisions and day-to-day operations.

So, what is it you really are looking to evaluate? You’re looking for someone who is excited to be a member of your workplace community, to build your product, and isn’t afraid to challenge your assumptions when necessary, but knows how to do so respectfully and appropriately.

Finding people who are excited to work with you is best done by letting them self-identify. Give them opportunities to express their interest, and they will make themselves known.

As for the last part (finding who knows how to respectfully disagree), pose tough questions in interviews. You don’t want to try to set up “mind tricks” (this usually backfires), but do give them a chance to play tug-of-war with you.

In short, don’t expect people to fit your existing company culture. Instead, ask yourself what that person brings to your company’s community, and then ask yourself if that is a valuable addition

Allies

In college, I served on the board of a student group that advocated sensible drug policy. During this time, our school’s chapter was named one of the top ten most succesful chapters in the country. This honor was in part because we succeeded in passing a “Good Samaritan” policy to encourage students to seek medical attention for drug overdoses. It was also because we managed to unite a number of otherwise disparate groups – we co-hosted various events with the College Republicans, the College Democrats, the Arab students organization, Hillel (the center for Jewish student life), and more.

When we organized events with the College Republicans, they did not refuse to collaborate with us simply because one of our board members supported raising taxes to fund a single-payer healthcare system. When we organized events with the College Democrats, they did not refuse to collaborate with us simply because a different board member supported defunding Medicare extending the Bush tax cuts.

Those issues were core to what these organizations believed in, and they were actively lobbying for both issues at the same time as they sponsored initiatives with us. However, they were able to recognize what was relevant to our collaboration and what wasn’t, and recognize the difference between the personal views of our members and our stance as an organization.

Effecting social change involves building a coalition, and a coalition is by nature diverse. While I would love for the leader of every company to agree that I deserve the right to marry, I also understand that one’s allies in one movement may not be allies in every other. Disagreement about other issues is not the sign of a bad coalition; it’s the sign of a broad one.

Bypassing a DNS man-in-the-middle attack against Google Drive

Boston to New York City is a frequently traveled route, so a number of different bus lines provide service between the cities. Most offer free WiFi as an amenity.

However, all WiFi is not created equal. Today I was traveling by the Go Bus, and I assumed I’d be able to do some work on the bus.

I needed to access a document on Google Drive. However, when I tried to open Drive, I was greeted with this sight.


I use OpenDNS instead of relying on my ISP’s DNS servers, and I figured that there was some error on OpenDNS’s end. So, I changed my /etc/resolv.conf to use the Google DNS servers, figuring that that would work.

No luck.

At this point, I realized that the bus network must be hijacking traffic on port 53, which was easy to test.

dig gave me the following output:

Visiting 67.215.65.130 directly gives the following page.

Saucon TDS uses OpenDNS for DNS lookups, but they redirect undesired lookups to their block page. I confirmed this by asking my neighbor across the aisle to visit drive.google.com – he happened to be using Safari, which gave him a 404-eque page instead of the big red error message that Chrome gave, but that was enough for me to confirm that the bus was, indeed, hijacking traffic on Port 53.

But how to fix it? The correct IP address for drive.google.com is actually 74.125.228.1 (ironically, I looked this up using OpenDNS: http://cachecheck.opendns.com/). However, entering that IP address into your browser will give you the Google homepage, because unlike most sites, their servers check the hostname (the same is true for all Google subdomains).

The fix is actually rather simple – add 74.125.228.1 to /etc/hosts. This will skip the DNS lookup altogether, but the browser will still think that you’re going to drive.google.com “normally” (in a way, you are).

I write this post to illustrate how easy it is to get around this kind of traffic shaping, for anybody else who has the misfortune of running into this problem.

On principle, supporters of net neutrality oppose traffic blocks based on content (instead of volume). However, Go Bus and Saucon TDS are not simply blocking traffic – they are hijacking it. My DNS queries are made to a third party, and yet they decide to redirect them to their own DNS servers anyway. From a user perspective, this is incredibly rude. From a security perspective, it’s downright malicious. I let them know over Twitter, though I haven’t received a response yet.

Other than using a VPN (which would have required advance preparation on my part), is there a long-term solution to authenticating DNS queries? Some people advocate DNSSEC. On the other hand, Tom Tpacek (tptacek), whom I tend to trust on other security matters, strongly opposes it and recommends DNSCurve instead.

In the meantime, let’s hope that providers treat customers with respect, and stop this malicious behavior.

Flipping a Coin Over the Phone

Last week, a friend and I had to arrange an in-person meeting after work, by email. He’s based on the Upper East Side, and I’m in Chelsea. Neither one of us wanted to make the trek to the other’s office, and there was no logical place “in between” where we’d have a quiet space.

The obvious solution would be to flip a coin, which he suggested. But how do we know that the other is telling the truth?

The procedure for having Alice and Bob flip a coin over the phone is actually fairly simple. (Conveniently, my friend’s name begins with a ‘B’, so I’ll make him Bob, and I’ll be Alice).

First, Alice flips a coin, but keeps the result of the coin flip secret. Let’s say that ‘H’ is 1, and ’T’ is 0.

Then, Alice finds a book – (almost) any book will do, as long as Bob has a copy of the book as well. She picks an arbitrary page in the book. If the coin flip was H (1) she should pick an odd page; otherwise, she should pick an even page. She notes both the page number and the first two words on that page.

Then, Alice emails Bob the first two words, and asks him to guess whether the page is even or odd. After Bob reveals his guess, Alice reveals the page number. Since Bob has a copy of the same book, he can verify that Alice is telling the truth about the parity of the page number (ie, whether the number is odd or even).

This protocol works because it is easy to find the first two words on a page, but it is hard to find a page that begins with a given pair of words. This serves the purpose of a one-way-function (a function that is hard to invert). By telling Bob the first two words, Alice is telling Bob a signature, and promising that she knows a value that produces that signature. Because Bob has a copy of the book, he is able to verify this signature.

A few interesting things to note about this technique, which is known as a commitment scheme:

  • If Alice only chooses a single, common word (like ‘the’), it would be easy for her to find both an odd page and an even page that start with that word. This would let Alice change the outcome of the coin flip after Bob makes his guess.

  • If Alice chooses too many words (such as an entire sentence), she runs the risk of providing enough context for Bob to figure out where to find the sentence (particuarly if he has read the book and knows the plot).

  • The ideal book is one that both Alice and Bob possess, but which neither one has read, for the above reason.

  • The book should be a work of fiction, as nonfictional books tend to have an index that provides a mapping of words -> page numbers.

  • This technique assumes that Bob does not have access to a digital version of the book that is easily searchable.

There are certainly a few ways in which this procedure could be cheated – either to guarantee a certain outcome, or to tip the results in one’s own favor. But in cryptography, we sometimes make certain concessions (such as assuming an “honest but curious” adversary, as opposed to a truly malicious one). In this case we assume that both Alice and Bob are “honest, but temptable” – ie, Alice or Bob might be tempted to lie about a coin flip, but neither will go to the trouble of manually finding phrases that appear on both even and odd pages in the same book).

Image provided by Филип Романски under the Creative Commons Attribution-Share Alike 3.0 Unported license (via the Wikimedia Commons)

Dorm Room Fund

I am very excited to announce that I’ve just joined the Dorm Room Fund team in New York City!

For those who aren’t familiar with the fund, Dorm Room Fund is a venture firm run by students and for students. The fund invests exclusively in student-run companies, providing seed financing on very founder-friendly terms. The goal is to serve as a springboard for driven, entrepreneurial students, providing support both financially and in other ways.

I have always had a special interest in students and young entrepreneurs, which is why I have been a mentor for groups such as hackNY and the Thiel Fellowship. I’ve found students are some of the most exciting entrepreneurs to work with – they bring fresh eyes to problems both new and old, and inspiring levels of energy and determination.

I’m looking forward to working with the rest of the team this year, as well as meeting all sorts of students working on a variety of enterprises.

Joining the Big Red

After a great year serving as the Hacker-in-Residence at Quotidian Ventures, I’m excited to announce that I’m starting classes this fall at Cornell University, through the brand-new Cornell NYC Tech program!

As noted on the original program website (screenshot above), this program is atypical in many ways. The program focuses on preparing master’s students not just to be successful academically as engineers, but also as practitioners and entrepreneurs.

To that effect, the program extends beyond the standard requirements of a master’s degree in computer science. In addition to classes such as Cryptography and Signal and Image Processing, the semester calendar includes a number of events aimed at introducing Cornell students to individuals in the New York tech community, and at building key skills such as pitching a startup, engaging effectively with media, and hiring/interviewing. In addition, students take two business classes focused on entrepreneurship each semester, including an entrepreneurship practicum taught by Greg Pass, the former CTO of Twitter.

It also includes a semester-long master’s project. For mine, I’ll be working with a few Google engineers to contribute to the IPython project.

I’m looking forward to the coming year, and I’ll be sure to keep everyone posted once my classes and project get underway!

Why You’ve Never Read “I Have A Dream”

(I know some people will be interested in a followup post to Don’t Fly During Ramadan. I plan on writing one, but this post happens to be timely.)

Yesterday, August 28th, 2013, marks the 50th anniversary of Martin Luther King’s famous speech, “I Have a Dream”.

If you live in the US, you’ve probably heard of this speech. You’ve also probably never read it, heard the audio, or seen the video in its entirety.

Unfortunately, the speech is under copyright, and will remain so until 2083. As a result, it is illegal to republish under most circumstances.

Except for the famous, titular line, textbooks in schools almost never publish “I Have A Dream”. Documentaries can only include small, five-second clips. Take a moment and ask the people sitting near you if they’ve ever heard the opening lines:

“I am happy to join with you today in what will go down in history as the
greatest demonstration for freedom in the history of our nation. ”

I’ll bet you they haven’t.

Your children, grandchildren, and perhaps great-grandchildren, will probably grow up without the opportunity of experiencing this moment in history. They will learn about it in grade school, based off of secondhand accounts from teachers who have never read the speech either. And so on.

Frustratingly, the copyright holders include the estate of the person who delivered the speech, but not even the estate of the two other people who wrote it (and likely wrote most of it).

Worse, “I Had A Dream” was delivered on the steps of the Lincoln Memorial. If it had been delivered in 2013, not 1963, there would have been hundreds of cell phone recordings of the speech all across the Internet within minutes. It was truly a public performance in every sense of the word.

The original purpose of copyright, as defined by the US constitution:

To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries

Emphasis mine.

Let’s ask ourselves: if Martin Luther King, Jr. had known that six generations of students would grow up without legal access to his speech at “the greatest demonstration for freedom in the history of our nation”, would he have been more likely to deliver it…. or less?


EDIT Hacker News readers have been kind enough to point out that schools may be able to use copies of parts of the speech under “fair use” privileges, and that copies of the text of the speech are available online.

However, the King estate does heavily enforce its copyright on the video recording of the speech (arguably the more important part), and this recording is much harder to find online (and, when it can be found, is legally questionable).

In retrospect, this post should probably have been titled “Why You’ve Never Seen/Heard ‘I Have a Dream’”, since the argument is stronger for the video recording; however, it’s important to note that most modern textbooks don’t include the text of the speech, and that the reason for that is due to the copyright restrictions and the royalties.

Perhaps a teacher has the right (and enough interest) to print out copies for the student separately, but students are unlikely to find it in most textbooks.