permalink
June 15, 2010

Google Apps Myths

I often encounter preconceptions about Google Apps that aren’t true. We just ran a pilot of moving our developers all over to Google Apps, so the following myths are freshly busted here:

1) Google’s spam filtering for email is better than anything else.

Out of the 14 users in our pilot, 14 of them had legitimate email misfiled as spam. I still find ham in my spam folder. I dutifully mark it ‘not spam’ in the hope that this will somehow train some spam filter. I have my doubts. What’s especially funny is that the receipt email for paying Google for Google Apps is always filed as spam for me. I’ve marked it “not spam” twice so far.

We recommend to our users that they look at their all their spam subject lines before allowing the spam to be automatically deleted, or that they turn off automatic spam detection entirely.

2) The Gmail web interface is the best available.

It is a very nicely designed web application, but it forces an entirely new paradigm for organizing messages onto the user. This change is especially confounding if you’re using an IMAP client, as the new paradigm is shoe-horned back into the old one, except for the parts that aren’t. Once you use Google’s migration tool to move email from another IMAP server into Google Apps, the organization of your mail is ever after different, yet similar enough in appearance to cause terrible mistakes. We had to enable experimental “Labs” features and send a support person around to every IMAP user to get them properly configured.

Gmail is clunky in Firefox, but works well in Chrome. I’m told it performs reasonably well in Safari too. It’s easy enough to install another browser everywhere, but having to run two browsers is a significant hassle.

3) Paid Google Apps are better than the free versions.

The features we’ve used from our paid Google Apps that we couldn’t have gotten for free are the IMAP copy tool (called the “migration” tool), and support. The IMAP copy tool works well, although its progress bars are damned lies. Support we’ve received so far has been purely manual reciting and hand holding, even when what we’re asking is clearly not in demand of either. Technical support email replies to us are all canned answers that look like they were written by a committee of people who hate their jobs and each other.

Paid Google Apps also buys us is the ability to retrieve a user’s data if they forget their password or go to work somewhere else. We mostly only care about that in terms of email, and we could get that if we used just Postini.

4) Everything will “just work” if we use Google Apps.

Two of the 14 users in our pilot reported that email in Google Apps was at times impossibly slow, and that they are frequently met with “Oops” and “Bad request” errors when using the web interface. This surprises them, as they never see these problems when they use their personal Gmail account. When we contacted support about these problems, the reply was a strange set of instructions for installing httpwatch and wireshark on our Windows boxes. Of course, we don’t have Windows boxes.

5) Google will not read our email.

This is less of an objective problem and more just me ranting, but here goes. I am not a lawyer. My subjective non-lawyer interpretation of the Google Privacy Policy is this: it reads like the reply of a guy who is being asked by his girlfriend not to go to topless bars any more. He wants to make her feel better, but he doesn’t want to lie to her and say “I will never go to a topless bar again.” So instead he says “I value our time together and I would never do anything directly to jeopardize our relationship.” He says it because it works. It makes her feel better. But it’s not a real commitment to anything.

I helped run email for around 100,000 users at a large University. We had a policy of not reading user email. If you were forced to look at user data, for example to fix a corrupt mail file, you were to do so with someone else in the room, so you wouldn’t be tempted to go exploring. Everyone understood how serious the issue was. Reading user email was reason to fire someone. Even given all this, more than one person was known to read user email all the time, and more than one person was fired for doing it. That doesn’t count whatever the network operators were snooping. Admins all ran their personal email over SSL to non-University servers. There’s no way to stop 100% of these kinds of abuses from happening, you can only limit the damage. If you think Google is different, either because they have a great reputation, or because your data is likely too obscure, I say you’re fooling yourself.

— Joe

Comments (View)
permalink
June 3, 2010

Avery’s talk at MongoNYC

On Friday, May 21st, I presented at MongoNYC. Unfortunately I wasn’t able to attend many of the other talks, but the event sold out, the crowd was good and the after party as Slate (sponsored by Gilt, thank you very much!) was great. It was really cool to see the turnout and I had some good conversations with people. One thing is clear — people are still working out all the right idioms and uses for MongoDB, but everyone agrees that as a product it has hit a sweet spot between fast and features.

My talk was essentially an expansion of the inaugural post on this blog — basically, we have been using MongoDB since it was at 0.9 and at this point we use it everywhere, for terabyte production datastores for front-end systems, as well as one-offs and internals. I cover these uses and why mongo is the right database for them, as well as some of the pitfalls and gotchas we have encountered while deploying mongo, like the benefits and drawbacks to one-button replication.

Here is the video of the presentation, and below, my slides.

—Avery

Using Mongo At Shopwiki

Comments (View)
permalink
May 7, 2010

Future-proofing your C++

So C++ 0x is imminent.  The draft Standard is shaping up, and many of the tastiest features are already available in experimental compiler modes.  Chances are you can’t switch over quite yet.  But chances are you want to.

So what can you do to be ready?

  1. Start using TR1.  Many of the features of TR1 are replacing features of the 14882 Standard library which will be deprecated in C++ 0x.  Any use of auto_ptr, specifically, should be replaced with unique_ptr, available in <tr1/memory>.  If you return auto_ptrs, you’ll need to use std::move() in the return statement.  Also stop using those __gnu_cxx::hash_maps and start using unordered_map.
  2. Stop using ambiguous syntax.  Experienced programmers sometimes get cocksure and stop using {}’s and ()’s when they’re not strictly necessary.  If you know how “p and q or r” or if-if-else blocks resolve, great, but goddamn, the GCC 4.5 compiler complains like hell about them.  Save yourself the hundreds of lines of compiler warnings and add the braces.

— Jack

    Comments (View)
    permalink
    May 5, 2010

    Success

    GCC 4.5 was recently released. My attempt to install it was a qualified success.

    The actual install of GCC 4.5 went smoothly.  I was missing three of the dependencies, but the configure script told me where to download them, and once I had them, GCC built without a hitch, in just under 100 minutes.  I built a little ‘Hello, world!’ with a lambda and an auto to marvel in the new C++ 0x features, and called it a day.

    When I got to work the next day, I tried to start Adium and was greeted by an unfortunate error message:

    Dyld Error Message:

     Library not loaded: /usr/lib/libstdc++.6.dylib

     Referenced from: /Applications/Adium.app/Contents/MacOS/Adium

     Reason: no suitable image found.  Did find:

    /usr/lib/libstdc++.6.dylib: mach-o, but wrong architecture

    /usr/local/lib/libstdc++.6.dylib: mach-o, but wrong architecture

    /usr/lib/libstdc++.6.dylib: mach-o, but wrong architecture

    Now, the full problem is more involved, the basic problem is clear: when I installed a shiny new GCC, I installed a shiny new libstdc++.  My shiny new libstdc++ was incompatible with my existing software, and I was sad.  Adium, Aquamacs, and git were all taken out by friendly fire.

    Before I impugn the good name of GCC, let me explain the entire problem.  Normally, replacing your old crufty libstdc++ with your new shiny libstdc++ is a good thing.  Bugs are fixed, programs run faster, all is right with the world.  But do you see where it says ‘wrong architecture’ in error message?  That is not right.  And it is not right because I told it to be wrong.

    Back when I upgraded my OS from Mac 10.5 to Mac 10.6, I learned that while 10.6 was 64-bit capable, it was only 32-bit by default.  Apple doesn’t want to cause people problems, and keeping everything 32-bit prevents problems.  If you toggle your machine over to be fully 64-bit, like I did, you are signing the form that says you’re a big boy and can deal with all the problems that may crop up.

    This is one such problem.  When I installed Adium — and git and Aquamacs — my system architecture was 32-bit.  These application were linked against a 32-bit libstdc++.  When I upgraded GCC, my system architecture was 64-bit.  The shiny new libstdc++ was built 64-bit.  Hence the ‘wrong architecture’ errors: my shiny new libstdc++ can’t work with my crufty old 32-bit applications.

    So that is the necessary precondition for GCC 4.5 to hose your Mac, having a 64-bit architecture and 32-bit system libraries (and applications relying on them).  If you are still running 32-bit or if you’ve already upgraded all of your system libraries to 64-bit, you’re fine.  You should be able to install GCC 4.5 without any problems.

    As for me, it’s back to battling the sharks.  I’ve got git back, but Adium and Aquamacs need Objective C, and I forgot to include Objective C when I built GCC.  Hopefully once I’ve rebuilt it, I’ll be able to recompile Adium and Aquamacs…

    — Jack

    Comments (View)
    permalink
    May 4, 2010

    RPM

    UNIX™ is a twisty little maze of 10,000 busted little programming languages, all nearly alike, and all faintly resembling the Lovecraftian offspring of C and the Bourne Shell. RPM is the Lovecraftianmost one of all, but we absolutely rely on it, and many of its related tools, every day.

    For various Good Reasons we run CentOS and Fedora on all our servers here at Shopwiki, and the software provisioning system for all these servers is therefore RPM. We aim to package all our own software using RPM, so that it can be installed, uninstalled, configured, and managed, scaling linearly, all using existing tools, in a way that integrates smoothly into the rest of the system software. When all (or nearly all) of the software in a system must is similarly packaged, dependencies between software packages can also be managed.

    As an analogy, imagine a C programmer who needs to copy a string. He writes:

    while (*p++ = *q++);
    

    As he continues to write his program, he finds several other places where he needs to copy a string, and each time keys in the above line of code. Call this “strategy A”. Eventually all the typing wears him out, but he discovers a copy and paste function in his editor that allows him to save on keystrokes. Call this “strategy B”. He then discovers that his code has a bug: it leaves p and q pointing outside the end of the string. He would like to replace all instances of the above code with:

    while (*p = *q) {
      p++;
      q++;
    }
    

    He is now faced with having to devise a regular expression replacement command that will find all the string copy while loops. That’s bound to miss a few, so he spends some time debugging. Planning ahead for the next change to string copying, he decides to do all of his string copying work by putting the above while loop into a function.

    string_copy(p, q);
    

    Call this “strategy C”. This has several pleasant consequences. He can now count how much time his program spends copying strings. If he fixes a bug in string copying, or discovers a faster way to copy strings, he only need make the change in one place in his source code. The while loops do not make the intention of the code as clear as the string_copy() function invocation. The string_copy function can even be stored in a shared library, so that only one copy of the function need exist in the entire system, yet it can be used by all running programs without penalty.

    Finally, in “strategy D”, the programmer discovers that there is already a string copying function in the standard library, called strcpy, which runs four times faster than his own. He further discovers that calling strcpy is almost always a bug and a gaping security hole: that he should instead be writing:

    assert(p && q && n > 0);
    strncpy(p, q, n);
    p[n - 1] = '\0';
    

    This leads him back to his own string_copy function, but this time as a wrapper around strncpy. There are cases where C programs must do work that does not already have a function defined for it in some library. So the best strategy is a combination of C and D, generally with as much D as possible.

    All of this should appear obvious in the context of C programming; creating abstractions like functions is a fundamental part of programming (albeit one which gets rediscovered and promoted under a new name every four or five years). These abstraction strategies are for some reason less obvious in the context of running a CentOS or Fedora box.

    Say you have 100 servers and you need to update your web application code on all of them. You could ssh in to each server and copy over files. This would be like “strategy A”. You could decide that typing your password 100 times is tedious, so use a shell script to automate it. This would be like “strategy B”. You could make an image of one server’s storage with the software installed, and copy that image onto the other 99 machines. This would still be like “strategy B”, but with the added hassle of having to write a script to impress the proper identity onto a newly imaged machine. Also, you risk combinatorial explosion if you need images that do more than one thing.

    Just as the rudimentary abstraction mechanism for C programs is writing a new function, the rudimentary abstraction mechanism for administering CentOS and Fedora boxen is making an RPM package and adding it to a repository — this is “strategy C”. Servers can either pull changes from a repository on a regular basis, or set a flag in some monitoring system asking for attention (perhaps for more risky changes). It too has several pleasant consequences. You can find out if all the prerequisites for a given piece of software are also installed, and if they aren’t, you can automatically install them. You can uninstall software. You can find out if software on a given system is up to date with respect to a repository. Changes can be disciplined and need only be made in one place.

    Sadly, it also has unpleasant consequences. Writing an RPM spec file is not nearly as straightforward as writing a C function. The syntax of RPM spec files is flamingly awful, and fails at its few meek attempts to successfully abstract away things like “how do we build this from source?” and “which files are the build targets?” into a domain-specific language. You need to change the spec file nearly every time the source undergoes significant change. There are also parts of the system that are vestigial, or otherwise disturbing, such as the inline change log, the — let’s be generous and call it “hideous” — macro syntax, or the fact that the default work directory for the build tools is /usr/src/redhat on CentOS.

    You can often find the software you need packaged as part of the distribution. In some cases though, you need it with a different compile option set, or you need a more recent version than what is supplied with your distribution, so you repackage it and put it in your own repository. Your repository overlays the regular distribution’s repository. This is “strategy D”. As well, the Right Thing is a combination of strategy C and D, with as much D as possible.

    There are software provisioning systems in play other than RPM, even without going to other distributions like Debian or Arch. For example, developers here install Python packages on their workstations using setuptools, and there is a strong tendency to install those packages using setuptools in production, as well. Setuptools does essentially the same thing as RPM does; it allows you to install Python programs and libraries, track their dependencies, and update them against a repository. But there are drawbacks to using it. For example, you can not express a dependency on a particular version of the gnutls command line tools. Everything else uses RPM already; you will never generate enough benefit from using something different to overcome the benefits of having everything as RPMs.

    I feel a syntax rant coming on.

    — Joe

    Comments (View)
    permalink
    permalink
    permalink
    April 12, 2010

    Our MongoDB Rig

    We use MongoDB at ShopWiki in numerous production roles, as well as in many of our internal tools and one-offs.

    Our front end uses it for logging all our traffic and analytics data, as well as a source for driving all the dynamic components that are /not/ search results on our site, like our various hierarchical site indices. On that account, we do not need to use memcached or similar solutions when our pages rely on the data store for render-blocking information.

    For its low effort to get started and flexible document based nature, ever since we started getting our feet wet with it, we have turned to Mongo to serve as both intermediate and persistent storage for internal tools, prototypes, and one-off projects.

    Since we’ve come to rely so heavily on MongoDB, we will talk about it in posts to come, about its benefits and pitfalls, and our various ways of utilizing it.

    —Avery

    Comments (View)