Tuesday, March 18, 2014

How we applied security updates to 16,304 running daemons.



Last week was exceptionally busy for our operations team.  If you hadn't noticed, then that is a sign of a job well done.


At VM Farms, we stand apart from other providers by providing our customers with a truly fully managed service.  This entails constant proactive management and maintenance in addition to our reactive role as a support group.

When your site has endured zero downtime, your application has remained performant, and your IT backend has stayed secure, it is because our team of system engineers have been hard at work.  Monitoring, patching, managing resources and technologies - we work continuously to ensure that our customers' applications and environments are always performing at their very best.

The last couple of weeks proved to be a truly exhausting exercise in proactive maintenance and management.  So much so that we decided to write a blog post to give our customers some additional insight into some of the operations activities we undertake on their behalf.

It all started on March 4th, when our monitoring systems notified us of a new CVE entry for the libgnutls library.  A library that a large number of our customers are heavily reliant on.

For the uninformed, CVE stands for Common Vulnerabilities and Exposures, and is part of a public database for vulnerabilities and security bugs for common software such as Linux, Window, OS X, and other operating systems.  The entire industry pays close attention to these announcement for actionable security concerns.

Each new vulnerability or incident is assigned a number, and makes mention of any affected package names and versions.  Upon receiving these alerts, our system automatically compares these reports with each of our customers' profiles of installed software.  This allows us to quickly identify those customers that are affected. 

For example, lets investigate CESA_2014-0246, which says:

lib/x509/verify.c in GnuTLS before 3.1.22 and 3.2.x before 3.2.12 does not properly handle unspecified errors when verifying X.509 certificates from SSL servers, which allows man-in-the-middle attackers to spoof servers via a crafted certificate.

Most linux professionals will recognize GnuTLS as a widely used software library, and this vulnerability immediately raised eyebrows around the world.  Right away, we were able to determine the scope of the problem based on the actual software patch...



The trouble with upgrading important libraries is that they often require service restarts.  Uptime is the utmost priority for us, so we notified our customers that same week.  This gave them a reasonable time to respond in case they had operations-scheduling conflicts that would prevent us from applying these patches.

The other problem with upgrading a widely used library is, because so many services are referenced within the library, they must all be restarted as well.  In the right order; and only once.  

The lsof command proved invaluable as we quickly made an inventory of exactly which daemons would need to be restarted.  We fixed non-production environments first, and left the most complex and exceptional configurations on production VMs for last.

Because we wanted to be sure we considered every service and dependency that needed to be restarted, we used a suite of custom automation tools that augmented our SSH sessions.  This helped Kris Kostecky and I perform the upgrades and verify each of them; one by one, and as a team.

When dealing with so many different customers using diverse stacks, you are bound to encounter edge cases in maintaining certain infrastructures.  Looking back, we are glad we took the time to care for each VM during this upgrade.  Each time we did encounter a gotcha, we had the ability to stop the process and take our time to properly address each issue. 

We are also glad that we allocated a team of two to work through this entire process.  This allowed us to make upgrades twice as fast, and provided additional support on the few incidents where a daemon did not restart cleanly, or a package download ran too slow.

In total we restarted 16,304 running daemons for our customers in the span of one week, scrutinizing each and every one.

Security is a never-ending, burdensome, and critical concern to all of our operations staff.  At VM Farms we have the tools and the team needed to remain constantly vigilant.  If you'd like to know more about how we can take a load off your plate as a Systems Administrator or Developer, check out our website, or give us a call at 1-866-278-0021.

Follow @vmfarms and @ian_vmfarms on Twitter.

Wednesday, December 18, 2013

Inject a little security in to your CentOS repositories

There are many aspects to securing a network and many articles, essays, and books have been written on the topic. One aspect of any security checklist is updating vulnerable system packages. Every operations person deals with this and there are many tools at your disposal to make this job more manageable.

If you're using CentOS, you can leverage Redhat's spacewalk project. However, this will do more than just track errata. Spacewalk will also inventory your hardware and software, install software, provision servers and take care of some monitoring. If that suits your use-case, then spacewalk could be the option for you.

Steve Meier of the CEFS project has made the process for tracking CentOS errata via spacewalk very easy and free. He provides a parsed errata.xml file generated from the centos-announce mailing lists and the scripts you need to import them in to your spacewalk server.


However, not everyone wants to run spacewalk. There are many reasons this may be the case. If you are one of these people, you're left with tracking the centos-announce mailing list using your own processes.

We'd like to present another option. What if we want to leverage the power of yum to tell us when a package needs to be updated? We can do this by installing the yum-plugin-security package. You're now one step closer, but the CentOS repositories do not come with a updateinfo.xml file that includes the relevant data that the plugin uses.

This is where we got the idea to leverage the CEFS project data and utilize the functionality of the updateinfo.xml file. All we needed to do is convert the errata.xml data in to the appropriate updateinfo.xml format and inject it in to the applicable CentOS repositories.

VM Farms would like to announce the public release of a utility to allow anyone to generate the updateinfo.xml errata files for insertion in to their CentOS repositories. Please visit our public repository to download a copy and start scanning.

Usage

The following example illustrates how you would go about using this for a CentOS 6 repo. The assumption is that you've set the BUILD_PREFIX=/security and that your CentOS-6-Updates directory lives under /repositories/
wget -q -N -P/security http://cefs.steve-meier.de/errata.latest.xml

generate_updateinfo.py /security/errata.latest.xml

/usr/bin/modifyrepo /security/updateinfo-6/updateinfo.xml /repositories/CentOS-6-Updates/repodata
Now that your repos have the data they need you can install the yum-plugin-security package and make use of it like so
yum install yum-plugin-security

yum security-list

Loaded plugins: changelog, fastestmirror, security
Loading mirror speeds from cached hostfile
CentOS-6-OS                                                 | 1.2 kB     00:00
CentOS-6-Updates                                            | 1.2 kB     00:00

CESA_2013__1764        security    ruby-1.8.7.352-13.el6.x86_64
CESA_2013__1764        security    ruby-irb-1.8.7.352-13.el6.x86_64
CESA_2013__1764        security    ruby-libs-1.8.7.352-13.el6.x86_64
CESA_2013__1764        security    ruby-rdoc-1.8.7.352-13.el6.x86_64
CESA_2013__1806        security    samba-client-3.6.9-167.el6_5.x86_64
CESA_2013__1806        security    samba-common-3.6.9-167.el6_5.x86_64
CESA_2013__1806        security    samba-winbind-3.6.9-167.el6_5.x86_64
CESA_2013__1806        security    samba-winbind-clients-3.6.9-167.el6_5.x86_64

Friday, July 5, 2013

How to Make Waffle Compatible with Django 1.5's Custom User Models

Waffle is a handy add-on for Django that allows fine-grain control over feature sets within an application.  At VM Farms, we use Waffle to control which users have access to specific features that are under development, but deployed within our live production environment.  Like I said, it’s handy!


Check out Waffle here.


Incidentally, our new portal is built on Django 1.5, and one of its new features is the ability to have fully customizable user models in django.contrib.auth.  Sweet!  This is great for us, as we are now able to move away from the static user model that existed within Django previously; however, despite Waffle’s best efforts to stay compatible, defining custom users can break Waffle.  Boo.


Read about the issue on Github here.


Until a fix is implemented, here is a breakdown of the workaround that we used to keep us moving forward.


The Heart of The Issue

Waffle makes the following four assumptions about user models:
  1. user.is_authenticated() method exists
  2. user.is_staff property exists
  3. user.is_superuser property exists
  4. user.groups.all() returns all Group objects attached to the user - making the assumption that you are using django.contrib.auth.group and linking it to the user model as a foreign key field

This clashes with Django 1.5’s minimum requirements for a custom user model (found here):

  1. have an integer primary key
  2. have a unique field for identification purposes (a “username”)
  3. provide a way to address the user in ‘short’ and ‘long’ forms

Some of Waffle’s requirements are satisfied if you go through the effort of making the user model compatible with django.contrib.admin, or by having the model inherit AbstractBaseUser. However, unless your model satisfies all four of Waffle’s assumptions, it will break the code. Luckily, there are easy ways to satisfy all requirements without having to run to your database to alter tables.



Faking is_authenticated()

It’s easy enough to fake results for is_authenticated():
class CustomUser(models.Model):
    ... 
    def is_authenticated(self):
        return True


Of course, if you actually plan on making use of is_authenticated() as a meaningful check, you’ll have to do more than just hard code a value to be returned.  Otherwise, it’s likely that you’re not interested in checking for “authenticated users only”, and just want it to work without throwing a 500.


Faking is_staff and is_superuser Model Fields

class CustomUser(models.Model):
    ... 
    @property
    def is_staff(self):
        # All admins are staff
        return self.is_admin

    @property
    def is_superuser(self):
        # All admins are superusers
        return self.is_admin


In this example, we already make use of a is_admin boolean field to track admins, so we get it to return that value in place of is_superuser and is_staff. You can just as easily hardcode it to return True/False if the field means nothing, and you just want Waffle to work.


Faking the Groups Many-to-Many Field

When it comes to the many-to-many Foreign Key field, we don’t actually need to fake a field - we just need to somehow make the call to [usermodel].groups.all() work.

We start by defining a fake property the same way we did above for is_superuser and is_staff.  However, this time we make it return a dummy object with an all() method under it:

class CustomUser(models.Model):
    ... 
    # Django Groups workaround for waffle
    @property
    def groups(self):
        class Groups():
            def __init__(self):
                pass
            def all(val):
                return None
        return Groups()


That way, user.groups.all() is now technically a perfectly valid command, and Waffle doesn’t complain.

Happy Waffling!

Friday, June 28, 2013

DNS vs DDoS - When 3rd Party Externalities Affect Our Service

For the past week or so, Zerigo - our managed DNS provider, has suffered a number of DDoS attacks against its network; thereby disrupting name servers and disabling service for many of our customers.

Although we at VM Farms have designed and built our environment to deliver an uptime of 99.99%  (or better), we realize that issues affecting 3rd parties will still creep into our world and trickle down to affect our customers as well.  Not cool.

In continuing with our militant approach to maintaining a fault-tolerant infrastructure, our operations team sprung into action to move customer DNS records over to our secondary DNS provider, Route53.  All that remained was for customers to update their name server information with their registrar and access would be restored.  Piece of cake, right?  Well, not quite.

The Problem

To achieve this task, our team set out to use Zerigo's API to retrieve the zone files for each of our customers, convert them, and push them up to Amazon's Route53 service using their respective API.  The problem that our team quickly encountered was that Zerigo's API was accessed through a domain name, managed through Zerigo's DNS service, whose name servers were down.

"@#$%!" - Q*Bert

The Solution

Upon hitting this catch-22, an internet-wide scavenger hunt ensued to find a cached name server that still had the IP address information for Zerigo's API.  Searching high and low (and stopping only to look at amusing pictures of cats), a cached name server was finally located in what we can only imagine was deep under a mountain in Colorado.  Success!  With the IP information, our team was able to access the API and create an ad hoc script that would convert and move all zone files into Route53.  Access could now be restored.

The Impact

Funny anecdote to the anecdote I just told.  As we were drafting instructions for our customers to update their name servers with their respective registrars, Zerigo announced that the DDoS attack had been mitigated, and that service had been mostly restored.  As awesome as that was for everyone, we were still on our high - we wanted to be heroes.

But it wasn't all for waste.  Although we set out to work around a 3rd party outage on behalf of our customers, what we ended up with was a redundant DNS system that is now built into our platform, and an our overall improvement to our fault-tolerance as a managed service provider.

With that said, we are pleased to announce that the DNS management service included with our service is now redundant across multiple providers; at no additional cost to our customers!

Wednesday, June 26, 2013

We’re Growing!

Well, it’s official.  It’s been one year since our last blog post, and almost 20 months since the one before that.  Needless to say, although we absolutely love the idea of keeping our community informed of new and exciting developments with our company and within our industry, I guess you can say that we’ve been more focused on building our first-class service to really keep up the discipline.


With that said, the last year has been quite a ride for us at VM Farms.  In 12 months, we’re pleased to announce that:


  • Our team has responded to nearly 10,000 requests;
  • We are now trending on approximately 250,000 unique points of data for our customers;
  • Rudy moved back to Ontario - although he still lives over an hour away (read the latest update on our modifications to Rudebot here);
  • We launched a public beta for our self-serve cloud storage solution, Nibbler;
  • Kris had a baby;
  • We’re nearing completion of a major user interface update (expect news on that soon); and finally,
  • We doubled our customer base!


On that last point, we’re pleased to announce 4 new additions to the VM Farms team.  Please join us in welcoming Larry, Christian, Jason, and Aman.


Larry and Christian are two incredibly talented Systems Operators.  They come to us from Yesup and Route1 respectively.
Jason is an incredibly talented Python developer.  He comes to us from our friends (and neighbors) at Shopcaster.


Aman is our tech-savvy sales and marketing guy.  He comes to us from Quartet Service.

We’re very excited to have these new hires join our team.  They are a testament to the success we have enjoyed this past year.  We look forward to continuing to develop first-class products, and providing exceptional service to our customers in the years ahead.
We are also making a resolution to be more active with our blog and social feeds.  If you haven’t already, please follow us on LinkedIn and on Twitter!

Sincerely,

The VM Farms Team

Tuesday, June 26, 2012

New Relic Launches App Speed Index and Custom Dashboards


This is a guest blog post written and contributed by Bill Hodak, Director of Product Marketing at New Relic, an application performance management vendor and VM Farms partner.


New Relic is announcing the availability of two awesome new features. Thanks to our partnership, our customers have immediate access to these new features. When you login or sign up today, you’ll get one or both of these features.

New Relic and VM Farms have partnered to make New Relic Standard available to all VM Farms customers free of charge. If you’re not yet a customer, sign up today! All accounts start with 14 days of Pro, for free.


App Speed Index

Think your app is fast? Stop guessing and start knowing with the App Speed Index. The App Speed Index leverages our Big Data to provide Big Insight to our customers. New Relic collects over 55 billion performance metrics and monitors 1.5 billion page loads on behalf of our 25,000 customers and their 450,000 application instances. All of that data equates to 3.5 terabytes of new data collected, stored and analyzed each day.



With the App Speed Index, our customers will be able to classify their application into a Peer Group of similar applications (ex. eCommerce, SaaS, Gaming, and Consumer Internet applications) and benchmark their app with industry peers. Find out your percentile rank within your peer group for end user and application response times, error rates, and application availability to find out how fast you really are.

Learn more about the App Speed Index here, or check out this blog post. And don’t forget to check out our living infographic, updated daily to show how the peer groups rank by performance and availability. It even lists the fastest applications monitored by New Relic!

Custom Dashboards

Have you ever wanted to see Network I/O graphs and End User Response Time graphs on the same dashboard?  What about some custom business metrics and application response time? Now you can with Custom Dashboards. With Custom Dashboards you can build any dashboard with any data that tickles your fancy. The best part about it? No Coding Required!  With Custom Dashboards all you have to do is click and pick, drag and drop, or instant copy an existing New Relic graph and boom — you’ve got a Custom Dashboard. This feature is only available to our Pro customers. So if you’re not currently a Pro customer, sign up or upgrade today to get access to Custom Dashboards. Learn more about Custom Dashboards by reading this blog post.



Monday, October 31, 2011

How we solved the remote employee problem for less than $100 (and had a bit of fun)

UPDATE: We've given Rudy some serious upgrades and he's now mobile. Check out the new post here: http://blog.nibbler.io/2013/01/rudebot-rolling-ubiquitous-display.html

Problem: Running a startup and coordinating with a colleague that’s 4000km away. How do you remain nimble, keep the pace of ideas flowing, and maintain close working relationships without the overhead of initiating a conversation at a moments notice?




Teleworking is as common in the tech sector as the problems that accompany it. The physical separation and long periods of time between contact often result in significant wasted time. Wasted time in everything from warming up, to repeating explanations, to re-synchronizing efforts. A large organization may be able to navigate these issues by structuring their teams so that components of a project can be handled in multiple locations with well defined touch points and time periods, but this can be anywhere from frustrating to unworkable for a startup.

The theory was that the next best thing to having our colleague, Rudy, in our office, was having an open video line to him all day, effectively putting him in the office. We needed to test this idea to see if had any merit. We configured a spare laptop that was lying around, initiated a video chat, and placed it on a desk for a week.

The change in productivity was obvious to us even before the end of the first day. From our observation it was apparent that our conversational flow was no longer jilted or deflated by having to initiate a conference on the fly. More importantly, there were times where inertia or laziness prevented that call from being made. This was no longer an issue. Duplicate meetings weren’t happening anymore. Most crucially, Rudy was always a part of on-the-fly decision making, rather than simply being informed on the occasion that decisions had to be made quickly. Ultimately, this open channel improved the sociability of our work force. It’s a fact that your team is going to work better if there is a good bond between them. That’s not really an issue when you sit a few feet away from them, but much more challenging when communication is staggered, scheduled and strictly task oriented.

We let the experiment run to an end and were convinced we had our solution. We even had a few ideas on how to improve the experience. The biggest issue was his restricted field of view. We were constantly having to roll in front of the camera to have an interaction. We wanted to give Rudy the ability to rotate the laptop to whomever he was chatting with. This problem lent itself to a fairly simple solution that we thought we could put together rather quickly with an Arduino, a servo motor, a rotating platform, and a simple server that bridged communication between a serial port and TCP/IP port.


Tools List
  • Drill
  • 7/8" Drill Bit
  • 3/4" Hole Saw
  • Chisel
  • Scroll/Jig/Table Saw
  • Screwdriver
  • Square
  • Pencil
  • Vernier Caliper

Materials List
  • USB Cable (for powering Arduino+Servo)
  • 2'x2' 1/2" MDF
  • Arduino UNO
  • Hobby Servo Motor (with a stall torque of 5.2kg*cm or more @ 5v)
  • Plastic Circular Arm (included with servo or purchased separately)
  • 3x Jumper Wires
  • 1" piece of solid wood
  • Balls Bearing Swivel (sourced from Home Depot) 
  • Wood Glue
  • Finishing Nails
  • #6-3/4" Wood Screws


Method

  1. Use whatever carpentry skills you have to build a three sided box using 1/2" MDF. This will be the base for the rotating platform. The box should be dimensioned so that it's width and length match that of the laptop you will be using. Wood glue and finishing nails will do the trick.
  1. Locate the centre of the base and drill a 7/8" hole through it.
  1. Take your swivel and align it on the base so it is centered. Fasten it to the base with the screws provided.
  1. Rotate the swivel 45 degrees and mark on one side where the screw hole for the platform aligns with the base. You are doing this so you can drill a hole there so you can gain access from underneath to attach the platform.
  1. Drill the second hole.
  1. Cut out a new piece of MDF that matches the dimensions of the base's length and width.
  1. Attach the swivel back to the base with the 4 wood screws.
  1. Align the newly cut platform on the swivel with the base. Flip over the project and use the secondary hole to fasten the swivel to the platform.
  1. Test your work. Make sure that the swivel rotates freely and that the platform aligns with the base when everything is square.
  1. The fastening of the servo to the top platform through the base is a little tricky. If you're interested in how it was done, please get in touch with us and we'd be more than happy to explain it.

11. Plug your Arduino in to your PC and upload the following code:

#include <Servo.h> 

#define MAX_CHAR 4

Servo servo;                

char angle[MAX_CHAR];
char* p = angle;
int alen = 0;

int myatoi(const char *s, int *value)
{
    if ( s != NULL && *s != '\0' && value != NULL )
    {
        char *endptr = (char*)s;
        *value = (int)strtol(s, &endptr, 10);
        if ( *endptr == '\0' )
            return 1;
    }
    return 0; /* failed to convert string to integer */
}

void setup() 
{ 
  Serial.begin(115200);
  servo.attach(9);
  memset(angle, '\0', MAX_CHAR);
}

void loop() 
{ 
  if (Serial.available() > 0)
  {
    char c = NULL;
    
    while ((c = Serial.read()) >= 0) 
    {  
      if ( c == '\r' || c == '\n' )
      {
        int incr = 1;
        int newpos;
        int pos = servo.read();

        // Check to make sure the input is all ints      
        if (!myatoi(angle, &newpos))
          Serial.println("Error: Bad input");
        else if (newpos >= 0 && newpos <=180)
        {
          // Seal off the string
          *p = NULL;
          Serial.println("Input angle: " + String(angle));
          if ((newpos - pos) > 0)
            incr = 1;
          else if ((newpos - pos) < 0)
            incr = -1;
            
          for(int ipos = servo.read(); ipos != newpos; ipos += incr)
          {
            servo.write(ipos);
            delay(30);
          }
        }
        else
          Serial.println("Error: Bad angle");
      }
      else
      {          
        if (alen == MAX_CHAR-1)
          Serial.println("Error: Angle more than 3 bytes");
        else
        {
          *p++ = c;
          alen++;
          continue;
        }
      }
        
      // Reset our string
      p = angle;
      memset(angle, '\0', MAX_CHAR);
      alen = 0;
        
      // Chuck anything after a newline/cr
      Serial.flush();
      Serial.println("OK: Good to go");
    }
  }
}

12. Now attach your servo's signal pin to the Arduino board using digital IO pin #9. Jumper the V+ pin on the servo to the 5v pin on the Arduino. Lastly, attach the GND pins of the servo to the Arduino. You can now use a serial communications program like minicom to connect to your Arduino and issue angles as commands. As you can see in the code above, the baud rate is set to 115200, so make sure you adjust your serial parameters to match this.


13. The last step to make this remotely accessible was to use a python script that bridges a serial port to a TCP/IP socket. You can download that here. Executing it with the arguments below would most likely work for you. Replace XXXX with the path to your usb serial device and YYYY with the port you want the server to listen on.
./tcp_serial_redirect.py -b 115200 -p /dev/XXXX -P YYYY