OpenVZ resource limits can be a little tricky, even after reading through the documentation on the OpenVZ website.  With this post, I hope to explain the /proc/user_beancounters file as well as provide an illustration of memory guarantees/limits in an example 256mb container.

Before attempting to understand OpenVZ’s memory limits, let’s recap Linux’s memory setup. This background information is important to firmly grasp OpenVZ’s behavior. You can safely skip this section if you already understand Linux’s memory overcommit accounting.

Linux Overcommit Accounting

Memory in Linux refers to the combined RAM + swap values. Memory is allocated from this pool using one of the *alloc() functions. If malloc() is used to allocate memory, the pages are not zeroed out. malloc() is unique in one regard: the kernel can malloc() more memory than what is available in RAM + swap. This behavior of allocating more than the total available memory is called overcommitting memory.

Seems like a recipe for disaster, huh? Imagine… processes are humming along allocating memory and BAM! some process wants to use the n’th + 1 byte of memory and an out of memory (OOM) condition provokes the kernel to start killing processes.

Why does the kernel allow memory to be overcommitted? Because most applications do not actually consume all of the malloc()’d memory. If an app doesn’t use a memory location, it does not really need to subtract from your total available memory.

Consider a typical Apache process. A full 20-30% of an Apache processes malloc()’d memory may not be consumed. With memory overcommit, this slice of memory can be used by other applications. Java applications also allocate much more than they use, as do many scientific applications.

At the system-wide level, these “malloc’d(), but not consumed” slices of memory begin to add up. By overcommitting memory, these slices don’t take up physical memory locations — resulting in more efficient use of memory resources.

OpenVZ and memory overcommit

OpenVZ also has the ability to overcommit memory. When creating an OpenVZ container, we specify the vmguarpages and oomguarpages resource guarantees. vmguarpages represents the maximum guaranteed amount of malloc()’d memory a container may have, while oomguarpages represents the maximum consumed memory. These are called guarantees because (If your server has enough RAM + swap) container processes will never be killed due to OOM if they are within these limits.

What if your container exceeds the vmguarpages or oomguarpages value? This is where a new limit comes in: privvmpages. privvmpages represents the absolute upper limit on container memory. When you run free or `cat /proc/meminfo` in a container, you see this privvmpages value take form as “Total memory”.

The memory gap between privvmpages and the two resource guarantees (vmguarpages and oomguarpages) is not safe to use in an ongoing basis if the sum of all container privvmpages exceeds RAM + swap of the hardware node. This is because if the hardware node runs out of memory, it begins looking for containers using more than their guaranteed memory. The kernel kills a process from the container who’s physpages exceeds oomguarpages the most.

I’ve created a diagram that hopefully illustrates the relationship between these three key memory limits/guarantees. Also included in the diagram is kmemsize, which is used for non-swappable kernel memory–not very important as long as it scales with privvmpages.

OpenVZ memory constraints

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , ,

One of the best uses of virtualization technology is for creating development environments. Virtual private servers (VPS) are ideal if you need a sandbox to create a new application stack, develop a new web app, or test out the app under a variety of Q/A scenarios.

How can we offer dev environments with low provisioning cost? This is one way:

  • Set up OpenVZ on a RHEL5 server.
  • Set up a web template (Apache, PHP5).
  • Set up a MySQL template
  • For each dev environment, deploy a web and MySQL container.

This 4 step process glosses over many important challenges. For instance:

  • How do you access these machines over the network?
  • How to get the latest code into each deployed web container?
  • How to get the latest MySQL schema & associated dev data into the MySQL container?
  • How to get code and schema changes out of the container and into another environment (e.g.: deploying to production)?
  • How to upgrade/restart the VPS without having to coordinate with each developer?

I’m thinking we can overcome these challenges with some crafty configuration & scripts.

Network access

NAT all VPS servers for more fluid provisioning, less DNS updates. Web nodes occupy one network subnet, DB nodes occupy another. Use a database to track all VPS and build a simple webapp on top that provides bookmarks to each VPS along with CRUD ops.

Getting code into the VPS

Using our git repo, check out a working copy with the latest dev branch. Alternatively, NFS mount staff home directories in each VPS and work off that. I’d prefer the NFS approach, as it makes the VPS more transient: we can destroy and re-create the server using an upgraded template if desired without loosing code.

Getting DB schema into the VPS

Database migrations, similar to Rails (we use PHP for most apps). Moodle CMS also has a form of database migrations which happen via the web, which is a tad simpler. Doctrine ORM supports database migrations, although we use it only for new development and it won’t help our existing code. For older/legacy webapps, we’ll have to maintain a per-tool schema file to bootstrap our webapps.

Getting code and schema updates out of the VPS

Pull code changes using git into a central repo. Put all database migrations in the code, using Doctrine as described above, or schema bootstrap file.

Painless VPS reboots

If all database changes exist in code, and all code exists on the shared NFS server, then destroying & deploying a new VPS from an updated template should not impact a developer. Test data may disappear, which may cause concern.  Ideally, all important data would be saved in-code and easily imported into MySQL, through migrations or bootstrap SQL files.  Alternatively, we could place the MySQL data files on an iSCSI lun, at an additional provisioning cost.

Once we are able to quickly provision new dev environments, why not do the same for production apps? I think we could use the same code bootstrap & database migration code to deploy new application stacks on demand.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
,

The Harvard Business Review article, “Reverse Engineering Google’s Innovation Machine” (April 2008), describes how Google is built for innovation. I highly recommend reading the article, and will focus on just a couple ideas here:

  • Budgeting for innovation
  • Superior infrastructure built for growth

Budgeting for innovation

Test if your organization values innovative activity: do they pay you for it? Google’s engineers are required to spend 20% of their time on a side project of their choosing. This means they take 20% of their time from search & advertising–Google’s bread & butter–and dedicate to risky, innovative activity.

Google’s organizational structure is effective at managing innovation. They understand that in the long-term, it’s more risky to focus solely on search & advertising. Investment in innovative activity is a hedge against potential downturns in their core business and a likely generator of future growth.

Superior infrastructure built for growth

Growing applications, from development to production, is challenging. Growing applications easily and on-demand is down-right hard. Your web application environment needs to be easy to set up, easy to share with other devs, fault-tolerant, easy to launch for users to see, and easy to scale up to meet production demands. Try building these capabilities into your infrastructure, then allowing your devs to do it without needing a sysadmin.

There are new tools that let us approach this level of superior infrastructure. They may not be as fast, or as easy, or as scalable. Tools like virtualization, re-usable web services and code libraries, and a solid architecture can bring us closer to this goal.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon

An Internet forum is a web application for holding discussions and posting user-generated content. This functionality is found in several types of applications, including message boards and blogs that allow for posting of comments.

One approach to creating a message board and blog web application is to abstract out the shared functionality. First, we have to define the functionality, then see where the overlap is. If the functionality differs greatly, then sharing a common web service won’t make sense. If there is a lot of shared functionality, or we design our applications with this goal of shared functionality in mind, then a web service component would result in an effective software design

Let’s look at what might be common across blogging and message board applications:

  • Posts, including subject, author, and meta-data (reply-to, last modified, date created, etc…)
  • Post buckets, which could take the form of a forum or blog post and comments.

URIs:

  • List buckets: GET /appbase
  • List posts: GET /appbase/1
  • Retrieve post: GET /appbase/1/2

Supported methods:

  • GET
  • HEAD
  • PUT
  • DELETE

Where have I seen something like this before???  Hmmm…

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon

Good overview of linux-based virtualizaton tech: Comparison matrix

Up-to-date virtualization blog: www.virtualization.info

Amazon web services blog: aws.typepad.com/aws/

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon

OpenVZ is a virtualization technology that allows many virtual private servers (VPS) to run on one hardware node. This post shows how to create a slim down LAMP VPS using OpenVZ on a RHEL4 hardware node.

Outline:

  • Install OpenVZ
  • Install an OpenVZ template
  • Create a temporary VPS container and initialize
  • Set up NAT on your hardware node
  • Install vzyum and related packages
  • Update the temporary VPS OS with latest packages
  • Replace OpenVZ template with the temporary VPS

Start with the fairly straight-forward quick installation guide available on OpenVZ’s wiki. Next, download a pre-created OS Template and place the tarball in /vz/templates/cache . At this point, you should be able to create a temporary container using the OS template you chose by doing the following:

vzctl create 1001 –ostemplate centos-5-i386-minimal

Where 1001 is the CTID (you’ll use this number to manipulate your VPS), and centos-5-i386-minimal is the name of the pre-created OS template you downloaded. Note: I tried a user-contributed centos 5 template, which required the installation of an additional metadata RPM in order for OpenVZ to know the location of certain files (like the networking scripts, nameserver file, etc…)

Let’s set the IP and nameserver of the VPS:

vctl set 1001 –ipadd 192.168.0.1 –nameserver “123.123.123.123 123.123.123.124″ –save

Next, set up NAT on the hardware node so that our VPS can make outbound network connections:

iptables -t nat -A POSTROUTING -s src_nat -o eth0 -j SNAT –to ip_address

I specified a src_nat of 192.168.0.0/24 to give me 255 NAT’d IPs to play around with. ip_address is your hardware node’s IP.

Time to fire up the VPS:

vctl start 1001

Test that networking works:

vctl enter 1001

ping yahoo.com

exit

At this point, we have a working VPS, but it’s running woefully out-of-date software. OpenVZ provides a wrapper around yum and rpm called ‘vzyum’ and ‘vzrpm’. Since I am running a centos 5 VPS, I’ll use vzyum. First I need to install the required packages on the hardware node (remember the HN is RHEL4):

rpm -Uvh python-elementtree-1.2.6-7.el4.rf.i386.rpm python-sqlite-1.0.1-1.2.el4.rf.i386.rpm python-urlgrabber-2.9.6-1.2.el4.rf.noarch.rpm vzrpm43-python-4.3.3-7_nonptl.6.i386.rpm vzrpm44-4.4.1-22.5.i386.rpm vzpkg-2.7.0-18.noarch.rpm vzrpm44-python-4.4.1-22.5.i386.rpm vzyum-2.4.0-11.noarch.rpm

Many of the packages not provided by RHEL4 were found at DAG .

Finally, let’s update our centos 5 VPS with the latest packages:

vzyum 1001 -y update

vzyum 1001 clean all

vzctl 1001 stop

Now we have all the updated packages and cleaned up any headers leftover from yum.

Now to replace the pre-created OS template with our up-to-date centos 5 version:

mv /vz/templates/cache/centos-5-i386-minimal.tar.gz /vz/templates/cache/centos-5-i386-minimal.tar.gz-old

cd /vz/private/1001

tar cvf /vz/templates/cache/centos-5-i386-minimal.tar.gz .

That’s it! Now you can vzctl create a slew of VPSs based on this template. Next time I’ll discuss how to create a web node template with LAMP built-in.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon

Virtual Private Servers (VPS) provide a chunk of computing power, coupled with storage space and networking. Many commercial hosting providers offer “dedicated shared hosting”, meaning they split up a machine into multiple chunks and rent them out individually. Often customers receive “root-level” access to their VPS, which are designed so that root in one VPS cannot touch another VPS running on the same hardware.

Why should I care about this? I’m a part time sysadmin at best…

Purchasing, racking, installing, obtaining support, patching, and overall managing of physical machines is not a web technology, nor does it deliver academic value to our clients. Web applications built into an academic portal deliver academic value to clients. If there was some way to run our web applications on air, we could focus solely on our direct commitment to students and faculty.

As much as I would like to run our web applications on thin air, physical hardware is needed.

This is where virtual private servers come in. VPS, if managed by highly responsible system administrators, completely eliminates the need for application developers to think about the care and feeding of physical machines. Instead, they get a reliable OS container that can move from one machine to another depending on hardware failures or changes in usage patterns. They get consistent, standardized chunks of computing power, making their application run exactly the same regardless of what physical hardware supports it.

I dig the idea of utility computing. I think we’re approaching the point where the interface between a computing need and fulfillment is as simple as plugging a light into an outlet.

If I need more CPU units to satisfy high end-of-quarter usage, I’d love to have a new server up and running in a few minutes, with minimal admin. I would love to have spare CPU units ready to go at a moments notice. I’d also love to share some of “our” CPU units back to the spare pool during usage lulls–or better yet, let this happen dynamically without me or our users noticing.

Why would a computing department want to offer VPS as a service instead of letting each group manage their own physical machines?

  • Greater department efficiency: some clients can share the same physical machine.
  • Simplified accounting: one chunk of computing power is the same as any other.
  • Higher-level capacity planning: aggregate all future computing needs into a pool.
  • Hardware cost savings: yearly bulk purchases for entire department instead of independent purchases.

The next question: how would this impact our organization? What changes would we need to make?

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon

MessageBoards, a redesign/rethinking of the EEE NoteBoard tool, will be coded by yours truly.

MessageBoards will provide board, forum, thread, and post-level functionality we have all grown to love–yet in the context of our academic portal tied into roster info.

I’d like to approach the way I code this tool differently, specifically:

  • Introduce month-long design/code/QA iterations.
  • Unit & integration testing.
  • Create a good set of Selenium tests.
  • Use the Doctrine framework for the underlying model.

Why bother changing the approach? We could code this tool just fine without changing the process, adding automated tests, and changing the framework.

MessageBoards is a big experiment. It may actually take longer to develop (although I hope not)–and it will give us a lot of good feedback on some new web development approaches that promise to reduce time to delivery. Delivering results faster–and with similar quality–would be a huge win.

There are a lot of variables in play, like the percent of time I dedicate to MessageBoards, and the four new approaches mentioned above. In the end, I anticipate it will be a challenge to isolate which variables added or removed value.

Another couple variables I bet will reduce time to delivery: working in parallel with a talented UI designer and the excitement of possibly finding something that will make other programmer’s lives easier.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , , ,