A short while ago we rolled out ModSecurity on three of our Apache web servers. The benefits of ModSecurity are clear: protection against most blatant web-based attacks, like SQL Injection and Cross-site scripting. It also acts as a last line of defense against information leakage, like PHP errors and directory listings.

In reality, ModSecurity takes a lot of time to implement well, especially if you have a large site. The core rules will almost certainly block legitimate user behavior–interrupting their business process with a generic error message.

Here are 8 tips that can help make your ModSecurity roll-out a success.

  1. Configure the ModSecurity SecResponseBodyLimit to be at least as large as the largest text document served by the site. Even in log-only mode, this will block large response bodies!
  2. Configure the ModSecurity SecRequestBodyInMemoryLimit to be at least as large as the size of php’s upload_max_filesize limit. Again, ModSecurity will block file uploads that exceed this value–even in log only mode.
  3. Start in log-only mode. ModSecurity will tell you what it would have blocked, giving you an opportunity to add whitelist rules or otherwise tune you ruleset.
  4. Whitelist load balancer health checks. This usually involves adding a whitelist rule for the load balancer’s IP.
  5. Whitelist automation and APIs that requests HTTP documents. This usually involves either an IP, user-agent, or URL-based whitelist. These are often easy to miss amongst the torrent of alerts ModSecurity detects.
  6. Consider disabling audit logging for invalid user agents, missing accept header, and unacceptable HTTP headers. This will significantly reduce the number of alerts that need to be analyzed, improving the chance of finding an alert that really matters.
  7. Review apps that use WSIWIG editors to ensure they are validating and sanitizing user input properly. ModSecurity loves to block WSIWIG input, generating alerts ranging from SQL injection to XSS to system command injection. The way around this is to whitelist certain rules for these app URLs.
  8. Create an operational plan to regularly review and act on ModSecurity alerts. Consider using ModSecurity Console to reduce the amount of time needed to analyze audit logs.
Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , ,

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, ,

For the past 7 months I’ve been trying lean web application development for the MessageBoard project. Now that the project is complete, I thought I’d compare lean application development with the typical waterfall method. Many of these ideas came from the book Implementing Lean Software Development, which I highly recommend reading if you are used to a conventional waterfall development process.

1. Lean development constrains time, not scope
Plan to spend a fixed amount of time working on a set of functionality. If you overestimate what you can produce in that amount of time, don’t adjust the deadline–adjust the scope. I’ve taken out sorting, filtering, and other behavior to make a monthly release deadline. Note the missing behavior and bundle it for a subsequent iteration. At the end of each month, I’d reflect on what I did to over/underestimate, then learn and adjust going forward.

There are a few compelling benefits to constraining time over scope. Estimating, previously one of the hardest tasks for me to do accurately and important during end-of-year performance evaluations, became easier because I was always working with the same unit of time (one iteration is always one month). Scheduling Q/A and planning sessions time was straightforward since I was making my deadlines.

Constraining deadlines over scope results in better prioritization. I’ll start on the required functionality, saving nice-to-haves until the end. If I am approaching an end-of-month deadline, I’ll postpone (or eliminate, team willing) the least cost-effective functionality. I feel this has the effect of focusing staff time on the most valuable functionality.

2. Frequent reviews, lots of feedback
Transitioning from the design mockup to a working product is a big step where many usability and programmatic issues are discovered. Including Q/A and designer at the beginning and end of each month-long iteration saved time in the long run because these issues were discovered earlier in the implementation phase while the code was still flexible.

Having a designer writing HTML, CSS, and javascript code was also helpful. We invited our designer in the first Q/A session of each month, allowing Q/A to talk directly to the designer about user interface issues (wording, confusing behavior, potential support issues, etc….) I thought it was more effective for the two to communicate directly instead of interpreting and relaying messages between the two team members.

When releases are small and frequent, making minor course corrections becomes much easier. With the waterfall method, I wouldn’t see a designer or Q/A very much until major coding was completed. I’d receive a huge amount of feedback right at the worst time: when code was mostly complete, rigid, and changes were very error-prone. This feedback was more than just application bugs–it included UI and requirement changes. With our three person team frequently reviewing the product and communicating, we could quickly identify what needed to be changed (either due to oversight or a change in our perception of user needs) and then decide on a course of action.

Conclusion & Looking Forward
We’ve brought Q/A fully into the iterative development, next I’d like to see if bringing our design activities into the iterative process is more effective than completing the design mockups prior to coding. I think we’ve already seen the benefit of keeping the design flexible during implementation, allowing us to quickly make necessary UI changes. Perhaps with a solid requirements, a good roadmap, and some rough mockups this added flexibility will result in more usable applications. Keeping the design rough and more open to change could reduce some of the friction that comes when the team is reviewing “final” UI design documents.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , , , ,

MessageBoard, a PHP & MySQL web application with a Doctrine ORM back-end, was launched last Friday, right on deadline. This concludes 7 month-long phases of development.

I think the ingredients essential to delivering this application on-time and within budget were:

  • Partnering with a user experience expert who designed the mock ups and wrote the bulk of HTML, CSS, and Javascript
  • Doctrine ORM: previously we hand-crafted SQL which was error-prone and occasionally introduced security vulnerabilities.
  • Month-long development iterations with a tested, working product delivered at the end of every month.

Looking forward, there is an increasing need at our campus for an Academic Personnel Review web application, which will transform a paper process into a fully online process. The major motivation behind this project is to reduce the massive amount of staff time that goes into coordinating this process in each campus unit.

This application will very likely be built in Java!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , ,

Just wanted to post an update to my previous blog entry about Doctrine ORM gotchas. Since 1.0.4 was released, a seriously limiting bug was fixed in the SoftDelete template. This bug was preventing typical performance optimizations that used LEFT JOINs to reduce the number of database queries. The idea is that a page generally loads much faster by executing few efficient JOIN’ed queries than many single-table queries (do your joins in MySQL, not PHP!)

I had posted a workaround to this bug ($query->addWhere(”deleted = 0 OR deleted IS NULL”) to all of your LEFT JOINs). This was cumbersome and I felt violated the principle of the SoftDelete event listener.

With this bug resolved, I’ve been more freely adding custom finders for specific pages. One page in particular (the MessageBoard thread index) went from 12 seconds to 4 seconds for a very large data set. The number of DB queries also was cut from 1200+ to about 300.

Now I just have to get the page down to 10 queries and we can call it optimized.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , ,

Doctrine ORM is a PHP library that implements the ActiveRecord pattern we have all grown to love.  I’ve been using it for the past 7 months and feel it is one of the reasons we were able to deliver the MessageBoards web application on-time and within budget.

This doesn’t mean using Doctrine has been all flowers and sunshine: Doctrine will kick you in the face when you’re not paying attention.

Today I’d like to revisit all of those black eyes and bloody noses in the hope of helping fellow developers avoid the same missteps I made.  Doctrine is complex and quirky, and has some unanticipated architectural “features” that are not well documented.

Use Record::toArray() and Record::fromArray() to store/retrieve Doctrine objects from the session.

  • Save space in the session store by adding only the column attributes of Record objects to the session.
  • The session will quickly fill up otherwise, as Doctrine adds considerable bulk to model objects.

Improve performance by extending Doctrine_Table and implement custom DQL queries for complex and frequently used queries.

  • If the controller or view will need a record’s related record, use a DQL query to join with the related table.

Optimize performance by getting to know and love Doctrine_Connection_Profiler.

  • Add the connection listener at the beginning of execution and print SQL queries at the end of execution in order to identify areas of effective performance optimization.
  • Example code that adds the listener and renders query events as HTML:
    
    // Set the connection listener
    
    $profiler = new Doctrine_Connection_Profiler();
    Doctrine_Manager::connection()->setListener($profiler);
    
    // Code goes here...
    
    // Render database connection events as HTML:
    
    $query_count = 0;
    $time = 0;
    echo "<table width='100%' border='1'>";
    foreach ( $profiler as $event ) {
        if ($event->getName() != 'execute') {
            continue;
        }
        $query_count++;
        echo "<tr>";
        $time += $event->getElapsedSecs() ;
        echo "<td>" . $event->getName() . "</td><td>" . sprintf ( "%f" , $event->getElapsedSecs() ) . "</td>";
        echo "<td>" . $event->getQuery() . "</td>" ;
        $params = $event->getParams() ;
        if ( ! empty ( $params ) ) {
              echo "<td>";
              echo join(', ', $params);
              echo "</td>";
        }
        echo "</tr>";
    }
    echo "</table>";
    echo "Total time: " . $time . ", query count: $query_count <br>\n ";
    

Effectively mitigate performance issues with memcache.

  • The query and result cache can drastically offset Doctrine’s performance overhead.
  • If you already have memcached running, this is one of the most cost-effective performance tweaks you can do.
  • Note: I received mysterious fatal errors when using INDEXBY in DQL queries. After removing the INDEXBY, the errors stopped.

Play nice with fellow coders or testers by automating database migrations.

  • Add a git merge hook that runs the Doctrine migration.
  • Alternatively, check if a migration is needed on every page view while in development mode.

Play nice with other web applications by prefixing database tables.

  • Set the table name prefix by calling $this->setTableName(’zzz_model_name’), where ‘zzz’ is the tool’s prefix.

Create a “resource-oriented” URL structure that closely follows the application’s models.

  • This is borrowed from the RESTful architecture, and not necessarily Doctrine-specific.
  • For example an HTTP GET on http://site.com/messageboard/m34/f11/ would display forum ID 11 in messageboard 34 in tool “messagebord”.

Use a workaround when using LEFT JOINs on models with actAs(’SoftDelete’) behavior.

  • SoftDelete will automatically add the WHERE condition “deleted = 0″ to all queries.
  • This prevents queries with LEFT JOIN from returning a row where “deleted IS NULL”.
  • Either use INNER JOIN instead, or add the following to DQL queries: $query->andWhere(’deleted = 0 or deleted IS NULL’);

Timestampable cannot be disabled temporarily, causing challenges when importing data with dates.

  • Doctrine provides no way to easily disable or override the timestamp behavior in order to import a pre-existing date.
  • Until this behavior is resolved, try using this patch to set the ‘disabled’ option of Timestampable.

Use “cascade => array(’delete”)” to propagate soft deletes through model relations.

  • The faster onDelete => ‘CASCADE’ performs the delete in MySQL, which does not set the deleted flag.

Put authorization code in one place, when possible.

Implement checkbox plus text input as two columns in a model.

  • For example:
    Require password:
  • This approach simplifies validation of optional attributes.

Use $model->setAttribute(Doctrine::ATTR_COLL_KEY, ‘id’) to key collections off of the primary key.

  • If this attribute is not set, Collections will be indexed starting from 0 and counting upwards.
  • This can simplify controller logic.

Use Doctrine_Pager only for the most basic views.

Use actAs(’NestedSet’) to model hierarchies that are read more frequently than written.

Use unix timestamps in Timestampable columns to ease formatting.

  • Using a datetime works fine if the view never needs to change how a datetime is displayed.
  • A unix timestamp allows for flexibly changing how dates are rendered, e.g.: “Jan 1st 2008″ or “Yesterday”.
  • Example actAs() code:
    $this->actAs('Timestampable', array(
        'created' => array('name' => 'created_at',
            'type'    =>  'integer',
            'format'  =>  'U',
            'disabled' => false,
            'options' =>  array()),
        'updated' => array('name'    =>  'updated_at',
            'type'    =>  'integer',
            'format'  =>  'U',
            'disabled' => false,
            'options' =>  array())));
    

For multi-step forms, add a ’state’ column to aid in validating each step.

  • In the model’s validate() function, use the state column to switch between validation logic.
  • For instance, in state 1, validate columns a and b. In state 2, validate columns a, b, c, and d. In state 3, validate the complete object.

I hope this list saves some heartache on what is really a very elegant ActiveRecord implementation in PHP!

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , ,

In this post I hope to show how to configure nginx as a reverse proxy to a back-end CentOS 5 server running Apache.

When you add an nginx reverse proxy layer on top of Apache, Apache thinks that all connections originate from the server running nginx. This creates a couple annoying problems:

  • Every entry in the Apache access logs appears to come from the IP of the nginx server
  • Securing sessions by checking that a user’s IP address hasn’t changed becomes more difficult.
    • This is especially true when using open source software. OS packages usually look for the client’s IP in the REMOTE_ADDR variable.

To resolve these issues, we’ll use the Apache mod_rpaf module to populate the REMOTE_ADDR using a special HTTP header inserted by nginx. A typical request would work as follows:

  • 1.2.3.4 sends HTTP request to nginx server
  • nginx determines that it needs to proxy pass the request to a back-end Apache server (e.g. by looking at the content-type or virtual host).
  • nginx adds an HTTP header “X-Forwarded-For” with the client’s real IP
  • nginx forwards (proxy_pass) the request to back-end Apache server
  • mod_rpaf in Apache detects that the request is coming from the nginx IP, then substitutes REMOTE_ADDR with the contents of X-Forwarded-For
  • Apache handles request as normal. Applications do not need to be aware of the reverse proxy.

To install mod_rpaf on the CentOS 5 box:

wget http://stderr.net/apache/rpaf/download/mod_rpaf-0.6.tar.gz
tar zxvf mod_rpaf-0.6.tar.gz
cd mod_rpaf-0.6

# Patch Makefile so it looks for 'apxs' instead of 'apxs2' (required
# when compiling under CentOS 5):
sed -ie 's/apxs2/apxs/' Makefile

make rpaf-2.0
make install-2.0

Create /etc/httpd/conf.d/rpaf.conf:

# Path to mod_rpaf-2.0.so, relative to /etc/httpd/
LoadModule rpaf_module modules/mod_rpaf-2.0.so

RPAFenable On
RPAFsethostname On

# Define our reverse proxy IP.  Only substitute client IP in
# when we receive a request from this IP.
RPAFproxy_ips 192.168.0.1

# The header where the real client IP address is stored.
RPAFheader X-Forwarded-For

Configure nginx to reverse proxy our CNAME IP address to the back-end container. We won’t go into installing nginx here, instead I’ve included the relevant configuration section in /etc/nginx/nginx.conf. This configuration says to reverse proxy all requests to the virtual host ‘myvirtualhost.com’:

server {
listen 80;
server_name myvirtualhost.com;

location / {
proxy_pass http://192.168.0.56;
proxy_redirect default;

proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}

After restarting Apache & nginx, you should be able to successfully connect to your back-end Apache via the nginx reverse proxy layer. Inspecting the Apache environment will show a couple new headers, but other than that requests look the same as if clients were connecting directly without the proxy.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , , ,

Creating a minimal LAMP stack in OpenVZ:

Start with a pre-created centos-5 OpenVZ template & install required packages:


# Create centos-5 OpenVZ container:
vzctl create 1056 --ostemplate centos-5-x86_64-minimal
vzctl set 1056 --ipadd 192.168.0.56 --nameserver 123.123.123.123 --save
vzctl start 1056

# Update software & install packages:
vzctl 1056 install yum
vzctl enter 1056
yum upgrade

# Install packages -- the ".x86_64" tells yum to only
# install the 64-bit packages and not the i386 packages.
yum install vim-enhanced.x86_64 mysql.x86_64 mysql-server.x86_64 \
httpd.x86_64 httpd-devel.x86_64 wget.x86_64 which.x86_64 \
php.x86_64 make.x86_64 gcc.x86_64 gcc-c++.x86_64

# Clean up leftover files from yum:
yum clean all

Next, tune Apache to fit our development 256MB OpenVZ container. If you have more memory dedicated to your container, consider increasing these settings. Edit the prefork MPM section of /etc/httpd/conf/httpd.conf:

<span class="__mozilla-findbar-search" style="padding: 0pt; background-color: white; color: black; display: inline; font-size: inherit;">&lt;</span>IfModule prefork.c<span class="__mozilla-findbar-search" style="padding: 0pt; background-color: white; color: black; display: inline; font-size: inherit;">&gt;</span>
StartServers       2
MinSpareServers    1
MaxSpareServers   8
ServerLimit      8
MaxClients       8
MaxRequestsPerChild  4000
<span class="__mozilla-findbar-search" style="padding: 0pt; background-color: white; color: black; display: inline; font-size: inherit;"><span class="__mozilla-findbar-search" style="padding: 0pt; background-color: white; color: black; display: inline; font-size: inherit;"></span></span>&lt;/IfModul<span class="__mozilla-findbar-search" style="padding: 0pt; background-color: white; color: black; display: inline; font-size: inherit;"><span class="__mozilla-findbar-search" style="padding: 0pt; background-color: white; color: black; display: inline; font-size: inherit;">e&gt;</span></span>

Let’s lock down the MySQL root user before starting up services:

mysql -u root mysql
mysql> update user set password=password('mynewsecurepassword') \
where user='root';
mysql> flush privileges;
mysql> exit

Start services and ensure that services start when the machine boots up:

chkconfig --levels 345 httpd
chkconfig --levels 345 mysqld
service httpd start
service mysqld start

Finally: test!

This will give you a slimmed down LAMP stack, suitable for running small web applications on top.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , , ,

ZFS is rapidly bringing “enterprise” features you’d normally find in Netapp, EMC or IBM products.  We recently deployed a Sunfire X4500 as our main storage server, serving out iSCSI and NFS shares to multiple servers, including an OpenVZ hardware node.  The Sunfire X4500 attracted us for two reasons: ZFS and cost of ownership.

The Sunfire X4500 came with 48 500GB drives and 6 disk controllers.  We configured the array into 7 RAIDZ2 sets, then striped the 7 sets together, leaving 4 spares and two root disks.  Since we started with Solaris 5/08, we went with a ufs mirror for the two root disks (Solaris 10/08 lets you use ZFS root pool).  This gives us plenty of redundancy at the disk and controller level.

Next we set up several ZFS filesystems, essentially one filesystem for each website using NFS, and one ZFS volume for each iSCSI share.  For now, we are only using one iSCSI share — this will change soon.

The filesystems exported over NFS have no capacity limits, while the iSCSI volume was thin provisioned at 1TB.  We are currently using only using about 100GB of the iSCSI volume.  In the long run the thin provisioned volume will save us from having to resize the volume and extend the ext3 filesystem that sits on top.

The iSCSI volume was shared to an Redhat RHEL5 box running Open-iSCSI.  We found Open-iSCSI to generally work well until you have a network hiccup.  In our initial tests, a network delay of 3 seconds would cause the SCSI block device to disappear temporarily, freaking ext3 out and remounting read-only.

Our solution to network hiccups was to add a DM-Multipath layer between the iSCSI and ext3 layers on the RHEL box.  Multipath will sense network issues and queue I/O until the issue is resolved, or a configurable timeout expires.  The queuing behavior worked perfectly as we went about unpluging and pluging network cables on the RHEL box.  Side note: Multipath is a much more capable piece of software, allowing a machine to switch from one network path to another if it had redundant network paths to the SAN.

One gotcha to look out for when using a Solaris + RHEL5 + Multipath combo: a bug on the Solaris iSCSI target side causes the SCSI ID to be reported as much longer than Multipath can handle.  We got around the bug by extracting what appeared to be the unique SCSI ID portion.

Performance was near wire-spead for sequential I/O.  Random I/O was fantastic, thanks in large part to the 16GB of RAM the Sunfire was using as a giant disk cache.  The I/O performance was much better than the local SATA that the RHEL box came with.

With the iSCSI share created, shared, all appropriate iptables/ipf rules were set up, and the block device mounted, we then moved all of the OpenVZ containers to this RHEL box.  OpenVZ needed no special configuration — we simply mounted the iSCSI device on /vz using ext3.  All containers have been running smoothly since (except for a bad RAM module, but that was unrelated to iSCSI or OpenVZ).

The NFS shares have also been running smoothly, although we did run into a nasty Solaris ipf issue.  The problem is that after a few days, automountd on RHEL randomly hangs when attempting an additional mount.  Packet captures indicate that ipf is blocking either the inbound or outbound packets for the second TCP connection (we are not using UDP) to the ‘nfs’ port.  Disabling ipf immediately resolves the problem.  We are currently trying to reproduce the issue on a test server.

Even with the few issues that came up, we are still very happy with our new setup.  The Sunfire X4500, ZFS, and Solaris 10 matches many of the essential features of a more expensive “enterprise” solution.

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon
, , ,

Two extremely informative ZFS references:

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • Reddit
  • StumbleUpon