new plugin: acts_as_git

Courtenay : November 14th, 2008

With the help of Jamie van Dyke at Parfait and Scott Chacon at GitHub, I'm pleased to announce Acts As Git (no, I don't like the name either). It's a simple plugin which stores all changes you make to a text field in a git repository. This is ideal for something like a git-backed wiki.

Look at it here: github or check it out from

git://github.com/courtenay/acts_like_git.git

From the README:

ALG automagically saves the history of a given text or string field. It sits over the top of an ActiveRecord model; after a value is committed to the database, the plugin writes the new value to a text file and commits it to a git repository. This way you get all the advantages of using Git as version-control.

Usage:

class Post < ActiveRecord::Base
  versioning(:title) do |version|
    version.repository = '/home/git/repositories/postal.git'
    version.message = lambda { |post| "Committed by #{post.author.name}" }
  end
end

To view the complete list of changes:

>> @post = Post.find 15
<Post:15>
>> @post.title
=> 'Freddy'
>> @post.history(:title)
=> ['Joe', 'Frank', 'Freddy]
>> @post.log
=> ['bfec2f69e270d2d02de4e8c7a4eb2bd0f132bdbb', '643deb45c12982dde75ba71657792a2dbdda83e6', 
'1ce6c7368219db7698f4acc3417e656510b4138d']
>> @post.revert_to '1ce6c7368219db7698f4acc3417e656510b4138d'
>> @post.title
=> 'Joe'

It uses the excellent Grit library, and doesn't actually have a checked-out repository. The latest version of your data is still stored in the database. You can actually clone this repo and view the changes; pushing back to it won't do anything useful.

Plugin configuration style?

Courtenay : November 10th, 2008

I’m putting the final touches on a super-sweet versioning plugin, and I’ve discovered that we’re using several different metaphors for configuring the plugin options. I’d like to get some opinions/feedback on your preferred style.

The DSL

Using a DSL and passing blocks in which get instance evalled. I’m normally very scathing of DSLs; I think that they’re Yet Another Language for people to learn to use – it’s usually your very own write-only syntax – but it’s been super-fun implementing the backend to this.

class Monkey < ActiveRecord::Base
  versioning do
    author do
      name { user.current.name }
      message { "Commited via #{name}" }
    end
    repository "Joe's DataStore" 
  end

Hashes

This seems to be the Rails plugin default:

class Monkey < ActiveRecord::Base
  versioning :author => { :name => lambda{ |u| user.current.name } }, :repository => "Joe's DataStore" 
end

Class vars / methods

Easy to monkeypatch later

class Monkey < ActiveRecord::Base
   will_version
   @@version_repository = "Joe's DataStory" 
   def version_author
     current_name
   end
end

Are there others? Which do you prefer? Currently I’m using all three in this one plugin, and it’s very un-awesome.

Ripping out your mocks

Courtenay : November 6th, 2008

I sat down with David Chelimsky at Rubyconf today to talk about rSpec and an interesting topic came up.

In my mind, there are two reasons to use a mock object: first, when you’re developing TDD style, you physically don’t have the objects yet; and second, so that you can tightly focus your unit tests. Maybe, these two different purposes should use a different mechanism.

His question to me then was, “Do you replace your mocks with the real objects after you’ve implemented those objects?”. I guess I hadn’t thought about that before. Do you? If so, how do you handle the extra complexity, maintaining sane associations and valid data?

On hiring Rubyists and Railsers

Courtenay : November 4th, 2008

We’re launching a new service at work in the next week or so that involves me looking through a lot of job applications: resumes and sample code.

I’d like to tell people right now, upfront, if you’re applying for a Ruby or Rails job, for anyone, there are a few ways of ensuring you get called back. They’re probably fairly simple.

Send some sample code, maybe a link to a project on Github, or a snippet of work you’ve done. Make sure you send the tests for the code. Any tests would be good, and you get bonus points for good tests. If you don’t have any tests, write them.

Don’t worry too much about sending some crazy complex code. Maybe some polymorphic associations (models), some ajax (views), a knowledge of the whole stack (simple controllers), some nested resources. Write a simple todo list application.

It’s not just a silly philosophy. Writing tests – hell, submitting tests with your job application’s code – shows that you’ve actually thought about the code, and that it actually works. You’ve permutated and permeated through the logic, actually think about the various ramifications of the design decisions in the code itself.

Just the pure act of sending tests with your sample code will put you above 90% of applicants, I promise.

We've stopped using rSpec ...

Courtenay : November 3rd, 2008

...for new projects.

fail

We upgraded the gems for one of our client projects, and the auto-loading / config.gems managed to completely break all our other projects, requiring upgrades, which caused weird breakages in weird places in some of the specs.

The app would refuse to deploy (rake tmp:create failed, because lib/tasks/rspec.rake was being loaded, and spec wasn't installed on the server). The annoying thing was that just having whatever.11 installed (I don't know the exact version) broke older apps on whatever.4 or whatever.0.2. .. so those had to be upgraded too. We wasted a day or two (three, maybe four developers) which equates to several thousand dollars in wasteage. It was also really infuriating -- the culmination of a few years of frustration of rSpec's weirdnesses.

After that, I found that some of the specs had never run (who knows why). It stopped reading spec.opts and started doing some weirdness with pending options. Finally, Rick just snapped, threw out rSpec and his Model Stubbing library, and now we're playing with a combination of rr, context, and matchy, trying to get a feel for a decent workflow again. It's sad and maybe a bit exciting to be on the edge.

What are you testing with?

The awesomest filter and sort ever

Courtenay : August 26th, 2008

Update 2: seems like only one or two people knew about what can_search does :) I hope we’re all a little better educated.

Update: yes, I’m using these named scopes throughout the app in other places – they aren’t used only in this one controller.

Often you have an index action where you want to sort records, filter by a parameter, and maybe join on some other tables to get a result. Let’s say you’re looking at a videos controller (where videos are acts_as_taggable) and you want to filter by user_id, filter by tag name, order by video title, or rating. Maybe later, you’ll add a roles (hm:t) association and need to only show videos viewable by a certain user. How complex!

To solve this, we’re going to play with some things you may know, and finish up with a bam! pow! that’ll take your breath away.

Rather than build up some form of frankenquery with all sorts of conditionals and cases, joins, and other messing about, let’s use a brand-new bleeding edge feature of Rails: named scopes.

First, build up individual named scopes for each axis on which you wish to filter. Make sure and put the table name in that query.



    named_scope :by_user, lambda { |user_id| 
      { :conditions => ['videos.user_id = ?', user_id] }
    }

    named_scope :tag_name, lambda { |tag_name|
      { :joins => { :taggable => :tag },
      { :conditions => ['tags.name = ?', tag] }
    }

    named_scope :rating, lambda { |rating| 
      { :conditions => ['ratings_count > ?', rating] }
    }

OK, I cheated on the last one, but let’s assume you have a counter_cache on ratings count.

Now, if you have more than one scope with joins in it, you’ll need to apply this patch to your rails installation, or upgrade past 2.1.1. This will allow you to have as many joins as you like in your scopes.

Now, here’s where the magic happens: in the controller. Big shout out to protocool for this method. Let’s build up a set of all the possible scopes that we might want to use, in an array form like [ named_scope, argument ]

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]
end

Easy, right? Very readable.

How about some ordering?

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

Now, as you know, you can chain named scopes. So you could say Video.by_user(2).tag_name('monkeys') Let's take advantage of this, building up a chain of scopes dynamically using 'inject', starting from Video, and adding each scope we added to the array above. This is really fun magic, because it doesn't run any of the queries until the whole thing is built. I don't even know how this works, but it does. Swimmingly.

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order)

The final method looks like this:

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order, :page => params[:page])
end

One final caveat. Sometimes :joins doesn’t know where to get the video id from, so if you’re using id in your app, you’ll need a slight workaround involving manually getting the pagination count, and forcing :select => &#8216;distinct videos.*&#8217; in the paginate call.

If this works for you, it’s really easy to add new filtering, ordering, or even scoping to your query. For example, you can add some form of role hackery to your video


    named_scope :viewable_by, lambda { |user| 
      { :joins => { :permissions => :roles },
        :conditions => [ "roles.user_id = ? AND permissions.role = ?", user.id, "view"
    }

Controller, you replace the first scope definition with this

scopes = [ :viewable_by, current_user ]

Or, you modify the scope inject statement


    @videos = scopes.inject(Video.viewable_by(current_user)) { |m,v| ... }

If you consider this a giant hack, you’re probably at least partly right. However, the alternative in building up a complex query with many possible moving parts is just hideous. And consider this: you can unit test each part of the query on its own, in the model specs.

Sanitize your users' HTML input

Courtenay : August 25th, 2008

The default Rails sanitize helper is actually quite powerful. You can see some of its usage here:

<%= sanitize @article.body, :tags => %w(table tr td), :attributes => %w(id class style) %>

However, as the docs say,

Please note that sanitizing user-provided text does not 
guarantee that the resulting markup is valid.

We were having an issue with users providing bad markup and leaving their tags unclosed.

This is <a href="http://foo.com">my dog<a/> and he&#8217;s super cool!

We solved it by running Hpricot over their input.

before_save :clean_html
def clean_html
  self.body = Hpricot(body).to_html
end

For performance reasons, you should probably run the hpricot and sanitize methods on the way into the database, rather than rendering it in the views, because it’s somewhat slow, and is a calculation that you only need to perform once.

In fact, instead of saving it in a callback, you could overload the accessor like so:

def body=(new_body)
  write_attribute :body, Hpricot(new_body).to_html
end

You’ll want to include the ActionView methods from ActionView::Helpers::SanitizeHelper to get ‘sanitize’ available in your model.

Authenticate like SSO with ActiveResource

Courtenay : July 18th, 2008

When you have multiple Rails applications that don’t share a common database and you want to share the user authentication information – or rather, use one app to provide authentication for another – there are a few options. Here’s how I solved it recently. This is the simplest way I could think of to get this working. I couldn’t find a plugin to do this, so here’s the result of my pdi.

Effectively what we’re doing is separating the user’s data - their profile info, if you like - from the credentials, and moving the latter to ActiveResource. This is something you should do in your own apps. Too frequently we stuff a bunch of data (like full name, phone number) into the user model, because it’s there. A more advanced version of this code might use the ‘profile’ as the resource name, updating the local profile with data from remote, and keeping User as a pure credential model.

Let’s assume we have App A which will act as the authenticator master. Our other application, App B, will still hold a User record, but we’ll override the authenticate method to use ActiveResource. We’ll also store some other fields like username and email, and will grab those each time the user logs in. That way, they can set an auth token in App A and they can login from cookies in app B (provided the cookie domain is shared).


class User < ActiveRecord::Base

  class Auth < ActiveResource::Base
    self.site = "http://app-a.com"
    self.format = :json
    self.element_name = 'user' # this is the name of the resource in your app
  end

  def self.authenticate(login, password)
    Auth.user = login
    Auth.password = password

    # Authenticating against the app will actually 'prove' the login/pass details.
    # We also want the user's details so we can cache them here.
    authed = Auth.find :first, :params => { :login => login }
    return false unless authed

    # Now, pull the data from remote and store it locally.
    user = User.find_or_initialize_by_login(login)
    user.attributes = authed.attributes
    user.save!
    user.activate!
    user

  rescue ActiveResource::ClientError # 406 error -- bad username/password.
    false
  end

Interestingly enough, find first actually runs the ‘index’ action, and returns the first record. sigh

Now, in your App A: users_controller, you want to set up a filter in the index like so:


  def index
    if params[:login]
      # for single-sign-on.
      @users = User.find(:all, :conditions => { :login => params[:login] })
    else
      @users = User.paginate(:all, :page => params[:page]) #... 
    end

    respond_to do |format|
      format.html
      format.json { render :json => @users }
    end
  end

Do you have a better way of doing this?

activerecord benchmarks: how fast is your system?

Courtenay : November 8th, 2007

Over a year ago we published some benchmarks on how fast your computers were running the complete ActiveRecord test suite. I consider this to be a great test for the fastest platform for developing Rails. (Let’s ignore the speed of your IDE or pseudo-IDE—this one’s all about waiting for your autotest. This probably isn’t a good indicator of server status)

It’s time to run this test again. Why? Because I’m buying a new computer, and I want to be the most efficient with my money as possible. That means a macbook, rather than macbook-pro.

Check out Rails revision 8117 (trunk at this time), install sqlite if you haven’t already (macports: rb-sqlite), and run rake test_sqlite

Comment here with your platform, and the time reported. If you want to be more accurate, run it a few times. I’m not a professional statistician; don’t tell Zed Shaw about my shoddy procedure.

Factors that may influence your times: disk speed, processor speed, your ruby version, luck …?

Who Hardware Rake time (sec) OS
-- ---- —- -
chrissturm imac core2 18.88 leopard
octopod mbp-sr 23.45 tiger
technomancy mbp-sr 25.74 ubuntu gutsy
defiler mb1 25.772 leopard
form mb2 2.0 26.59 leopard
courtenay macpro 2×2.6 28.49 tiger
mike Athlon64/3000 34.63 xp
courtenay Sempron64/2600 57.49 fc6
courtenay powerbook 1.5 92.92 tiger
  • Summary

From the looks of it, most current-level professional macs whether laptop or desktop run the benchmarks at within 15% of the same time. This probably isn’t too much of a surprise, since ActiveRecord won’t run on multiple processors; but it’s nice to know that if you’re only really doing rails on your laptop, a macbook is as good as anything out there.

The move to Intel has really helped Apple get a nice standard baseline for performance, that clearly smokes the ‘old’ PPCs.

In fact, ‘ol faithful, my previous fast-rails-box running linux on an amd-64, has dropped to very lowly status of 57 seconds. It’s time to retire my trusty powerbook. I spend more time waiting than coding.

Notes:

  • mb1 : MacBook 1 (Core Duo)
  • mb2 : MacBook 2 (Core 2 Duo)
  • mb3 : MacBook 3 (Santa Rosa)
  • mbp-sr : MacBookPro (Santa Rosa)

Premcaching, updated

Courtenay : October 10th, 2007

See my previous article on premcaching, preloading data and stuffing it into memcached in a fork.

Paul McKellar just sent this snippet to re-establish the database connection in your fork.

def fork_with_new_connection(config, klass = ActiveRecord::Base, &block)
  fork do
    begin
      klass.establish_connection(config)
      yield
    ensure
      klass.remove_connection
    end
  end
end
def fire_and_forget(&block)
  config = ActiveRecord::Base.remove_connection
  pid = fork_with_new_connection(config) do
    begin
      yield
    ensure
      Process.exit!
    end
  end
  ActiveRecord::Base.establish_connection(config)
  Process.detach pid
end

Awesome. Is anyone else using techniques like this for their crazy scaling or pagination needs?

skinny controllers, skinnier controller specs

Courtenay : August 24th, 2007

So, you're happily using mocks to remove the database from your skinny™ controller.

The code has been hacked on by about four different people and looks something like

describe CategoriesController, "showing a record" do

  before do
    @store = mock_model(Store, :categories => mock('categories proxy'))
    @product = mock_model(Product)
    @store.categories.stub!(:find_by_permalink).and_return @product
    @product.stub!(:name).and_return('foo')
  end

  it "should show successfully" do
    get :show
    response.should render_template('show')
  end

  it "should load one record" do
    @store.products.should_receive(:find_by_permalink).with('1').and_return @product
    get :show
  end

end

To be honest, it's pretty nasty, and with rSpec, if it feels nasty it's probably wrong. The controller is quite simple

class CategoriesController < ApplicationController

  before_filter :load_store

protected
  def load_store
    @store = Store.find(session[:store_id])
  end

public

  def show
    @category = @store.categories.find_by_permalink(params[:id])
  end

  def edit
    @category = @store.categories.find_by_permalink(params[:id])
  end

  def update
    @category = @store.categories.find_by_permalink(params[:id])
    @category.update_attributes(params[:category])
  end

end

Now, there are two ways of DRYing up this. They both involve a "find_category" method. The holy war involves whether you load the data in a before_filter or explicitly set @category in each action. I think the first is much cooler.

class CategoriesController < ApplicationController
  before_filter :find_category, :only => [ :show, :edit, :update ]

protected
  def store
    @store ||= Store.find(session[:store_id])
  end

  def find_category
    @category = store.categories.find_by_permalink(params[:id])
  end

public

  def show
  end

  def edit
  end

  def update
    @category.update_attributes(params[:category])
  end

end

In the new spec, we can do something like this:

describe CategoriesController, "showing a record" do

  before do
    controller.stub!(:find_store)
    controller.stub!(:find_category)
    controller.instance_variable_set(:@category, mock_model(Category)
  end

  it "should show successfully" do
    get :show
    response.should render_template('show')
  end

  it "should load one record" do
    controller.should_receive(:find_category)
    get :show
  end

end

describe CategoriesController, "finding a record" do
  before do
    @store = mock_model(Store)
    controller.stub!(:store).and_return(@store)
  end

  it "should find a record by permalink" do
    controller.stub!(:params).and_return({ :id => '1' })
    @store.should_receive(:find_by_permalink).with('1')

    controller.send(:find_category)
  end
end

First, we test the "should show.." logic. Then, in a different context, we test that the "find" works as advertised.

Got a better way?

compiling ruby for the iphone

Courtenay : August 24th, 2007

Here’s how far we’ve come.

You can install the binary ruby from Installer.app, however, it doesn’t run anything useful, and segfaults, errors out, or otherwise refuses to operate.

# ruby setup.rb
setup.rb:1585: [BUG] terminated node (0x15c444)

So, the process is, install the ARM binutils (gcc and friends) with macports. This GCC is ghetto, and doesn’t support everything that regular Darwin GCC does.

sudo port install arm-apple-darwin-binutils

Download ruby, untar it, and apply this patch:

&#8212; configure.in.old    2007-08-23 23:22:08.000000000 -0700
+++ configure.in        2007-08-23 23:40:18.000000000 -0700
@@ -530,7 +530,7 @@
              truncate chsize times utimes fcntl lockf lstat symlink link\
              readlink setitimer setruid seteuid setreuid setresuid\
              setproctitle setrgid setegid setregid setresgid issetugid pause\
-             lchown lchmod getpgrp setpgrp getpgid setpgid initgroups\
+             lchown lchmod getpgrp getpgid setpgid initgroups\
              getgroups setgroups getpriority getrlimit setrlimit sysconf\
               dlopen sigprocmask\
              sigaction _setjmp setsid telldir seekdir fchmod mktime timegm\
@@ -630,7 +630,7 @@
 fi

 AC_FUNC_GETPGRP
-AC_FUNC_SETPGRP  
+# AC_FUNC_SETPGRP  

 AC_C_BIGENDIAN
 AC_C_CONST
@@ -1047,7 +1047,7 @@
        rhapsody*)      : ${LDSHARED=&#8217;cc -dynamic -bundle -undefined suppress&#8217;}
                        : ${LDFLAGS=&#8221;&#8220;}
                        rb_cv_dlopen=yes ;;
-       darwin*)        : ${LDSHARED=&#8217;cc -dynamic -bundle -undefined suppress -flat_namespace&#8217;}
+       darwin*)        : ${LDSHARED=&#8217;$(CC) -dynamic -bundle -flat_namespace&#8217;}
                        : ${LDFLAGS=&#8221;&#8220;}
                        : ${LIBPATHENV=DYLD_LIBRARY_PATH}
                        rb_cv_dlopen=yes ;;
@@ -1379,7 +1379,7 @@
        ;;
     darwin*)
        LIBRUBY_SO=&#8217;lib$(RUBY_SO_NAME).$(MAJOR).$(MINOR).$(TEENY).dylib&#8217;
-       LIBRUBY_LDSHARED=&#8217;cc -dynamiclib -undefined suppress -flat_namespace&#8217;
+       LIBRUBY_LDSHARED=&#8217;$(CC) -dynamiclib -undefined suppress -flat_namespace&#8217;
        LIBRUBY_DLDFLAGS=&#8217;-install_name $(libdir)/lib$(RUBY_SO_NAME).dylib -current_version $(MAJOR).$(MINOR).$(TEENY) -compatibility_version $(MAJOR).$(MINOR)&#8217;
        LIBRUBY_ALIASES=&#8217;lib$(RUBY_SO_NAME).$(MAJOR).$(MINOR).dylib lib$(RUBY_SO_NAME).dylib&#8217;
        ;;
@@ -1430,7 +1430,7 @@
        CFLAGS=&#8221;$CFLAGS -pipe -no-precomp -fno-common&#8221;
        ;;
     darwin*)
-       CFLAGS=&#8221;$CFLAGS -pipe -fno-common&#8221;
+       CFLAGS=&#8221;$CFLAGS -fno-common&#8221;
        ;;
     os2-emx)
        CFLAGS=&#8221;$CFLAGS -DOS2 -Zmts&#8221;

Edit mkconfig.rb and comment-out the line near the end that begins, Fileutils.touch

Edit ext/Setup and uncomment the first line. Also uncomment socket, digest, digest/md5, etc, fcntl, stringio, syck and zlib.

$ autoconf
$ CC=arm-apple-darwin-cc CPP=llvm-cpp ./configure &#8211;host=arm-apple-darwin  &#8211;disable-ipv6 &#8211;prefix=/tmp

Edit ext/getaddrinfo.c and comment out “gai_strerror” from about line 207. Edit ext/socket/addrinfo.h and comment out the code related to “extern char *gair_strerror”

$ make
$ make install

Hopefully it all compiles for you and you can scp ruby from /tmp/lib and /tmp/bin into your iphone. I just can’t get it to work. Basically, the symbols from those ext/ libraries aren’t available during the final part of “make”.

Simple Presenters

Courtenay : August 23rd, 2007

The Presenter pattern, as my limited monkey-brain can see it, is a way of encapsulating a bunch of logic and display-related code for a database record.

If you want to be truly confused, go check out what the venerable Martin Fowler (who once ignored me in an elevator) has to say about it: Supervising Presenter and Passive View. As usual with Java people, it's horribly complex.

In Rails, this is the way our requests currently work:

CMV

The request comes in, hits the controller, we load up some data from the model, it gets pushed to the view, and then we use a combination of helpers and lots of conditional and other stuff that looks like PHP to malleate it until it looks good.

Unfortunately for my simple chimpanzee neurons, this all feels wrong. Here's some sample 'wrong' code I just paraphrased from a live app:

<% if @cart_item.variation.nil? or @cart_item.variation.product.deleted? or ... %>
  <img src="/images/active.png" /> This cart is no longer active.
<% end %>

OK, this one is easy to refactor. You just add a method to CartItem model, like

def is_active?
  variation.nil? or variation.product.deleted? ...
end

Or, you could write a helper method:

module CartItemHelper
   def show_active(cart_item)
     image_tag("active") + " This cart is no longer active" if cart_item.is_active?
   end
end

Ugh. Capital Ugh. First off, the is_active? method is nice enough, but it's all view logic! This method is being used to control the logic in the presentation layer. What is it doing in the model? I think of the model as entirely database related.

This is where the "presenter" comes in. If you've used the Liquid template/layout system, you'll be familiar with Drops. Basically, a presenter contains any ruby code related to displaying fields or logic. I'll let the code do the talking:

class CartItemPresenter < Caboose::Presenter

   def name
     h(@source.product_name)
   end

   def product_link
     if item.variation.nil?
       name
     else
       link_to name, fashion_product_path(item.product)
     end
   end

  def is_active?
    @source.variation && @source.variation.product
  end

  def inactive_button
    return if is_active?
    image_tag('active') + " This product is not active."
  end

end

Pretty straightforward. In the controller, you finish with @cart_item = CartItemPresenter.new(@cart_item, self)

Here's what Presenter looks like.

class Presenter
  include ActionView::Helpers::TagHelper # link_to
  include ActionView::Helpers::UrlHelper # url_for
  include ActionController::UrlWriter # named routes
  attr_accessor :controller, :source # so we can be lazy

  def initialize(source, controller)
    @source = source
    @controller = controller
  end

  alias :html_escape :h
  def html_escape(s) # I couldn't figure a better way to do this
    s.to_s.gsub(/&/, "&amp;").gsub(/\"/, "\&quot;").gsub(/>/, "&gt;").gsub(/</, "&lt;") #>
  end  

end

So, your view no longer accesses the database directly; everything goes through the presenter. Your views now will contain very little logic; in fact, they may start to feel a little more like ASP.Net.

Here's how the new stack looks:

CPV

And the view:

<%= @cart_item.product_link %>
<%= @cart_item.inactive_button %>

What do you think? Is this layer of abstraction necessary, or do you prefer keeping things together in the model (database!) and presenter (view!) ?

P.s. I often write "todo" titles for articles and save them as drafts, so I can come back later. Seems like one slipped out.

Taking a vanilla rails application from one box and up is a fun process. The exact path you’ll take depends on the nature of your data, and the ratio of database reads to writes. I’m going to cover some of the more common use cases. If you don’t want to get your hands dirty and it’s kind of an emergency, look at stage zero, then skip to the end where I tell you who you can just pay to fix it.

The path you’ll take also depends on how much money you have to play with, and how quickly your site is growing. For example, if you’re sitting on a mountain of cash, and the facebook users are coming in like lemmings, then you can just throw hardware at it. However, if things are tight, and it’s a nice linear growth curve, then you can play around with caching.

Let’s assume you have a slice or VPS (if you’re on shared hosting, the first step should be to get a dedicated box or at least a xen instance).

Stage zero: fix any ‘duh’ errors

Make sure you’re on a database that can handle the load. This doesn’t include sqlite. I’m going to suggest MySQL in this article, because it’s where I have the most experience.

Make sure you’re not serving up static files through mongrel. This will happen if you are proxying everything through the webserver.

Upgrade your webserver to something like nginx. Alternatively, you might use pound as a load balancer, pointing dynamic requests at mongrels, and static requests at lighttpd. (Interested? I can write an article on this. Let me know.)

Move off that $20/month shared box and get your own server. You can lease a phat server in a data center on 100mbit pipe for $100/month. If you want to colo, I recommend Corporate Colo in Los Angeles.

Move any slow actions into a dedicated process. For example, you have some code that takes 4 seconds to update a bunch of tables? You probably want to fire an asynchronous event to a BackgrounDRB process that handles this exclusively.

Move your uploads to a dedicated merb cluster – it’s like a cut-down rails with less magic and more speed.

Stage one: clean up your database

Take a look at your logs – are you performing over 10 database calls per request? You need to fix this. Are you performing over 90? You’re a dumbass. (yes, even I am guilty of this).

Generally you can reduce the number of requests by denormalizing; for example, you have a list of users and a count of how many comments they’ve made.

<% @users.each do |user| %>
  <%=h user.name %> (<%= user.comments.count %>)
<% end %>

You’re performing a “COUNT” for every single user, every time the page loads. Yuk! This is a “read-optimizable” situation, since there are many READS for each WRITE (comments don’t get created that often).

Add a counter_cache to the comments belongs_to :user association and change this to

<%=h user.name %> (<%= user.comments_count %>)

You can do this in other situations where you’re chaining through associations.

Your task is <%=h @list_item.task.name %>.

This call has to find the list_item task, and then grab the name. You can fix this by either adding :include to the ListItem.find, or, you can denormalize the task name.

Include isn’t always an option, and can be slow. Nothing’s faster than denormalizing. Add a “taskname” to the listitem model.

Your task is <%=h @list_item.task_name %>.

Then make sure to update that field if the task gets updated.

class Task < ActiveRecord::Base
  has_many :list_items
  after_save :update_list_item_names

  def update_list_item_names
    list_items.each { |li| li.update_attribute(:name, self.name) }
  end
end

Yes, there are faster ways of doing this, and yes, I should probably wrap that in a transaction. But you get the point. (note to self: if rails had dirty-field checking, this would be much better)

Stage two: cache the hell out of it

Next thing you want to do: caching. If you haven't already, install memcached, use the cache-fu plugin, and start saving the results of long-running or frequent queries into the cache. Set the TTL (timeout) at about 5 minutes; that way you won't need to write any expiry code (it's lazy, but you're busy!) You'll immediately notice a drop in load. If you have time, write some cache-expiring observers and up the TTL to 15 minutes or even an hour.

Eventually you want to have memcached sitting between your application and the database. Most of your database calls' results will be stored for at least 5 minutes, and maybe forever, in memcached.

If you can, add some action caching. Action cache is like page caching, but it runs any filters you may have. Action caching isn't always easy, particularly if you have "current_user" dependent code in your views. I have a solution for this which I'll be releasing soon, but in the mean time, you may not be able to action cache. Any action-cached pages will be vastly beneficial to your load, and combining memcached with action caches means that you can virtually eliminate any database slowness and is almost as good as the page cache.

If you can action cache, then you can probably page cache. A page-cached site will get you about 3,000 requests per second, thereabouts, and a simple GET request won't even hit your application; you're serving raw html through the webserver. You will soon start thinking of rails as an HTML generator, rather than an app server.

However, all these caching measures won't hide a basic problem: you are performing lots of database queries, and it's harshing your mellow.

Stage three: move the database to another server

This should be fairly painless. Get yourself a fat database server. By fat I mean, super-fast disks, plenty of RAM, and the fastest networking you can afford.

Set it up so that it's only accessible from your main box, which will now be known as the app server. Point your database.yml at the IP of the database server.

Now your app server has much less load, so you can increase the number of mongrels. Add some more RAM to the app server box, too, if you can.

How many mongrels?

Here's a simple formula to follow.

A. Take the (average or median) request time, in seconds.  Say, 0.250 seconds (250ms)
B.  How many requests do you want to handle at peak?  (e.g. 10,000 a minute, 166 a second)
C.  Multiply A x B :  0.250 * 166 = 41.5

So you need about 40 mongrels to handle the load. At about 60MB per listener, that's 2.4GB of RAM, plus a bit of room for leakiness and swapping. Ezra at Engine Yard suggests "about 10 dogs per CPU core", which means that if we have a 4-core opteron box with 3GB of RAM, then this is possible on one box.

Your mileage will vary, which means, if the box is lagging, remove a few mongrels.

Stage four: add more servers as necessary

Here's where it gets interesting. Which of your servers has the most load?

If it's the app server, then setup another box as an exact copy. Now you have app1 and app2. You will need to load balance between app1 and app2. You can do this with a hardware load balancer, or you can use pound on app1 to balance to listeners on app2. (You'll have a single point of failure on app1 if you do it this way)

If the db server is the most heavily loaded box, things start getting interesting: you'll either need some kind of replication, or you'll need to shard (partition) your data.

Replication vs Sharding

Take a look at the data in your application. If it were a person, would it be "extroverted" or "introverted"? That is, could you split the data into many sections (no friends, introverted), or is it all cross-linked (lots of friends)?

For example, you are hosting subversion repositories. You can easily send half the records to one database and half to the other. Or, you host thousands of social networks, each with about 50 users (collectivex, I’m looking at you.)

In this case, one database box would handle all users with names A–L, and another box from M–Z.

If you have a social networking site where anyone can be friends with anyone else, you’re going to have difficulty partitioning the data. (Astute readers will instantly think about denormalizing to make this still possible).

If you have one shared table (users) but the rest of the data can be sharded, then you will want some bastard stepchild method.

Replication: Master-Slave

So, replication (MySQL only from here on). Let’s say you have a few writes (inserts, updates) and a lot of SELECTs. Most people are just viewing things, not updating records. This is fun and easy.

You set up one database box as the “master”. This box will behave as normal. You can read and write data as before.

You then set up as many “slave” boxes as you like. These boxes will be read-only, but because you have a large amount of reads, then much of the load can be pushed out to these slaves. You’ll need to hack at your rails app to direct simple reads at a slave DB. Luckily, someone’s already done the work and called it acts_as_readonlyable.

The problem here is that the slaves will always be lagging, depending on load. Under light load, they may only be 100ms behind. Under heavy load, you can’t be guaranteed of any sort of synchronization. In this case, you’ll want to use memcached heavily. Here’s some cache-fu code.

class Category < ActiveRecord::Base
  acts_as_cached

  def after_save
    Category.set_cache(id, self)
  end
end

When you save the category, it pushes the record (self) into memcached. That means, with a long TTL, you'll never need to do a simple “find” on category from the database, and replication lag won’t matter.

Finally, you’ll want to load-balance to the database servers, an exercise left to the reader.

      / write==[db1] master
[app1]
     \                /==[db2] slave
      \ read==LB==[db3] slave
                       \==[db4] slave

Replication: Simple Master-Master

In this situation, you have two sets of stacks. Each stack has an app box and a database box. They are almost identical; the app server is wired to one database server. There is no crossover. ASCII-tabulous diagram:

   /[app1]====[db1]  master+slave
LB                        |    replicate
   [app2]====[db2]  master+slave

Both databases in this case are masters. That means, both act as masters, but both act as slaves as well. You can even set up the boxes so that if one goes down, the other fails over and takes on both IP addresses.

Because the setup is asynchronous, you need to assign each database a separate set of autoincrement keys. DB1 will increment values like 11, 21, 31, 41, 51, and DB2 ids will increment like 12, 22, 32, 42, 52. You set these with auto_increment_increment and auto_increment_offset.

Take care! If you have a UNIQUE index on other, non-auto-increment fields, you need to make sure that the same database will be used for CREATEs. You’ll need some algorithm, such as checking the final character or number of the unique field. You’ll also need some way of redirecting writes to a specific database, as well as dealing with load balancers. You may find MySQL Proxy useful here – you can use Lua to control the load-balancing at the db layer.

Master:Master replication doesn’t really scale past 10 boxes, because the databases will be so busy updating that they won’t be able to serve requests. However, 99% of rails applications won’t get to this stage.

And remember – there will be some replication lag between the boxes, so your code will need to be tolerant of this issue.

Stage Six: More boxes!

At this stage, you should have most of your data stored in memcached, and it’s time to get yourself a dedicated memcached box with gigabit networking and a metric crapton of RAM.

Your data should be nicely segmented (sharded, or partitioned) into separate databases.

Stage Seven: You’re Going To Need Help

If you’ve roughly followed all of the above steps, and your site is still lagging, you either didn’t follow the instructions, or you’re beyond this and need to bring in some experts. Replicating and sharding should cover most people’s scaling needs, such that you just keep adding stacks of app+db and expanding the memcached cluster.

You can hire skilled rails consultants to clean up your code (there are plenty of #caboosers with the requisite experience), or you can use a hosting service like Engine Yard (staffed almost exclusively with caboosers) where they will have your app running on a cluster pretty much like I’ve described above, only bigger and faster. It’s going to cost you, but you get what you pay for. Hell, they even deploy your app for you.

Developers, if you’re not an avid reader of the MySQL Performance Blog, go subscribe now. They are also for hire, and I hear they’re most excellent. Pricey, but worth every damn penny.

Got any more tricks? Hook me up in the comments.

Upgrading rspec to 0.9.x

Courtenay : May 8th, 2007

Once again the team has introducted a breaking change between versions. I'm holding off migrating up from 0.8.x until others solve all the issues that will arise. To be honest, it's the one thing that kept me from rspec in the past, and despite now using it in all my projects, I really hate that they keep changing the API. Hate hate. I may just wait til 1.0, when they change it all again.

Ruy Asan has hit and solved a few issues and gotchas in his own apps, so if you're feeling the pain of rspec 0.9, check out his migration pains post.