new plugin: acts_as_git

Courtenay : November 14th, 2008

With the help of Jamie van Dyke at Parfait and Scott Chacon at GitHub, I'm pleased to announce Acts As Git (no, I don't like the name either). It's a simple plugin which stores all changes you make to a text field in a git repository. This is ideal for something like a git-backed wiki.

Look at it here: github or check it out from

git://github.com/courtenay/acts_like_git.git

From the README:

ALG automagically saves the history of a given text or string field. It sits over the top of an ActiveRecord model; after a value is committed to the database, the plugin writes the new value to a text file and commits it to a git repository. This way you get all the advantages of using Git as version-control.

Usage:

class Post < ActiveRecord::Base
  versioning(:title) do |version|
    version.repository = '/home/git/repositories/postal.git'
    version.message = lambda { |post| "Committed by #{post.author.name}" }
  end
end

To view the complete list of changes:

>> @post = Post.find 15
<Post:15>
>> @post.title
=> 'Freddy'
>> @post.history(:title)
=> ['Joe', 'Frank', 'Freddy]
>> @post.log
=> ['bfec2f69e270d2d02de4e8c7a4eb2bd0f132bdbb', '643deb45c12982dde75ba71657792a2dbdda83e6', 
'1ce6c7368219db7698f4acc3417e656510b4138d']
>> @post.revert_to '1ce6c7368219db7698f4acc3417e656510b4138d'
>> @post.title
=> 'Joe'

It uses the excellent Grit library, and doesn't actually have a checked-out repository. The latest version of your data is still stored in the database. You can actually clone this repo and view the changes; pushing back to it won't do anything useful.

Plugin configuration style?

Courtenay : November 10th, 2008

I’m putting the final touches on a super-sweet versioning plugin, and I’ve discovered that we’re using several different metaphors for configuring the plugin options. I’d like to get some opinions/feedback on your preferred style.

The DSL

Using a DSL and passing blocks in which get instance evalled. I’m normally very scathing of DSLs; I think that they’re Yet Another Language for people to learn to use – it’s usually your very own write-only syntax – but it’s been super-fun implementing the backend to this.

class Monkey < ActiveRecord::Base
  versioning do
    author do
      name { user.current.name }
      message { "Commited via #{name}" }
    end
    repository "Joe's DataStore" 
  end

Hashes

This seems to be the Rails plugin default:

class Monkey < ActiveRecord::Base
  versioning :author => { :name => lambda{ |u| user.current.name } }, :repository => "Joe's DataStore" 
end

Class vars / methods

Easy to monkeypatch later

class Monkey < ActiveRecord::Base
   will_version
   @@version_repository = "Joe's DataStory" 
   def version_author
     current_name
   end
end

Are there others? Which do you prefer? Currently I’m using all three in this one plugin, and it’s very un-awesome.

Ripping out your mocks

Courtenay : November 6th, 2008

I sat down with David Chelimsky at Rubyconf today to talk about rSpec and an interesting topic came up.

In my mind, there are two reasons to use a mock object: first, when you’re developing TDD style, you physically don’t have the objects yet; and second, so that you can tightly focus your unit tests. Maybe, these two different purposes should use a different mechanism.

His question to me then was, “Do you replace your mocks with the real objects after you’ve implemented those objects?”. I guess I hadn’t thought about that before. Do you? If so, how do you handle the extra complexity, maintaining sane associations and valid data?

On hiring Rubyists and Railsers

Courtenay : November 4th, 2008

We’re launching a new service at work in the next week or so that involves me looking through a lot of job applications: resumes and sample code.

I’d like to tell people right now, upfront, if you’re applying for a Ruby or Rails job, for anyone, there are a few ways of ensuring you get called back. They’re probably fairly simple.

Send some sample code, maybe a link to a project on Github, or a snippet of work you’ve done. Make sure you send the tests for the code. Any tests would be good, and you get bonus points for good tests. If you don’t have any tests, write them.

Don’t worry too much about sending some crazy complex code. Maybe some polymorphic associations (models), some ajax (views), a knowledge of the whole stack (simple controllers), some nested resources. Write a simple todo list application.

It’s not just a silly philosophy. Writing tests – hell, submitting tests with your job application’s code – shows that you’ve actually thought about the code, and that it actually works. You’ve permutated and permeated through the logic, actually think about the various ramifications of the design decisions in the code itself.

Just the pure act of sending tests with your sample code will put you above 90% of applicants, I promise.

We've stopped using rSpec ...

Courtenay : November 3rd, 2008

...for new projects.

fail

We upgraded the gems for one of our client projects, and the auto-loading / config.gems managed to completely break all our other projects, requiring upgrades, which caused weird breakages in weird places in some of the specs.

The app would refuse to deploy (rake tmp:create failed, because lib/tasks/rspec.rake was being loaded, and spec wasn't installed on the server). The annoying thing was that just having whatever.11 installed (I don't know the exact version) broke older apps on whatever.4 or whatever.0.2. .. so those had to be upgraded too. We wasted a day or two (three, maybe four developers) which equates to several thousand dollars in wasteage. It was also really infuriating -- the culmination of a few years of frustration of rSpec's weirdnesses.

After that, I found that some of the specs had never run (who knows why). It stopped reading spec.opts and started doing some weirdness with pending options. Finally, Rick just snapped, threw out rSpec and his Model Stubbing library, and now we're playing with a combination of rr, context, and matchy, trying to get a feel for a decent workflow again. It's sad and maybe a bit exciting to be on the edge.

What are you testing with?

A simple Rails slow-query logger

Courtenay : September 29th, 2008

A few years ago I wrote a simple addition to ActiveRecord that does two things: it chops out the eager loading "t1_t2 AS foo", and it shows the number of records returned for every query you run against the database. You can view the file here

Today I was profiling a site and wanted to quickly find the slow database queries, but didn't have access to mysql's config directly, so I patched that file above to record all queries over 500ms and save it to a log file. I'll warn you now, it ain't pretty, but it works pretty well.

Here's how it works: First, throw this in a file in config/initializers. I open up the rails abstract adapter

module ActiveRecord
  module ConnectionAdapters
    class AbstractAdapter

And add in a new logger.

        def slow_query; 0.5; end # number of seconds
        def slow_query_logger
          @slow_query_logger ||= Logger.new("log/slow_queries.log")
        end

Ideally of course this would all be configurable.

Next, I copy the logging code out of the latest ActiveRecord, and patch it to return the number of records. This is a bit of a hack, too, but we can either look at "num_rows" from the resultset or the actual size of an array.

              s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
                 (result.respond_to?(:size) ? result.size : 0)) || 0

Finally, I rewrite the actual log method so that it checks the benchmark against our threshold

        def log_info(sql, name, runtime, result_size = 0)
          if runtime > slow_query && slow_query_logger
            slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
          end

And add the number of results to the regular rails log, while snipping out the annoying eager-loading code.

          if @logger && @logger.debug?
            if name =~ /Load Including Associations$/
              sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
            end

            name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
            @logger.debug format_log_entry(name, sql.squeeze(' '))
          end
        end

Here's the full file.

module ActiveRecord
  module ConnectionAdapters # :nodoc:
    class AbstractAdapter
      protected
        # todo: config this
        def slow_query; 0.5; end
        def slow_query_logger
          @slow_query_logger ||= Logger.new("log/slow_queries.log")
        end

        alias_method :old_log, :log

        def log(sql, name, &block)
          if block_given?
            #if @logger and @logger.level <= Logger::INFO
              result = nil
              seconds = Benchmark.realtime { result = yield }
              @runtime += seconds
              s = result && (result.respond_to?(:num_rows) ? result.num_rows : \
                 (result.respond_to?(:size) ? result.size : 0)) || 0 
              log_info(sql, name, seconds, s)
              return result
            #end
          else
            log_info(sql, name, 0, 0)
            nil
          end
          # old_log(sql, name) { yield }
        rescue Exception => e
          @last_verification = 0
          message = "#{e.class.name}: #{e.message}: #{sql}"
          log_info(message, name, 0)
          raise ActiveRecord::StatementInvalid, message
        end

        alias_method :old_log_info, :log_info
        def log_info(sql, name, runtime, result_size = 0)
          if runtime > slow_query && slow_query_logger
            slow_query_logger.debug "Slow query: (#{runtime}) [#{result_size}] #{sql}"
          end
          if @logger && @logger.debug?
            if name =~ /Load Including Associations$/
              sql = sql.scan(/SELECT /).to_s + ' ...<snip>... ' + sql.scan(/(FROM .*)$/).to_s
            end

            name = "#{name.nil? ? "SQL" : name} (#{sprintf("%f", runtime)}) [#{result_size.to_i}]"
            @logger.debug format_log_entry(name, sql.squeeze(' '))
          end
        end
      end
    end
  end

Would this work as a plugin? As a patch to Rails itself? Or did somebody else already implement a cross-platform slow query logger?

The awesomest filter and sort ever

Courtenay : August 26th, 2008

Update 2: seems like only one or two people knew about what can_search does :) I hope we’re all a little better educated.

Update: yes, I’m using these named scopes throughout the app in other places – they aren’t used only in this one controller.

Often you have an index action where you want to sort records, filter by a parameter, and maybe join on some other tables to get a result. Let’s say you’re looking at a videos controller (where videos are acts_as_taggable) and you want to filter by user_id, filter by tag name, order by video title, or rating. Maybe later, you’ll add a roles (hm:t) association and need to only show videos viewable by a certain user. How complex!

To solve this, we’re going to play with some things you may know, and finish up with a bam! pow! that’ll take your breath away.

Rather than build up some form of frankenquery with all sorts of conditionals and cases, joins, and other messing about, let’s use a brand-new bleeding edge feature of Rails: named scopes.

First, build up individual named scopes for each axis on which you wish to filter. Make sure and put the table name in that query.



    named_scope :by_user, lambda { |user_id| 
      { :conditions => ['videos.user_id = ?', user_id] }
    }

    named_scope :tag_name, lambda { |tag_name|
      { :joins => { :taggable => :tag },
      { :conditions => ['tags.name = ?', tag] }
    }

    named_scope :rating, lambda { |rating| 
      { :conditions => ['ratings_count > ?', rating] }
    }

OK, I cheated on the last one, but let’s assume you have a counter_cache on ratings count.

Now, if you have more than one scope with joins in it, you’ll need to apply this patch to your rails installation, or upgrade past 2.1.1. This will allow you to have as many joins as you like in your scopes.

Now, here’s where the magic happens: in the controller. Big shout out to protocool for this method. Let’s build up a set of all the possible scopes that we might want to use, in an array form like [ named_scope, argument ]

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]
end

Easy, right? Very readable.

How about some ordering?

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

Now, as you know, you can chain named scopes. So you could say Video.by_user(2).tag_name('monkeys') Let's take advantage of this, building up a chain of scopes dynamically using 'inject', starting from Video, and adding each scope we added to the array above. This is really fun magic, because it doesn't run any of the queries until the whole thing is built. I don't even know how this works, but it does. Swimmingly.

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order)

The final method looks like this:

def index
  scopes = []
  scopes << [ :by_user, params[:user_id] ] if params[:user_id]
  scopes << [ :tag_name, params[:tag_name] ] if params[:tag_name]
  scopes << [ :rating, params[:rating] ] if params[:rating]

  order = { 'name' : 'videos.name ASC' }[params[:order]] || 'videos.id DESC'

  @videos = scopes.inject(Video) {|m,v| m.scopes[v[0]].call(m, v[1]) }.paginate(:all, :order => order, :page => params[:page])
end

One final caveat. Sometimes :joins doesn’t know where to get the video id from, so if you’re using id in your app, you’ll need a slight workaround involving manually getting the pagination count, and forcing :select => &#8216;distinct videos.*&#8217; in the paginate call.

If this works for you, it’s really easy to add new filtering, ordering, or even scoping to your query. For example, you can add some form of role hackery to your video


    named_scope :viewable_by, lambda { |user| 
      { :joins => { :permissions => :roles },
        :conditions => [ "roles.user_id = ? AND permissions.role = ?", user.id, "view"
    }

Controller, you replace the first scope definition with this

scopes = [ :viewable_by, current_user ]

Or, you modify the scope inject statement


    @videos = scopes.inject(Video.viewable_by(current_user)) { |m,v| ... }

If you consider this a giant hack, you’re probably at least partly right. However, the alternative in building up a complex query with many possible moving parts is just hideous. And consider this: you can unit test each part of the query on its own, in the model specs.

Sanitize your users' HTML input

Courtenay : August 25th, 2008

The default Rails sanitize helper is actually quite powerful. You can see some of its usage here:

<%= sanitize @article.body, :tags => %w(table tr td), :attributes => %w(id class style) %>

However, as the docs say,

Please note that sanitizing user-provided text does not 
guarantee that the resulting markup is valid.

We were having an issue with users providing bad markup and leaving their tags unclosed.

This is <a href="http://foo.com">my dog<a/> and he&#8217;s super cool!

We solved it by running Hpricot over their input.

before_save :clean_html
def clean_html
  self.body = Hpricot(body).to_html
end

For performance reasons, you should probably run the hpricot and sanitize methods on the way into the database, rather than rendering it in the views, because it’s somewhat slow, and is a calculation that you only need to perform once.

In fact, instead of saving it in a callback, you could overload the accessor like so:

def body=(new_body)
  write_attribute :body, Hpricot(new_body).to_html
end

You’ll want to include the ActionView methods from ActionView::Helpers::SanitizeHelper to get ‘sanitize’ available in your model.

data migration tip

Courtenay : August 20th, 2008

I’m tracking all the failures that occur in a model, so the users can easily track and resolve them. The data looks something like this:


 Failures table

  id | video_id | description
 ----+----------+------------------------
  1  | 5        | Transcoding error
  2  | 23       | Bad file type

 Videos table

  id | name     | creator_id
 ----+----------+------------------------
  5  | Kitten   | 23
  6  | Monkey   | 12
  23 | Elephant | 23    

 

If we want to search for all failures by creator, we have to do a join on Failures and Video. To make this a little faster, I will denormalize the data a little, by adding a creator_id to failures table, and a callback to the Failure model to set the creator_id field. This is one of the scaling tradeoffs you need to make: slower writes, slower updates, larger table disk size, faster reads and counts (with grouping).

class Failure < ActiveRecord::Base
  before_update :denormalize_creator  
  def denormalize_creator
    self.creator_id = video && video.creator_id
  end
end

This might have some issues depending on if you’re using #build to generate your Failure object. Regardless..

The temptation (for me, anyways) is to create a migration that looks something like this:

class AddCreatorIdToFailure  ActiveRecord::Migration
  def self.up
    add_column :failures, :creator_id, :integer
    Failure.each do |fail|
      fail.update_attribute :creator_id, fail.video.creator_id
    end
  end
  def self.down
    remove_column :failures, :creator_id
  end
end

There are a few things bad with this method.

1. You’re loading all failure objects into memory, then performing a query on each one. 2. If you have thousands of failures, it’s going to take some time to run. If it gets stopped partway through, you’ll have to comment out that “add_column” line to get it to re-run.

So. Step one, move the update to its own migration. Then, you can re-run it as often as you like.

Step two, make the migration a bit smarter. You can do this either by rewriting it in SQL, or by using something like paginated_each (jfgi).

When you do that, it’s worth throwing some conditions and an include in there. For example,

Failure.paginated_each(:order => "id desc", :conditions => "creator_id IS NULL", :include => :video) do |fail|
  fail.update_attribute :creator_id, fail.video.creator_id
end

You can run this migration as many times as you like (it will only query the records it hasn’t updated). Ultimately, though, unless you’re doing polymorphic associations (which makes the join nigh on impossible), it’s going to be 10 – 100x faster (wild guess) doing the update in raw sql. Any takers on the best SQL for this situation?

Since I don't have anything of value to post,

Courtenay : August 15th, 2008

Here’s a video of my cat.

Authenticate like SSO with ActiveResource

Courtenay : July 18th, 2008

When you have multiple Rails applications that don’t share a common database and you want to share the user authentication information – or rather, use one app to provide authentication for another – there are a few options. Here’s how I solved it recently. This is the simplest way I could think of to get this working. I couldn’t find a plugin to do this, so here’s the result of my pdi.

Effectively what we’re doing is separating the user’s data - their profile info, if you like - from the credentials, and moving the latter to ActiveResource. This is something you should do in your own apps. Too frequently we stuff a bunch of data (like full name, phone number) into the user model, because it’s there. A more advanced version of this code might use the ‘profile’ as the resource name, updating the local profile with data from remote, and keeping User as a pure credential model.

Let’s assume we have App A which will act as the authenticator master. Our other application, App B, will still hold a User record, but we’ll override the authenticate method to use ActiveResource. We’ll also store some other fields like username and email, and will grab those each time the user logs in. That way, they can set an auth token in App A and they can login from cookies in app B (provided the cookie domain is shared).


class User < ActiveRecord::Base

  class Auth < ActiveResource::Base
    self.site = "http://app-a.com"
    self.format = :json
    self.element_name = 'user' # this is the name of the resource in your app
  end

  def self.authenticate(login, password)
    Auth.user = login
    Auth.password = password

    # Authenticating against the app will actually 'prove' the login/pass details.
    # We also want the user's details so we can cache them here.
    authed = Auth.find :first, :params => { :login => login }
    return false unless authed

    # Now, pull the data from remote and store it locally.
    user = User.find_or_initialize_by_login(login)
    user.attributes = authed.attributes
    user.save!
    user.activate!
    user

  rescue ActiveResource::ClientError # 406 error -- bad username/password.
    false
  end

Interestingly enough, find first actually runs the ‘index’ action, and returns the first record. sigh

Now, in your App A: users_controller, you want to set up a filter in the index like so:


  def index
    if params[:login]
      # for single-sign-on.
      @users = User.find(:all, :conditions => { :login => params[:login] })
    else
      @users = User.paginate(:all, :page => params[:page]) #... 
    end

    respond_to do |format|
      format.html
      format.json { render :json => @users }
    end
  end

Do you have a better way of doing this?

All quiet on the Western Front

Courtenay : June 30th, 2008

It’s been a while since I blogged here. Mainly, I think it’s because as I get deep into the daily grind of building other people’s social apps, I no longer feel like any of the code I’m writing is worthy of a post. This is not to say that I don’t love my work, just that maybe the techniques we’re using aren’t that special. (Maybe I’m wrong. Lots of people at Rails Conf came up to me and said they love reading this blog.)

So, readers, what content do you want to see on the Caboose blog moving forward?

If you want a discounted room, contact me today, preferably on irc.

Your name needs to be on the list, otherwise you won’t get in.

Also, you’ll need to prove your worth, by submitting a documentation patch to Rails core. Do that, then sign up here: http://register.caboose.org.

Moving everything to github

Courtenay : May 3rd, 2008

This is a quick note to inform you that any plugins or code hosted on *.caboo.se will be moving to GitHub very soon. If I’m hosting your project, you have about a week to move your code repository, if you haven’t already.