So far, it\’s RoR

Ruby on Rails, PostgreSQL, SVN, etc….

Archive for August, 2007

Making (and verifying) the migration to PostgreSQL in Rails

Posted by Chirag Patel on August 20, 2007

Install

After installing PostgreSQL on Windows, I converted my Rails application from MySQL to PostgreSQL with the following steps:

  1. Create a new database in PostgreSQL
  2. Installed postgres-pr gem (Ruby driver) because the postgres gem (native driver) didn’t work. Will figure out why later on.
  3. Changed the appropriate Rails config stuff in database.yml
    development:
      adapter: postgresql
      database: database_name
      username: postgres
      password: password
      host: localhost
  4. I ran only my first Rails migration to test that it worked using this command: rake db:migrate VERSION=1. My first migration (001_create_heartrates.rb) creates a table in the database so it will be easy to test.

Robby explains a quick way to migrate the data as well

Test

You can test that the migration worked via 2 methods (GUI or command line)

  1. Using following commands with the psql command line utility (Start Menu -> PostgreSQL -> psql to postgres):
    psql commmand Description
    \? help
    \c database_name connect to a database (many other commands won’t work until you execute this)
    \dt show tables (execute \c first)
    \d table_name describes a particular table (e.g. table_name)
  2. Using the pgAdmin III GUI:
    • The database table is deeply nested in the tree, I missed it at first. Go to PostgreSQL Database Server 8.2 -> Databases -> database_name -> Schemas -> public -> Tables -> table_name

Posted in PostgreSQL, Ruby on Rails | Leave a Comment »

Settings for creating a new database in PostgreSQL

Posted by Chirag Patel on August 20, 2007

I wanted to quickly create a new database via pgAdmin III, so I used the following properties

Note: initdb (which ran during step 6 of the installation process) creates two initial databases, namely postgres and template1. Also, the cluster created by initdb also comes with two templates (template0 and template1). template0 is a backup for template1.

Setting Value Description
Owner: postgres I assume postgres refers to the database superuser, not the Windows login.
Encoding: UTF8 Because a PostgreSQL dialog box suggested it would be most flexible in case other clients decided to access the data (instead of SQL_ASCII)
Template: template1 schema extensions and objects are usually loaded into template1
Tablespace: pg_default The pg_default tablespace is the default tablespace of the template1 and template0 databases (and, therefore, will be the default tablespace for other databases as well, unless overridden by a TABLESPACE clause in CREATE DATABASE).Tablespaces in PostgreSQL allow database administrators to define locations in the file system where the files representing database objects can be stored. Two tablespaces are automatically created by initdb .

  1. The pg_global tablespace is used for shared system catalogs.
  2. The pg_default (described above)

One of the things that the tablespace defines is the file path for the data as shown with this command: CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';

Posted in PostgreSQL | 3 Comments »

Database scaling terminology

Posted by Chirag Patel on August 16, 2007

Some quick explanations on designing scalable databases (thanks to Jon Sime of the PostgreSQL mailing list and Media Matters for America)

Horizontal vs Vertical scaling

Horizontal scaling is merely a general concept which, in its simplest form, means “adding more nodes” as opposed to vertical scaling which (very roughly) means making an individual node as powerful as possible. The two are not mutually exclusive (you could add more nodes, and make each one as powerful as possible — if you’ve got the money), and the concept is not limited to databases.

Horizontal scaling: 3 types of “clustering”

  1. Partitioning (most likely pretty relevant when talking about terabytes of data) is a feature which stores the actual data of a logical table into any number of physical tables (within the same database), each of which store an explicitly defined subset of the data.
    • Note: Partitioning is referred to as clustering in O’Relly’s Building Scalable Web Sites. The book also refers to clustering and federation as types of partitioning. In other words, the terms partitioning and clustering are interchanged.
  2. Federation typically refers to a situation where you have multiple datastores (which can be different databases on one machine, entirely different servers, various mixtures of storage formats, etc.) which each store and are responsible for subsets of your data, with a middleware layer on top of them that mediates access and updates to the underlying and disparate repositories. A certain software package also uses the term to refer to individual non-local tables stored on a single remote server which are presented as if they were local tables.
  3. Replication generally indicates a setup where data is managed across a set of two or more nodes, each node either storing copies of all the data locally, or at least able to access the non-local portions transparently from another source when necessary. Each node in the cluster should be able to present a consistent version of the entire dataset to connected clients (i.e. if you connect to two different nodes and issue the same query at the same time to each, you would get the same data back from both). The issue of propagating all changes to the data from one node to all others in a fail-safe and consistent way makes replication generally very complicated. Specifically in the context of an index, replication can also mean ordering the data in a table based on the values of an indexed column.
    • Note: Replication was originally referred to as clustering by Jon Sime

If anyone has an disagreement (or agreement) of these term’s please leave a comment. I would love to hear your feedback!

Posted in Database | 1 Comment »

Installing PostgreSQL on Windows Vista

Posted by Chirag Patel on August 15, 2007

Just a few notes to be very careful when installing PostgreSQL 8.2.4 on Windows Vista:

Because PostgreSQL needs to be installed as a Windows service, it needs to create an account that will be granted Logon On as a Service permission. When I first installed it, I allowed the installation wizard to use a default account name (postgres) and it generated a password for me. The password was extremely long (25 random chars) and difficult to remember, and for that matter, I could not copy and paste their auto-generated password from the the dialog box that came up. It installed successfully, but when I tried to change the password with the GUI (pgAdmin III), it was unsuccessful. So, I uninstalled PostgreSQL and re-installed it with my own Windows login (instead of the default postgres account) and that’s when things went terribly wrong. After that point, the PostgreSQL would not allow me to reinstall (Error message: “another installation in process”) and on top of that Windows would not allow me to delete the files manually because of permissions rights on the C:\Program Files\PostgreSQL directory. I finally had to do a System Restore and got it working.

To summarize:

  1. Do not use a member of the local administrators group as your Windows login for the service (such as your own personal Windows login) in step 5 of the installation guide!! Using their default postgres is the easiest. But use your own password instead of an autogenerated one. The FAQ (topic 2.3) explains why a non-administrator account is needed for security reasons. By the way, you can start and stop the PostgreSQL service within the Services control panel and also from the PostgreSQL group under the Start Menu.
  2. I would go ahead and allow it to create a database cluster in step 6 of the installation guide. I used the HTTP port number they suggested (5432) and allowed only localhost to access the database. In this step, the account name is the superuser (i.e. root access) of the database cluster.
    Here’s PostgreSQL’s explanation of a database cluster:

    Before you can do anything, you must initialize a database storage area on disk. We call this a database cluster. (SQL uses the term catalog cluster.) A database cluster is a collection of databases that is managed by a single instance of a running database server. After initialization, a database cluster will contain a database named postgres, which is meant as a default database for use by utilities, users and third party applications. The database server itself does not require the postgres database to exist, but many external utility programs assume it exists.

  3. After I successfully installed PostgreSQL, I used these settings to create a new database

Last, see the FAQ for the installation process for further explanations.

Posted in PostgreSQL, Windows | 1 Comment »

Google web history and Google reader

Posted by Chirag Patel on August 15, 2007

In case you’re not using it already, Google’s web history is a free and handy way to find useful links that you forgot to bookmark. I use it quite a bit when when I google search bugs and installation problems that I need to look up later. It can keep track of (1) all of your Google web searches and links you visited from those searches, and optionally (2) all of your web history (not just Google searches)

Click this thumbnail for a screenshot:

web-history.jpg

It includes a calendar, that you might help you remember what day you need to be looking at.

Also, I prefer Google Reader over Bloglines mainly because of how quick it is to star RSS feeds and tag RSS feeds by keywords. Everything loads quicker in general. Also, keep an eye on fav.or.it, which will allow you to comment the RSS feeds

Please post comments if you know of better alternatives to both tools..

Posted in Other Tools | Leave a Comment »

Timestamp support for Rails and MySQL/PostgreSQL/Oracle

Posted by Chirag Patel on August 13, 2007

Troel has written a great summary of the SQL standard for the timestamp data type along with its implementation in the big 5 databases (MySQL, PostgreSQL, Microsoft SQL Server, Oracle, and DB2).

MySQL

It turns out that MySQL does not support sub-second time or date formats. There is a MICROSECOND function in MySQL allowed to run on temporal values, but sub-second precision (including millisecond) is not stored in the database.

A microseconds part is allowable in temporal values in some contexts, such as in literal values, and in the arguments to or return values from some temporal functions. Microseconds are specified as a trailing .uuuuuu part in the value. …. However, microseconds cannot be stored into a column of any temporal data type. Any microseconds part is discarded.

There is also a May 26, 2007 post on Kevin Burton’s blog in which a MySQL developer appears to confirm that MySQL will not store milliseconds (which obviously means microseconds can’t be stored either). It’s odd considering that MySQL’s DATETIME data type is 8 bytes and milliseconds can’t be stored. Interestingly, PostgreSQL’s timestamp datatype is also 8 bytes and can support microsecond resolution.

Here’s the mapping of the Rails data types (the symbols used in Rails migrations) to MySQL data types. This mapping can be found in mysql_adapter.rb in Rail’s ActiveRecord:

      def native_database_types #:nodoc:
        {
          :primary_key => "int(11) DEFAULT NULL auto_increment PRIMARY KEY",
          :string      => { :name => "varchar", :limit => 255 },
          :text        => { :name => "text" },
          :integer     => { :name => "int", :limit => 11 },
          :float       => { :name => "float" },
          :decimal     => { :name => "decimal" },
          :datetime    => { :name => "datetime" },
          :timestamp   => { :name => "datetime" },
          :time        => { :name => "time" },
          :date        => { :name => "date" },
          :binary      => { :name => "blob" },
          :boolean     => { :name => "tinyint", :limit => 1 }
        }

      end

A potential solution to the MySQL sub-second limitation is to use force Rails to use the bigint type in MySQL (along with creating your own custom timestamp format to fit in the bigint type). This is an alternative to modifying mysql_adapter.rb in Rail’s ActiveRecord, which could pose problems later on. There is a plug-in MySQL Bigint Rails Plugin that supports multiple integer types. Not only is bigint supported, it also allows tinyint and mediumint. After installing, it magically transforms the :limit parameter so that it specifies the number of bytes for the column stored instead of specifying the maximum value expected for that column. For example, the following statement will be mapped to MySQL’s mediumint instead of a regular int.

    t.column "col2", :integer, :limit => 3, :null=> false

Run 'DESCRIBE table;' in the MySQL command line to verify that it worked.

PostgreSQL

PostgreSQL, on the other hand, has 1 microsecond resolution on all of its time and timestamp types. Here’s the mapping in of Rails data types to PostgreSQL data types (line 92 of postgresql_adapter.rb in Rail’s ActiveRecord)

      def native_database_types
        {
          :primary_key => "serial primary key",
          :string      => { :name => "character varying", :limit => 255 },
          :text        => { :name => "text" },
          :integer     => { :name => "integer" },
          :float       => { :name => "float" },
          :decimal     => { :name => "decimal" },
          :datetime    => { :name => "timestamp" },
          :timestamp   => { :name => "timestamp" },
          :time        => { :name => "time" },
          :date        => { :name => "date" },
          :binary      => { :name => "bytea" },
          :boolean     => { :name => "boolean" }
        }
      end


In PostgreSQL, the timestamp data type is referred to as the "timestamp without time zone". It will use this by default, unless you explicitly specify the timestamp with time zone. In either case, the timestamp is accepted and queried in string format: (e.g. '2003-07-29 13:19:30.532').

Also, PostgreSQL’s timestamp data type accepts a parameter p indicating the the number of fractional digits retained in the seconds field. To my understanding, p does not have any influence on the total bytes stored, which is always 8. Internally, it’s an 8-byte int or an 8-byte float depending on whether or not you compiled with enable-integer-datetimes. The default is 8-byte float.

If server was compiled with enable-integer-datetimes, PostgreSQL stores the timestamp as the total number of microseconds from midnight 2000-01-01 (in an int8). If not, it’s stored (in a float8) and the precision degrades as the date moves away from midnight 2000-01-01.

To make it all work, I added a :timestamp_with_time_zone data type by overriding PostgreSQL adapter class found in postgresql_adapter.rb

To override this class, include the preceding in a file such as lib\postgre_extensions.rb and add “require ‘postgre_extensions’” to the top of each migration file. Or you can add require ‘postgre_extensions’ to the bottom of config\environment.rb

class ActiveRecord::ConnectionAdapters::PostgreSQLAdapter
  def native_database_types
    {
      :primary_key => "serial primary key",
      :string      => { :name => "character varying", :limit => 255 },
      :text        => { :name => "text" },
      :integer     => { :name => "integer" },
      :float       => { :name => "float" },
      :decimal     => { :name => "decimal" },
      :datetime    => { :name => "timestamp" },
      :timestamp   => { :name => "timestamp" },
      :timestamp_with_time_zone => { :name => "timestamp with time zone" },
      :time        => { :name => "time" },
      :date        => { :name => "date" },
      :binary      => { :name => "bytea" },
      :boolean     => { :name => "boolean" }
    }
  end
end
  

While I was at it, I tried to create a smallint type in PostgreSQL with the following line of Migrations code and the MySQL Bigint Rails Plugin (described above):

t.column :heartrate, :integer, :limit => 1, :null=> false

The author claims that it should work in other databases outside of MySQL and it did for this example! In this particular case, a smallint (2 bytes) was used instead of an integer (4 bytes). The tinyint data type (1 byte) is not available in PostgreSQL like it is in MySQL unfortunately.

Oracle

Oracle, too, appears to supports millisecond precision, but the Rails mapping goes to DATE instead of TIMESTAMP as Paul Greer pointed out in ActiveRecord’s oracle.rb:

        def native_database_types #:nodoc:
          {
            :primary_key => "NUMBER(38) NOT NULL PRIMARY KEY",
            :string      => { :name => "VARCHAR2", :limit => 255 },
            :text        => { :name => "CLOB" },
            :integer     => { :name => "NUMBER", :limit => 38 },
            :float       => { :name => "NUMBER" },
            :decimal     => { :name => "DECIMAL" },
            :datetime    => { :name => "DATE" },
            :timestamp   => { :name => "DATE" },
            :time        => { :name => "DATE" },
            :date        => { :name => "DATE" },
            :binary      => { :name => "BLOB" },
            :boolean     => { :name => "NUMBER", :limit => 1 }
          }
        end

Timestamp SQL queries

Use single quotes '' for timestamp! He are are some examples

  1. SELECT AVG(value) FROM numerics WHERE timestamp > '2007-08-23 19:48:09-04';
  2. SELECT AVG(value) FROM numerics WHERE timestamp BETWEEN '2007-08-23 19:48:09-04' AND '2007-08-23 19:48:09-04';
  3. SELECT AVG(value) FROM numerics WHERE timestamp > '2007-08-23 19:48:09-04'::timestamp;
  4. EXPLAIN SELECT AVG(value) FROM numerics WHERE timestamp > '2007-08-23 19:48:09-04';
    Traditional practice for EXPLAIN is to measure the costs in units of disk page fetches;

P. S. It’s a bad idea to title a column ‘timestamp’, because this is a datatype-identifier – but it works.

Useful Rails timestamp-related code

# First and last records from a model ordered by a specified column (e.g. timestamp)
ending = Users.find(:first, :order => 'timestamp DESC')
beginning = Users.find(:first, :order => 'timestamp ASC')

#resolve Ruby variable name (e.g. start_time) in string using the #{...} string interpolation mechanism
average = User.average(:age, :conditions => "timestamp > #{start_time} AND timestamp < #{end_time}")

#converts string to time object
Time.parse(self)

 

Posted in Database, PostgreSQL, Ruby on Rails | Leave a Comment »

Rants about Rails database migrations

Posted by Chirag Patel on August 12, 2007

Migrations as version control for databases

When I first learned about Rails migration, I immediately thought that dropping down a revision would be something you would do often. In actuality, Rails migrations is more like a version control for databases. Since databases changes require modifications to both code and data, you cannot simply use a source code version control system like Subversion or Sourcesafe. You need a more advanced revisioning system for databases… hence Rails Migrations.

Because of my inexperience with databases, I mistakenly thought that the purpose of migrations was to allow the ability to go up and down migrations at will.. But the purpose is more to restore an older version of the database like you would restore an older version of source code with Subversion. I would think that when you migrate down, all of your business logic would probably migrate down at the same time. What are thoughts on this? Any comments?

In a nutshell, Rails migrations are really just a database agnostic way of running SQL scripts in an organized, standardized way. Migrations can obviously be used outside of web development.

Relationship between model and migration

In Rails, a model internally maps itself to a database table. The table in the database must be the plural form of the model’s class. If we generate a model called User, Rails automatically looks for a table called users in the database.

You can use the Rails generator to generate both the model and a corresponding migration as shown here (You can also generate a migration by itself – for example, add_column to an existing model)

ruby script/generate model User #run Rails model generator

The model and migration are both generated:

app/models/user.rb #model

db/migrate/001_create_users.rb #migration that creates database table

And the generator also generates some test stuff:

test/unit/user_test.rb #unit test for the model

test/fixtures/users.yml #sample data for the model

The migration is generated so that it will use the plural form of the model for the database table name. The important thing to note is that after the migration is generated, the class name of the model (User) and the class name for the migration (CreateUsers) are not linked in anyway. The only link is at runtime and that is only between the model and the database table name. For example, when you do a query with the User model (User.find), then ActiveRecord will automatically query on the users table in the MySQL database (or whichever database you’re using). The migration is only used to set up the database and this is never done during runtime.

Model (app/models/user.rb)

  class User < ActiveRecord::Base


Migration (db/migrate/001_create_users.rb)

  class CreateUsers < ActiveRecord::Migration
    def self.up
      create_table :users do |t|
        t.column :user_id, :int
      end
    end

    def self.down
      drop_table :users
    end
  end

The cool part about ActiveRecord’s ORM is that the model never needs to maintain the column names. It learns them during runtime, as seen by this empty User class.

Migrations on a per table basis

One drawback of Rails migrations is that all migrations occur at the database level, not the table level. If you want to restore the database for only one table, you’ll have to do something special, which I haven’t figured out just yet.

Renaming the model when the model is referenced inside the migration

If you rename a model and the model is referenced inside a migration, it might break your migrations. When would you ever reference the model inside the migration? When you want to make changes to the actual data in the database, it’s easier to code with the model than raw SQL. In this case, you can actually define the model in the migration itself to prevent it from breaking if you rename the model. This is described in Rails Recipes by Chad Fowler

Advanced features and options of Rails migrations:

  1. Mike Perham’s writeup of the usage :limit, :null, add_index, and foreign key support
  2. Some advanced migration features (:force, :temporary, :opitons, execute native SQL)

Basics of Rails migrations:

  1. Basics of Rails migrations
  2. Ruby on Rails Migrations Explained
  3. Damon ClinkScales’ very illustrative presentation on Rails Migrations

Posted in Database, Ruby on Rails | 1 Comment »

Visualize your Rails database schema for free on Windows

Posted by Chirag Patel on August 9, 2007

Using Railroad and Graphviz

Blog update (10/22/2007): Thanks to the comment posted by Blog de Kilgore, I tried Railroad (instead of using the below method with starUML) to visualize my RoR app. Very easy to setup and use (see examples on web page).

Here’s a quick command to create a .png graphic file of your models:

    railroad -a -i -M | dot -Tpng > models.png

 

This command will directly write to a .png file without the need of Graphviz

  1. Database Models
  2. Mailer models
  3. Observer models
  4. Associations (with lines and arrows)
  5. Inheritance among classes (for regular models, mailers, and observers)

 

 

Using starUML

Rails keeps your schema is a file called db/schema.rb (which I assume is updated every time you run a Rails migration). To visualize your schema in Windows:

  1. Install and run the following plugin to convert your schema (db/schema.rb) to an XML format:

    This will generate schema.xml (uses a open standard fromat called XMI).

  2. Download the latest version of starUML for Windows (Version 5.0, released Dec 29, 2005). StarUML is a “software modeling tool and also platform that is a compelling replacement of commercial UML tools such as Rational Rose
    • Open schema.xml from starUML’s Import -> XMI (I chose “Design Model” when importing, but it doesn’t really matter)
    • starUML
    • Go to the Model Explorer (under Design Model”) and drag each table to the design surface

Thanks to Miroslav Škultéty for writing the script and cnruby for converting into a plugin!

Posted in Database, Ruby on Rails, Windows | 1 Comment »

Basics of layouts and nested layouts in Rails

Posted by Chirag Patel on August 2, 2007

Layout basics

Here’s Duane Johnson’s concise explanation of how layouts generally work in Rails. To summarize this blog article, layouts occur at 2 levels (applciation-wide or controller-wide) by default:

  • The entire Rails application (all views of all controllers) will use this layout:

views/layouts/application.rhtml

  • All views within a single controller will use this layout. For example, the layout for weclome_controller.rb will use this layout. Notice, the ‘_controller’ is left off for the layout:

views/layouts/welcome.rhtml

  • Use this code if you want to use a different layout than one of the two default layouts described above. This code will go in the action (index, in this case) corresponding to the view (index.rhtml)

def index

render :layout => ‘alternative_layout’
end

When using Rails layouts, the final page is rendered by using <%= yield %> in the layout. With <%= yield %>, the view is placed inside the layout during runtime. For example, the application-wide layout application.rhtml could contain the following:

<title>Layout Example</title>
  </head>
  <body>
    <%= yield %>
  </body>
</html>

The view (e.g. index.rhtml) for weclome_controller.rb could look something like this:

<b> Hello World </b>

This would render the following final .html page when you enter http://localhost:3000/welcome in your browser:

<title>Layout Example</title>
  </head>
  <body>
    <b> Hello World </b>
</body>
</html

Nested Layouts

To create nested layouts (master layout > sublayout > view), I found the following 2 options

  1. Using partials and qualifying methods as described in Matt McCray’s blog. Matt’s got an example Rails app that you can download and try. Also here’s a more visual layout of the code that he explains:
    • Matt McCray’s Nested Layouts
  2. Using the nested-layout plugin

I prefer the first method because its pretty simple and uses inherit Rails methods instead of a plugin that could end up being more maintenance and possibly inflexible later on.

Last, here’s a really good write-up for web designers transitioning to Rails (includes basics about MVC, rhtml, ERB, etc)

Posted in Ruby on Rails | 5 Comments »