Ezra Zygmuntowicz | 1 Sep 20:05

Re: [ANN] New site built with rails.

On Aug 31, 2005, at 9:00 AM, Doug Alcorn wrote:

> Ezra Zygmuntowicz <ezra@...> writes:
>
>
>>     Without further ado... heres a link.. <http://yakimaherald.com>.
>>
>
> I'd really like to see some project stats like LOC, Test LOC, and
> elapsed time to implement.  I know some companies like to treat stuff
> like that as proprietary competitive advantage, but I'd like to hear
> some "lessons learned" from such a large-ish project for an existing
> large-ish company.  Is the entire site Rails?  Does it switch over to
> some existing infrastructure at some point?  How many model classes
> did you end up with?
>
> This looks like a well done site.  Some type of write-up on your
> experiences would be a valuable asset to the community.
> --  
> doug@...
> _______________________________________________
> Rails mailing list
> Rails@...
> http://lists.rubyonrails.org/mailman/listinfo/rails

     I would be happy to share my experience developing this site.  
Here's a decent write up for now. I am going on vacation today and  
will be back next week. When I get back I will put a full detailed  
write up on my blog. Just so people just reading this thread I am  
talking about this new rails site <http://yakimaherald.com> that I  
just launched.
     If anyone has any question they want to ask or want to look at  
any config files please feel free to contact me on or off list. I  
have good config files for lightttpd/fcgi and apache1.3 and apache2  
fastcgi configs as well as ruby and rails setup on Linux and OSX.
     First off, if you want to see the site as it used to be for the  
last 3 or 4 years then you can see it at <http:// 
legacy.yakimaherald.com>. The old site was written in the worst  
spaghetti style php with no comments and most variable names like $x  
and $var. It was also very unstable and temperamental and  
consistently brought our network down at least once a month. I  
inherited the site when I took a job here a year ago. So it was in  
sore need of a rebuild. I started using RoR and ruby last November. I  
had been a php developer for 4 years before then. By mid-January I  
knew that I didn't want to use php on any new projects. Since I am  
the sole developer here at the newspaper right now I was told "If  
ruby will make you happier and more productive then so be it. As long  
as you write decent documentation on whatever you build then use  
whatever makes you more productive." This was great and I started  
using ruby and rails for all new development. At the paper here we  
also do web design and application building for other local  
businesses. I built a few smaller apps with rails and built my  
confidence with the framework. So needless to say I unsubscribed from  
all the php lists I was on and have been reading ruby-talk and the  
rails list religiously for the last 6 months.

-------Development---------
     We got the approval and started rebuilding the yakimaherald.com  
site on May 1st 05. So it has been almost 4 months from start to  
finish building this app. When I say we I mean myself, the sole  
developer and my designer who made the views. But during those 4  
months I still had to do the daily maintenence and upkeep of the  
papers website and advertising plus we built 2 or 3 smaller sites  
with RoR during these 4 months. So if I had worked on nothing else  
except the new site <http://yakimaherald.com> I estimeate it would  
have taken me about 2.5-3 months to develop with just myself and one  
designer.
         The final app in the state it is in today weighs in at 1479  
LOC/models/controllers  and 867 LOC/tests. I have 8 controllers, 12  
models, 9 layouts and 69 view templates. The system is very heavy on  
content. There are 4 main data sources for the app:

         1. A local postgres 7.x db for cms functionality and static  
page contents. This database holds the info that reporters and  
photographers input through the admin interface. And it also holds  
the new banner management system I wrote in ruby. Config is pretty  
much vanilla postgres and it performs great for my situation. I used  
the C postgres bindings.
         2. A BaseView database that is a proprietary db that many of  
the worlds newspapers run for their newsroom database that holds all  
the content that gets printed in the paper. This db is not SQL. It  
has a proprietary scripting/templating language called LiveIQ. My  
rails model that handles this db is a custom ruby lib that I wrote.  
It creates a little DSL for querying the BasviewDB. I converts my  
ruby DSL into the LiveIQ scripting language on the fly so I can think  
in ruby. All the local Yakima and central washington content comes  
form this DB. This model accounts for 307 LOC out of my total app  
because of its complexity. I may be able to make this component open  
source because it could definitely benefit any other newspapers that  
use Baseview that are thinking about ruby and rails.
         3. Custom xml feeds from the AP news wire. This content  
comes from the AP newswire subscription our paper has for the print  
version. It contains thousands of news items from around the world  
that get constantly updated throughout the day. These feeds are a  
little rough and require a fair bit of text processing before they  
are ready to go live on the web. The feed come across the wire as a  
Base64 encoded xml file. After unpacking it I have to scan for the  
relevant feeds we use out of the 2 or more thousand that are  
available. So my app processes and regenerates the online content  
every 1.5 hours unless we manually make it sooner.
         4. The Seattle Times own the Yakima Herald. So we get some  
of our content from them.We don't have a whole lot of content form  
this source yet but we will be using more soon as we just got the go- 
ahead to use their RSS feeds.

     So this app is very data and content heavy. When the index page  
gets regenerated after a cache flush it is pulling local postgres  
data, Baseview DB data from a server on the local LAN, Custom xml  
feeds from the AP wire and a few headlines feeds from the Seattle  
Times. This still is relatively fast. It takes about 500 milliseconds  
which is very good for everything it is doing to create the page  
including the network latency. But this only happens every 1.5 hours  
on one hit, the rest of the time it is cached .html files in the  
public/ dir these get served _fast_ by lighty.  But it can serve up  
to 200 requests/sec with only 5 fcgi's on  no network latency dynamic  
pages. So for me Rails __CAN__ scale for largish web apps with a  
largish amount of users.

-------Deployment-----------
     The new app runs on a brand new dual 2.5Ghz G5 Xserve running  
Tiger server with 1 gig of ram and 480Gb scsi RAID. We just got this  
in 10 days ago and I configured it myself. I am running Lighttpd  
1.3.16/fcgi and it is running great. I initially tried to run on  
apache2/fcgi but in testing I got too many random 500 internal server  
errors. Lighttpd has proven itself to me over the last few months in  
production on some smaller sites and I think it is pretty much ready  
for prime time. We are getting around 40,000+ hits a day and thanks  
to judicious caches_page and fragment caching the server is barely  
breaking a sweat. Here is a paste of the relevant section of top  
running on the Xserve right now:

   PID COMMAND      %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD   
RSIZE  VSIZE
26609 ruby         0.8% 12:46.00   2    16   132  22.9M  2.00M   
24.2M  50.4M
26606 ruby         0.4% 12:11.00   2    16   129  22.9M  2.00M   
24.2M  50.4M
26605 ruby         0.4% 14:43.20   2    16   137  23.0M  2.00M   
24.1M  50.5M
26604 ruby         6.7% 17:06.96   2    16   133  23.2M+ 2.00M  24.5M 
+ 50.6M+
26603 ruby        12.8% 18:19.70   2    16   129  23.0M+ 2.00M  24.4M 
+ 50.4M+
26602 lighttpd     2.4%  4:58.44   1    10    39  4.25M   704K   
4.59M  27.5M

     I have 5 dispatch.fcgi's running (the ruby processes above) and  
they fluctuate from below 1% to around 16%cpu  when they are working  
on a complex page rebuild after a cache is swept. But for the most  
part they just hover around 1-3%. And light is awesome it's never  
gone above 9% cpu yet and it mainly stays around 3%!  And these  
percentages go to 200% since there are dual procs. So for the most  
part I am using about 16% of all my processing power on this box for  
my rails app at any given time.
     I have a few launchd scripts(Tigers new xml version of cron)   
running for maintenence tasks. I have launchd launch an instance of  
the awesome ruby daemon daedalus at boot time. This daemon checks to  
make sure that lighttpd is running every 3 minutes and if it is not  
it relaunches a new instance of lighttpd/fcgi. It also wipes out the  
ruby_sess files in /tmp every 6 hours. I end up with around 8-9,000  
of these session file in 6 hours and my app runs much better when  
these are not allowed to build up.
      Daedalus also bashes my cached pages every 1.5 hours. I have  
many data sources that wont work with cache_sweeper because they come  
from remote computers. So this script erases the pertinent files in  
public so the cache can rebuild with the new content from all remote  
locations. We also have an intranet page where people from the  
newsroom can go and run a script to clean the cache whenever they add  
new content they want to be picked up.

     I also have a lot of "glue" code written in ruby to do various  
text processing and ftp'ing and other things. The classified ads are  
processed  to format them for online display. I have a bunch of admin  
tools written in ruby as well.

--------Wrap Up-----------

     All in all I am _very_happy with my experience with Rails as  
well as ruby. Rails is a super productive  environment for me to  
develop web apps in. But I have really fell in love with ruby  
_itself_. Ruby is so elegant and the syntax allows for me to open up  
code from 7-8 months ago and at an instance see exactly what it does.  
So it is much more maintainable than the PERL and shell scripts that  
I have replaced. I think that anyone considering rails and ruby for a  
decent size project should not be concerned with how does RoR scale.  
It scales great. The shared nothing architecture works great. If I  
need more power eventually I can just fire up another linux box and  
run fcgi's on there. Rinse, repeat..
     I am available for some consulting work if anyone wants any help  
in designing or implementing a ruby on rails infrastructure you can  
get in touch with the info in my signature.
     But also feel free to contact me at no charge if anyone  
interested in any of my config files or have deployment or other  
questions please feel free to ask.

     If anyone is interested in reading more and hasn't fallen asleep  
yet  this far into this rambling post, will have a more detailed  
article about the development of the http://yakimaherald.com website  
on my new blog next week( I will announce the address when its finished)

Cheers...
I hope I can help some people with any questions you might have as I  
have greatly benefitted from the very knowledgeable people of the  
ruby and rails community.

-Ezra Zygmuntowicz
WebMaster
Yakima Herald-Republic Newspaper
ezra@...
509-910-0773

Gmane