Chado Hardware Requirements

A forum for discussing Power User related features of FlyBase such as using Chado, GFF, FASTA files, etc...

Chado Hardware Requirements

Postby randalls » Mon Dec 15, 2008 5:01 pm

I have been looking at the Chado database schema for implementation in our environment. From what I can tell, to query data from a database using the Chado schema, requires a lot of joins. Joins can be fairly expensive for a large database. We are planning on storing information related to three different genomes using Chado.

My question is what kind of hardware does a popular Model Organism Database like FlyBase use for serving web content and database server? We are thinking about acquiring a new database server and storage array for this transition. We anticipate that a large number of people will use this database and it will have a drupal based web front end. But what I am unsure about is how much overhead does Chado add due to its design.

Thanks,

Randall Svancara
Systems Admin
User avatar
randalls
 
Posts: 3
Joined: Mon Dec 15, 2008 4:32 pm
Location: WSU

Re: Chado Hardware Requirements

Postby Josh Goodman » Tue Dec 16, 2008 11:15 am

Hi Randall,

Chado does require a lot of joins as shown on our field mapping tables on the GMOD wiki.

Before I get into specifics of hardware let me give you some background information about FlyBase that influences how we operate and actually use Chado. FlyBase is made up of 3 sites, one at Cambridge University in the UK, another at Harvard University, and lastly one at Indiana University (IU). The first two sites are primarily tasked with curation and data management while IU is primarily responsible for the website and other public services of FlyBase. The data flow starts with curators at Cambridge and Harvard inputing data into the master Chado database at Harvard. Once every ~5 weeks Harvard freezes the database and sends a dump to IU from which we produce each release. Thus, the database servers at Harvard are geared for both reading and writing whereas IU is strictly a read only environment.

Another point I'd like to make is that while Chado is very good at storing and managing genome data one of its weaknesses can be query performance. This is a problem that is a general relational database issue rather than a strictly Chado one. The way we've gotten around this is by creating a denormalized search database and by pregenerating all the HTML reports for the website. This gives us the performance and scalability that our website requires. Thus, our Chado interaction is limited to a one time dump of all the data we need (in ChadoXML format using XORT) and then working from our highly optimized sources after that. This type of setup will obviously not work as well if you have a situation where you want live editing to be immediately reflected on the web site and search database.

Having said all that here is what I can tell you about the hardware requirements for Chado at IU.

Disk requirements
The current release of FlyBase (FB2008_10) with 12 Drosophila genomes takes up ~40 GB of disk space once it is imported and indexed in PostgreSQL. I also generally figure another 10-20 GB of required disk space for temporary indices during loading and vacuuming. The recommendation here is to get the fastest and largest capacity you can given your budget. If you are looking at a system with 6 or less disks I would opt for a RAID 1+0 or 0+1 setup over RAID 5 for performance reasons.

Memory
The more memory you can dedicate to PostgreSQL the better. Our servers typically have 4-6 GB of memory on a machine that does nothing but serve Chado and they only handle one query at a time. In order to use that memory we tweak the work_mem setting so that queries don't result in lots of hits to the disk. Keep in mind that work_mem is a per query parameter so if you want to put drupal on top of Chado you will need to lower that to a level you think is reasonable given your expected query load.

CPU
Get as many single or multi core CPUs as you can afford. Most of our servers are older dual CPU systems in the 2.5-3 Ghz range and they can handle our existing load without any issues.

The Harvard group will be posting their hardware setup in a separate post.

Let us know if you have any other questions.

Josh

p.s.-I'd highly suggest coming to the Jan 2009 GMOD meeting if you are just getting started with Chado. I and a few other FlyBase folks will be there.
Josh Goodman
Site Admin
 
Posts: 64
Joined: Mon Nov 26, 2007 2:39 pm

Re: Chado Hardware Requirements

Postby randalls » Tue Dec 16, 2008 11:45 am

Josh,

Thanks for all the valuable information. I really appreciate it.

I will me at the GMOD meeting. See you there.

Randall
User avatar
randalls
 
Posts: 3
Joined: Mon Dec 15, 2008 4:32 pm
Location: WSU

Re: Chado Hardware Requirements

Postby Paul Leyland » Tue Dec 16, 2008 12:35 pm

Here in FB Cambridge we have a different usage pattern again, so a brief description of our hardware may be informative.

Our usage is exclusively read-only access to Chado but we need to query the same database instance(s) as reside at Harvard so Josh's denormalized version isn't appropriate to our needs. However, Josh's advice to use as many cpu cores and as much RAM as makes economic sense is very relevant.

Our current server has two quad-core Xeons ("model name : Intel(R) Xeon(R) CPU E5345 @ 2.33GHz" according to /proc/cpuinfo) and 8GB RAM. The system is on its own disk and the data on a RAID-5 set of six 250GB SATA disks, giving about 700GB usable storage.

A couple of days ago some new hardware arrived. It's still in the packing case, but this one will have two quad-core 3GHz Xeon, eight 1TB SAS disks in a RAID-5 array and 32GB RAM.

Both these machines are dedicated Chado servers.

Paul
Hanging on in quiet desperation is the English way.
The time is gone, the song is over.
Thought I'd something more to say.
User avatar
Paul Leyland
 
Posts: 1
Joined: Wed Nov 28, 2007 4:42 am

Re: Chado Hardware Requirements

Postby randalls » Tue Dec 16, 2008 2:54 pm

Paul,

Thank you for your information as well. I will let you guys know what we decide to do.

Thanks,

Randall
User avatar
randalls
 
Posts: 3
Joined: Mon Dec 15, 2008 4:32 pm
Location: WSU


Return to Power Users