Friday, January 29, 2010

Bioinformatics for Biologist: Installing NGS software



I remember I have difficulty install Velvet, the first Next Generation Sequencing software I use. As a biologist, I just don't understand how to install despite reading the manual many times. Once I did, NGS software installation becomes simple. Occasionally I run into problems. Maq and MapView are among the hardest softwares to install. Here, I'm gonna share how to install a NGS software.

Installation
The NGS softwares are compressed in a tar.gz file. Once downloaded, right click and select "Extract here". A folder with the same name will appear. It's your choice to install in which directory. Preferably in a hard disk (not Desktop or Documents) as the file generated will take a lot of space.

To install, ALWAYS read the manual or README.txt for instructions. Instructions differ for each software. Some softwares like Samtools are ready to run once unzipped. The two basic commands to install are:
> make
> make install

Here's an example, I just downloaded BWA 0.5.5 from http://sourceforge.net/projects/bio-bwa/files/. After unzipping it, I found no instruction to install. I thought it's ready to use. But command "./bwa" doesn't work. So I tried "make install" and "make". Only the latter works. During installation, there will be a long list in the Terminal to check if the required files/libraries/systems are present.

Once the installation completed, I type "./bwa". A help file with a list of options appeared. In bwa-0.5.5 folder, you will notice many files are added during installation. Mission accomplished.


Troubleshooting
When you fail to install it, the Terminal will show "ERROR" or "Stop". If you wanna know what went wrong, you need to check the lines that Terminal produces during Installation. Maybe you didn't install a library or some GNU environment required. Once missing components are identified, go to System > Administration > Synaptic package manager to install them. Try installation again.

Tutorial
Let's try installing maq-0.7.1. I have problem installing this version eventhough I was certain I have all the components installed. I even tried to install an older version 0.6.8 but it didn't work. Took me awhile to figure out what's missing.

For installation instruction, read http://maq.sourceforge.net/maq-man.shtml#install. To download, http://sourceforge.net/projects/maq/files/maq/

Why don't you try it on your own first. Just download the zip file, unpack it and simply type on Terminal:
./configure; make; make install
Alternatively, you can also type 3 separate commands:
>./configure
>make
>make install

If you manage to install it, congratulations! If you don't, welcome to the club!

OK. Let's cut the chase and tell you what I did to fix the problem. I realized a library known as "zlib1g-dev" is missing. After installing that library, I type:
>./configure
>make
>sudo make install
Then, type in administrator password and Wala!

Since I already install the library needed, I can't show you an example of failed installation. Oh this is the worst tutorial ever!

Read more...

Tuesday, January 26, 2010

Bioinformatics for biologist: Using terminal



One of the first things in Bioinformatics is learning how to use Operating Systems (OS) other than Windows such as Linux and Mac. I'm currently dual-boot 64-bit Ubuntu with Windows Vista. What's the different between 64-bit and normal 32-bit? 64-bit OS enable more than 4 GB RAM usage. But don't bother buying 64-bit Windows because most NGS tools are built and tested on 64 bit Linux OS. My advice is to get a free 64-bit Linux distro which is suitable for you.

And, most tools run on command line instead of Graphic User Interface(GUI). In fact, many bioinformatians think it's a stupid idea to create a program in GUI. Who needs GUI? Biologist does.

The command line is run using a shell. The shell in Ubuntu is called Terminal. To open Terminal, click on Applications menu -> Accessories -> Terminal.


Let's look at some of the basic operations I use daily:

cd
To change directory. Directory can be refered to as Folder in Windows term.
cd ..
Back to previous directory
cd /
Back to root directory

ls
list the file/directory in the current directory.
ls *.pl
List all the file/directory ending with ".pl". For example, fastq2fasta.pl will be listed.
ls fastq*
List all the file/directory starting with "fastq".
It's easy to find a file if you cannot remember part of the file name.


rm
Remove a file or directory

wc -l filename
Count the number of lines in a file. Useful to find out the number of sequences in a FASTA file. # sequences = # lines divided by 2

split -l 20000 filename
Create smaller files consisting of 20000 lines each. Type "split --help" for more split options.

more filename
Read the content of the file from beginning. This is very useful when the file size is very big (>100MB). Opening it using a text editor will crash the system. Press ENTER to read more lines.

top
Display information about system, running processes and RAM usage. To exit, press 'q'.

sudo
Root user or administrator. Any command starting with "sudo" requires administrator password.
sudo rm -rf ~/.local/share/Trash/files/*
Administrative privilege used to empty trash.

When typing a long command on terminal, you would prefer to paste that command there. Note that Terminal is different. To copy something from Terminal, hightlight and press shift+ctrl+c. To paste onto Terminal, press shift+ctrl+v. Alternatively, you can right-click your mouse and choose the copy/paste option.

When a command or application finish running, you will see the directory address with "~$" sign again. To stop any running application, press shift+ctrl+z.

Find out more on https://help.ubuntu.com/community/UsingTheTerminal

Read more...

Thursday, January 21, 2010

It's 2010! I'm back!




Sunset at Tian'anmen

It's been two weeks since I came back. Haven't check my blog until now. I'm sure somebody will be curious where I went for the past four months. I was traveling. I spent almost three months doing an attachment in China. Mainly bioinformatics analysis.

Hmmm... what should I write in the first post of 2010?! A notification that I'm back ofcourse. :p Considering how busy I am lately, I don't think I have much time to update my blog regularly. It's a pity. Oh well, I guess I will just make a quick post everytime I think of something.

Read more...

  © Free Blogger Templates Spain by Ourblogtemplates.com 2008

Back to TOP