Friday, February 5, 2010

Programming: friend or foe



Here I go again. Trying to write Python scripts (more like modifying scripts :p). Finally, it's done. Took me 2 scripts and 2 hours. I can imagine a programmer spending only 5 min doing all that with 1 script. Sigh. Programming is my enemy!

A few months ago, I believed Python can be mastered in two months. I know it's silly. Now I'm more realistic about my goals. I would like to master simpler language like 'awk' instead of Python.

Here's a number of programming languages I want to learn:
- awk
- Python
- R statistic

A little explanation

Awk is a programming language to manipulate text-based file. The general rule of the file Awk can handle is it must be separated by lines and columns. I like awk because I can use one-line command. Other languages like Perl and Python requires the scripts to be saved in a file before running the script on terminal.

Python
People said that Python is the easiest programming language for bioinformatics. Some people disagree, they prefer Perl. Others said the norm for bioinformatian is to know at least two languages: C/C++/Java and Perl/Python.

Python scripts are 'readable' and easier to understand. It's object-oriented (Nope, I don't really understand what that means). Biopython is a version of Python for computational molecular biology. Users can import modules or functions stored in Biopython library. That saves a lot of work. However, you must master Python before using Biopython.

R statistics
R is a free programming language developed from S. Most people use R to construct graphical output of their data. Those days when we use Microsoft Word and Excel to plot graphs is soooOOOooo over!

After I look at the long manual, I give up. But lately I discovered ggplot2 which is a package in R. It's quick and easy. *evil laugh.

Programming is fun when you can find the scripts and tutorials easily online. Somebody have done it before. A little trial and error will give you the results you want. Like I always said, the things that we need to learn we learn by doing them.

2 comments:

BadboyZ February 25, 2010 at 10:16 AM  

If I were you I would stick to figuring out how to use python. Nowadays there are heaps of resources to make use of that helps with your code refactoring.

Sure I use awk a lot myself, but never to do anything serious and it's best used in conjunction with a good knowledge of `sed`. Lastly you could accomplish anything awk+sed with a perl one-liner :)

Some people do perl, others do python. I do both however I am more fond of python but I am more skilled in Perl. If that made any sense you will realize that Perl is not super structured as python. Python gets you thinking about programming more logically and when it comes to advancing your programming Object Oriented skills then you're better off with python.

About your mention of R. R is great and the availability of Bioconductor makes it super Awesome for encapsulating those really complicated stats functions. You are underutilizing R if you think that plotting is it's main strength. Personally I would massage my data into tab/comma separated and then just plot it with excel if the no. of data points does not exceed 100,000. My other options are python's matplotlib,gnuplot. Ooooh, if you really like plotting stuff take a look at the Orange framework.

The golden rule with writing bioinformatics software is "Before you start writing something, make sure somebody has not done it already". Just plug into a library framework e.g. bioperl, biopython, bioconductor,etc and get to work.

Melissa Wong March 5, 2010 at 8:33 PM  

Thanks Badboyz for the useful advice and suggestion. I must admit I'm not a fast learner and never will be a self-taught genius. A bit jeolous of self-taught people using internet. Just a little bit ;-)

Post a Comment

  © Free Blogger Templates Spain by Ourblogtemplates.com 2008

Back to TOP