Thursday, October 3, 2013

Applying 80/20 rule in bioinformatics analysis



I have always been a firm believer in quality before speed. Ever since I read about the 80/20 rule last month, I start questioning how productive are the things I do. It allows me to reflect on my strengths and weaknesses. Just two weeks ago, I recorded my personal fastest time to complete an analysis from scatch. It took me less than a day to assemble some 454 sequences I downloaded from NCBI SRA! I spent half a day for installation and half an hour to assemble (well, it was a small dataset). I thought I would need more than a week to trim, filter reads, install, try different software and try different parameters etc. Instead, I settled for "good enough" assembly and save the time for more important tasks. 

What is 80/20 rule? It is also known as Pareto principle which is developed by an Italian economist, Vilfredo Pareto. In 1906, he observed that 80% of the wealth was owned by 20% of the population. This concept has been widely applied in business. Although the real ratio always derive from 80/20, the rule generally implies that most results come from a small amount of efforts while a large proportion of efforts give very little impact or result. By investing our efforts on the 20% of work that is really important and reducing efforts on 80% work that is least important, we can increase our output tremendously. 

Identifying 80/20 rule in bioinformatics analysis. First, I need to breakdown my workflow into steps and identify which area is consuming more time and how to improve my efficiency. The flowchart below shows my typical bioinformatics analysis which involves four basic steps :1) Identification of appropriate bioinformatics tools; 2) Reading manuals and publications; 3) Installation; 4) Running analysis.



Running analysis is the 20% work that will give me 80% results. The most exhausting stage is choosing the appropriate tools, reading publications and manual, and installation. So naturally, I would want to reduce the time I spent in these areas. But how?

  • Do I really need to install the software locally? Is any web-server available? Are there any  written scripts by good samaritians available online? Big dataset?
  • Choosing the appropriate software. Mostly I have several software options. I will start with the most popular, well-supported and well documented software. Is the software up-to-date and the last release date is recent? If no, then there is a good chance that I will not be able to install it in my system.  
  • Reading manual. Some manuals can be 200 pages long while some can be as short as a README text file. By skimming through the whole manual and reading only what I need, I can save a lot of time. I always read the part on introduction, installation and the analysis I need. 
  • Installation. I always make sure I installed the required packages and dependancies before installing the software. Some software developers will not tell you this in the manual, and therefore, be sure to google it online.
  • The software may not be working properly due to various reasons such as incomplete installation and wrong input files. To identify the problems, I usually test run a small dataset. My latest practise is to save the print screens into a log file to look for errors.
  • It's time to call for help if the step above still can't solve the problem. The fastest way is of course by goggling the problem online. I realized that I can find a solution faster if I know the specific problem, error message and right syntax to look for. Posting at a forum or writing to the developer is the last resort as it may take a few days before I get a reply. 
  • Do something productive while waiting for the analysis to complete. I'm drafting this blog as I'm running Genscan locally. Estimating the running time is very helpful in managing my time but I often don't know how long it will take. I have a little Unix trick that will play a sound when the command finished running. In that way, I get notified when the analysis complete. Whatever you plan to do during the waiting time, be prepared to be interrupted once the analysis has completed.
Free feel to post any suggestion in the comment box.


0 comments:

Post a Comment

  © Free Blogger Templates Spain by Ourblogtemplates.com 2008

Back to TOP