Choose the Statistical Package that Will Make Your Data Talk
In the last years, the need for using statistical testing in bioscience has grown exponentially and so has the development of statistical software. It is now common that everyone is using some sort of stats in their basic research. Among the skillful biostatisticians, R is the most popular software for data analysis, but not all data require such advanced computing for making conclusions.
To help you decide which software is best for you to use for your data, I have compiled a list of statistical packages that may come in handy for a beginner in the field of statistical testing.
Tests: If you are in need of simple descriptive statistics (like calculating means and standard deviations) or basic inferential analysis (hypothesis testing, comparing two or more groups) Excel can do the work just fine. To perform statistical testing in Excel you can use the data analysis add-in (just install it from the options), or go to formulas tab and under more functions you’ll find formulas for a plentiful of parametric test and other simpler analysis. But be careful if your data does not fit the Gaussian distribution – nonparametric tests are not incorporated in Excel.
Availability: It comes with the Microsoft Office package so it is readily available.
Skill requirements: Almost none if you have ever used Excel for any sort of computation. It only requires the understanding of statistical tests as it guides you to data array selection and other requirements.
Personal recommendation: Use it if you need to do simple statistics quickly. Very often your data will already be exported into an excel spreadsheet, making it easy to do some quick hypothesis testing or determine the dispersion of the data.
IBM SPSS Statistics
Its name originates from Statistical Package for Social Sciences, which was the field it was originally developed for. Its use is now much wider and it is commonly used in the field of biomedical research. After IBM purchased the SPSS software it became officially named IBM SPSS Statistics.
Tests: SPSS is equipped with most of the statistical tests you will need in regular life science research. It includes descriptive statistics, parametric and nonparametric tests, various regression methods, and correlation analysis. Also, a good feature for those in the biomarker discovery field is the ROC curve analysis.
In case you need a more sophisticated pattern analysis, SPSS provides a good coverage of multivariate analysis methods (factor analysis, cluster analysis, discriminant analysis) as well as decision tree algorithms (convenient for those who are creating classifiers for diagnostic purposes).
Availability: It is a product of IBM and is available under their license. To my knowledge, there are no available demo versions – which means you have to buy it.
Skill requirements: The interface is rather user-friendly. It uses pull-down menus through which most of its features are available. You can also program it with command syntax language (of course, this is for advanced users), which enables even more options. However, unless you are looking for complex data patterns and connections, it does not require any coding and I would call it easy to use.
Personal recommendation: SPSS is good for almost every type of statistical testing. I use it for extensive data analysis or for a preliminary analysis if the data is non-normally distributed.
“R” is the famous open source package for statistical programming. It was first developed in the 1990 as a form of S programming language and has continued to develop. So, it now has a fast growing community of users and developers, with new versions being released frequently. R is free for both using and developing packages and this plasticity is what contributed most to its popularity.
Tests: With R you can do just about any kind of statistical testing imaginable on earth at this moment. R packages are being designed all the time and are available to the users as soon as they are finished. You can use R for things as simple as a Student’s t test, and as advanced as CHAID decision tree algorithms.
Availability: It is an open source software, you can also download the various packages in R for free. The different graphical user interfaces are freely available as well.
Skill requirements: Here is the catch with R – it is a programming language! The only way to use it is to learn how to code in R, which does not sound like fun for most life scientists (unless you want to specialize in biostatistics, in which case you already know how to use R and are reading this article out of pure procrastination). On the other hand, the Web offers numerous tutorials explaining how to perform various test in R (including those on Bitesize Bio!), there are also many online courses available for learning the language, as well as a stable R community.
Personal recommendation: This is the best statistical package out there, if you ask me. It offers the latest developed algorithms and tests, it allows a lot of manipulation, and it is free! If you need something more complex than Excel, and do not have the access to licensed programs like SPSS, R is the next best option (it’s free and abundant in tutorials). However, I would not recommend it in cases where time is a limiting factor since learning to code is a rather time-consuming process.
Known in the past as Clementine, SPSS Modeler is a powerful statistical software directed mainly towards users in the sales industry. However, the abundance of features that it offers with a relatively simple graphical interface makes it very applicable for life science research as well.
Tests: This is a sophisticated software intended for data mining and, as the name says, modeling of data. It is almost as well-equipped as R. This basically means that you can find every test imaginable to the release date of the software, but it is primarily oriented towards advanced statistical analysis. It’s perfect if you want to build predictive models from your data and it incorporates a vast number of forecasting algorithms. It is also very helpful for decision making (which is something necessary in both sales analysis and biomedical research).
Availability: SPSS Modeler is under license and it is not quite the most low-priced software out there. No demo versions are available.
Skill requirements: SPSS Modeler is more complicated to use than SPSS Statistics. It has a unique user interface that includes a protocol you need to follow in order to perform the analysis. However, it does not require any coding to perform the most top-notch analysis. It really leaves you in awe that with only a few tutorials you can make Modeler do miracles with you data.
Personal recommendation: It is good if you have only basic statistical knowledge but would like to try to perform complex testing on your data. The Modeler can decide for you which test would pull out the best conclusions from your data. It is like the R for the non-coding users.
In case you are, like me, a fan of colorful summaries, here is a table that includes the main points of the article:
Comes with Microsoft Office
Low, takes a few minutes to learn
Fast parametric analysis
Descriptive, parametric and nonparametric, multivariate
Under IBM license
Medium, takes a few hours to learn
Extensive statistical analysis, simpler modeling
Descriptive, parametric and nonparametric, data mining and forecasting
Under IBM license
High, takes a few tutorials to learn it
Fast complex analysis
Everything known to mankind
Very high, requires days or months of learning (depending how fast you learn)
Extensive complex analysis
Finally, when you’re searching for your ideal software, start from answering the question – what do you want your data to tell you? Then you’ll know what tests to use, which will bring you to the software that supports that kind of analysis.
Leave a Comment
You must be logged in to post a comment.
Its informative article thanks.