Talk:Comparison of statistical packages
From Wikipedia, the free encyclopedia
| This article is rated List-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | |||||||||||||||||||||
| |||||||||||||||||||||
Statsmodels removal
Statsmodels is the largest and most popular statistics package in Python and usually in the top 10 Python packages for data science. Also it has close to 1500 citations in scientific articles and books per year, currently 7300 citations in total according to google scholar https://scholar.google.com/citations?view_op=view_citation&hl=en&user=X7Zp5YMAAAAJ&citation_for_view=X7Zp5YMAAAAJ:IjCSPb-OGe4C
Statsmodels is for traditional statistics and econometric analysis and does not get the hype of machine learning like scikit-learn. However, it still very widely used in industry, research and teaching.
Why has it been almost systematically removed from Wikipedia?
revision that removed it from this page https://en.wikipedia.org/w/index.php?title=Comparison_of_statistical_packages&diff=1256342533&oldid=1256286749 The statsmodels Wikipedia page was also removed some years ago.
Aside: statsmodels grew out of scipy and still works closely with scipy.stats. For example, scipy.stats has the basic statistical methods and points in the documentation to statsmodels for more advanced methods. — Preceding unsigned comment added by Josefpktd (talk • contribs) 13:51, 7 October 2025 (UTC)
LAD support
By definition, any package that supports quantile regression will support least absolute deviation regression, which is just quantile regression at q=0.5. --189.125.124.24 (talk) 16:44, 23 November 2016 (UTC)
RKWard windows support
RKWard is actually fully usable on windows by installing the KDE package. Source: http://sourceforge.net/apps/mediawiki/rkward/index.php?title=RKWard_on_Windows
So I think there should be a star specifying such support. — Preceding unsigned comment added by 66.36.128.146 (talk) 14:25, 4 June 2011 (UTC)
Gretl and Stepwise
Can gretl do stepwise regression? I can find no mention of it inside the program or documentation, but I may be wrong - I'm quite new to it! tompagenet (talk) 22:02, 26 August 2008 (UTC)
- Yes, it can. In the output window, choose Tests> Omit variables. Free Software Knight (talk) 08:31, 22 October 2008 (UTC)
External URLs
Most other software lists and comparisons do not have external URLs for every package. I think they should be removed from this article as well. --Karnesky 15:57, 6 September 2006 (UTC)
GLM
Why is GLM listed in both ANOVA and regression? --Karnesky 15:57, 6 September 2006 (UTC)
- General linear model and Generalized linear model are both abbreviated GLM. Den fjättrade ankan 16:05, 6 September 2006 (UTC)
Both ANOVA and regression analyses are special cases of the GLM. MW (talk) 15:15, 24 December 2009 (UTC)
Procedure Comparisons
It seems pointless, and cumbersome, to list different capabilities for each package, like ANOVA, Regression, etc. First of all, there are just TONS of them: many, many more than could be listed in such a form. This makes the current listing utterly incomplete, and therefore very misleading. There are already links to each package's website, where such details are given. I'm for deleting these tables altogether. What are the thoughts out there? Ww.ellis 15:01, 29 November 2006 (UTC)
- I disagree--other software comparisons have complicated sets of features. What features do you think are missing? Why don't you just add them? --Karnesky 17:06, 29 November 2006 (UTC)
- I disagree too, I think the tables are useful. Den fjättrade ankan 21:58, 29 November 2006 (UTC)
- I do agree, these kind of tables are misleading from the early beginning, out of WP scope, always outdated, non NPOV by nature, I'm for deleting the whole entry. Jean R. Lobry 01:59, 6 January 2007 (UTC)
- I disagree for now. The sheer principle of the table is to give an overview of the general capability and orientation of the statistics package, which partially fails, as you point out. But in order to gain an overview over the statistics packages, this kind of table is the only way to start to get a more balanced overview. The table should - in future - be splitup according to the table headers main sections, and the sub-headers provided in separate tables. It's unavoidable that comparing proprietary and open source software involves a kind of original research, so either move the table to this discussion page, or keep. Said: Rursus 10:34, 25 May 2007 (UTC)
I find the comparisons helpful, especially to narrow down the packages which may be appropriate to the kind of analysis I have in mind. I also refer students to this page when they are looking for statistical software. However, some of the descriptions are out of date (for example neither the SalStat home page nor the program are being maintained). Other descriptions, as noted in other comments, are incomplete. This page needs to be better maintained. I would do it myself but I'm not sure how to insert a footnote to the author, Salmoni. MW (talk) 15:19, 24 December 2009 (UTC)
- The comparisons are useful, albeit limited. Another alternative/additional approach would be to list the number of procedures provided by each application within a given category. For example, rather than checking whether each application allows plotting pie charts, bar charts, scatter plots, and so on ad nauseum, the table could summarise that 'application A' provides 7 types of chart, 'application B' provides 53 types of chart, and so on. I wouldn't suggest removing data already in the table and replacing it with this right away, but it is at least worth considering when contemplating the addition of new categories/columns to the tables! —DIV (137.111.13.17 (talk) 12:21, 30 January 2019 (UTC))
- Replacing such information with a simple count destroys valuable information. Because it is hard to get all this information does not make it worthless - more to the contrary. All this info is hard to get, so it is very valuable. And there is no comparable info out there I know of in the internet which shows all this in one place. So we should try double as hard to make these tables as complete as possible. And I believe the community can do it! Crowd source the crap out of it :D Tomtomme (talk) 17:22, 25 February 2025 (UTC)
Free
Lets change all the instances of "Free" in the Cost columns to "Gratis", least anyone become confused with the licensing terms. Ogranut 03:15, 11 January 2007 (UTC)
I also disagree. This page has one important role - it shows which software is better. You pay more, you get more. "Free" is a simple term. Let's stick to it. Everybody understands "Free".
By the way, i haven't seen JMulti in the list... unfortunately i'm not competent enough to add it.. ;) R.B.
- Nope, there's free, and there's gratis. "Gratis" means there's a zero-bucks version for download somewhere - gratis directs towards low-cost-custormers. Free means the source code is open for everyone to share, reprogram and redistribute - free directs towards finger-itching programmers that want to improve the program. Mostly free code can be achieved in a gratis version. See the preachings of our most revered prophet R. M. Stallman of the Emacs Church in Free Software Definition. Said: Rursus 09:09, 25 May 2007 (UTC)
Gratis and Free mean the same things. Check the dictionary. Graemec2 (talk) 14:28, 28 April 2008 (UTC)
Red/Green should be avoided
In the cost field, coloring open source cells green and non-open source cells red should be avoided because they correlate to good and bad endorsements (green means go, red means stop). —Preceding unsigned comment added by 199.253.16.1 (talk • contribs)
- Rather than "open source," which places a value judgment on particular set of licenses, why don't we say "source available?" Having source code available is a feature & a differentiator (particularly for mathematical software). I don't see how you can argue that we shouldn't color code this feature, but should color-code platform support or a particular ANOVA method. --Karnesky 19:31, 20 February 2007 (UTC)
- Red means stop, or missing, or broken by a general tabular convention found at open source tables. Maybe cyan or blue for non-gratis software. It indicates the coolness (coldness?) of the market. Said: Rursus 09:13, 25 May 2007 (UTC)
- This page does not exist in a vacuum. There are many other pages with these kind of tables & they've all used red for "no" as opposed to anything else. Perhaps a new set of templates can be made for "free/open source" vs. "proprietary" (similar to the free (gratis)/nonfree templates). However, I think this page should follow conventions and consensus set forth by other pages. This is currently that green and red aren't being used to give value judgments. --Karnesky 14:07, 25 May 2007 (UTC)
- The same complaint was voiced at Talk:Comparison_of_computer_algebra_systems. JonMcLoone (talk) 16:35, 28 April 2008 (UTC)
- And, copying from that page: We had this discussion at Template talk:Yes. The consensus was that green means yes & red means no & that we aren't prescribing a value judgment. The 'but yes' and 'but no' templates were deleted for this very reason. --Karnesky (talk) 17:45, 28 April 2008 (UTC)
(backdent) as an economist, I just want to throw out there that paying less is good. It also totally changes the environment when requiring students to use a package. PDBailey (talk) 00:53, 9 January 2009 (UTC)
- It doesn't automatically follow that commercial implies any difference to student use. eg in my world (Mathematica) student home use is included in the price of a site license. From the students perspective Mathematica is free. Someone is paying but not them. This is the same with free software, there is always someone is paying (at least with their time), just not the end user.JonMcLoone (talk) 10:57, 9 January 2009 (UTC)
- Red means no, green means yes? OK, so change the question: change the heading to "Closed source". All answers will then flip. Everyone happy then? —DIV (120.17.146.10 (talk) 03:34, 25 January 2019 (UTC))
Asterisks
There are asterisks after some program's prices but there isn't an explanation for them anywhere.
Where is Octave ?
As Zarahemlite (talk) said, where is octave? rolandog (talk) 15:14, 25 September 2012 (UTC)
- Yeah, where is GNU Octave? If MATLAB (plus its Statistics Toolbox) is included, then GNU Octave (plus its Statistics Package, from Octave Forge) can be included too. —DIV (120.17.146.10 (talk) 02:30, 25 January 2019 (UTC))
Where is MATLAB ?
I am shocked that all Matlab information is gone from this page. Is Matlab suddenly not a powerful statistical analysis package any more? —Preceding unsigned comment added by 71.98.91.175 (talk) 11:09, 5 May 2009 (UTC)
- I added it back. I agree with you, whoever removed it was wrong to do so. I had a quick look for prices, and could not find them. Clearly, like all the commercial packages, they try to lock students in with packages at vastly reduced rates. Toolboxes add extra complications too. I suspect UNIX versions might be more than Windows. Hence I just put depends on many things. Let someone else fill in the details if they want. But at least MATLAB is now back. It would be interesting to know who and why they removed it. Perhaps it was an accident.
- In addition, where is octave? Zarahemlite (talk) 17:03, 30 January 2010 (UTC)
- Where is Design Expert? -134.84.166.40 (talk) 18:34, 26 February 2010 (UTC)
- Shouldn't Weka be here as well?
BSD on the OS list
Having BSD on the list of supported platforms makes this look out of touch. Might as well have DOS amd OS/2. I suggest we cut the column. Wordsoup (talk) 18:27, 7 March 2008 (UTC)
- I don't understand this reference to DOS or OS/2. The BSD operating systems are architecturally modern (sometimes leading / sometimes lagging), actively maintained, with reliable release schedules, and excellent reputations for stability. While the BSD user base is not large enough to interest the commercial vendors (is that what you mean by "out of touch"?), the BSD progeny remain important within the open source community, even if they are not everyone's cup of tea. MaxEnt (talk) 01:32, 19 August 2009 (UTC)
Open source vs proprietary
Anyone understand why Dataplot and SalStat are listed as "open source", while their license is listed as proprietary, and hence not "open source". -ChristopherM (talk) 05:34, 3 April 2008 (UTC)
DAP
Is anybody familiar enough with GNU DAP (Free software with at least some compatibility with SAS) to add it to the comparison matrices?
http://www.gnu.org/software/dap/dap.html
--CristoperB (talk) 19:37, 2 December 2008 (UTC)
memory model would be nice
It would be nice to list the memory model. i.e., does it use the hard drive or system memory for storage of matricies / data sets. Those that use system memory rely on swap to deal with large data sets but are generally faster for small data sets and require more thought go into their algorithms. PDBailey (talk) 22:04, 8 January 2009 (UTC)
Image copyright problem with File:Mainshot5.png
The image File:Mainshot5.png is used in this article under a claim of fair use, but it does not have an adequate explanation for why it meets the requirements for such images when used here. In particular, for each page the image is used on, it must have an explanation linking to that page which explains why it needs to be used on that page. Please check
- That there is a non-free use rationale on the image's description page for the use in this article.
- That this article is linked to from the image description page.
The following images also have this problem:
- File:Stata10 big.jpg
- File:SAS 9 on Microsoft Windows.png
- File:Minitab.jpg
- File:Maple12 Screenshot.jpg
This is an automated notice by FairuseBot. For assistance on the image use policy, see Wikipedia:Media copyright questions. --23:27, 8 February 2009 (UTC)
Eviews
Why does this page claim that the Eviews costs 40 dollars, if on the Eviews page it costs 600 dollars? —Preceding unsigned comment added by 92.245.195.116 (talk) 23:28, 11 February 2009 (UTC)
Inclusion criteria
I think we need to define some criteria for including a package in these tables, as they are becoming too large to be easy to use and the page becomes a sort of Directory. At the moment the lead just says "a number" of statistical packages, which is a bit woolly. I'd suggest that the main criterion should be the first criterion of the common selection criteria for inclusion in a stand-alone list, i.e. "Every entry meets the notability criteria for their own non-redirect articles in English Wikipedia. Red-linked entries are acceptable if the entry is verifiably a member of the listed group, and it is reasonable to expect an article could be forthcoming in the future." In addition, we could specify that the packages must be currently available (so e.g. not GLIM). Thoughts? Qwfp (talk) 11:42, 17 July 2010 (UTC)
- Qwfp, I think that is a great idea. However, I think as long as the package is notable, it should be included. Obviously GLIM is hugely notable in the area of statistical computing since it changed the face of it. 018 (talk) 19:03, 13 October 2010 (UTC)
- Another question is how much statistics-stuff should a package include to count as a statistics package ... I presume we wouldn't want to have all of Category:Data analysis software. Melcombe (talk) 15:26, 24 June 2011 (UTC)
- This may not be such a philosophical question. Google Scholar searches for "review of econometric software" will bring up such articles published in the scholarly journals. I know "statistics" isn't just "econometrics', but you catch my drift. --189.125.124.24 (talk) 16:47, 23 November 2016 (UTC)
Why isn't Excel included? Or to put it another way, what are the inclusion criteria which excludes Excel? It would be interesting to see how it compares to the other statistical packages (and may be a useful tool to justfy not using Excel).--NimbleThink (talk) 06:57, 13 February 2013 (UTC)
Sorting in "Latest Version" column of General Information table is broken
The sorting feature of the General Information table is broken for the "Latest Version" column b/c of a mix of alphabetical and numerical data. Since some entries contain MONTH information in addition to YEAR, the table sorts via alphabetical order; however this breaks the functionality of the table, b/c it makes "August 2009" appear before "March 2010" ("A" obviously appearing in the alphabet before "M".)
I think the best solution to this would be to change the format of data in the "Latest Version" column to numerical dates only, consistent across all entries, then sort by numerical order. —Preceding unsigned comment added by 216.15.58.24 (talk) 01:29, 15 November 2010 (UTC)
Weibull++
I think Weibull++ should be added here...188.194.248.102 (talk) 17:27, 23 November 2011 (UTC)
Microfit
I think Microfit developed by Bahram Pesaran (Wadhwani Asset Management) and M. Hashem Pesaran (University of Cambridge)should also be in the list. — Preceding unsigned comment added by 175.156.156.100 (talk) 16:09, 15 December 2012 (UTC)
SIMCA P+
Umetrics SIMCA P+ should be included. — Preceding unsigned comment added by 136.159.224.201 (talk) 23:12, 29 January 2013 (UTC)
StatsDirect
I'd never come across this until I saw it cited in PLOS Medicine as the software used in this article. My sense from a very unscientific Google sniff is that it is meant for those who find SPSS too challenging - but a knowledgeable appreciation would be helpful. — Preceding unsigned comment added by Skeptic12 (talk • contribs) 21:45, 19 February 2013 (UTC)