Sign up for our newsletter and get the latest big data news and analysis.

The SAS versus R Debate

SAS_versus_R_logosThe question posed was simple enough. As innocuous as it was, I didn’t think such a firestorm would result. Here’s the original post from August 2011 with the subject “SAS versus R”:

Did anyone have to justify to a prospect/customer why R is better than SAS? What arguments did you provide? Did your prospect/customer agree with them? Why do you think, despite being free and having a lot of packages, R is still not a favorite in Data Mining/Predictive Analytics in the corporate world?”

The inquiry was posted on the discussion forum for the LinkedIn group – Advanced Business Analytics, Data Mining, and Predictive Modeling.  As of today I found a whopping 900+ comments/replies, many of which posted in the last couple of months. By any standard, the question generated a lot of interest and I think I know why. Data science and big data are currently at an important inflection point – companies are trying to decide whether the traditional license fee model of analytics software is still viable or open source is destined to reign in the coming years.

To provide a bit of perspective for how I see this issue, in a previous life I ran a company that was a long-time Microsoft Certified Partner firm. I lived and breathed Microsoft. I was an evangelist and generated a lot of revenue for the company over the years based on my product recommendations. It took an unfortunate policy change by Microsoft, to shed themselves of all smaller partners, to open my eyes and make me see the light. I now see Microsoft’s license model, as well as that of SAS, as a dinosaur of an age gone by. This is why I eagerly left the Microsoft BI solution-set behind in favor of R and other open source technologies. I won’t be going back. And this is the essence of the LinkedIn discussion where both factions argue strenuously for their position. I’ll give you a small sample of the highlights from the discussion here:

  • R has some very good extensions for larger data sets. The fact that R is a programming language gives you a lot more freedom in analytics when using R, but for data management I do want to use a more database-like tool, which SAS certainly is.
  • As a long time/advanced SAS user who is in the process of transitioning into being more R oriented the thing I miss most are my beloved Macros. I’m hoping I can use R functions the same way I used SAS Macros…fingers crossed.
  • I have avoided SAS except where professionally necessary. I find the macro oriented language extremely unpleasant and the generally prepackaged nature is also contrary to my needs. I don’t normally comment on that publicly since I figure different folks would like different strokes (posted by a software architect at MapR).
  • We can endlessly argue about when exactly R catches up with SAS (in 1 year or 3 years), but this won’t have much benefit for this discussion. Potential SAS or R users are more interested in trends, I believe.
  • Even if R was used by more people than SAS it would not mean that R is better than SAS. I believe more people drive Suzuki than Mercedes, but that does not mean that Suzuki is better then Mercedes.
  • The reason I think the R job growth trend will continue is: it’s free, it’s easy to use in conjunction with SAS and many other tools, and it’s growth in capability is very high. R is now adding more functions in one year than SAS has in total.
  • I work in analytics every day and own a small consulting firm. I do not observe the same increase in R requirements being stated here. In fact, I see an increase in demand for SAS and services for SAS. I also see Python and other products infringing on R’s territory.
  • I attended (and spoke at) the American Statistical Association’s Conference on Statistical Practice last week. I must say I was surprised to find the degree to which SAS dominated that particular group. R was there, of course – prominently. Yet, SAS appeared to be the tool of choice for this group.
  • What if SAS is cheaper in future, or even free, would people change their mind and switch from R? Probably, an interesting assumption it is when debating SAS vs. R, purely from functionality perspective.
  • I’ve just posted some findings on jobs in analytics, including SAS and R. Based just on job trends, R should catch SAS in between 1.87 and 3.35 years.

Based on the last highlight above, here is the poster’s original research article: “Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?” and his recent update: “Job trends in the Analytics Market.” Here is a trend plot from the article that seems to show R slowly approaching SAS from a jobs perspective:

SAS_versus_R

I enjoy monitoring this kind of emotion-charged debate on technology issues because I take it to mean that industry participants truly care about what they’re doing and how they do it. Since this particular LinkedIn discussion thread seems to have some longevity, I’ll continue checking it out in order to keep a pulse on this important issue.

Daniel — Managing Editor, insideBIGDATA

 

Sign up for the free insideBIGDATA newsletter.

 

 

 

 

Comments

  1. I am a SAS certified programmer but lately I use R at work as the cost of SAS is not justifiable, 99% of everything I needed to do is doable in R, and I suspect the other 1% is as well with other packages (mostly running out of memory issues) – however I always found a way to solve my needs
    Graphics are much better in R, no doubt – especially ggplot and the ability even on base graphics to add almost any layers, The SAS SG procedures are very good, but not quite as flexible
    I think for a large company SAS is likely better, less of a cost issue and it has great support, you can call them with any question. R support is searching google whihc is not quite the same (it is effective though) – I ma better by knowing both

    • Hi Mario, thanks for adding your view of this issue. I think your perspective is very similar to others in this industry. SAS is still a viable option for many organizations, but I do think the tides are turning toward open source solutions like R. Time will tell! — Daniel

    • Jackie Wu says:

      I’m currently working in SAS, but I’ve started to learn R and other open-source technologies because I kind of have similar concern with Ritu Jain.
      In my eyes, SAS is more of a brand that has been well honed by Dr Goodnight. I’m worrying what if he retires. What if SAS’s culture changes, what if employees have to be laid off?…Time will tell. Let’s see.

  2. As a former SAS employee who has seen many debate this issue– here are my 2 cents:
    1). SAS is too expensive for almost all companies — irrespective of size. How do you justify a solution that in software alone can cost you hundreds of thousands of dollars — especially when it takes as much as 6 months or more to deploy. Can you really afford a 2-3 year payback period on your investment? Not everyone needs a Mercedes or can afford one. Why pay for bells and whistles you will never use.
    2) SAS is too complex to use. You have multiple tools for data access, data quality, and data integration — most with different interfaces. By the time your analysts figure out how to move from tool to tool and just get your data in order for analysis — your competitor has already run with the opportunity. Also, do you have enough of necessary PH.D Statisticians and SAS programmers you would need to really get value from your SAS tools?
    3) Pace of innovation is too slow. SAS is very vocal about re-investing about 25% of its revenue back into R&D to support innovation — but do you know that SAS has over 200 different products. And most of these dollars actually go in maintaining some of those archaic products that were developed almost 30 years ago. Companies have way too much data today—yet—SAS is way behind in working on any sort of data compression capabilities to help with data storage and processing issues.
    I left SAS to join Alteryx because I felt that the company was just not moving fast enough to keep up with market direction. And while I agree that there are shortcomings in “R”, Alteryx is doing a great job providing a strong alternative to SAS that costs only a fraction of what you will pay for SAS. “R” based macros take away the need for writing thousands of lines of code. Single, intuitive workflow with drag and drop interface takes away the need to deploy and integrate multiple tools and train employees in multiple tools—your line of business and data analysts can now themselves create analytic models in a drag-and drop , without any coding, or dependence on IT. And partnership with Revolution Analytics makes scalability a non-issue.
    I don’t think my comments alone will change SAS advocates’ minds — after all they have built their careers on SAS — but don’t you think it is at least worth keeping up with industry direction. And it is definitely not where SAs is going….

  3. I see an analyze many comments regarding this topic – whilst some of them are relevant, they certainly do not carry any weight when you look at actual value generated from using SAS or R within an organisation (and I have always wondered – why ?). I use SAS and more recently R. Though I like the building blocks of R, I can see why the software does not get the commercial directors’ signature.

    As a data analysis professional when carrying out statistical modelling tasks I want to be able to CALL stat functions (in R or SAS) with complete trust – knowing that the syntax and procedure has been numerically validated therefore not having to worry about the results. For R, the CRAN does not offer a common test methodology or the validation of statistical analysis output. I could spend a good amount of time validating source code, output and algorithm robustness since anyone can contribute packages to the CRAN. For many packages I see they are developed by a single user – often graduate student. 3 out of the top 5 R packages were developed by a single person not a team. Anyone can submit an R package by dropping the relevant files on a CRAN FTP loaction. As a result we now have approximately 5300+ packages with overlapping functionality and varying documentation – so sometimes I find it difficult to decide which to use. With SAS I know the syntax and procs have rigorous software testing and the Quality Assurance program which undertakes functional testing, error handling, documentation completeness/updates and most importantly NUMERICAL VALIDATION.

    As for data management – I think R can do more than what is argued – so it gets a little unfair criticism in that regard. For example using the SQLDF package one can be somewhat flexible in terms of data manipulation etc – but I would say that most R users still use an accommodating tool for data management – which I would not do when I use SAS.

    Penultimately the ‘memory’ issue – i have had a few of those messages when using R. From my research most R function are single threaded and bound by memory of a single machine which sometimes gives poor performance. REVOLUTION ANALYTICS attempted to address this issue with the ScaleR offering – which did not include many of the frequently used R packages.

    Finally the ‘support’ argument. R relies on the open source community, since many R packages are developed by single users it can be difficult to attain answers within the right time frame. I used to pay for a sinlge user license for SAS – costing about 2500 GBP (base/stat). Recently in R I spent a few days finding the best way to connect to Sybase IQ on a server – I got there eventually – after a few days – it cost my client my daily rate for 3 days. So is R really free. In SAS I would have got the information in say 15 minutes – 2 lines of code – but with SAS I would have paid for a license upfront. 2 years ago a company presented a data analysis solution coded in R to a telco firm. The company was charging a fee for their solution plus support. The initial telco reaction was negative upon hearing it was coded in R….secondly the analytics manager questioned the company about their procedure for fixing any problems/code etc by asserting that the Telco had access to the R online community and the developers just as the company did – so they could go direct. There was no unique selling point.

    R is a nice package -I like using it – but I completely understand why it falls short on the commercial agreements scale.

Resource Links: