Is Python better than R for data science?


I recommend python for multiple reasons:

If data science is going to remain a main-stream in the next 5 years, it needs to add value not only in terms of proof of concept (as it is doing now) but also in terms of production (as it is failing in over 70% of cases, as Gartner recently surveyed). While R is an absolute winner in terms of classical pattern recognition libraries and statistical methods, python has a better ability to write production-ready codes.

Above point raises another important point, that is best practices of software engineering (e.g., uml architecture designs, unit testing, coding review, scrum) are going to be absolute requirements in near future for data scientists, in addition to the expected knowledge in machine learning and statistics. The reason is that proper software, production ready, codes require proper architecture design, with proper reviews and testing.

The future of data science is highly dependent on hardware availability, usually provided through clusters and clouds. Python integrates easier with cloud services from Amazon (AWS) and other clouds, comparing to R.

To remain in the market, you need to be ready for future as while you are studying data science today, it is going to be already old once you finished. I suggest proper training in languages like Scala (a potential alternative to python, while more software engineering friendly) and Java (as a basic programming language) at the same time while taking python seriously.

Note: data science is adding lots of values in the area of vision (through deep learning), however, vision is only a small portion of what it is expected from data science. Most problems require simple but robust solution designs that are able to work in a remote site for a long time in front of an operator.

One of the major downsides to Python is that many libraries are written by normal people without proper code reviews, not as a part of a research group, nor out of a properly peer-reviewed publication. This leads to a significant discrepancy between the output of libraries while they suppose to do the same.

Meet the Experts :


Source link

Write a comment