Improving proficiency in data science is easy. You can join a course of PGP in Data Science, read self-help books, or sift through articles and online courses. It all depends upon how you work on developing conceptual knowledge while continually indulging in the practice of regular work routine.
It will enhance your chances of landing a promising career opportunity in data science. You can fast-track your growth in data science by pursuing a PGP certification while practicing and building open-source projects present on GitHub.
Image Courtesy: Shutterstock
GitHub is a platform that brings together developers from around the world to create and share open-source software development projects. GitHub is primarily a code hosting platform for version control and collaboration.
Experienced programmers have put their open-source codes on GitHub, which anyone can access to understand and build their model. You can also learn here about recent breakthroughs part from building projects. Furthermore, you can augment your learning by enrolling in a PGP in data science course, which will give you the necessary platform to develop your portfolio.
To help build your data science portfolio, here are some GitHub projects to consider:
Natural Language Projects (NLP)
NLP is booming with many breakthroughs. Once you start going through them, you will realize it’s hard to keep up with the pace of new frameworks. There are many projects for you to experiment with and gain experience.
It is a light version of BERT. If you are not aware of the BERT framework, it was developed by Google that transformed NLP overnight. Original BERT framework is massive in size, which won’t run on local machines unless you have GPUs lying around, which lead to the creation of ALBERT. It is used for building language models that perform all tasks with only 30% parameters.
- String Sifter
It is a machine learning tool that automatically ranks strings for malware analysis, making it one of the fascinating projects of data science. A malware program often contains strings to perform various operations like copying a file to specific location or registering key. String Sifter provides crucial information that can help build strong malware detection programs.
It is a collection of papers on Pertained Language Models which allows us to use the existing model and play around with it. PLMpapers repository is a collection of over 60 papers, which include BERT, XLNet, ERNIE, ULMfit, among various others.
Computer Vision Projects
Have you ever heard of image or video data and worked with them? It is an advanced form of computer vision technique for which specialists are high in demand. In case you have prior knowledge of computer vision projects, you can add a few GitHub projects on to your portfolio.
A lot is being sought for the ability to work with image data in the industry, which is not coming as a surprise. It is unprecedented how images are uploaded and published these days, whose pace will only increase in the coming years. Tiler is a simple tool to create images using different small images or tiles. The possibilities of creating an image become endless which comes in all shapes and sizes.
In today’s digital world, privacy is in short supply as every form of online activity is recorded, stored, analysed, and used for offering customized adds and product suggestions. One of the major drawbacks of our lack of privacy is the manipulation of images. DeepPrivacy is fully automatic anonymization.
These are some of the data science projects which are on rage. While you need to learn about these projects or build them, you must also be aware of other projects on GitHub like TubeMQ, DeepCTR.
Leave your vote
This post was created with our nice and easy submission form. Create your post!