Data science and within that, machine learning has seen an explosive uptick in both interest and application in recent years. This has meant that the job market has expanded quickly.
With no real sign of slowing demand and a limit to the number of experienced individuals with computer science degrees, the market has been opened up to a diverse set of prospective candidates.
Many individuals are moving into the industry from backgrounds such as the sciences, engineering or from engagement with massive open online courses (MOOCs). In fact, Andrew Ng himself recently placed emphasis on taking on interns who had completed his Deep Learning MOOC on Coursea.
With many applications of these methods, it is also beneficial to have domain knowledge. With prospective candidates coming from diverse backgrounds, assessing their ability is complex.
Someone with a strong background in research does not necessarily have a programming background that exemplifies the traits needed to pass the short time-sensitive tests used by many recruiters to assess ability. However, this does not necessarily impede a candidates ability to generate production-ready code in industry.
To address this we use a mixture of different machine learning methodologies to assess numerous aspects of a candidates coding ability and return a fit for a company's coding culture.
Recently, there has been a huge drive to opensource a lot of code development. Many large technology companies that have typically kept much of their code bases private are opening up import cache of code from which we can generate invaluable data.
An interesting post here suggested that Microsoft and Google have a combined ~2200 employees pushing to almost 2000 top repositories on Github.
Further, individual programmers are now able to publicly display many of their own projects on services such as Github. This gives many different projects, from large code bases with thousands of lines to smaller projects with a few hundred upon which we can analyze trends.
With all this code freely available to the world, we have developed an infrastructure that allows us to generate large data sets of pertinent information on coding traits employed within leading corporations or large swathes of industry.
Once data from these sources has been generated, we can use this for training the models and algorithms used to assess candidates, but also take a more research orientated approach and take a deep dive into large and popular repositories.
By taking an in-depth look at these, we can investigate coding metrics that tell us which projects are likely to be easier for coders to dive into. Which are likely to be difficult to maintain. As well as plot how repositories change over time and with different releases.
Using a big data approach we utilize machine learning in a multi-faceted approach to scoring prospective candidates. We assess both individual coding traits as well as wider company and industry trends allowing us to match candidates for specific roles.
Using a combination of supervised and unsupervised machine learning, we look at specific traits that give a picture of how a candidate is likely to perform in a role.
This methodology allows us to follow the desires of an individual recruiter with precision, or to allow the data to speak for itself and indicate into which profile a candidate falls.
We can then tune the underlying algorithms to prioritize for company culture or wider industry trends without exerting our own biases.
In short, we can use machine learning to assess machine learning engineers and any other software developers, in their assessment for a new job.
RecruitSumo Inc, sharing our passion for machine learning and artificial intelligence. We specialize in predictive analytics for human capital adding value by helping build the right organization, culture, team, and talent to succeed.