Here at RecruitSumo, we strive to enable an unbias assessment of potential candidates. Our platform allows for blind testing of candidates to enable this. Though there are a number of considerations that must be made in order to assess the true impartiality of our models.
Domain Bias vs Model Generality
Part of our underlying engine to assess a candidate is reliant on the application of machine learning. When applying machine learning, it is important to consider the effect of training data on the outcome of models.
A recent article presented a concerning review of bias in machine learning arising from inherent bias in data used to train the model. As we use big data approaches in our application it is important to give clear indications of where bias may enter our models and how this can impact our assessment of a candidate.
Bias can come from a number of sources, such as our sampling methods as well as the inherent bias in implementation choices made by developers in the source code we analyze.
It is important we consider this and are transparent about it so as not to unduly diminish the ability of potential candidates for a role. In reality, we generate a number of different models at varying granularity of specificity. These vary from language agnostic to language and domain-specific models.
Sampling Opensource Repositories
In order to train these models, we need to gather data. To do this we must select a sample of repositories from which to scrape raw source code to analyze.
There are several ways to do this, we can focus on specific metrics, such as the number of stars or forks. This will prioritize larger code bases that will most likely be skewed towards higher regarded coding attributes.
As it is important to gain knowledge of positive and negative results when training the models, we sample randomly and ensure that a variety of source code with both good and bad attributes is analyzed.
This random sampling provides a generalized model. By averaging over domain and purpose, the metrics and relationships in the code will be relatively unbiased and account for general engineering practices.
It is possible to build more domain-specific models using tags or keywords within a repositories metadata. This will introduce a controlled bias towards a certain developer profile but maintain the impartiality of the underlying assessment.
We are able to generate a multitude of different models, from those that are broad and transferable to many different engineering roles, to models that have controlled bias towards engineering traits from different industries.
RecruitSumo Inc, sharing our passion for machine learning and artificial intelligence. We specialize in predictive analytics for human capital adding value by helping build the right organization, culture, team, and talent to succeed.