Anonymous Coding Tests Don’t Remove Bias

9 min readOct 12, 2021

This is Part 2 in a five part series on building diverse software engineering teams. Part 1 covers the importance of using demographic data during the interview stage.

I’ve had this same conversation with dozens of software engineering leaders.

Them: “I’m having a hard time hiring gender minorities and people of color. What can I do?”

Me: “Have you looked at biases in your interview process that might be putting these communities at a disadvantage?”

Them: “Oh no, our interview process definitely isn’t biased. We do an anonymous code test as the first step to remove bias.”

*1980s record scratch*

This is where the conversation can take a difficult turn.

Anonymous interview stages, and specifically a technical screening, don’t remove bias from your interview process.

There are a few reasons for this, but two big ones. First, these types of technical screenings require the candidate to have spare time, and to already believe they will fit in on the team. Both of these conditions favor certain communities and disadvantage others. Second, teams who have anonymous technical screens often view them as a signal that their interview process is not biased. It can be used as a free pass that allows them not to examine their processes for other sources of bias.

I do believe (and research shows) that most interview processes that include an anonymous coding test were created by people who do really care about fair hiring processes. These people just might not have the tools or data to understand why those practices aren’t leading to the outcomes they expect.

Some context and an historical example

Folks in and outside of tech love to point to a specific study about orchestras to explain why they have an anonymous technical screen. In the USA, orchestras used to offer positions to musicians selected by the conductor, which led to very poor representation of women. Sometime in the 70s, many orchestras updated their policies to have open auditions, but they hid the musicians’ identities from the judging panel. As a result, the percentage of women in high-ranking orchestras increased from 6% in 1970 to 21% in 1993. On the surface, this policy change was a success.

Like a lot of diversity in tech initiatives, the orchestra example is exclusively about women, and specifically white women. In 2014, the percentage of people of color in orchestras was still extremely low, with just 14% of members being non-white. Orchestras still remain some of the most racially homogenous institutions today.

The policies adopted by orchestras in the 70s have come under scrutiny in recent years, most notably from the New York Times, where a critic argued that these anonymous auditions hurt more than help diversity initiatives. They prevent orchestras from representing the communities they serve, specifically as it relates to race.

Since the policy change did have a positive effect on the representation of women, it’s often cited as proof that anonymous hiring stages bolster diversity. Folks in tech have taken hiring practices for orchestras and applied them to software teams, no doubt with the best of intentions in most cases.

Musicians and the arts have immense value in our society, and representation matters. Neither point is up for debate here. But an orchestra isn’t responsible for designing autonomous driving software that may not recognize people with dark skin tones.

Research shows mixed results

So we know that anonymous auditions worked for white women in orchestras, but what does research say about these methods in general?

In 2018, The IZA Institute of Labor and Economics published a study reporting on the impact of anonymous hiring practices on diversity. They found that while anonymous application screening can reduce bias in organizations where discrimination is high, they also have a lot of downside, most of which is rarely discussed:

Bias is moved to later stages in the process.
These practices can work against other diversity initiatives which require intentionally building a team with a specific composition.
The practice can still disadvantage candidates by presenting important data out of context.

Two positives about anonymous applicant screening, though. They are fairly easy to implement through the use of applicant tracking software (ATS) tools like Greenhouse or Lever. They are also a strong signal that the hiring team does care about equitable hiring. But anonymous screenings alone can’t remove bias.

A first-stage anonymous coding test is inherently biased

Let’s bring this back to software engineering hiring. The anonymous technical screening has gained popularity as a way to reduce bias because of the previous success of policy changes like in the orchestra example above. But, the optimistic belief that it works to increase representation for all minority groups isn’t necessarily backed by research.

Anonymous technical screenings are often implemented with the joint goal of increasing efficiency, and reducing bias. In an effort to advance only candidates who are the best fit for the role, you ask them to complete a technical challenge as part of the application process, or before the first interview. These exercises can range from code riddles like reversing an array in place to full-on feature development that teeters on the line between an interview exercise or unpaid work. Some of these challenges have a time limit, some don’t. Either way, there’s really no way to tell how long the candidate spent on the exercise unless you pay for a premium tool to manage the exercises.

What you gain in efficiency, you lose in a diverse candidate pool. The outcome of a system like this is often a group of candidates that looks an awful lot like the members who are already on the team, and not a group with the representation that the hiring team wants to see.

This is because the people who can afford to complete these code screenings as a first interview stage are generally people who are already employed, have completed their education, have ample free time, and already feel like they belong on the team. This excludes many minority groups, for example, Black women, or working parents.

Here are some question to guide you through reflection on why this is. If your interview process includes an early anonymous technical screening, or if you’re asking applicants to complete a take-home coding test with more than 2 hours of commitment, these questions can help you uncover why it may be working against your diversity efforts.

Who has time to complete this? People with caregiving responsibilities, people who are pursing additional education, or people who work an another job immediately have a disadvantage coming out of the gate as they have less free time than someone without those responsibilities. Many people without a lot of free time are also from underrepresented communities. Where this disparity is particularly visible is in longer coding challenges, especially ones that are not managed by a tool which tracks the amount of time used. People who have more flexible schedules have the ability to spend more time on the challenge. Someone with less free time might not have the ability to spend extra time polishing their submissions or adding nice to haves. In this situation, the person with more time will always come out on top, not because of talent or ability, but because of other factors. It’s not a fair comparison.
Who can afford to complete this? Looking at this above list, what will it cost your candidate in order to complete your exercise on their own time? A 4-hour coding challenge means 4 hours of child care, or 4 hours they could be using their skills in another way, whether it’s for their current role, learning something new. People also need to rest.
Who is in a position to invest that much time in a company where they don’t have any idea about the team they’d be joining? For underrepresented folks, joining an inclusive team where they won’t be tokenized is extremely important. But anonymous hiring stages are designed to be completely opaque. With no data about whether the team is the right fit, and without the confidence that they could see themselves on the team, folks from underrepresented communities are more likely to pass on your opportunity in favor of an employer who will invest the time to make a connection with them. Who is in a position not to care about team demographics, or who feels confident that they’re going to belong no matter what? Folks from the existing dominant culture.
Who designed the exercise, and what do they value? Your evaluators are going to add their own biases and preferences in the evaluation criteria; it’s human nature. Adding structure here can help bring those preferences to the forefront. Is there a standard rubric? Can someone pass the interview if their solution works, but they use an approach or library that the interviewer simply doesn’t prefer? Is the evaluation criteria shared with the candidate beforehand so they know what they are being evaluated against, or do you just expect them to guess which tradeoffs are the right ones? Do you really want or need to hire another engineer who thinks exactly like other members of your team?
How would your current engineers do on this test? Ask your current engineers to take the challenge, following all of the time commitments in the rules, and have those solutions evaluated anonymously by the evaluation team. I’ve had some previous teams go through this exercise only to realize that their own peers wouldn’t pass the interviews.
Who has the patience for this? Highly qualified inbound candidates and candidates coming from your outbound recruitment absolutely don’t have the patience for an interview stage that brings them no closer to answering the question “do I want to work here?” Especially at a time when the market is incredibly competitive, focus your efforts on getting candidates engaged with your team before plopping a technical screening on them.

A data blind spot

Anonymous stages are opaque by design. The reviewers don’t know about the demographic data of the applicants and vice versa. However, this often also means that demographic data is not collected whatsoever, even for analysis in aggregate, disassociated from any application or code challenge. This makes it nearly impossible to analyze your hiring funnel and address what is and isn’t helping you reach your hiring targets, or to see where bias may be present in your processes.

Not my team!

If you find yourself hate-reading this article and collecting “you’re wrong, this works for my team” then I am happy for you. Where’s the data that backs that up? Can you prove that the anonymous coding interview does reduce bias by looking at demographic data of applicants vs. last-round candidates? What about the data you’re not seeing — the candidates who wanted to apply, but didn’t?

Three other ways to reduce bias in interviewing

We’ve established that anonymous applicant screening and anonymous coding challenges as part of the application process are not as effective at reducing bias as hoped. But, there are some cases when anonymous stages can be quite successful in removing bias. There are also some other ways to reduce bias in your hiring processes without introducing anonymous stages.

Design your interview process with a work sample or assessment in a later stage, after the candidate has had time to meet some other team members and can imagine themselves joining the team. An anonymous review at this stage can help put the focus on the competencies of the candidate. But this only works if you’re evaluating multiple work samples concurrently, and if they can be generic enough not to signal which candidate created them, which is challenging. For work samples, share the evaluation criteria with the candidate. If you provide clear expectations for your current employees, extend that practice to your candidates.
Train your interviewers. In speaking with engineering leaders about challenges with hiring, I was regrettably unsurprised to hear that almost every company provided no training to interviewers. Forbes has some patterns of bias outlined in this article which are helpful to include in anti-bias training for interviewers.
Create structured interviews with a standard rubric. This reduces subjectivity in evaluation, because it holds interviewers accountable to explicit qualities and competencies that have been predefined. It also cuts down on informal conversation. Harvard Business Review has some guidance in this article.

Up next in this series:

Focusing too much on diversity without investing in equity and inclusion. What good is hiring talent from many different backgrounds if your company is set up to favor those already in the dominant culture?

Looking only for “senior” talent, or inflating job requirements in the job ad to cover every nice-to-have quality. Do you really need a senior engineer with 10+ years of Javascript experience to fix every UI bug?

Using the wrong reachout methods and messaging for the communities they want to see represented. How can you authentically let a candidate know that DEI is important to your team and company without it coming across as the only reason you’re approaching them?

🎉 I’ve launched a course called Measuring Development Team Performance. Stop measuring the wrong things and looking at team productivity dashboards that don’t mean anything to your team.