Daxtra Blog

CV Parsing Fact or Fiction: You Decide

Written by M. Christine Watson | Feb 26, 2014 5:34:50 PM

Different parsers will perform with different accuracies on different sets of data. Thus, if accurate parsing is important to you, the only way to find out which is the best parser is to test it on a sample of your data.

We always invite prospective customers to test the accuracy of our parser on their own CV/Resumes, and when they do, they usually find it to be the most accurate amongst all the parsing tools that they have previously tested.

Usual claims about CV/Resume parsers

There are many different claims out there about parsers, some of which are more true than others. Here are the two most common:

"We have the best/most accurate parser in the world".

This depends on the data the parser is supposed to process. Resumes vary greatly depending on who wrote them, the incoming document format, what language they are written in, and so on.

For example, a single parser is likely to have very different accuracies for UK or Irish sourced resumes than resumes from the US or Australia. And they will certainly have very different accuracies on resumes written in different languages. The only real way to find out whether a parser is accurate enough for you is to test it on your data, or ask someone who processes very similar data.

If accuracy is your concern (and it should be), we at Daxtra are always happy to help you evaluate your data with our parser because we have found that when people evaluate us, we nearly always are measured to be the most accurate.

"You can train our parser automatically to accurately parse any resumes irrespective of location, language or specialism".

This half truth propagated by statistical parser companies is often used to explain why they were evaluated as second- or third-best.

The truth is that it is indeed always possible to increase accuracy somewhat by training a statistical system on new data, but this usually requires a large effort to generate sufficient accuracy. The statistical algorithms used by these parsers will have already been trained on huge amounts of data, much of which will probably be similar to your data, the problem is what happens when the parser has to work "out of the box".

If you send the data that you are evaluating on for "training", you should not be surprised if the parser suddenly performs well on this particular data. That is what training is all about, after all. however, it is very important to re-evaluate the parser on different data to see how well the training has improved accuracy on previously unseen data (which is how it will be used in a commercial setting). Often you will find that the "training" has little impact on performance on unseen data.

At DaXtra we continually train and improve our parser to make sure that everyone gets the most accurate parser possible when they apply it within their business.

So, what are the key measurements of a good parser?