1. Home
  2. Technology
  3. The Challenges of In-House NLP

April 20, 2022

The Challenges of In-House NLP

You have hired an in-house team of AI and NLP experts and you are about to task them to develop a custom Natural Language Processing (NLP) application that will match your specific requirements. Do not think your problems are solved yet. Developing in-house NLP projects is a long journey that it is fraught with high costs and risks.

One thing’s for sure, having in-house NLP capability is expensive

The costs of NLP development can be broken down into roughly four categories: personnel, infrastructure, time, and data. A lean NLP development team likely requires at least three employees, a machine learning researcher, a machine learning engineer, and a more general software/data engineer. For such a bare-bones team, businesses are already looking at yearly expenses in the range of hundreds of thousands of dollars. This is the bare minimum in terms of developer expenditure. Also attracting the requisite AI talents to your organization will prove much more difficult than you imagine – good data scientists are a scarce, highly coveted resource.

In addition to personnel expenses, running and training machine learning models takes time and requires vast computational infrastructure. Many modern-day deep learning models contain millions, or even billions, of parameters that must be tweaked. These models can take months to train and require very fast machines with expensive GPU or TPU hardware.

Data is another substantial expense associated with homegrown NLP projects. Especially now, in the age of deep learning, NLP models for Intelligent Document Processing (IDP) require extensive amounts of data to train. Even if pre-trained versions make these models more accessible for business users, they still need annotated documents in the range of 5,000 to 10,000 and that poses two critical questions: Do you have enough sample data that is representative for your use case? Will your business users have enough time to annotate them?

Natural-Language-Processing-Build-versus-Buy

Your chance of success largely depends on your use case

There are so many available resources out there, sometimes even open source, that make the training of one’s own models easy. It is tempting to think that your in-house team can now solve any NLP challenge. This is not the case. These tools perform well for certain NLP tasks, but not for all.

If you want to develop your own chatbot or a question-answering tool, the chances are good that your in-house NLP team will get good results with the widely available models like BERT or GPT-3. Same with other NLP tasks like summarization, machine translation and text generation that can be successfully handled by Transformer models.

But if your use case involves broader NLP tasks such as parsing, searching and classifying unstructured documents, you are looking into a very long, experimental journey with uncertain outcome. In that case, you should not try to invent the wheel, but rather talk to NLP experts like Cortical.io which have already implemented similar solutions in a business context and have poured many years of focused effort into refining their systems to be state-of-the-art.

Read our white paper

Build or Buy: What is the best solution to process unstructured text?

to get additional insights.

Stay informed!

Subscribe to our newsletter to keep track of what happens at Cortical.io.

You have Successfully Subscribed!