Named Entity Recognition : Deep Learning with Automated Pipeline for Lead Processing
Nguyen, Phan Khanh (2020)
Nguyen, Phan Khanh
2020
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202005118219
https://urn.fi/URN:NBN:fi:amk-202005118219
Tiivistelmä
Over the past few years, various breakthroughs have been made in many artificial intelligence tasks due to the increasing popularity of artificial neural networks. Named entity recognition is a subtask of natural language processing, in which the aim is to detect and extract potential named entities from unstructured text. The goal of this thesis is to develop a functional Named Entity Recognition system using an artificial neural network for the company Vainu.
The end model was constructed by using different architectures of artificial neural networks, such as Recurrent Neural Network and Convolutional Neural Network. Some methods of transfer learning such as word embeddings were also applied. The trained model was then deployed as a microservice using Python and Docker. A training pipeline for the Named Entity Recognition model consisting of a continuous integration system with automated building and testing processes was also implemented.
Through many experiments and testing, the objective of this thesis was accomplished. The final model was able to perform the entity extracting task with high accuracy. With the new Named Entity Recognition application, Vainu gets a new AI that can be freely adapted to suit its requirements, increases the matching performance of the company and reduces the operation expense compared to using third-party software. The training pipeline was also implemented in a highly scalable way to ensure that new models for new languages can be added to the system with ease if necessary.
The end model was constructed by using different architectures of artificial neural networks, such as Recurrent Neural Network and Convolutional Neural Network. Some methods of transfer learning such as word embeddings were also applied. The trained model was then deployed as a microservice using Python and Docker. A training pipeline for the Named Entity Recognition model consisting of a continuous integration system with automated building and testing processes was also implemented.
Through many experiments and testing, the objective of this thesis was accomplished. The final model was able to perform the entity extracting task with high accuracy. With the new Named Entity Recognition application, Vainu gets a new AI that can be freely adapted to suit its requirements, increases the matching performance of the company and reduces the operation expense compared to using third-party software. The training pipeline was also implemented in a highly scalable way to ensure that new models for new languages can be added to the system with ease if necessary.