An analytical approach to analyze the popular word search from nineteen-year news dataset using Natural language processing technique
Hoque, Mohammad Mahmudul (2023)
Hoque, Mohammad Mahmudul
2023
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2023062124248
https://urn.fi/URN:NBN:fi:amk-2023062124248
Tiivistelmä
Natural language processing (NLP) has seen significant growth in 2022 because of the growing availability of digital data. With the development of big data, it is now important to analyze and obtain valuable conclusions from massive databases. Natural Language Processing (NLP) is a subfield of computer science and artificial intelligence that deals with the interaction between human language and computers. It involves the development of algorithms and computational models that can analyze, understand, and generate human language. This study focuses on an algorithm to find the most popular word from nineteen-year news data. There are many different approaches that can be used to implement word search games using NLP, and the specific approach used will depend on factors such as the size of the grid, the complexity of the clues. This thesis study explores the use of NLP tools to identify and find out the most popular words in the nineteen-year news datasets. The study will employ a range of NLP techniques such as vectorization, stemming, tokenization, stop word removal and many to preprocess the data. After preprocessing the data, the study will use frequency analysis (Term Frequency-Inverse Document Frequency: TF-IDF) to identify the most commonly occurring words in the dataset. The study will then rank the words based on their frequency and identify the most popular words.