ChatGPT’s code suggestion accuracy evaluation
Lai, Uyen (2024)
Lai, Uyen
2024
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2024060621751
https://urn.fi/URN:NBN:fi:amk-2024060621751
Tiivistelmä
Generative AI is becoming increasingly prominent in the field of software development. For developers, AI-powered programming tools are essential. While many studies have been published on generative AI tools, such as their usage by people or the influence on developers, there has been limited research into the precision of AI's recommended code. This thesis presents an empirical evaluation of generative AI's accuracy in generating code using ChatGPT as the main tool.
The research utilizes both quantitative and qualitative methods to gain a more comprehensive understanding of the subject and address the research questions. The literature review section employs a qualitative method to understand the raised research problem. The study's practical section applies the quantitative approach to evaluate the correctness of ChatGPT responses, through testing 30 randomly selected questions from the LeetCode platform.
Research result shows that ChatGPT's solutions are effective for simple coding problems, but their effectiveness decreases as challenges become more complex. ChatGPT's solutions are often accurate, but the high memory usage and upper limit on memory indicate that they may not be optimal in many cases. The findings also suggest that ChatGPT's below-average performance is likely due to the quality of its training data rather than just semantic inaccuracies.
The research utilizes both quantitative and qualitative methods to gain a more comprehensive understanding of the subject and address the research questions. The literature review section employs a qualitative method to understand the raised research problem. The study's practical section applies the quantitative approach to evaluate the correctness of ChatGPT responses, through testing 30 randomly selected questions from the LeetCode platform.
Research result shows that ChatGPT's solutions are effective for simple coding problems, but their effectiveness decreases as challenges become more complex. ChatGPT's solutions are often accurate, but the high memory usage and upper limit on memory indicate that they may not be optimal in many cases. The findings also suggest that ChatGPT's below-average performance is likely due to the quality of its training data rather than just semantic inaccuracies.