Comparative Analysis of Large Language Models' Performance in Identifying Different Types of Code Defects During Automated Code Review
Abstract
The integration of Large Language Models (LLMs) into automated code review processes has shown promising potential for enhancing software quality assurance practices. This research presents a comprehensive comparative analysis of multiple LLM architectures in identifying various categories of code defects during automated review workflows. Our experimental framework evaluates the performance of leading LLMs including GPT-4, Claude-3, CodeBERT, and GraphCodeBERT across six distinct defect types: security vulnerabilities, logic errors, performance issues, code smells, maintainability violations, and syntax inconsistencies. Through systematic testing on a curated dataset of 5,000 code samples from open-source repositories, we measured precision, recall, F1-scores, and processing latency for each model-defect combination. Results indicate significant variation in detection capabilities, with transformer-based models achieving 87.3% average accuracy for security vulnerabilities but only 62.1% for subtle logic errors. Our findings reveal critical insights into LLM limitations and provide empirical evidence for optimal model selection strategies in production environments. The study contributes practical guidelines for implementing LLM-assisted code review systems and identifies specific areas requiring human oversight to maintain code quality standards.
Keywords
Large Language Models, Automated Code Review, Defect Detection, Software Quality Assurance
References
- Wang, Z., Wang, X., & Wang, H. (2024). Temporal Graph Neural Networks for Money Laundering Detection in Cross-Border Transactions. Academia Nexus Journal, 3(2).
- Liang, J., Fan, J., Feng, Z., & Xin, J. (2025). Anomaly Detection in Tax Filing Documents Using Natural Language Processing Techniques. Applied and Computational Engineering, 144, 80-89.
- Zhang, S., Mo, T., & Zhang, Z. (2024). LightPersML: A Lightweight Machine Learning Pipeline Architecture for Real-Time Personalization in Resource-Constrained E-commerce Businesses. Journal of Advanced Computing Systems, 4(8), 44-56.
- Fan, J., Trinh, T. K., & Zhang, H. (2024). Deep Learning-Based Transfer Pricing Anomaly Detection and Risk Alert System for Pharmaceutical Companies: A Data Security-Oriented Approach. Journal of Advanced Computing Systems, 4(2), 1-14.
- Ni, C., Qian, K., Wu, J., & Wang, H. (2025). Contrastive Time-Series Visualization Techniques for Enhancing AI Model Interpretability in Financial Risk Assessment.
- Ju, C., Jiang, X., Wu, J., & Ni, C. (2024). AI-Driven Vulnerability Assessment and Early Warning Mechanism for Semiconductor Supply Chain Resilience. Annals of Applied Sciences, 5(1).
- Sun, M., Feng, Z., & Li, P. (2023). Real-Time AI-Driven Attribution Modeling for Dynamic Budget Allocation in US E-Commerce: A Small Appliance Sector Analysis. Journal of Advanced Computing Systems, 3(9), 39-53.
- Trinh, T. K., & Zhang, D. (2024). Algorithmic Fairness in Financial Decision-Making: Detection and Mitigation of Bias in Credit Scoring Applications. Journal of Advanced Computing Systems, 4(2), 36-49.
- Rao, G., Wang, Z., & Liang, J. (2025). Reinforcement Learning for Pattern Recognition in Cross-Border Financial Transaction Anomalies: A Behavioral Economics Approach to AML. Applied and Computational Engineering, 142, 116-127.
- Wang, H., Wu, J., Ni, C., & Qian, K. (2025). Automated Compliance Monitoring: A Machine Learning Approach for Digital Services Act Adherence in Multi-Product Platforms. Applied and Computational Engineering, 147, 14-25.
- Chen, S., Li, X., Zhang, M., Jiang, E. H., Zeng, Q., & Yu, C. H. (2025). CARES: Comprehensive Evaluation of Safety and Adversarial Robustness in Medical LLMs. arXiv preprint arXiv:2505.11413.
- Chen, Y., Ni, C., & Wang, H. (2024). AdaptiveGenBackend A Scalable Architecture for Low-Latency Generative AI Video Processing in Content Creation Platforms. Annals of Applied Sciences, 5(1).
- Zhang, S., Feng, Z., & Dong, B. (2024). LAMDA: Low-Latency Anomaly Detection Architecture for Real-Time Cross-Market Financial Decision Support. Academia Nexus Journal, 3(2).
- Zhang, M., Heffernan, N., & Lan, A. (2023). Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions. arXiv preprint arXiv:2306.00791.
- Li, M., Liu, W., & Chen, C. (2024). Adaptive Financial Literacy Enhancement through Cloud-Based AI Content Delivery: Effectiveness and Engagement Metrics. Annals of Applied Sciences, 5(1).
- Liu, P., Yan, X., Jiang, Y., & Xia, S. T. (2020, May). Deep flow collaborative network for online visual tracking. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 2598-2602). IEEE.
- Zhao, Y., Zhang, P., Pu, Y., Lei, H., & Zheng, X. (2023). Unit operation combination and flow distribution scheme of water pump station system based on Genetic Algorithm. Applied Sciences, 13(21), 11869.
- Kang, A., Xin, J., & Ma, X. (2024). Anomalous Cross-Border Capital Flow Patterns and Their Implications for National Economic Security: An Empirical Analysis. Journal of Advanced Computing Systems, 4(5), 42-54.
- Zhang, M., Baral, S., Heffernan, N., & Lan, A. (2022). Automatic short math answer grading via in-context meta-learning. arXiv preprint arXiv:2205.15219.
- Chand, R., Jain, P., Mathur, A., Raj, S., & Kanikar, P. (2023, March). Survey on Visual Speech Recognition using Deep Learning Techniques. In 2023 International Conference on Communication System, Computing and IT Applications (CSCITA) (pp. 72-77). IEEE.
- Rao, G., Trinh, T. K., Chen, Y., Shu, M., & Zheng, S. (2024). Jump Prediction in Systemically Important Financial Institutions' CDS Prices. Spectrum of Research, 4(2).
- Wang, H., Qian, K., Ni, C., & Wu, J. (2025). Distributed Batch Processing Architecture for Cross-Platform Abuse Detection at Scale. Pinnacle Academic Press Proceedings Series, 2, 12-27.
- Zhang, S., Zhu, C., & Xin, J. (2024). CloudScale: A Lightweight AI Framework for Predictive Supply Chain Risk Management in Small and Medium Manufacturing Enterprises. Spectrum of Research, 4(2).
- Wang, Z., Trinh, T. K., Liu, W., & Zhu, C. (2025). Temporal Evolution of Sentiment in Earnings Calls and Its Relationship with Financial Performance. Applied and Computational Engineering, 141, 195-206.
- Chen, J., & Lv, Z. (2025, April). Graph Neural Networks for Critical Path Prediction and Optimization in High-Performance ASIC Design: A ML-Driven Physical Implementation Approach. In Global Conference on Advanced Science and Technology (Vol. 1, No. 1, pp. 23-30).
- Wang, Z., Zhang, M., Baraniuk, R. G., & Lan, A. S. (2021, December). Scientific formula retrieval via tree embeddings. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 1493-1503). IEEE.
- Zhang, M., Wang, Z., Yang, Z., Feng, W., & Lan, A. (2023). Interpretable math word problem solution generation via step-by-step planning. arXiv preprint arXiv:2306.00784.
- Michael, S., Sohrabi, E., Zhang, M., Baral, S., Smalenberger, K., Lan, A., & Heffernan, N. (2024, July). Automatic Short Answer Grading in College Mathematics Using In-Context Meta-learning: An Evaluation of the Transferability of Findings. In International Conference on Artificial Intelligence in Education (pp. 409-417). Cham: Springer Nature Switzerland.