Skip to main navigation menu Skip to main content Skip to site footer

A Deep Reinforcement Learning Approach to Dynamic E-commerce Pricing Under Supply Chain Disruption Risk

Abstract

This paper presents a novel deep reinforcement learning approach for dynamic pricing in e-commerce environments subject to supply chain disruption risks. Traditional pricing strategies often fail to adapt effectively to supply chain disruptions, resulting in suboptimal revenue, increased stockouts, and diminished market share. We formulate the dynamic pricing problem as a Markov Decision Process (MDP) with a state space incorporating both market conditions and supply chain status indicators. The proposed dual-stream neural network architecture processes pricing history and supply chain disruption signals simultaneously, enabling contextually appropriate pricing decisions that balance immediate revenue optimization with long-term resilience. Extensive experiments using a simulation environment with 237 SKUs across 6 product categories demonstrate that our DRL approach outperforms traditional pricing strategies by 4.9% in revenue and 5.1% in profit margin under normal market conditions. More significantly, during supply chain disruptions, the DRL model maintains 83.4% of normal operational performance compared to 61.7-72.3% for conventional approaches. Performance evaluation across multiple metrics shows that the proposed method effectively mitigates the negative impacts of various disruption scenarios, including transportation failures, supplier bankruptcies, and pandemic-related restrictions, while maintaining computational efficiency suitable for real-time implementation. The research contributes to both theoretical understanding of resilient pricing mechanisms and practical applications for e-commerce businesses operating in volatile supply environments.

Keywords

Deep Reinforcement Learning, E-commerce Pricing, Supply Chain Disruption, Resilience Optimization

View PDF

References

  1. Ahmad, K., Rozhok, A., & Revetria, R. (2024, May). Supply Chain Resilience in SMEs: Integration of Generative AI in Decision-Making Framework. In 2024 International Conference on Machine Intelligence and Smart Innovation (ICMISI) (pp. 295-299). IEEE.
  2. Saxena, A., Pandey, S. N., & Dixit, S. (2024, April). Wheeling Pricing Calculation and Allocation using Deep Reinforcement Learning (DRL) Approach. In 2024 IEEE 13th International Conference on Communication Systems and Network Technologies (CSNT) (pp. 761-765). IEEE.
  3. Hu, Y., & Ghadimi, P. (2023, June). A Review of Artificial Intelligence Application on Enhancing Resilience of Closed-loop Supply Chain. In 2023 IEEE International Conference on Engineering, Technology and Innovation (ICE/ITMC) (pp. 1-8). IEEE.
  4. Avramelou, L., Nousi, P., Passalis, N., Doropoulos, S., & Tefas, A. (2023, June). Cryptosentiment: A dataset and baseline for sentiment-aware deep reinforcement learning for financial trading. In 2023 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (pp. 1-5). IEEE.
  5. Wang, L. (2024, December). Dynamic Pricing Algorithm Based on Deep Reinforcement Learning. In 2024 IEEE 16th International Conference on Computational Intelligence and Communication Networks (CICN) (pp. 303-308). IEEE.
  6. Kang, A., Xin, J., & Ma, X. (2024). Anomalous Cross-Border Capital Flow Patterns and Their Implications for National Economic Security: An Empirical Analysis. Journal of Advanced Computing Systems, 4(5), 42-54.
  7. Liang, J., Zhu, C., & Zheng, Q. (2023). Developing Evaluation Metrics for Cross-lingual LLM-based Detection of Subtle Sentiment Manipulation in Online Financial Content. Journal of Advanced Computing Systems, 3(9), 24-38.
  8. Wang, Z., & Liang, J. (2024). Comparative Analysis of Interpretability Techniques for Feature Importance in Credit Risk Assessment. Spectrum of Research, 4(2).
  9. Dong, B., & Zhang, Z. (2024). AI-Driven Framework for Compliance Risk Assessment in Cross-Border Payments: Multi-Jurisdictional Challenges and Response Strategies. Spectrum of Research, 4(2).
  10. Wang, J., Guo, L., & Qian, K. (2025). LSTM-Based Heart Rate Dynamics Prediction During Aerobic Exercise for Elderly Adults.
  11. Ma, D., Shu, M., & Zhang, H. (2025). Feature Selection Optimization for Employee Retention Prediction: A Machine Learning Approach for Human Resource Management.
  12. Li, M., Ma, D., & Zhang, Y. (2025). Improving Database Anomaly Detection Efficiency Through Sample Difficulty Estimation.
  13. Yu, K., Chen, Y., Trinh, T. K., & Bi, W. (2025). Real-Time Detection of Anomalous Trading Patterns in Financial Markets Using Generative Adversarial Networks.
  14. Xiao, X., Chen, H., Zhang, Y., Ren, W., Xu, J., & Zhang, J. (2025). Anomalous Payment Behavior Detection and Risk Prediction for SMEs Based on LSTM-Attention Mechanism. Academic Journal of Sociology and Management, 3(2), 43-51.
  15. Xiao, X., Zhang, Y., Chen, H., Ren, W., Zhang, J., & Xu, J. (2025). A Differential Privacy-Based Mechanism for Preventing Data Leakage in Large Language Model Training. Academic Journal of Sociology and Management, 3(2), 33-42.
  16. Zhang, J., Xiao, X., Ren, W., & Zhang, Y. (2024). Privacy-Preserving Feature Extraction for Medical Images Based on Fully Homomorphic Encryption. Journal of Advanced Computing Systems, 4(2), 15-28.
  17. Ren, W., Xiao, X., Xu, J., Chen, H., Zhang, Y., & Zhang, J. (2025). Trojan Virus Detection and Classification Based on Graph Convolutional Neural Network Algorithm. Journal of Industrial Engineering and Applied Science, 3(2), 1-5.
  18. Ji, S., Liang, Y., Xiao, X., Li, J., & Tian, Q. (2007, July). An attitude-adaptation negotiation strategy in electronic market environments. In Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007) (Vol. 3, pp. 125-130). IEEE.
  19. Xiao, X., Zhang, Y., Xu, J., Ren, W., & Zhang, J. (2025). Assessment Methods and Protection Strategies for Data Leakage Risks in Large Language Models. Journal of Industrial Engineering and Applied Science, 3(2), 6-15.
  20. Liu, X., Chen, Z., Hua, K., Liu, M., & Zhang, J. (2017, August). An adaptive multimedia signal transmission strategy in cloud-assisted vehicular networks. In 2017 IEEE 5th international conference on future internet of things and cloud (FiCloud) (pp. 220-226). IEEE.
  21. Michael, S., Sohrabi, E., Zhang, M., Baral, S., Smalenberger, K., Lan, A., & Heffernan, N. (2024, July). Automatic Short Answer Grading in College Mathematics Using In-Context Meta-learning: An Evaluation of the Transferability of Findings. In International Conference on Artificial Intelligence in Education (pp. 409-417). Cham: Springer Nature Switzerland.
  22. McNichols, H., Zhang, M., & Lan, A. (2023, June). Algebra error classification with large language models. In International Conference on Artificial Intelligence in Education (pp. 365-376). Cham: Springer Nature Switzerland.
  23. Zhang, M., Heffernan, N., & Lan, A. (2023). Modeling and Analyzing Scorer Preferences in Short-Answer Math Questions. arXiv preprint arXiv:2306.00791.
  24. Zhang, M., Wang, Z., Yang, Z., Feng, W., & Lan, A. (2023). Interpretable math word problem solution generation via step-by-step planning. arXiv preprint arXiv:2306.00784.
  25. Zhang, M., Baral, S., Heffernan, N., & Lan, A. (2022). Automatic short math answer grading via in-context meta-learning. arXiv preprint arXiv:2205.15219.
  26. Wang, Z., Zhang, M., Baraniuk, R. G., & Lan, A. S. (2021, December). Scientific formula retrieval via tree embeddings. In 2021 IEEE International Conference on Big Data (Big Data) (pp. 1493-1503). IEEE.
  27. Zhang, M., Wang, Z., Baraniuk, R., & Lan, A. (2021). Math operation embeddings for open-ended solution analysis and feedback. arXiv preprint arXiv:2104.12047.
  28. Jordan, S., Chandak, Y., Cohen, D., Zhang, M., & Thomas, P. (2020, November). Evaluating the performance of reinforcement learning algorithms. In International Conference on Machine Learning (pp. 4962-4973). PMLR.
  29. Qi, D., Arfin, J., Zhang, M., Mathew, T., Pless, R., & Juba, B. (2018, March). Anomaly explanation using metadata. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) (pp. 1916-1924). IEEE.
  30. Zhang, M., Mathew, T., & Juba, B. (2017, February). An improved algorithm for learning to perform exception-tolerant abduction. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 31, No. 1).
  31. Fan, J., Trinh, T. K., & Zhang, H. (2024). Deep Learning-Based Transfer Pricing Anomaly Detection and Risk Alert System for Pharmaceutical Companies: A Data Security-Oriented Approach. Journal of Advanced Computing Systems, 4(2), 1-14.
  32. Ju, C., & Trinh, T. K. (2023). A Machine Learning Approach to Supply Chain Vulnerability Early Warning System: Evidence from US Semiconductor Industry. Journal of Advanced Computing Systems, 3(11), 21-35.
  33. Rao, G., Trinh, T. K., Chen, Y., Shu, M., & Zheng, S. (2024). Jump Prediction in Systemically Important Financial Institutions' CDS Prices. Spectrum of Research, 4(2).
  34. Bi, W., Trinh, T. K., & Fan, S. (2024). Machine Learning-Based Pattern Recognition for Anti-Money Laundering in Banking Systems. Journal of Advanced Computing Systems, 4(11), 30-41.
  35. Dong, B., & Trinh, T. K. (2025). Real-time Early Warning of Trading Behavior Anomalies in Financial Markets: An AI-driven Approach. Journal of Economic Theory and Business Management, 2(2), 14-23.
  36. Trinh, T. K., & Wang, Z. (2024). Dynamic Graph Neural Networks for Multi-Level Financial Fraud Detection: A Temporal-Structural Approach. Annals of Applied Sciences, 5(1).
  37. Trinh, T. K., & Zhang, D. (2024). Algorithmic Fairness in Financial Decision-Making: Detection and Mitigation of Bias in Credit Scoring Applications. Journal of Advanced Computing Systems, 4(2), 36-49.
  38. Wang, Z., Trinh, T. K., Liu, W., & Zhu, C. (2025). Temporal Evolution of Sentiment in Earnings Calls and Its Relationship with Financial Performance. Applied and Computational Engineering, 141, 195-206.
  39. Ni, C., Qian, K., Wu, J., & Wang, H. (2025). Contrastive Time-Series Visualization Techniques for Enhancing AI Model Interpretability in Financial Risk Assessment.
  40. Wang, H., Qian, K., Ni, C., & Wu, J. (2025). Distributed Batch Processing Architecture for Cross-Platform Abuse Detection at Scale. Pinnacle Academic Press Proceedings Series, 2, 12-27.
  41. Chen, Y., Ni, C., & Wang, H. (2024). AdaptiveGenBackend A Scalable Architecture for Low-Latency Generative AI Video Processing in Content Creation Platforms. Annals of Applied Sciences, 5(1).
  42. Wang, Z., Wang, X., & Wang, H. (2024). Temporal Graph Neural Networks for Money Laundering Detection in Cross-Border Transactions. Academia Nexus Journal, 3(2).
  43. Yan, L., Wang, Y., Guo, L., & Qian, K. (2025). Enhanced Spatio-Temporal Attention Mechanism for Video Anomaly Event Detection. Applied and Computational Engineering, 117, 155-164.
  44. Wu, Z., Wang, S., Ni, C., & Wu, J. (2024). Adaptive Traffic Signal Timing Optimization Using Deep Reinforcement Learning in Urban Networks. Artificial Intelligence and Machine Learning Review, 5(4), 55-68.
  45. Ju, C., Jiang, X., Wu, J., & Ni, C. (2024). AI-Driven Vulnerability Assessment and Early Warning Mechanism for Semiconductor Supply Chain Resilience. Annals of Applied Sciences, 5(1).