Phishing/Spam Email Detection with Natural Language Processing and Machine Learning

Authors

  • Sahil Gedam PG Scholar, Dept. of Artificial Intelligent and Data Science, Wainganga College of Engineering and Management, Dongargaon, Maharashtra Author
  • Prof. Tara Shende Associate Professor, Artificial Intelligent and Data Science, Wainganga College of Engineering and Management, Dongargaon, Maharashtra Author

DOI:

https://doi.org/10.47392/IRJAEH.2026.0066

Keywords:

Phishing Detection, Spam Detection, NLP, Machine Learning, BERT, URL Reputation, Email Metadata, Flask API, Explainability

Abstract

This work presents a practical email threat detection framework aimed at identifying both conventional spam and advanced phishing attacks by integrating linguistic analysis with contextual sender intelligence. Instead of relying solely on message content, the proposed system jointly models textual representations derived from statistical and transformer-based embeddings, domain and URL credibility indicators, and authentication-related email metadata. A range of learning algorithms is examined, spanning linear classifiers and ensemble methods to fine-tuned transformer architectures. In addition, a hybrid modeling strategy is introduced that fuses dense semantic representations with structured numerical features at the decision level. Experimental evaluation conducted on a consolidated benchmark constructed from multiple publicly available email and phishing corpora demonstrates that multimodal feature fusion consistently improves detection reliability compared to single-source detectors. The most effective configuration, which integrates transformer-based text embeddings with sender authentication and URL-derived attributes, achieves detection accuracy in the mid-to-high ninety percent range while maintaining a low false positive rate. To support real-world applicability, the system is implemented as a RESTful service and validated through client-side integrations for both enterprise email platforms and standard mail protocols. The study also addresses model interpretability, data privacy considerations, and operational constraints, and outlines future extensions focused on robustness against adversarial manipulation and support for multilingual email streams.

Downloads

Download data is not yet available.

Downloads

Published

2026-02-14

How to Cite

Phishing/Spam Email Detection with Natural Language Processing and Machine Learning. (2026). International Research Journal on Advanced Engineering Hub (IRJAEH), 4(02), 488-495. https://doi.org/10.47392/IRJAEH.2026.0066

Similar Articles

1-10 of 717

You may also start an advanced similarity search for this article.