Phishing/Spam Email Detection with Natural Language Processing and Machine Learning
DOI:
https://doi.org/10.47392/IRJAEH.2026.0066Keywords:
Phishing Detection, Spam Detection, NLP, Machine Learning, BERT, URL Reputation, Email Metadata, Flask API, ExplainabilityAbstract
This work presents a practical email threat detection framework aimed at identifying both conventional spam and advanced phishing attacks by integrating linguistic analysis with contextual sender intelligence. Instead of relying solely on message content, the proposed system jointly models textual representations derived from statistical and transformer-based embeddings, domain and URL credibility indicators, and authentication-related email metadata. A range of learning algorithms is examined, spanning linear classifiers and ensemble methods to fine-tuned transformer architectures. In addition, a hybrid modeling strategy is introduced that fuses dense semantic representations with structured numerical features at the decision level. Experimental evaluation conducted on a consolidated benchmark constructed from multiple publicly available email and phishing corpora demonstrates that multimodal feature fusion consistently improves detection reliability compared to single-source detectors. The most effective configuration, which integrates transformer-based text embeddings with sender authentication and URL-derived attributes, achieves detection accuracy in the mid-to-high ninety percent range while maintaining a low false positive rate. To support real-world applicability, the system is implemented as a RESTful service and validated through client-side integrations for both enterprise email platforms and standard mail protocols. The study also addresses model interpretability, data privacy considerations, and operational constraints, and outlines future extensions focused on robustness against adversarial manipulation and support for multilingual email streams.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Research Journal on Advanced Engineering Hub (IRJAEH)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
.