Enhancing Real-Time Data Warehousing Through Intelligent ETL Pipeline Orchestration: A Comparative Study of Talend and IBM Data Stage

Authors

  • Jagadeesh Thiruveedula Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh, India. Author
  • Hari Krishn Gupta University of Southern California, Los Angeles, California, United States Author
  • Kumaresan Durvas Jayaraman Bharathidasan University, Tiruchirappalli, Tamil Nadu, India. Author

DOI:

https://doi.org/10.47392/IRJAEH.2025.0464

Keywords:

Real-Time Data Warehousing, ETL Pipeline Orchestration, IBM Data Stage, Edge Computing Agents, Adaptive Resource Scheduling

Abstract

In order to make active use of constantly gathered data, organizations now rely on real-time data warehousing. Faultless orchestration of ETL pipelines makes it possible for analytics platforms to consistently provide fast results and handle a lot of data which guarantees their reliability. This investigation evaluates Talend and IBM DataStage by simulating real-world streaming tasks that check for end-to-end latency, how much can be sustained, how much processing is required and how fault recovery functions. Data ingestion was found to increase greatly in Talend, despite requiring a moderate amount of both CPU and memory. Yet, the optimized engine in DataStage causes the workload to execute quickly and recover more efficiently from simulated errors, but it does so by requiring the most resources. However, both platforms face similar problems: minimizing changes to the source systems when there is fast data capture, changing their resources as needed to manage different demands and ensuring all the incoming data is uniform. As a result, it is proposed to release lightweight ETL agents at the network edge to handle basic processing of origin data; include adaptive planners that use machine learning to improve scheduling processes and resource management in the system; and build reliable methods to test and assess performance in different situations. More chances exist in building strong security and governance checks into orchestration stages, as well as by using event-driven, serverless architecture that runs ETL processes immediately when data shows up. When taken together, these modifications will allow ETL orchestration to become autonomous, dependable and highly responsive in real-time data warehousing.

Downloads

Download data is not yet available.

Downloads

Published

2025-07-22

How to Cite

Enhancing Real-Time Data Warehousing Through Intelligent ETL Pipeline Orchestration: A Comparative Study of Talend and IBM Data Stage. (2025). International Research Journal on Advanced Engineering Hub (IRJAEH), 3(07), 3149-3153. https://doi.org/10.47392/IRJAEH.2025.0464

Similar Articles

11-20 of 633

You may also start an advanced similarity search for this article.