Insights on bridging data engineering, AI, and education at Dremio with Andrew Madson, Data Analytics, Data Science, and AI Evangelist.
Hi Andrew, we are delighted to have you with us. Could you please tell us your expertise and experiences in data analytics, AI, and education, highlighting your professional journey, and how you ended up as an evangelist at Dremio?
My mission and ultimate journey in data and analytics has always been focused on bridging the gap between raw data and actionable insights. Before joining Dremio, I led multiple data analytics and machine learning teams where I specialized in building scalable data pipelines, implementing advanced analytics solutions, and developing machine learning models for production environments for companies such as JPMorgan Chase, LPL Financial, and MassMutual.
My experience spans building real-time analytics platforms, developing predictive maintenance systems, and architecting enterprise-wide data solutions. The transition to becoming a technology evangelist at Dremio was driven by the opportunity To help organizations modernize their data architecture and leverage the power of open-source technologies in their AI and analytics initiatives.
You have an extensive background with your education. Can you brief us on the relationship between academic and practical applications in AI and data analytics, and how do we address the gap between these?
The relationship between academic research and practical applications in AI and data analytics is complex and evolving. Academic research drives innovation in algorithms, methodologies, and theoretical frameworks, while practical applications focus on scalability, maintainability, and business value. The key to bridging this gap lies in understanding both perspectives. We often need to balance theoretical optimization with practical constraints like computational resources, data quality, and integration requirements. For example, while academic research might focus on achieving marginal improvements in model accuracy, practical implementations often prioritize robustness, interpretability, and operational efficiency.
Organizations should foster collaboration between research teams and practitioners to address this gap, implement proof-of-concept projects to validate academic findings and maintain strong feedback loops between theoretical development and practical implementation.
Your experience in leading diverse teams is outstanding. What strategies are most effective when leading these teams that promote innovation and collaboration?
Leading high-performing technical teams requires a multifaceted approach combining technical excellence and effective people management. I believe establishing clear technical standards and architecture principles while allowing teams flexibility in implementation is crucial. This approach needs to be balanced with dedicated time for innovation through activities like hackathons and research sprints. Regular knowledge-sharing sessions where team members can present new technologies or successful project outcomes have proven invaluable.
Using data-driven decision-making to evaluate and prioritize technical initiatives helps maintain objectivity and focus. Perhaps most importantly, fostering an environment of psychological safety where team members feel comfortable challenging assumptions and proposing new approaches has been key to driving innovation. The ultimate goal is to create an environment where diverse perspectives are valued and technical excellence is celebrated while maintaining a focus on delivering business value.
How do you think data engineering is evolving to meet the demands of AI technologies?
Data engineering is fundamentally transforming to support the increasing demands of AI technologies. We’re seeing a significant shift from batch-oriented to streaming architectures to support real-time ML inference and training, with an enhanced focus on automated data quality monitoring and metadata management. Developing feature stores and automated feature engineering pipelines has become crucial for scaling AI operations. Adopting declarative approaches to data pipeline development and closer alignment between data engineering and ML operations is reshaping how we build and maintain data infrastructure.
These changes are driving the need for data engineers to develop expertise in distributed systems, streaming architectures, and machine learning operations while maintaining their core data engineering skills. The evolution is technical and organizational as data engineering teams increasingly collaborate with ML teams to build integrated, scalable solutions.
As Dremio is a lakehouse for high performance, could you elaborate on the advantages of a data lakehouse for organizations looking to leverage AI and how they can do a smooth transition from traditional data warehousing to lakehouse?
The data lakehouse architecture represents a significant advancement in data management as combining the best aspects of data lakes and data warehouses. It enables cost-effective structured and unstructured data storage in open formats while providing ACID transaction support and versioning capabilities. This architecture is precious for AI initiatives as it allows direct access to raw data for ML training while maintaining high-performance query capabilities for BI and AI workloads.
When looking to make the transition from a data warehouse to a lakehouse, I recommend a measured approach for organizations transitioning from traditional data warehouses, starting with a pilot project focusing on a specific use case. This should be accompanied by early implementation of robust data governance and security frameworks. Organizations should then gradually migrate workloads while maintaining business continuity, leveraging automated tools for schema migration and data validation. Investment in team training on open-source technologies and modern data architectures is crucial for long-term success.
Maintaining data quality is the rise of synthetic data. Do tell us how Apache Iceberg ensures consistency and reliability when data is frequently being updated.
Apache Iceberg provides a robust foundation for maintaining data consistency and reliability through several sophisticated mechanisms in dynamic environments. At its core, Iceberg’s atomic transaction support ensures that all changes are either fully committed or rolled back, preventing partial updates that could corrupt data integrity. The table format’s schema evolution capabilities allow for safe schema changes without disrupting existing queries, which is crucial when dealing with evolving data requirements.
Iceberg’s hidden partitioning optimizes query performance while maintaining data organization, Its time travel capabilities also enable access to historical versions of data, which is invaluable for audit trails and historical analysis. The snapshot isolation feature guarantees consistent views of data during concurrent operations, making it particularly well-suited for environments with frequent updates or synthetic data generation. These capabilities maintain data integrity while supporting high concurrency and complex data operations.
Given your experience in data privacy, how can data professionals ensure they are endorsing ethical standards, and what kind of ethical practices should organizations prioritize when working on AI solutions?
Data professionals must approach ethical considerations in AI development through a comprehensive framework that begins with strong data governance. This includes implementing regular privacy impact assessments and maintaining transparent documentation of AI model decisions and data lineage. Organizations should prioritize privacy-by-design principles in all data initiatives, conduct regular ethical reviews of AI systems, and ensure diverse representation in AI development teams.
It’s crucial to establish clear protocols for handling sensitive data and procedures for addressing ethical concerns as they arise. Implementing comprehensive monitoring systems for detecting and mitigating bias in both data and models is essential. Organizations must develop and maintain clear data collection, usage, and retention policies that align with regulatory requirements and ethical principles. Success in this area requires ongoing commitment and regular review of practices to ensure they remain current with evolving ethical standards and technological capabilities.
In the end, how do you expect data engineering to shape the future of AI, and what advice would you like to give to our readers who are looking to work with the demands of AI-driven insights?
Data engineering will continue to be foundational to AI success, with an increasing focus on automated, intelligent data systems. The future will likely see greater integration between data engineering and AI operations, emphasizing automated pipeline optimization, self-healing data systems, intelligent data quality monitoring, and advanced metadata management.
As these systems become more sophisticated, the role of data engineers will evolve to focus more on architecture and strategy rather than implementation details. I advise professionals in this field to build strong foundations in data architecture principles while staying current with emerging AI technologies and tools. It’s crucial to develop technical and business acumen, gain experience with open-source technologies, and understand the entire data lifecycle from ingestion to deployment. Success in AI-driven insights requires technical expertise and a deep understanding of business contexts and ethical implications. The field will continue to evolve rapidly, making continuous learning and adaptation essential for long-term success in this dynamic environment.
Andrew Madson
Data Analytics, Data Science, and AI Evangelist at Dremio
Andrew Madson is a Data Analytics, Data Science, and AI Evangelist at Dremio where he leverages his extensive expertise in data analytics, machine learning, and artificial intelligence to drive innovation and educate the wider community. With a strong academic background, including multiple master’s degrees in data analytics and business management, Andrew deeply understands the technical intricacies involved in data-driven decision-making.