Anh Chu

Data Architect · Solutions Architect

Seattle, Washington

anhcodes.dev github.com/anhhchu in/anhhchu

Data Architect passionate about working with data and bringing insights closer to business users. Experience across data engineering, big data, data science, data warehousing, and back-end databases on GCP, Azure, and AWS.

Experience

Specialist Solutions Architect · Databricks
2024 — Present
Deliver technical leadership to enterprise clients on architecting and implementing data modernization solutions, specializing in Delta Lake, big data platforms, Apache Spark, SQL optimization, and advanced data engineering practices.
- Lead data modernization for strategic enterprise accounts, from architecture to production.
- Design scalable lakehouse solutions on Delta Lake, Spark, and Databricks.
- Tune Spark and SQL workloads to cut cost and accelerate pipelines.
- Partner cross-functionally to drive platform adoption and business outcomes.
Sr Specialist Solutions Engineer · Databricks
2023 — 2024
Provide technical guidance to strategic customers in designing and implementing enterprise data modernization projects using Delta Lake, big data, Spark and SQL optimization, and data engineering.
Software Engineer · Microsoft
2022 — 2023
Data Engineer building, configuring, and managing back-end infrastructure for a video-powered social-learning platform owned by Microsoft.
- Migrated the data warehouse from AWS Redshift to a Synapse lakehouse, end to end.
- Cut query times 4–5× through data-loading and table-design optimization.
- Built batch and streaming pipelines from transactional and telemetry sources into the lakehouse.
- Streamed CDC with Debezium, Kafka, and Azure EventHub; transformed data in Synapse Spark.
- Shipped a reliable lakehouse→CRM sync via REST API, with validation and monitoring.
- Operated the Azure platform — storage, database, warehouse, Kubernetes, CI/CD — for high availability.
Software Engineer · Walmart Global Tech
2020 — 2022
Data Engineer building an end-to-end analytical Supply Chain web application to track inventory and transportation from Suppliers to Stores for international markets.
- Led 4 engineers migrating an on-prem Teradata warehouse to GCP across 10 markets (BigQuery, Dataproc, PySpark, Airflow).
- Boosted application performance 70% via caching, indexing, and in-database aggregation.
- Cut codebase complexity 80% through refactoring, SQL cleanup, and CI/CD.
- Built reverse-ETL pipelines serving warehouse analytics metrics back into the app's MSSQL database for in-product supply-chain insights.
- Shipped new supply-chain metrics with SQL and Spark, validated for data quality.

Education

M.S. in Computer Science — Harrisburg University of Science and Technology
M.S. in Supply Chain Management — University of Texas at Dallas

Skills

Data Engineering: PySpark · Delta Lake · Iceberg · Spark Streaming · Kafka · CDC
Data Warehousing & Database: Data Warehousing · Semantic Modeling · Postgres · MSSQL Server · Dimensional Modeling · dbt · Power BI · Tableau
Data Science & ML: Machine Learning · Spark ML · MLflow · GenAI
Languages & Query: Python · SQL · Linux / Shell
Cloud & Platforms: Databricks · Snowflake · Azure · AWS · BigQuery

Anh Chu

Experience

Specialist Solutions Architect · Databricks

Sr Specialist Solutions Engineer · Databricks

Software Engineer · Microsoft

Software Engineer · Walmart Global Tech

Education

Skills