Anh Chu logoAnh ChuContact me

Anh Chu

Data Architect · Solutions Architect

Seattle, Washington

Data Architect passionate about working with data and bringing insights closer to business users. Experience across data engineering, big data, data science, data warehousing, and back-end databases on GCP, Azure, and AWS.

Experience

  1. Specialist Solutions Architect · Databricks

    Deliver technical leadership to enterprise clients on architecting and implementing data modernization solutions, specializing in Delta Lake, big data platforms, Apache Spark, SQL optimization, and advanced data engineering practices.

    • Lead data modernization for strategic enterprise accounts, from architecture to production.
    • Design scalable lakehouse solutions on Delta Lake, Spark, and Databricks.
    • Tune Spark and SQL workloads to cut cost and accelerate pipelines.
    • Partner cross-functionally to drive platform adoption and business outcomes.
  2. Sr Specialist Solutions Engineer · Databricks

    Provide technical guidance to strategic customers in designing and implementing enterprise data modernization projects using Delta Lake, big data, Spark and SQL optimization, and data engineering.

  3. Software Engineer · Microsoft

    Software Engineer building, configuring, and managing back-end infrastructure for a video-powered social-learning platform owned by Microsoft.

    • Migrated the data warehouse from AWS Redshift to a Synapse lakehouse, end to end.
    • Cut query times 4–5× through data-loading and table-design optimization.
    • Built batch and streaming pipelines from transactional and telemetry sources into the lakehouse.
    • Streamed CDC with Debezium, Kafka, and Azure EventHub; transformed data in Synapse Spark.
    • Shipped a reliable lakehouse→CRM sync via REST API, with validation and monitoring.
    • Operated the Azure platform — storage, database, warehouse, Kubernetes, CI/CD — for high availability.
  4. Software Engineer II · Walmart Global Tech

    Software Engineer building an end-to-end analytical Supply Chain web application to track inventory and transportation from Suppliers to Stores for international markets.

    • Led 4 engineers migrating an on-prem Teradata warehouse to GCP across 10 markets (BigQuery, Dataproc, PySpark, Airflow).
    • Boosted application performance 70% via caching, indexing, and in-database aggregation.
    • Cut codebase complexity 80% through refactoring, SQL cleanup, and CI/CD.
    • Built reverse-ETL pipelines serving warehouse analytics metrics back into the app's MSSQL database for in-product supply-chain insights.
    • Shipped new supply-chain metrics with SQL and Spark, validated for data quality.

Education

  • M.S. in Computer ScienceHarrisburg University of Science & Technology
  • M.S. in Supply Chain ManagementUniversity of Texas at Dallas

Skills

Data Engineering
PySpark · Delta Lake · Spark Streaming · Kafka · Airflow · Databricks
Data Science & ML
Machine Learning · Spark ML · MLflow · GenAI · Tableau
Languages & Query
Python · SQL · Linux / Shell
Cloud & Platforms
Azure Synapse · AWS Redshift · BigQuery · Docker · Kubernetes