Toni Ventura

Healthcare Data Analyst | SQL • ETL • Data Visualization • Machine Learning

SQL & Data Analysis

Healthcare SQL Projects

Healthcare Claims Analysis

Analyzed large-scale claims data using complex SQL joins and window functions to identify cost trends and reduce variance across providers.

Tech stack: SQL, PostgreSQL, BigQuery, DBeaver

View on GitHub

Miscellaneous SQL Projects

Life Expectancy Analysis

Built SQL queries to clean, analyze Life Expectancy throughtout the world

Tech stack: MySQL, Power BI

View on GitHub

Data Engineering

FHIR to BigQuery Pipeline

Built a GCP-based ETL pipeline using Python to transform and load FHIR JSON data into BigQuery for analysis.

View on GitHub

Data Science

CKD Prediction Pipeline — Clinical Text + Labs (NLP, TF-IDF)

Machine learning pipeline that combines TF-IDF features from clinical notes with numeric lab values (eGFR, creatinine) to predict CKD.

  • Text preprocessing + TF-IDF vectorization for clinical notes.
  • Numeric feature scaling for lab values using StandardScaler.
  • ColumnTransformer + Pipeline architecture for preprocessing + modeling.
  • RandomForestClassifier with class_weight='balanced' and model persistence via joblib.

Tech stack: Python, Pandas, Scikit-learn, NLP, TF-IDF, Random Forest

View on GitHub

Power BI & Visualization

Healthcare Claims Analysis Dashboard

Power BI Dashboard

  • An interactive Power BI dashboard connected to Google BigQuery that analyzes healthcare claim values and encounter types across multiple facilities. Developed using DAX measures for dynamic filtering and real-time insights.

View on GitHub

World Life Expectancy Dashboard

Power BI Dashboard

  • An interactive Power BI dashboard connected to MySQL that analyzes Life Expectancy from 2007-2022. Developed using DAX measures for dynamic filtering and real-time insights.

Tech stack: MySQL, Power BI

View on GitHub