HireJohn

John

Mississauga, Ontario --:--:--

John is a Senior Big Data Engineer with 4 years of experience designing and developing big data pipelines from data sources including CSV, JSON files, RESTful APIs, SQL, non-SQL databases, and big data lake environments using Hadoop, Databricks, and recently AWS EMR with S3 storage. He has extensive skills in Python writing and optimizing SQL queries for supporting data ingestion processes feeding Data Analytics Platforms. John has 3 years of experience writing data transformation processes in pySpark. He is an AWS Certified Cloud Practitioner. He has 12 years of experience writing and troubleshooting SQL performance in relational databases including earning an Oracle SQL expert certified by Oracle. Additionally, he has 10 years of experience writing Bash Shell scripting for supporting and deploying data automation maintenance tasks from development to production.

Cloud
Data
Databases

Hire John

Skills

Years

10+

AWS

Pyspark

Hive

Python

Oracle

PowerBI

NLP

Podium

PL SQL

Azure

Apache

Pandas

CRM

MongoDB

Qlik

SDLC

Node

REST

ERP

Yarn

Hadoop

Dragon Oneshield

Zeppelin

Oozie

PostgreSQL

Sqoop

JSON

Skills

1+ Years

Developer Personality

Independent

Collaborative

Trailblazer

Conservative

Generalist

Specialist

Planner

Doer

Idealist

Pragmatist

Abstraction

Control

100

Feature Experience

SQL queries

Data Validation

Data Transformations

Big Data Pipelines

MODERATE

EXTENSIVE

EXPERT

Cultural Experience

Telecom

Construction

Marketing - Event Planning

Banking

MODERATE

EXTENSIVE

EXPERT

Portfolio

CIHI – Canadian Institute for Health Information

Sr. PySpark AWS Developer

Work Experience : 2021 - present

Current Project: Data suppression ETL processing for health reporting system

Design and implement data pipelines in pySpark running on Amazon EMR service, reading from CSV sources and writing data in s3 parquet files accessible from hive external tables. Advanced ETL transformation extracting data from multiple tables and delivering datasets in CSV files to the reporting layer.
Designed and developed a data validation framework for executing data quality checks on inbound and outbound datasets for consumers based on metadata schema definition for CSV sources in pySpark.

Skills

Python
Spark
PySpark
AWS
S3
Hive

Features

Data Suppression
ETL Processing
Data Analysis

Big Data Consultant

Skills

REST
Hadoop
Azure
PySpark
Python
PowerBI
SQL DB

Features

Data Privacy
Data Pipelines
Full Lifecycle

Python/Spark Developer

Categories

Back End
Cloud
Databases

Work Experience : 2020

Implemented data pipeline ingestion jobs in PySpark using in-house pySpark data ingestion framework running on Databricks data lake using AWS Cloud services including S3 storage. Performed python programming for data manipulation supporting ETL transformations in pySpark.

Project: Sobeys data pipeline implementation to support daily data loads with around 1 GB daily, the pipeline update multiple fact tables with a size of 15 – 20 billion rows in average.

Wrote Spark SQL queries to validate data consistency and accuracy for the data ingestion process managing 22 data sources.

Skills

Python
SQL DB
Spark
PySpark
AWS

Features

Data Pipeline Ingestion
Transformations

Big Data Engineer - Enterprise Data Platform

Categories

Data
Databases

Work Experience : 2017-2019

Designed and implemented spark data transformation jobs to create target datasets from hive tables used for machine learning models during the life cycle in production. The Machine learning models supported were product advisor and customer churn models.
Provided Hadoop application support for data ingestion jobs in Development and production environment related to HDFS, HIVE, Apache Spark, Zeppelin, and Oozie day-to-day. Extensive experience in troubleshooting big data jobs in Hadoop environment using Yarn logging system day-to-day.
Designed and developed batch jobs using Apache Oozie for data ingestion process loading data from Oracle databases.
Developed spark-shell application to extract data from Hive datasets and save it to Mongo database in JSON format for front end application access, the business application was driven by Natural Language Processing from mobile devices.
Developed data ingestion jobs in spark for loading data from Oracle tables with 90 million rows and 900 columns in Parquet and Orc file format leveraging spark partitioning parallel processing and allowed data ingestion efficiency in development and support compared to Nifi ingestion framework.
Performed performance SQL tuning in existing spark machine learning model for calculating customer retention probability running in a 10 nodes Hadoop cluster with 40% improvements in time response. Performance improvements were coming mostly from optimizing complex SQL queries and setting the right indices in Oracle tables for the extraction process and allowing spark push filtering conditions to the database.

Skills

Hive
Hadoop
Apache
MongoDB
Oracle
SQL
Spark
Oozie
Yarn
Zeppelin

Features

Transformations
Data Pipelines
Database Management

Big Data Analyst / Developer - Enterprise Data Platform

Categories

Back End
Data
Databases
PMO
UX/UI

Work Experience : 2015-2017

Developed python application for extracting and transforming data using Pandas framework reading CSV files and excel files to load metadata in the data ingestion Framework (Podium – Qlik) through Restful APIs calls. The milestone was an automated load of 250 data sources for ingesting delimited files into Hadoop.
Provided technical support in troubleshooting issues related to HDFS, Oozie, configuration from data ingestion jobs in Hadoop Cloudera environment day-to-day for data ingestion jobs developed and deployed to Production.
Served as SME in supporting and tuning PostgreSQL databases supporting data ingestion framework.
Developed a python application to extract, validate and update metadata configuration from the data ingestion framework through its restful API
Developed and designed weekly ingestion job using oozie and sqoop from oracle tables (200.000+ rows) to Hadoop Cloudera using hive partitioned tables. The job triggered data ingestion job from the Podium ingestion framework using Rest API calls in python.
Developed Python application for executing complex validations on metadata configuration from JSON files as part of the Quality Assurance process for metadata configuration for data ingestion jobs. The application used python list comprehensions collection extensively for modeling and executing validation in a functional programming framework. The utility helped to reduce in 40% the QA time for detecting metadata configuration issues for data ingestion in the Podium ingestion framework.

Skills

Python
Pandas
Podium
Qlik
REST
Hadoop
Cloudera
PostgreSQL
Sqoop
JSON
Hive

Features

Data Transformations
Troubleshooting
Quality Assurance / Validations
Metadata Configuration

Database Administrator

Skills

Oracle
SQL DB
Linux
Shell Scripting

Features

Data Modeling
Troubleshooting
Database Management

2008

Categories

Data
Databases

Work Experience : 2013

Served as SME for Oracle ETL transformations from the Policy management system for reporting purposes using Oracle PLSQL.
Provide strong expertise in optimizing oracle SQL in multiple applications running with oracle in the backend.
Provided Oracle database modeling support for insurance applications in commercial business line and broker portal (Business Management system) with strong client facing experience in gathering data business requirements.
Led the database implementation for the new policy management system ( Dragon Oneshield ) during the SDLC ( system development life cycle ) and supported Dragon OLTP database (sized in 2+ Terabytes) for two consecutive years in production ensuring stability and steady time response optimizing SQL queries when it was required.

Skills

Oracle
PL SQL
Dragon Oneshield

Features

Transformations
Code Optimization
Data Modeling
SDLC

Application Developer

Skills

Oracle
PL SQL

Features

Application Development
ERP
Data Reporting

Sr. Oracle Programmer

Skills

Oracle
SQL DB
SDLC
Oracle DB

Features

Database Architecture
Development Best Practices
Application Development

Sr. Oracle Software Engineer

Skills

Oracle
PS SQL
CRM
Python
AWS

Features

Application Development
Customizations

Let’s Get Started

John

Skills

Skills

1+ Years

2+ Years

4+ Years

5+ Years

7+ Years

10+ Years

Developer Personality

Independent

Collaborative

Trailblazer

Conservative

Generalist

Specialist

Planner

Doer

Idealist

Pragmatist

Abstraction

Control

Feature Experience

SQL queries

Data Validation

Data Transformations

Big Data Pipelines

Cultural Experience

Telecom

Construction

Marketing - Event Planning

Banking

Portfolio

CIHI – Canadian Institute for Health Information

Sr. PySpark AWS Developer

Categories

Work Experience : 2021 - present

Skills

Features

Accenture

Big Data Consultant

Categories

Work Experience : 2020-2021

Skills

Features

Loyalty One (Air Miles Program)

Python/Spark Developer

Categories

Work Experience : 2020

Skills

Features

Mercer Canada (March & McLennan Company)

Big Data Engineer - Enterprise Data Platform

Categories

Work Experience : 2017-2019

Skills

Features

TD Canada Trust (contract)

Big Data Analyst / Developer - Enterprise Data Platform

Categories

Work Experience : 2015-2017

Skills

Features

Active Network

Database Administrator

Categories

Work Experience : 2013-2015

Skills

Features

Travellers

2008

Categories

Work Experience : 2013

Skills

Features

Computer Methods (CMiC)

Application Developer

Categories

Work Experience : 2006-2008