fbpx John | DevReady

Let’s Get Started

John
HireJohn

John

Mississauga, Ontario --:--:--

John is a Senior Big Data Engineer with 4 years of experience designing and developing big data pipelines from data sources including CSV, JSON files, RESTful APIs, SQL, non-SQL databases, and big data lake environments using Hadoop, Databricks, and recently AWS EMR with S3 storage.  He has extensive skills in Python writing and optimizing SQL queries for supporting data ingestion processes feeding Data Analytics Platforms. John has 3 years of experience writing data transformation processes in pySpark. He is an AWS Certified Cloud Practitioner. He has 12 years of experience writing and troubleshooting SQL performance in relational databases including earning an Oracle SQL expert certified by Oracle.  Additionally, he has 10 years of experience writing Bash Shell scripting for supporting and deploying data automation maintenance tasks from development to production.

Hire John
Skills
Years
1
2
3
4
5
6
7
8
9
10+
S3
AWS
Pyspark
Hive
Python
Oracle
PowerBI
NLP
Podium
PL SQL
Azure
Apache
Pandas
CRM
MongoDB
Qlik
SDLC
Node
REST
ERP
Yarn
Hadoop
Dragon Oneshield
Zeppelin
Oozie
PostgreSQL
Sqoop
JSON
Developer Personality

Independent

Collaborative

Trailblazer

Conservative

Generalist

Specialist

Planner

Doer

Idealist

Pragmatist

Abstraction

Control

100
50
0
50
100
Feature Experience

SQL queries

Data Validation

Data Transformations

Big Data Pipelines

MODERATE
EXTENSIVE
EXPERT
Cultural Experience

Telecom

Construction

Marketing - Event Planning

Banking

MODERATE
EXTENSIVE
EXPERT
Portfolio

CIHI – Canadian Institute for Health Information

Sr. PySpark AWS Developer

Categories

Work Experience : 2021 - present

Current Project: Data suppression ETL processing for health reporting system

  • Design and implement data pipelines in pySpark running on Amazon EMR service, reading from CSV sources and writing data in s3 parquet files accessible from hive external tables. Advanced ETL transformation extracting data from multiple tables and delivering datasets in CSV files to the reporting layer.
  • Designed and developed a data validation framework for executing data quality checks on inbound and outbound datasets for consumers based on metadata schema definition for CSV sources in pySpark.
More

Accenture

Big Data Consultant

Work Experience : 2020-2021

Canadian Government company Data Privacy enhancement reporting solution  – Full life cycle end to end

  • Designed and developed seven data pipelines from Restful API, Hive, JSON, and CSV data sources in Hadoop Cloudera platform with pySpark tool for developing data transformation processes in storing data in hive parquet managed tables for data security PowerBI reporting layer.
  • Developed data extraction from active directory groups and users from LDAP API on LDAP server using power shell scripting.
  • Developed python data extraction multiprocessing parallel framework to download 25 million rows daily in 30 minutes from Hadoop Cloudera audit event logs using Rest API.
  • Design table partitioning in Azure SQL database to allocate Cloudera logs for storage backend serving the reporting layer in Power BI.
  • Designed and write complex SQL queries for data extraction and transformation supporting the data pipelines in pySpark running on a daily basis and consumed from the reporting layer.
  • Developed pySpark code for parsing hdfs acls and displaying the data in tabular format.

 

Project: Bank Global Risk Management ETL- ML migration from pySpark on-premises to Azure Databricks

  • Developed three data pipeline processes in Azure Data factory integrated with Databricks, the bulk of data transformation processes were developed in pySpark on Azure Databricks using Azure ADSL as a storage layer.
More

Loyalty One (Air Miles Program)

Python/Spark Developer

Work Experience : 2020

Implemented data pipeline ingestion jobs in PySpark using in-house pySpark data ingestion framework running on Databricks data lake using AWS Cloud services including S3 storage. Performed python programming for data manipulation supporting ETL transformations in pySpark.

Project: Sobeys data pipeline implementation to support daily data loads with around 1 GB daily, the pipeline update multiple fact tables with a size of 15 – 20 billion rows in average.

  • Wrote Spark SQL queries to validate data consistency and accuracy for the data ingestion process managing 22 data sources.
More

Mercer Canada (March & McLennan Company)

Big Data Engineer - Enterprise Data Platform

Categories

Work Experience : 2017-2019
  • Designed and implemented spark data transformation jobs to create target datasets from hive tables used for machine learning models during the life cycle in production. The Machine learning models supported were product advisor and customer churn models.
  • Provided Hadoop application support for data ingestion jobs in Development and production environment related to HDFS, HIVE, Apache Spark, Zeppelin, and Oozie day-to-day.  Extensive experience in troubleshooting big data jobs in Hadoop environment using Yarn logging system day-to-day.
  • Designed and developed batch jobs using Apache Oozie for data ingestion process loading data from Oracle databases.
  • Developed spark-shell application to extract data from Hive datasets and save it to Mongo database in JSON format for front end application access, the business application was driven by Natural Language Processing from mobile devices.
  • Developed data ingestion jobs in spark for loading data from Oracle tables with 90 million rows and 900 columns in Parquet and Orc file format leveraging spark partitioning parallel processing and allowed data ingestion efficiency in development and support compared to Nifi ingestion framework.
  • Performed performance SQL tuning in existing spark machine learning model for calculating customer retention probability running in a 10 nodes Hadoop cluster with 40% improvements in time response. Performance improvements were coming mostly from optimizing complex SQL queries and setting the right indices in Oracle tables for the extraction process and allowing spark push filtering conditions to the database.
More

TD Canada Trust (contract)

Big Data Analyst / Developer - Enterprise Data Platform

Work Experience : 2015-2017
  • Developed python application for extracting and transforming data using Pandas framework reading CSV files and excel files to load metadata in the data ingestion Framework (Podium – Qlik) through Restful APIs calls. The milestone was an automated load of 250 data sources for ingesting delimited files into Hadoop.
  • Provided technical support in troubleshooting issues related to HDFS, Oozie, configuration from data ingestion jobs in Hadoop Cloudera environment day-to-day for data ingestion jobs developed and deployed to Production.
  • Served as SME in supporting and tuning PostgreSQL databases supporting data ingestion framework.
  • Developed a python application to extract, validate and update metadata configuration from the data ingestion framework through its restful API
  • Developed and designed weekly ingestion job using oozie and sqoop from oracle tables (200.000+ rows) to Hadoop Cloudera using hive partitioned tables. The job triggered data ingestion job from the Podium ingestion framework using Rest API calls in python.
  • Developed Python application for executing complex validations on metadata configuration from JSON files as part of the Quality Assurance process for metadata configuration for data ingestion jobs. The application used python list comprehensions collection extensively for modeling and executing validation in a functional programming framework. The utility helped to reduce in 40% the QA time for detecting metadata configuration issues for data ingestion in the Podium ingestion framework.
More

Active Network

Database Administrator

Categories

Work Experience : 2013-2015
  • Provided Oracle Database Modeling services for Application portfolio outdoors and recreation systems enhancements running on Databases in RAC cluster 7 x 24 mode.
  • Provided Oracle SQL optimization on queries running on Oracle RAC for tables between 0.5 and 1 Terabyte in size.
  • Developed maintenance routines in Oracle PLSQL for security management.
  • Oracle Database monitoring using Oracle Enterprise Manager
  • Linux shell scripting
  • SQL/PLSQL programming
  • Database Modelling logical and physical design
  • Performance tuning troubleshooting
More

Travellers

2008

Categories

Work Experience : 2013
  • Served as SME for Oracle ETL transformations from the Policy management system for reporting purposes using Oracle PLSQL.
  • Provide strong expertise in optimizing oracle SQL in multiple applications running with oracle in the backend.
  • Provided Oracle database modeling support for insurance applications in commercial business line and broker portal (Business Management system) with strong client facing experience in gathering data business requirements.
  • Led the database implementation for the new policy management system ( Dragon Oneshield ) during the SDLC ( system development life cycle ) and supported Dragon OLTP database (sized in 2+ Terabytes) for two consecutive years in production ensuring stability and steady time response optimizing SQL queries when it was required.
More

Computer Methods (CMiC)

Application Developer

Categories

Work Experience : 2006-2008
  • Provided application development support in Oracle Forms 6i/10g to the ERP company’s product in the financial modules GL, AP and AR. The product’s support was targeted to the biggest construction companies from US and Canada
  • Developed custom billable enhancements to the application modules requested by the clients. Designed and wrote complex logic for processing large amount of financial records for reporting purposes using analytic functions and PLSQL arrays achieving good time response.

Skills

More

Colombian Center Professional Studies

Sr. Oracle Programmer

Categories

Work Experience : 2003-2004
  • Provided database architecture, analysis, design, and business requirements gathering to build a new client-server tuition billing system in Oracle database 9i and Oracle Forms 6i.
  • Led two Oracle developers in the application modules implementation and development best practices.
  • Developed complex routines for the billing system in dynamic Oracle PLSQL to achieve high flexibility in configuring logic for billing charges calculations.
  • Developed a critical payment interface between the tuition billing system and bank accounts.
More

Open Systems

Sr. Oracle Software Engineer

Work Experience : 2000-2002
  • Provided application development support in Oracle Forms 6i/10g to the ERP company’s product in the financial modules GL, AP, and AR. The product’s support was targeted to the biggest construction companies from US and Canada
  • Developed custom billable enhancements to the application modules requested by the clients. Designed and wrote complex logic for processing a large number of financial records for reporting purposes using analytic functions and PLSQL arrays achieving good time response.
More

Hire John