Senior Data Engineer with experience developing pipelines and solutions using Microservices modules to collect, process, aggregate, and provide batch and stream data to the user using GCP or AWS services. Previously spent 13 years designing Data Warehouse (DW) and Business Intelligence solutions in areas including finance, customer service, telecom, public and private health, mining, insurance, retail, sales, marketing and IT.
Stacks used on projects:
S3, EMR, Apache (Kafka, Nifi, Airflow),Glue, Athena, QuickSight, IAM, AWS IOT, Hadoop, Hive, Hbase, Spark, Python / PySpark, SQL Server, PostgreSQL, Oracle, Redis, DynamoDB, MongoDB, EC2, RDS, Redshift, Dataflow, Dataproc, Bigquery, Dataprep, Bigtable, Apache NiFi, Keras, Pandas, Apache Beam, GCP, Tensor Flow, MongoDB, QlikView, Tableau, Qlik Sense, Pentaho, Microstrategy, IBM Datastage, SAS Guide, COGNOS, ODI, OBIEE, BO, Microsoft BI (SSIS, SSAS, SSRS), SAP BW, Hana, Pentaho
Statistical data analysis and data mining packages SAS, SPSS, MatLab, R, STATA, Excel.
Responsible for speaking with stakeholders to understand needs then create cloud solutions to solve problems involving large amounts of data. Also responsible for design, implementation, and support of platform providing secured access to large datasets, analyze logs, debug Python scripts, fix code, and customize development.
Responsible for creating, maintaining, and optimizing scalable data pipelines on AWS, writing code for manipulating data using both Python and complex SQL queries, cloud resources tuning adjustments, structured and unstructured data manipulation, both for batch and streaming data flow, as well as optimizing persistence on Data Lake.
Developed solution able to process huge volume data using Google Cloud DataFlow. Solution was able to copy 15 Terabytes of data on SQL Server and wrote new structure to Google Cloud. Extracted XML and JSON data stored in blobs column and saved to Cloud Storage. Also audited all processes to ensure the content at the source was the same at the destination and saved audit results in BigQuery tables.
Data solutions consultants and engineer. Responsible for maintaining database. Monitored performance and alerted data for all devices. Developed repeatable and customizable analytic processes for reporting on business impacts. Development, testing, and deployment of the solution using SCRUM methodology.
Prepared data for Business Intelligence solutions, including identification of missing information as well as quality and lack of quality to the data owners. Performed transformation of metadata for use. Evaluated use of technology and utilized standard methodologies. Determined best-fit Business Intelligence technology, methodology, and visualizations given data and business requirement criteria. Developed database queries implementing scalable and efficient query. Lead the evaluation and validation of quality performance on QlikView business intelligence and data warehouse solutions. Implementation of processes and other data movement requirements.