Essam is an innovative, experienced, results-driven Sr. Architect with 30+ years of experience, combining hands-on, architectural, and team leadership talents to design and develop high-volume data products and microservices architecture. Proven expertise in roles where the organization’s strategy is to create breakthrough solutions to make the best use of the data using AI and analytics. Passionate about building superior data-driven products using Structured and Unstructured Data Integration, Knowledge Representation; Semantic Web; Linked Data, Graph Analytics; NLP; and Machine Learning. Leverage technical, business, and analytical skills to align enterprise technology strategies and processes with strategic business plans through business process re-engineering, and innovative architectural solutions. Skilled at conveying complex technical solutions to individuals and groups from professionals to C-level executives.
Hire EssamArchitecting and implementing of MSK MIND computational oncology AI platform to integrate multidimensional datasets and enable advanced analytics including machine learning and artificial intelligence, using cutting edge big data and AI technologies, to predict patient outcomes and ultimately be translated to improved patient care:
• Implemented radiology and pathology data ingestion pipeline and integrated the data into a virtual data lake.
• Implemented ML pipeline by doing data mining for clinical data radiology reports to predict metastatic events using CNN and word embedding.
Technologies: Spark, AirFlow, MLOps (guild), distributed computing (Ray/Dask), Delta Lake, CI/CD (CircleCI, Flux CD), ML/AI, HPC, Python, PyTorch, Data Virtualization (Dremio).
Elsevier is an information and analytics company, and one of the world’s major providers of scientific, technical, and medical information.
“Find all compounds share multiple targets with drug compounds and”, “Study the most significant pathways that are associated with a given adverse drug reaction”: These are the kind of natural language questions, and more, that could be answered with this patented Semantic Data Integration, Search & AI Platform for Life Sciences – Entellect. The platform ingests structured and unstructured data streams, extracts knowledge from the content using NLP and Machine Learning, semantically integrates the data, and creates data products for consumption system antic search engine, AI platform, and graph analytics engine.
Role included:
Pearson is the world’s leading learning company, providing a range of education products and services to institutions, governments, and individual learners. My role was researching new data products, leading the development of PoC/Prototypes, and in some instances Beta releases, and developing of product roadmap and strategy.
• Integrated nine different Product and Customer Masters (PMI/CDI) by architecting a Data Hub based on Semantic Data Integration, Graph modeling, and ontology mapping approaches. Used OWL Ontology, Talend ETL, Oracle DB, Oracle ESB, and Java.
• Architected a data-driven approach for content development, publishing, and analytics to improve learning efficacy and outcomes by implementing a “Data Lake” with an underline “Education Graph” using Big Data and Graph/Linked Data paradigms and technologies. The Data Lake enabled global users to discover, search, share, reuse, manage and collaborate throughout the content life-cycle, with near real-time Data & Content Analytics. Used Apache Kafka for real-time distributed messaging and data streaming to build the data pipelines, Hadoop/Pig for graph ETL and building, and OntoText GraphDB as RDF triple/graph store. Deployed the Data Lake in three global regions, and used Apache Yarn and Twill for distributed application management.
• Developed a Predictive Model to identify Risks and opportunities in the UK EdExcel Exam business to predict schools likely to switch providers. Ingested and aggregated 5 years of GSCE grade data sets into Hbase using Pig. Fetched data into R Language in-memory datasets using rhbase. Trained binary classifier (stayed vs. switched) using caret package in R. Initial data analysis shows 67% precision and 32% recall.
• Increased content share and reuse by architecting Chaski Enterprise Search & Retrieval platform. Used ElasticSearch, MongoDB, and Java to build content sources connectors.
• Implemented Recommendation Engine using Similarity Search and Content Classification for the Learning Object Discovery Engine content recommendation and curriculum standards mapping. Used Silk for link discovery, ElasticSearch Search, Apache Jena, Python, and Apache Jena TDB Graph Store.
• Increased content discoverability by 75% by architecting text analytics and NLP pipeline based on Linked Data and Semantic Web principles to enable automatic content semantic enrichment throughout the content development life-cycle. Used DBpedia Spotlight, UIMA, Apache Jena, and Java.
First Genetic Trust is a provider of secure electronic clinical data collection (EDC), Pharmacogenomics clinical trial management, and secure genetic data banking (data warehouse) products and services. My primary responsibility included: gathering internal and external requirements, architecting the product solution, integrating with client’s and partner’s applications, Leading onsite/offshore development teams, coordinating
across functional teams, delivery of development artifacts, and managing client expectations. Some of my accomplishments are:
Product Management:
• Worked with senior management to define product development strategy and roadmap.
• Managed the entire product life cycle to successfully deliver three enTRUST releases and four minor releases.
• Defined and documented functional specifications and software architecture for the company’s product line, and created reference implementations.
• Designed the pheno-genotypic mapping graph/ontology model.
• Evangelized the product by developing technical white papers and presentations.
• Enabled FDA 21 CFR Part 11 compliance across the product line.
Integration Solutions
• Business Cases: (1) Enroll patients, capture clinical data, extract genetic data, and transform and load to the genetic bank. Integration Scenarios: Architected SOA implementation using process integration, data integration, EAI, and Role-Based Access portals (patient/physician/researcher). (2) Integrate patients’ phenotypic and genetic data into a Genetic Safe. (3) Integrate the Generic Safe with the NCI Center for Bioinformatics Cancer Grid.
• Integration Scenarios: Data integration, data federation, and composite application.
• Technical Environment: WebLogic, Oracle, J2EE, LDAP, JAAS. Globus, DQP, Oracle, J2EE, data encryption technology, OWL Ontology, and Search.
Participated in pre-sales/post-sales presentations and prototyping, defined data warehouse, and web-based architectural solutions for clients, create software architecture and design documents, worked with cross-functional teams and lead development teams towards successful implementation.
Selected list of projects:
• HealthMarket, inc, CT (www.HealthMarket.com B2C Online Health Plans and Services)
• ARMADA Hess, Woodbridge, NJ (www.Hesstoytruck.com B2C Online Marketing)
• Jones of New York, PA (Sales Tracking Data Warehouse)
• Avon Products, NY (International Product Distribution Data Warehouse)
Worked with clients, and managed onsite and offshore project teams for Container Shipping application development. Enforced standards, code-reuse, code reviews, and unit/integration/system/acceptance testing.
Technical environment: Client/Server (PowerBuilder), IBM Mid-range AS/400, Oracle, PL/SQL.
Selected list of projects:
• Atlantic Container Line, S. Plainfield, NJ (Container Shipping Application)
• United Arab Shipping, Cranford, NJ/Kuwait (Container Shipping Application)
From 1987 to 1993 worked as Software Engineer and Systems Engineer on consulting engagements executing projects (MAPICS/ERP implementations, Distributed DB/400 servers, Oracle Forms application) at multinational firms in Egypt (General Motors Egypt), Germany (IBM), and the United States.