×
Dhia Hmila

Dhia Hmila

Senior Data Scientist at AXA France

Colombes, Ile-de-France, France, 92700
+33610201606

Background


About

About

I am a Senior Data Scientist and I specialize in fraud/anomaly detection and document classification. My expertise extends to leading-edge technologies, including machine learning, computer vision, and natural language processing. Additionally, I have a strong understanding of MLOps practices, ensuring seamless integration of ML models into production environments.

Professional experience

Professional experience

  • Senior Data Scientist, Data Science Team, AXA France

    June 2021 - Present

    • Developed text extraction AI Systems using computer vision to read documents (Driver's license, National ID card, etc.). The models processed over 6M documents in batches using PySpark Jobs

    • Developed a range of reusable python packages for document processing (OCR, object detection, etc.) The packages are designed using the strategy pattern to quickly experiment with different algorithms / models.

    • Automated the indexing of 70% of AXA France's incoming documents by developing a retrainable document classification AI System (exposed as an API).

    • Created a YAML-based templating system to streamline the creation of Azure ML pipelines.

    • Built reusable Azure ML Pipelines (Continuous Training) to automatically retrain AI Systems and publish to Model Registry.

    • Designed Azure DevOps CI/CD/CT pipeline templates to be used in the Team's projects.

    • Led the hiring process, onboarding new team members and facilitating efficient knowledge transfer.

    • Coordinated the annotation process with annotators and SMEs (subject matter experts) to ensure consistent datasets across various projects.

    • Led AXA's 'Python COP' initiative, organizing a series of monthly programming talks and events on topics such as testing, web scraping, packaging, code quality, etc.

    • Teached a series of training sessions/courses on Python, Data Manipulation (pandas & pyspark), Software Engineering best practices for over 30 AXA collaborators.

  • Graduate Data Scientist, Data, Fraud, Waste & Abuse Team, AXA France

    May 2020 - June 2021

    • Engineered an advanced AI-driven Fraud, Waste, and Abuse Detection Tool that inspects different scopes, including Claims, Health Providers, Beneficiaries, and more. This innovative tool played a pivotal role in achieving savings exceeding 150k euros in 2022.

    • Designed and trained various fraud detection models for fraudulent patterns as part of the FWA detection tool.

  • Data Scientist, Shift Technology

    April 2019 - March 2020

    • Applied Natural Language Processing (NLP) techniques to cluster Claim managers' notes, leveraging advanced visualization methods to uncover and identify emerging fraud patterns.

    • Built new AI Models for Shift's fraud detection solution & Deployed their solution for a new customer.

  • Research Internship, U2IS - ENSTA Paris

    May 2018 - August 2018

    • Created a synthetic image dataset by building a simulation of an agent in an in-house environment and using Raytracing to render realistic images.

    • Studied Catastrophic Forgetting in Incremental Learning of Principal Components of generated Images

Projects

Projects

  • mlflower: Lightweight orchestration tool for mlflow projects

    November 2023 - Present

    Tool to extend mlflow projects with inbuilt multi step workflow orchestration

  • pysira: publish resumes in different formats

    February 2023 - Present

    CLI tool to export 'jsonresume' files to different formats (HTML, TeX, PDF, etc.) and languages.

    • Created a CI/CD pipeline (Github Actions) to publish resumes in different themes, formats and languages

    • This resume was created using pysira: hmiladhia.github.io, hmiladhia.github.io/cv.pdf, hmiladhia.github.io/fr/kendall or hmiladhia.github.io/resume.pdf

  • nbmanips: Split, merge and convert IPython Notebooks

    May 2021 - Present

    An open-source package containing a collection of tools to manipulate Notebooks via a python or CLI.

    • Developed a python package/CLI to manipulate (Split, Merge, ...) Jupyter Notebooks

    • Used CI/CD piplines (Github Actions) to Lint, Test and publish the package to Pypi

  • Spot-Language: Programming Language Detection

    January 2020

    Spot-Language is a classification model to detect the language used in code snippets.

    • Assembled a training dataset using public repositories on GitHub.

    • Built an ML experiment to train an NLP model for language classification.

    • Improved model interpretability using Lime.

    • Deployed a demo Web Application of the ML model using flask.

  • Dmail

    April 2020 - May 2020

    A python package to write & send emails in markdown format

    • Developed a python package to write and send emails in markdown format

    • Used CI/CD piplines (Github Actions) to Lint, Test and publish the package to Pypi

  • DroNet

    April 2019 - May 2019

    DroNet for the forest environment

    • Built a dataset from different sensors and a camera of a terrestrial mobile robot placed in a forest environment.

    • Fine-tuned the DroNet neural network on the new dataset and tested the model in a Bebop drone

  • ENIT Robots competition

    April 2017

    Participation in ENIT Robots competition as team leader. Obtained 3rd place.

  • ENSIT Robots competition

    December 2017

    Development in python of a trajectory recognition algorithm for a line following robot equipped with Raspberry Pi 3 and a camera.

Skills

Skills

  • Machine Learning

    mlflow

    scikit-learn

    Tensorflow

    pytorch

    trax

    pyod

    transformers

    huggingface

    pytorch-lightning

  • Computer Vision

    OpenCV

    torchvision

    Vision Transformers

    VGG16

    YOLO

    Feature Matching

    tesseract

    easy-ocr

    paddleOCR

  • Natural Language Processing

    nltk

    spaCy

    Gensim

    torchtext

    langchain

  • Data Wrangling

    pandas

    pyspark

    SQL

    ETL

  • Programming Languages

    Python

    Rust

    JavaScript

    C/C++

    C#

  • MLOps

    Github Actions

    Azure Pipelines

    Git

    Linux

    Docker

    Kubernetes

  • Cloud

    Azure ML

    Databricks

    Heroku

  • Web Scraping

    selenium

    requests

    BeautifulSoup4

  • Web

    HTML

    CSS

    JavaScript

    fastapi

    flask

    Dash

Education

Education

  • Artificial Intelligence, Engineering double degree,  ENSTA ParisTech / ENIT

    September 2016 - September 2019

  • Mathematics & Physics, Preparatory School,  IPEIT / Esprit Prépa

    September 2014 - June 2015

Certificates

Certificates