Miguel Araújo

Miguel Araújo

Manager of Data Science @ Feedzai

Bio

Miguel Araújo is a Data Science Manager in Feedzai's engineering teams. He holds a MSc in Informatics from the University of Porto and a PhD in Computer Science from Carnegie Mellon University. Miguel has a strong background in algorithms, data mining and machine learning, having developed anomaly detection, pattern mining and prediction solutions in a variety of domains. He represented Portugal in the International Olympiads in Informatics in Mexico and Croatia, and the University of Porto in international competitions in Germany, Portugal and Spain. He has published multiple refereed articles, including two best paper awards in top venues. One patent granted and another currently pending.

Automating the full Data Science pipeline

Building a good machine learning model end-to-end requires creativity but it also includes several tedious, repetitive, time-consuming, or error-prone steps. Several flavors of AutoML -- frameworks to automate building machine learning models -- have been proposed but they mostly focused on feature generation and frequently ignored meta-data and domain knowledge. In this talk, we present an end-to-end data science pipeline automation that includes feature generation, feature selection, sample building, model creation, hyperband parameter tuning, taking advantage of domain and meta-data knowledge. We evaluate several algorithmic choices and show order-of-magnitude benefits vs human generated models.