Tesi e Stage

Questa pagina include l’elenco delle opportunità di stage attualmente disponibili, molte delle quali sono anche ideali per preparare una tesi di laurea triennale o magistrale

Opportunità di Tesi

Forecasting Financial Future:

Machine Learning Approaches to Profitability Prediction

Progetto

The thesis project is a collaboration between Mind Lab and the “Centro Studi Aziendali” within the Department of Business and Law at the University of Milan-Bicocca.
The idea is to explore Machine Learning and Deep Learning techniques for automatically analyzing companies’ financial statements. Since 2010, companies have been mandated to compile their finance statements in XBRL, a machine-readable format. Furthermore, recent legislation requires the tagging and machine-readable publication of each part of the financial statement, including additional notes. The idea is to train Machine Learning algorithms on financial statements to predict a company’s future profitability using historical data. We started with tabular data and aimed to incorporate the information contained in the additional notes as tagged text.
The thesis aims to develop each part of the Machine Learning pipeline required to manage and predict from finance statements data. The pipeline includes (but not limited to):

data cleaning and preprocessing
feature selection and feature reduction
model selection and model training
model explanation

An important focus will be placed on the interpretation and analysis of the results, involving stakeholders with a finance background to understand the practical capabilities of the trained models.

Team

Prof. Enza Messina (enza.messina@unimib.it)
Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
Marco Piazza (m.piazza23@campus.unimib.it)

Requisiti

Python
- data manipulation libraries (numpy, pandas, scipy)
- ML and DL libraries (sklearn, keras

But open to different programming languages popular in AI fields

Captions ricche di Odio

Utilizzo di LLM per l’arricchimento di Image Captions

Progetto

Recenti studi hanno evidenziato lacune nei modelli generativi che non sono in grado di generare caption esaustive. In particolare, elementi peculiari per l’individuazione di contenuti di odio (es. utili alla definizione di minoranze rappresentate o relative alla struttura dell’immagine) non vengono identificati e rappresentati nelle captions, portando all’impossibilità di un modello predittivo di individuare tali contenuti. Questa tesi vede la definizione di specifici prompt per l’estrazione di tali informazioni in vista dell’arricchimento della captions generata e l’utilizzo di tale porzione di testo per l’implementazione di un modello di individuazione di contenuti di odio.

Team

Prof. Elisabetta Fersini (elisabetta.fersini@unimib.it)
Giulia Rizzi (g.rizzi10@campus.unimib.it)

Requisiti

È richiesta una conoscenza di base delle seguenti librerie. Ulteriori competenze verranno apprese sul campo.

Python
- ML and DL libraries (sklearn, Hugging Face – transformers, LLM)

MEMECap:

Generazione automatica di Captions da Meme di odio

Progetto

Nell’era digitale, i meme hanno acquisito un’incredibile popolarità come forma di espressione virale e umoristica. Tuttavia, la loro diffusione su piattaforme online ha spesso sollevato preoccupazioni riguardo alla presenza di contenuti di odio o discriminatori. Questa tesi propone di esaminare l’efficacia di un modello generativo allenato su MEMECap, un dataset di meme, nell’individuare aspetti legati all’odio, offrendo un’analisi approfondita dei risultati e suggerendo possibili migliorie.

Team

Prof. Elisabetta Fersini (elisabetta.fersini@unimib.it)
Giulia Rizzi (g.rizzi10@campus.unimib.it)

Requisiti

È richiesta una conoscenza di base delle seguenti librerie. Ulteriori competenze verranno apprese sul campo.

Python
- ML and DL libraries (sklearn, Hugging Face – transformers)

..:

Early salivary diagnosis using Raman Spectroscopy and Machine Learning

Progetto

Raman Spectroscopy is a technique based on the inelastic scattering of a monochromatic light beam used to observe low-frequency modes in a target molecular system. The scattering pattern can be viewed as a sort of ”fingerprint” that encodes information about the chemical composition of a given target.
Recently, Raman spectra of biofluids, and in particular saliva, have been proposed for medical diagnosis, leveraging the power of Machine Learning for automatic sample classification.
These thesis projects will be developed in the context of CORSAI, a funded European project with the final aim of developing a new diagnostic tool based on a combination of Raman Spectroscopy and Machine Learning, in which Mind Laboratory is a partner of a multidisciplinary team, composed also by biologist and clinicians.
References :

Carlomagno et al (https://www.nature.com/articles/s41598-021-84565-3)
Bertazioli et al (https://drive.google.com/file/d/1P3rxuu0xb5Jdzy9BC5bp9JrYEgcf9meC/view?usp=drive_link)

Projects thesis are available to improve the model and the pipeline (some ideas are reported in the next lines, but it is up to you to find the most suitable activity for your interest).

Transfer Learning:
The objective of this project is to investigate the impact of Transfer Learning on the results. Our model can be used as a benchmark model for TL.
References :

Attention mechanisms:
Numerous attention mechanisms have been developed and applied in both NLP and Computer Vision fields. An attempt to implement attention mechanisms for spectral data has already been proposed (in references); the objective of this thesis is to develop and adapt an attention mechanism for Raman spectroscopy data and to analyze the impact of attention mechanisms on classification performance.
References :

https://arxiv.org/abs/2111.07624 (+ repo github)
https://www.nature.com/articles/s41598-023-28730-w

Explainability:
Deep Learning models have high performance on many different classification tasks; on the other hand, they have high complexity which makes it difficult to understand the ratio behind proposed classifications. For this reason, in the last few years, some explainability techniques have been proposed in the literature. We have already developed two different explainability techniques for our model: one based on gradient analysis and the other on Game Theory. The aim of this project is to enhance existing methods and explore new ones, focusing on overcoming problems related to adapting methods created for different data types to the Raman Spectroscopy field.
References :

Comparison between Raman Aramis and BWS465-785:
for our project, the data are collected with two different spectrometers: Aramis and BWS465-785. The main difference is that the second one is portable. This project aims to investigate the difference in the shape and the classification of the spectra collected with the two instruments.
References :

https://drive.google.com/file/d/1uavamCN3zFkHkjKt9w9OqhTBMdWrbnnk/view?usp=share_link

Team

Prof. Enza Messina (enza.messina@unimib.it)
Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
Marco Piazza (m.piazza23@campus.unimib.it)

Requisiti

Concepts of Machine Learning and Deep Learning
Python
- Machine Learning and Deep Learning libraries (sklearn, keras, pytorch)
- Data manipulation library (numpy, pandas, …)

Beyond Shap:

Game Theory for eXplainable Artificial Intelligence

Progetto

Shap is one of the most used approaches for eXplainable Artificial Intelligence (XAI) due to two main reasons: it offers an efficient and readily available Python implementation and it is based on an interesting theoretical framework. The basic idea behind this approach is to treat the features of a Machine Learning problem as players in a cooperative game with transferable utility and compute the Shapley Value as the solution of this game. The main idea of this thesis project is to explore the limitations related to the currently available formalization of the problem and investigate different possibilities. Some possible examples are:

how to simulate the absence of a feature
how to approximate the Shapley Value
how to model the characteristic function
how to sample interesting coalitions of players
use solution concepts from Game Theory other than Shapley value

Note: this project is highly theoretical and requires an understanding of the basic concepts of Game Theory (which could be developed in the early stage of the project, in collaboration with the supervisors). However, it is composed also of an experimental and computational part to validate and verify the formulated hypothesis.
References:

Team

Prof. Enza Messina (enza.messina@unimib.it)
Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
Marco Piazza (m.piazza23@campus.unimib.it)

Requisiti

Concepts of Machine Learning and Deep Learning
Python
- Machine Learning and Deep Learning libraries (sklearn, keras, pytorch)
- Data manipulation library (numpy, pandas, …)

..:

Assessing the effectiveness of XAI approaches with synthetic data

Progetto

Currently, one of the primary challenges in eXplainable Artificial Intelligence (XAI) research is the absence of a globally accepted gold-standard method for evaluating the quality of provided explanations. This thesis project aims to comprehend the current state-of-the-art in this specific aspect of XAI research and to implement a technique based on synthetic data generation. The result will encompass a comparison between the developed method with those available in the literature and also evaluate the most famous XAI techniques (Shap, Lime, …) with the newly developed framework.
References:

https://arxiv.org/abs/2007.10532

Team

Prof. Enza Messina (enza.messina@unimib.it)
Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
Marco Piazza (m.piazza23@campus.unimib.it)

Requisiti

Machine Learning concepts
Python
- Machine Learning implementation libraries (sklearn, …)
- Data manipulation library (numpy, …)

Opportunità di Stage

Forecasting Financial Future: todo

Machine Learning Approaches to Profitability Prediction

Progetto

The thesis project is a collaboration between Mind Lab and the “Centro Studi Aziendali” within the Department of Business and Law at the University of Milan-Bicocca.
The idea is to explore Machine Learning and Deep Learning techniques for automatically analyzing companies’ financial statements. Since 2010, companies have been mandated to compile their finance statements in XBRL, a machine-readable format. Furthermore, recent legislation requires the tagging and machine-readable publication of each part of the financial statement, including additional notes. The idea is to train Machine Learning algorithms on financial statements to predict a company’s future profitability using historical data. We started with tabular data and aimed to incorporate the information contained in the additional notes as tagged text.
The thesis aims to develop or validate some parts of the Machine Learning pipeline required to manage and predict from finance statements data. The pipeline includes (but not limited to):

data cleaning and preprocessing
feature selection and feature reduction
model selection and model training
model explanation

Team

Prof. Enza Messina (enza.messina@unimib.it)
Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
Marco Piazza (m.piazza23@campus.unimib.it)

Requisiti

Python

Data manipulation libraries (numpy, pandas, scipy)
ML and DL libraries (sklearn, keras)

..:

Early salivary diagnosis using Raman Spectroscopy and Machine Learning

Progetto

Carlomagno et al (https://www.nature.com/articles/s41598-021-84565-3)
Bertazioli et al (https://drive.google.com/file/d/1P3rxuu0xb5Jdzy9BC5bp9JrYEgcf9meC/view?usp=drive_link)

Projects thesis are available to improve the model and the pipeline (some ideas are reported in the next lines, but it is up to you to find the most suitable activity for your interest).

Transfer Learning:
The objective of this project is to investigate the impact of Transfer Learning on the results. Our model can be used as a benchmark model for TL.
References :

https://drive.google.com/file/d/1uavamCN3zFkHkjKt9w9OqhTBMdWrbnnk/view?usp=share_link

https://drive.google.com/file/d/1uavamCN3zFkHkjKt9w9OqhTBMdWrbnnk/view?usp=share_link

Scikit-raman library:
Over time, a Python library for manipulating spectral data was developed. Specifically, this library enables the application of the entire data processing pipeline. The objective of this project is to integrate new functionalities and enhance the existing source code. Another potential development in this regard could involve making the library (or a portion of it) portable to Mojo programming language, a new language proposed for AI applications, with syntax that closely looks like Python.

Team

Prof. Enza Messina (enza.messina@unimib.it)
Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
Marco Piazza (m.piazza23@campus.unimib.it)

Requisiti

Concepts of Machine Learning and Deep Learning
Python
- Machine Learning and Deep Learning libraries (sklearn, keras, pytorch)
- Data manipulation library (numpy, pandas, …)

Contact Us

Below, you’ll find various ways through which you can reach out to us for more information or collaboration opportunities

Email

enza.messina@unimib.it

elisabetta.fersini@unimib.it

Phone

34534565467 336 20126

34534534636

Location

Viale Sarca 336 – Milan

Building U14 – 2nd Floor – Room 2048

Fast Maps

These maps illustrate the suggested routes to reach the MIND research laboratory at U14, starting from the most commonly used and well-known public transportation options

From

Viale Sarca 336, Milan – Building U14, 2nd Floor, Room 2048

From Central Station Area: Bus line 87 (550 m on foot) – 16 stops – 230 m on foot

From Garibaldi Station: “Garibaldi FS” M5 Metro station – 8 stops – 650 m on foot

From Garibaldi Station: Bus line 87 (180 m on foot) – 4 stops – 230 m on foot

From Metro Line M5: M5 “Bignami” stop – 650 m on foot

From Linate Airport: “San Babila” M5 Metro station (750 m on foot) – 7 stops – “Sesto 1 Maggio FS” M1 Metro station – 11 stops – 700 m on foot

Tesi e Stage

Questa pagina include l’elenco delle opportunità di stage attualmente disponibili, molte delle quali sono anche ideali per preparare una tesi di laurea triennale o magistrale

Opportunità di Tesi

Forecasting Financial Future:

Machine Learning Approaches to Profitability Prediction

Captions ricche di Odio

Utilizzo di LLM per l’arricchimento di Image Captions

MEMECap:

Generazione automatica di Captions da Meme di odio

..:

Early salivary diagnosis using Raman Spectroscopy and Machine Learning

Beyond Shap:

Game Theory for eXplainable Artificial Intelligence

..:

Assessing the effectiveness of XAI approaches with synthetic data

Opportunità di Stage

Forecasting Financial Future: todo

Machine Learning Approaches to Profitability Prediction

..:

Early salivary diagnosis using Raman Spectroscopy and Machine Learning

Contact Us

Below, you’ll find various ways through which you can reach out to us for more information or collaboration opportunities

Email

Phone

Location

Fast Maps

These maps illustrate the suggested routes to reach the MIND research laboratory at U14, starting from the most commonly used and well-known public transportation options

From

Models in Decision Making and Data Analysis

Models in Decision Making and Data Analysis