Last update on May 15th, 2024 at 11:37 am
This website was designed, developed and maintained by UFFICIO WEB - AREA SISTEMI INFORMATIVICopyright © 2025 mind All rights are reserved.
Privacy | Accessibility | Accessibility Statement | Stats
Tesi e Stage
Questa pagina include l’elenco delle opportunità di stage attualmente disponibili, molte delle quali sono anche ideali per preparare una tesi di laurea triennale o magistrale
Opportunità di Tesi
Forecasting Financial Future:
Machine Learning Approaches to Profitability Prediction
Progetto
The thesis project is a collaboration between Mind Lab and the “Centro Studi Aziendali” within the Department of Business and Law at the University of Milan-Bicocca.
The idea is to explore Machine Learning and Deep Learning techniques for automatically analyzing companies’ financial statements. Since 2010, companies have been mandated to compile their finance statements in XBRL, a machine-readable format. Furthermore, recent legislation requires the tagging and machine-readable publication of each part of the financial statement, including additional notes. The idea is to train Machine Learning algorithms on financial statements to predict a company’s future profitability using historical data. We started with tabular data and aimed to incorporate the information contained in the additional notes as tagged text.
The thesis aims to develop each part of the Machine Learning pipeline required to manage and predict from finance statements data. The pipeline includes (but not limited to):
- data cleaning and preprocessing
- feature selection and feature reduction
- model selection and model training
- model explanation
An important focus will be placed on the interpretation and analysis of the results, involving stakeholders with a finance background to understand the practical capabilities of the trained models.
Team
- Prof. Enza Messina (enza.messina@unimib.it)
- Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
- Marco Piazza (m.piazza23@campus.unimib.it)
Requisiti
- Python
- data manipulation libraries (numpy, pandas, scipy)
- ML and DL libraries (sklearn, keras
But open to different programming languages popular in AI fields
Captions ricche di Odio
Utilizzo di LLM per l’arricchimento di Image Captions
Progetto
Recenti studi hanno evidenziato lacune nei modelli generativi che non sono in grado di generare caption esaustive. In particolare, elementi peculiari per l’individuazione di contenuti di odio (es. utili alla definizione di minoranze rappresentate o relative alla struttura dell’immagine) non vengono identificati e rappresentati nelle captions, portando all’impossibilità di un modello predittivo di individuare tali contenuti. Questa tesi vede la definizione di specifici prompt per l’estrazione di tali informazioni in vista dell’arricchimento della captions generata e l’utilizzo di tale porzione di testo per l’implementazione di un modello di individuazione di contenuti di odio.
Team
- Prof. Elisabetta Fersini (elisabetta.fersini@unimib.it)
- Giulia Rizzi (g.rizzi10@campus.unimib.it)
Requisiti
È richiesta una conoscenza di base delle seguenti librerie. Ulteriori competenze verranno apprese sul campo.
- Python
- ML and DL libraries (sklearn, Hugging Face – transformers, LLM)
MEMECap:
Generazione automatica di Captions da Meme di odio
Progetto
Nell’era digitale, i meme hanno acquisito un’incredibile popolarità come forma di espressione virale e umoristica. Tuttavia, la loro diffusione su piattaforme online ha spesso sollevato preoccupazioni riguardo alla presenza di contenuti di odio o discriminatori. Questa tesi propone di esaminare l’efficacia di un modello generativo allenato su MEMECap, un dataset di meme, nell’individuare aspetti legati all’odio, offrendo un’analisi approfondita dei risultati e suggerendo possibili migliorie.
Team
- Prof. Elisabetta Fersini (elisabetta.fersini@unimib.it)
- Giulia Rizzi (g.rizzi10@campus.unimib.it)
Requisiti
È richiesta una conoscenza di base delle seguenti librerie. Ulteriori competenze verranno apprese sul campo.
- Python
- ML and DL libraries (sklearn, Hugging Face – transformers)
..:
Early salivary diagnosis using Raman Spectroscopy and Machine Learning
Progetto
Raman Spectroscopy is a technique based on the inelastic scattering of a monochromatic light beam used to observe low-frequency modes in a target molecular system. The scattering pattern can be viewed as a sort of ”fingerprint” that encodes information about the chemical composition of a given target.
Recently, Raman spectra of biofluids, and in particular saliva, have been proposed for medical diagnosis, leveraging the power of Machine Learning for automatic sample classification.
These thesis projects will be developed in the context of CORSAI, a funded European project with the final aim of developing a new diagnostic tool based on a combination of Raman Spectroscopy and Machine Learning, in which Mind Laboratory is a partner of a multidisciplinary team, composed also by biologist and clinicians.
References :
- Carlomagno et al (https://www.nature.com/articles/s41598-021-84565-3)
- Bertazioli et al (https://drive.google.com/file/d/1P3rxuu0xb5Jdzy9BC5bp9JrYEgcf9meC/view?usp=drive_link)
Projects thesis are available to improve the model and the pipeline (some ideas are reported in the next lines, but it is up to you to find the most suitable activity for your interest).
Transfer Learning:
The objective of this project is to investigate the impact of Transfer Learning on the results. Our model can be used as a benchmark model for TL.
References :
- https://www.nature.com/articles/s41598-021-84565-3
- https://www.nature.com/articles/s41467-019-12898-9
- https://drive.google.com/file/d/1KEcqITPhegFZa_dJoQG8mIpe1xo7J0nD/view?usp=share_link
Attention mechanisms:
Numerous attention mechanisms have been developed and applied in both NLP and Computer Vision fields. An attempt to implement attention mechanisms for spectral data has already been proposed (in references); the objective of this thesis is to develop and adapt an attention mechanism for Raman spectroscopy data and to analyze the impact of attention mechanisms on classification performance.
References :
Explainability:
Deep Learning models have high performance on many different classification tasks; on the other hand, they have high complexity which makes it difficult to understand the ratio behind proposed classifications. For this reason, in the last few years, some explainability techniques have been proposed in the literature. We have already developed two different explainability techniques for our model: one based on gradient analysis and the other on Game Theory. The aim of this project is to enhance existing methods and explore new ones, focusing on overcoming problems related to adapting methods created for different data types to the Raman Spectroscopy field.
References :
- https://arxiv.org/abs/1610.02391
- https://drive.google.com/file/d/14WYvg8Chw9YDA0MJdtfpvV2idavCbCfU/view?usp=drive_link
- https://drive.google.com/file/d/1Jzb5ehbYKTU2nyLAbKn7Xfrr9yYktRmk/view?usp=drive_link
- https://arxiv.org/abs/1705.07874
Comparison between Raman Aramis and BWS465-785:
for our project, the data are collected with two different spectrometers: Aramis and BWS465-785. The main difference is that the second one is portable. This project aims to investigate the difference in the shape and the classification of the spectra collected with the two instruments.
References :
Team
- Prof. Enza Messina (enza.messina@unimib.it)
- Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
- Marco Piazza (m.piazza23@campus.unimib.it)
Requisiti
- Concepts of Machine Learning and Deep Learning
- Python
- Machine Learning and Deep Learning libraries (sklearn, keras, pytorch)
- Data manipulation library (numpy, pandas, …)
Beyond Shap:
Game Theory for eXplainable Artificial Intelligence
Progetto
Shap is one of the most used approaches for eXplainable Artificial Intelligence (XAI) due to two main reasons: it offers an efficient and readily available Python implementation and it is based on an interesting theoretical framework. The basic idea behind this approach is to treat the features of a Machine Learning problem as players in a cooperative game with transferable utility and compute the Shapley Value as the solution of this game. The main idea of this thesis project is to explore the limitations related to the currently available formalization of the problem and investigate different possibilities. Some possible examples are:
- how to simulate the absence of a feature
- how to approximate the Shapley Value
- how to model the characteristic function
- how to sample interesting coalitions of players
- use solution concepts from Game Theory other than Shapley value
Note: this project is highly theoretical and requires an understanding of the basic concepts of Game Theory (which could be developed in the early stage of the project, in collaboration with the supervisors). However, it is composed also of an experimental and computational part to validate and verify the formulated hypothesis.
References:
Team
- Prof. Enza Messina (enza.messina@unimib.it)
- Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
- Marco Piazza (m.piazza23@campus.unimib.it)
Requisiti
- Concepts of Machine Learning and Deep Learning
- Python
- Machine Learning and Deep Learning libraries (sklearn, keras, pytorch)
- Data manipulation library (numpy, pandas, …)
..:
Assessing the effectiveness of XAI approaches with synthetic data
Progetto
Currently, one of the primary challenges in eXplainable Artificial Intelligence (XAI) research is the absence of a globally accepted gold-standard method for evaluating the quality of provided explanations. This thesis project aims to comprehend the current state-of-the-art in this specific aspect of XAI research and to implement a technique based on synthetic data generation. The result will encompass a comparison between the developed method with those available in the literature and also evaluate the most famous XAI techniques (Shap, Lime, …) with the newly developed framework.
References:
Team
- Prof. Enza Messina (enza.messina@unimib.it)
- Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
- Marco Piazza (m.piazza23@campus.unimib.it)
Requisiti
- Machine Learning concepts
- Python
- Machine Learning implementation libraries (sklearn, …)
- Data manipulation library (numpy, …)
Opportunità di Stage
Forecasting Financial Future: todo
Machine Learning Approaches to Profitability Prediction
Progetto
The thesis project is a collaboration between Mind Lab and the “Centro Studi Aziendali” within the Department of Business and Law at the University of Milan-Bicocca.
The idea is to explore Machine Learning and Deep Learning techniques for automatically analyzing companies’ financial statements. Since 2010, companies have been mandated to compile their finance statements in XBRL, a machine-readable format. Furthermore, recent legislation requires the tagging and machine-readable publication of each part of the financial statement, including additional notes. The idea is to train Machine Learning algorithms on financial statements to predict a company’s future profitability using historical data. We started with tabular data and aimed to incorporate the information contained in the additional notes as tagged text.
The thesis aims to develop or validate some parts of the Machine Learning pipeline required to manage and predict from finance statements data. The pipeline includes (but not limited to):
- data cleaning and preprocessing
- feature selection and feature reduction
- model selection and model training
- model explanation
Team
- Prof. Enza Messina (enza.messina@unimib.it)
- Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
- Marco Piazza (m.piazza23@campus.unimib.it)
Requisiti
Python
- Data manipulation libraries (numpy, pandas, scipy)
- ML and DL libraries (sklearn, keras)
..:
Early salivary diagnosis using Raman Spectroscopy and Machine Learning
Progetto
Raman Spectroscopy is a technique based on the inelastic scattering of a monochromatic light beam used to observe low-frequency modes in a target molecular system. The scattering pattern can be viewed as a sort of ”fingerprint” that encodes information about the chemical composition of a given target.
Recently, Raman spectra of biofluids, and in particular saliva, have been proposed for medical diagnosis, leveraging the power of Machine Learning for automatic sample classification.
These thesis projects will be developed in the context of CORSAI, a funded European project with the final aim of developing a new diagnostic tool based on a combination of Raman Spectroscopy and Machine Learning, in which Mind Laboratory is a partner of a multidisciplinary team, composed also by biologist and clinicians.
For further information is possible to refer to some already published articles:
- Carlomagno et al (https://www.nature.com/articles/s41598-021-84565-3)
- Bertazioli et al (https://drive.google.com/file/d/1P3rxuu0xb5Jdzy9BC5bp9JrYEgcf9meC/view?usp=drive_link)
Projects thesis are available to improve the model and the pipeline (some ideas are reported in the next lines, but it is up to you to find the most suitable activity for your interest).
Transfer Learning:
The objective of this project is to investigate the impact of Transfer Learning on the results. Our model can be used as a benchmark model for TL.
References :
- https://www.nature.com/articles/s41598-021-84565-3
- https://www.nature.com/articles/s41467-019-12898-9
- https://drive.google.com/file/d/1KEcqITPhegFZa_dJoQG8mIpe1xo7J0nD/view?usp=share_lin
Comparison between Raman Aramis and BWS465-785:
for our project, the data are collected with two different spectrometers: Aramis and BWS465-785. The main difference is that the second one is portable. This project aims to investigate the difference in the shape and the classification of the spectra collected with the two instruments.
References :
Comparison between Raman Aramis and BWS465-785:
for our project, the data are collected with two different spectrometers: Aramis and BWS465-785. The main difference is that the second one is portable. This project aims to investigate the difference in the shape and the classification of the spectra collected with the two instruments.
References :
Scikit-raman library:
Over time, a Python library for manipulating spectral data was developed. Specifically, this library enables the application of the entire data processing pipeline. The objective of this project is to integrate new functionalities and enhance the existing source code. Another potential development in this regard could involve making the library (or a portion of it) portable to Mojo programming language, a new language proposed for AI applications, with syntax that closely looks like Python.
Team
- Prof. Enza Messina (enza.messina@unimib.it)
- Prof. Mauro Passacantando (mauro.passacatando@unimib.it)
- Marco Piazza (m.piazza23@campus.unimib.it)
Requisiti
- Concepts of Machine Learning and Deep Learning
- Python
- Machine Learning and Deep Learning libraries (sklearn, keras, pytorch)
- Data manipulation library (numpy, pandas, …)
Contact Us
Below, you’ll find various ways through which you can reach out to us for more information or collaboration opportunities
enza.messina@unimib.it
elisabetta.fersini@unimib.it
Phone
34534565467 336 20126
34534534636
Location
Viale Sarca 336 – Milan
Building U14 – 2nd Floor – Room 2048