June 3, 2025

Day 6 – Literature review and Paper briefing

By Aayam Raj Shakya

What I Learned

We began our day by resuming our literature review from yesterday. We briefed Mr. Pelumi on our findings one by one. I was dissatisfied with the first paper I read. It didn’t use computer vision libraries for eye and mouth tracking; instead, it sent each video frame to the RNN (LSTM). While this might work, I don’t think this is ideal, as it would require much more pre-processing, like detecting eye and mouth landmarks before even beginning to train the model on the drowsiness state. Another issue I had with this paper was that the experiment was conducted in a laboratory setting, which means the data might not truly reflect real-world scenarios.

I was impressed by the second paper I read. The authors used sophisticated libraries like dlib and OpenCV to first track the eyes and mouth/lips, and then sent those frames to the CNN for training and testing. They employed the Eye Aspect Ratio (EAR) to determine whether the eyes were closed (either partially or fully), helping the model assess driver fatigue. However, this research lacked a discussion on bias towards drivers wearing sunglasses or spectacles, which is relevant in real life.

The third paper I read was the most interesting. The authors used model ensembling, stacking multiple models to harness the strength of each. They employed MTCNN (Multi-task Cascaded Convolutional Networks), which I learned from a Medium article outperforms CNNs, Haar Cascade classifiers, and other commonly used algorithms. The authors created two different CNNs, each focusing on either the eyes or the mouth, rather than combining them into a single CNN. They then combined these two CNN models into one that determines the drowsiness state. Their model turned out to be lightweight and more accurate than most previous research.

Besides the literature reviews, I also completed my “Intro to Deep Learning” course on Kaggle and received my certificate.

Blockers

No issues faced.

Reflection

I learned a lot about computer vision today after reading all those papers, and I concluded that ensembling is the way to go for our research project! Since the Kaggle course on deep learning uses TensorFlow, which I haven’t used much (I mostly work with PyTorch), the course taught me a good bit about TensorFlow as well. I still love PyTorch, though; I think it is much more versatile and simpler than TensorFlow.