Hugginface Image-Model Inference Bug-Fixing

Estado: Closed
Premio: $25
Propuestas recibidas: 6
Ganador: hrshammo

Winner

Resumen del concurso

I tried to implement https://huggingface.co/blog/fine-tune-vit

Training runs fine. Inference though produces something weird. I need the logits of the model (array of three classes-scores). But I get something different. Does it even load my trained model? Do I get the last layer (output for three classes) or is this the feature-vector one layer below?

IMPORTANT:
I want to run this in a docker. So first step for you is to build the docker in order to replicate what I did.

Please find all files in https://gitlab.com/boprinnit/tarkapp

STEPS:
- Make a docker with Dockerfile_kivy
- Tag it tarkov_docker:latest
- Start a docker session with
xhost + \
&& xhost +local:docker \
&& docker run --net=host -e DISPLAY=unix$DISPLAY --privileged -v ~/dev/git:/root/dev/git -v ~/dev/pv_data:/root/dev/pv_data -v /var/www/vhosts:/var/www/vhosts -it tarkov_docker:latest /bin/bash
- train a model by
- cd ~/dev/git/tarkov/hugging_image_classifier_vit
- python3 train.py

BUT THEN when I run python3 inference.py I get some weird messages and a vector of not 3 but hundreds of floats. I would expect three (one for each class). Here are the details:

Some weights of the model checkpoint at vit-base-beans-demo-v5/checkpoint-60 were not used when initializing ViTModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing ViTModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ViTModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ViTModel were not initialized from the model checkpoint at vit-base-beans-demo-v5/checkpoint-60 and are newly initialized: ['vit.pooler.dense.bias', 'vit.pooler.dense.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
outputs
(tensor([[[-0.0922, 0.0321, 0.0705, ..., -0.4506, -0.3042, 0.2119],
[ 0.0106, 0.1491, 0.0273, ..., -0.3192, -0.2063, -0.0749],
...,
[-0.0709, 0.0378, 0.0878, ..., -0.5177, -0.1632, 0.1065]]],
grad_fn=<NativeLayerNormBackward0>), tensor([[-1.0117e-01, 1.3335e-01, -5.3186e-02, -6.9971e-02, 1.2102e-01,
-1.2692e-01, 6.0312e-02, -2.8494e-02, 7.3352e-02, -1.5986e-01,
-1.1982e-01, 2.8124e-02, -1.5338e-0... shortened ...
-8.6240e-03, 9.4536e-02, 7.7640e-02, -1.1717e-02, 3.4637e-02,
4.8355e-03, 1.0956e-01, 6.5691e-02, 1.9251e-01, 1.2720e-01,
-1.3891e-01, 1.7495e-02, -5.4980e-02, -1.8399e-01, 1.2765e-01,
-9.1845e-02, -1.4221e-01, 4.6340e-02]], grad_fn=<TanhBackward0>))
Traceback (most recent call last):
File "inference.py", line 29, in <module>
logits = outputs.logits
AttributeError: 'tuple' object has no attribute 'logits'

TO CONCLUDE:
Your job is to fix the given files so that the inference runs and gives the classification result between the three (!!!) trained classes of the trained model.

IMPORTANT: Must be fixed in the next 48 hours