Préparation
Prepare virtualenv
Références :
1. https://expoundai.wordpress.com/2019/08/30/transfer-learning-for-segmentation-using-deeplabv3-in-pytorch/
2. https://github.com/msminhas93/DeepLabv3FineTuning
To save the model
Load the trained model
Set the model to evaluate mode
Define the helper function
Some standard imports
Load pretrained model weights
Initialize model with the pretrained weights
Set the model to evaluate mode
Reference:https://discuss.pytorch.org/t/random-number-on-gpu/9649
torch.cuda.FloatTensor(batch_size, 3, 224, 224).normal_()
Export the model
x, # model input (or a tuple for multiple inputs)
“test-onnx.onnx”, # where to save the model (can be a file or file-like object)
export_params=True, # store the trained parameter weights inside the model file
opset_version=10, # the ONNX version to export the model to
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = [‘input’], # the model’s input names
output_names = [‘output’], # the model’s output names
dynamic_axes={‘input’ : {2048 : ‘batch_size’}, # variable lenght axes
‘output’ : {1 : ‘batch_size’}})
compute ONNX Runtime output prediction
compare ONNX Runtime and PyTorch results
Opset version 9 is required by Jetson Xavier, which has TRT6.
This script will install pytorch, torchvision on nano
If you have any of these installed already on your machine, you can skip those.
from train.py on https://github.com/dusty-nv/pytorch-segmentation/blob/master/train.py, it is used against this fork of torchvision: https://github.com/dusty-nv/vision/tree/v0.3.0
git clone –branch v0.5.0 https://github.com/pytorch/vision torchvision # see below for version of torchvision to download
checkout code from https://github.com/dusty-nv/pytorch-segmentation (link from https://github.com/dusty-nv/jetson-inference/tree/master/python/training)
TODO run a test to make sure all is fine
find a cheap baseline model to train to test the env
also the onnx export to test
and run the onnx

The project model which will be used for transfer learning and domain adaptation is DeepLabv3FineTuning.

Then we will be using the tutorial to infer an image with the new trained model.

https://www.learnopencv.com/pytorch-for-beginners-semantic-segmentation-using-torchvision/

Then to save it as an ONNX file and use it, check this tutorial:

Maybe taking a look at (NVidia) DeepStream

https://towardsdatascience.com/how-to-deploy-onnx-models-on-nvidia-jetson-nano-using-deepstream-b2872b99a031

Références:

Préparation

Téléchargement du projet à partir de github en zip

wget https://github.com/msminhas93/DeepLabv3FineTuning/archive/master.zip
mv /home/lefv2603/Downloads/master.zip /home/lefv2603/Downloads/DeepLabv3FineTuning-master.zip

copier to projet zippé sur le serveur beluga de Compute Canada

scp /home/lefv2603/Downloads/DeepLabv3FineTuning-master.zip vincelf@beluga.computecanada.ca:~

Télécharger les modèles de resnet1010 et deeplabv3_resnet101_coco en avance depuis une autre machine que celle de Compute Canada car depuis le serveur de Compute Canada l’accès à ces URLs est bloqué.

Erreur si le modèle n’est pas dans le cache et doit être téléchargé à partir de Compute Canada

(env) [vincelf@blg4101 DeepLabv3FineTuning-master-test]$ python $TRAINDIR/main.py $TRAINDIR/CrackForest $TRAINDIR/VLFExp
Downloading: "https://download.pytorch.org/models/resnet101-5d3b4d8f.pth" to /home/vincelf/.cache/torch/checkpoints/resnet101-5d3b4d8f.pth
Traceback (most recent call last):
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/urllib/request.py", line 1317, in do_open
  encode_chunked=req.has_header('Transfer-encoding'))
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 1244, in request
  self._send_request(method, url, body, headers, encode_chunked)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 1290, in _send_request
  self.endheaders(body, encode_chunked=encode_chunked)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 1239, in endheaders
  self._send_output(message_body, encode_chunked=encode_chunked)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 1026, in _send_output
  self.send(msg)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 966, in send
  self.connect()
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 1406, in connect
  super().connect()
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/http/client.py", line 938, in connect
  (self.host,self.port), self.timeout, self.source_address)
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/socket.py", line 727, in create_connection
  raise err
File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.7.4/lib/python3.7/socket.py", line 716, in create_connection
  sock.connect(sa)
OSError: [Errno 101] Network is unreachable
...

wget https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
wget https://download.pytorch.org/models/deeplabv3_resnet101_coco-586e9e4e.pth

copier les modèles sur le serveur de Compute Canada

scp /home/lefv2603/Downloads/resnet101-5d3b4d8f.pth vincelf@beluga.computecanada.ca:~
scp /home/lefv2603/Downloads/deeplabv3_resnet101_coco-586e9e4e.pth vincelf@beluga.computecanada.ca:~

Déplacer les modèles dans le cache de torch

mv ~/resnet101-5d3b4d8f.pth /home/vincelf/.cache/torch/checkpoints
m ~/deeplabv3_resnet101_coco-586e9e4e.pth /home/vincelf/.cache/torch/checkpoints

S’assurer d’avoir les bonnes librairies dans requirements.tx.

Note: Les librairies peuvent être installées dans l’env virtuel Pyton à la main avec l’optoipn –no-index sinon il y a une erreur:

(env) [vincelf@blg5602 DeepLabv3FineTuning-master-test]$ pip3 install torchvision
Ignoring pip: markers 'python_version < "3"' don't match your environment
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx512, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
Collecting torchvision
Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'NewConnectionError('<pip._vendor.urllib3.connection.VerifiedHTTPSConnection object at 0x2b343e329450>: Failed to establish a new connection: [Errno 101] Network is unreachable')': /simple/torchvision/

tqdm
scikit-learn
opencv-python
torch
torchvision

Utiliser pip3 install --no-index torchvision

Exécuter l’entrainement dans un mode intéractif (salloc) ``` salloc –account=def-germ2201-ab –gres=gpu:1 –cpus-per-task=10 –mem=48000M –time=0-3:00

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

#module load python/3.6 cuda cudnn module load python/3.7.4 cuda cudnn

SOURCEDIR=~/gae724/src/DeepLabv3FineTuning-master-test TRAINDIR=~/DeepLabv3FineTuning-master

Prepare virtualenv

#virtualenv –no-download $SLURM_TMPDIR/env python3 -m venv $SLURM_TMPDIR/env source $SLURM_TMPDIR/env/bin/activate pip3 install –no-index -r $SOURCEDIR/requirements.txt

Références :

https://expoundai.wordpress.com/2019/08/30/transfer-learning-for-segmentation-using-deeplabv3-in-pytorch/

https://github.com/msminhas93/DeepLabv3FineTuning

python $TRAINDIR/main.py $TRAINDIR/CrackForest $TRAINDIR/VLFExp


* Le modèle s'entraine

(env) [vincelf@blg4101 DeepLabv3FineTuning-master-test]$ python $TRAINDIR/main.py $TRAINDIR/CrackForest $TRAINDIR/VLFExp Epoch 1/25 ———- 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:22<00:00, 1.06it/s] Train Loss: 0.0251 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:03<00:00, 1.76it/s] Test Loss: 0.0556 {‘epoch’: 1, ‘Train_loss’: 0.025081725791096687, ‘Test_loss’: 0.05564142018556595, ‘Train_f1_score’: 0.050840583267973714, ‘Train_auroc’: 0.5767874359848608, ‘Test_f1_score’: 0.010331824851978231, ‘Test_auroc’: 0.5188828199161867} Epoch 2/25 ———- 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:22<00:00, 1.07it/s] Train Loss: 0.0155 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:03<00:00, 1.91it/s] Test Loss: 0.0385 {‘epoch’: 2, ‘Train_loss’: 0.015544182620942593, ‘Test_loss’: 0.03848109021782875, ‘Train_f1_score’: 0.1191172723341669, ‘Train_auroc’: 0.7760033667334525, ‘Test_f1_score’: 0.08732444335126234, ‘Test_auroc’: 0.7224661313356657} … Epoch 24/25 ———- 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:22<00:00, 1.08it/s] Train Loss: 0.0106 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:03<00:00, 1.91it/s] Test Loss: 0.0083 {‘epoch’: 24, ‘Train_loss’: 0.010582794435322285, ‘Test_loss’: 0.00830918364226818, ‘Train_f1_score’: 0.4236991146887413, ‘Train_auroc’: 0.9393122372904904, ‘Test_f1_score’: 0.3157004818295278, ‘Test_auroc’: 0.8214732776085168} Epoch 25/25 ———- 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:22<00:00, 1.08it/s] Train Loss: 0.0087 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:03<00:00, 1.90it/s] Test Loss: 0.0072 {‘epoch’: 25, ‘Train_loss’: 0.008725304156541824, ‘Test_loss’: 0.007179579231888056, ‘Train_f1_score’: 0.3882697467070124, ‘Train_auroc’: 0.9296205158172175, ‘Test_f1_score’: 0.263064162183016, ‘Test_auroc’: 0.802269088719596} Training complete in 10m 57s Lowest Loss: 0.005794


* Sauver le modèle

To save the model

torch.save(model, os.path.join(bpath, ‘weights.pt’))

* Charger le modèle

Load the trained model

model = torch.load(‘./CFExp/weights.pt’)

Set the model to evaluate mode

model.eval()

* Pour l'inférence il faut faire .eval

* How can this be extended to a single image having multiple masks (4 masks per image, where each mask represents a class) ?
Just use those masks as input and use crossentropy as the loss function

## Test de segmentation d'une image avec resnet101
Ce scrit fonctionne bien sur le serveur de Compute canada

* Se logguer sur le serveur beluga
* Préparer un répertoire pour les scripts
* démarrer une session intéractive
```salloc --account=def-germ2201-ab --gres=gpu:1 --cpus-per-task=10 --mem=48000M --time=0-3:00```
* charger les modules python 6.7, cuda et cudnn. À noter que sur le jetson nano (L4T Linux For Tegra), cuda et cudnn existent déjà, ils sont installés avec le Jetpack. Par contre je ne sais pas comment les "charger" pour que les librairies python y fasse référence, au lieu de retourner une erreur. 

#SBATCH –gres=gpu:1 # Request GPU “generic resources”; Number of GPU(s) per node #SBATCH –cpus-per-task=10 # Cores proportional to GPUs: 6 on Cedar, 16 on Graham, 10 on Beluga.; CPU cores/threads #SBATCH –mem=48000M # Memory proportional to GPUs: 32000 Cedar, 64000 Graham, 48000M on Beluga.; memory per node; Requesting –mem=0 is interpreted by Slurm to mean “reserve all the available memory on each node assigned to the job.” #SBATCH –time=0-03:00 #SBATCH –output=%N-%j.out

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

#module load python/3.6 cuda cudnn module load python/3.7.4 cuda cudnn

SOURCEDIR=~/gae724/src/DeepLabv3FineTuning-master-test TRAINDIR=~/DeepLabv3FineTuning-master


* créer l'environnement virtuel python 3

#virtualenv –no-download $SLURM_TMPDIR/env python3 -m venv $SLURM_TMPDIR/env source $SLURM_TMPDIR/env/bin/activate

* installer les librairies python requisent
requirements.txt:

tqdm scikit-learn opencv-python torch torchvision matplotlib

pip3 install –no-index -r $SOURCEDIR/requirements.txt


* exécuter le script

(env) [vincelf@blg4101 DeepLabv3FineTuning-master-test]$ python3 test_segmentation_resnet101.py

Il va créer une image plot.png

(env) [vincelf@blg4101 DeepLabv3FineTuning-master-test]$ cat test_segmentation_resnet101.py from torchvision import models from PIL import Image import matplotlib.pyplot as plt import torch import torchvision.transforms as T import numpy as np

def segment(net, path): img = Image.open(path) plt.imshow(img) plt.axis(‘off’) plt.show() # Comment the Resize and CenterCrop for better inference results trf = T.Compose([T.Resize(256), T.CenterCrop(224), T.ToTensor(), T.Normalize(mean = [0.485, 0.456, 0.406], std = [0.229, 0.224, 0.225])]) inp = trf(img).unsqueeze(0) out = net(inp)[‘out’] om = torch.argmax(out.squeeze(), dim=0).detach().cpu().numpy() rgb = decode_segmap(om) plt.imshow(rgb) plt.axis(‘off’) plt.show() plt.savefig(‘plot.png’)

Define the helper function

def decode_segmap(image, nc=21):

label_colors = np.array([(0, 0, 0), # 0=background # 1=aeroplane, 2=bicycle, 3=bird, 4=boat, 5=bottle (128, 0, 0), (0, 128, 0), (128, 128, 0), (0, 0, 128), (128, 0, 128), # 6=bus, 7=car, 8=cat, 9=chair, 10=cow (0, 128, 128), (128, 128, 128), (64, 0, 0), (192, 0, 0), (64, 128, 0), # 11=dining table, 12=dog, 13=horse, 14=motorbike, 15=person (192, 128, 0), (64, 0, 128), (192, 0, 128), (64, 128, 128), (192, 128, 128), # 16=potted plant, 17=sheep, 18=sofa, 19=train, 20=tv/monitor (0, 64, 0), (128, 64, 0), (0, 192, 0), (128, 192, 0), (0, 64, 128)])

r = np.zeros_like(image).astype(np.uint8) g = np.zeros_like(image).astype(np.uint8) b = np.zeros_like(image).astype(np.uint8)

for l in range(0, nc): idx = image == l r[idx] = label_colors[l, 0] g[idx] = label_colors[l, 1] b[idx] = label_colors[l, 2]

rgb = np.stack([r, g, b], axis=2) return rgb

fcn = models.segmentation.fcn_resnet101(pretrained=True).eval() segment(fcn, ‘./test.jpeg’)

## export to ONNX 
L'export vers un modele onnx fonctionne avec le script suivant. 

Les tests avec onnx et onnxruntime ne fonctionnent pas car les librairies n'existe pas dans le repo de Compute Canad (email en cours). 

Le model qui est utilisé est celui de DeepLabv3FineTuning-master-test qui a été sauvegardé avec 
torch.save(model, os.path.join(bpath, 'weights.pt')) (voir fichier /home/vincelf/DeepLabv3FineTuning-master/mains.py)

(env) [vincelf@blg4101 DeepLabv3FineTuning-master-test]$ cat export-to-onnx.py

Some standard imports

import io import numpy as np

from torch import nn import torch.utils.model_zoo as model_zoo import torch.onnx

import torch.nn as nn import torch.nn.init as init

#from model import createDeepLabv3

Load pretrained model weights

#model_url = ‘fcn-weights.pt’ batch_size = 1 # just a random number

Initialize model with the pretrained weights

#torch_model = createDeepLabv3() #torch_model = models.segmentation.deeplabv3_resnet101() torch_model = torch.load(‘weights.pt’) #torch_model = torch.load_state_dict(‘weights’)

Set the model to evaluate mode

torch_model.eval()

#torch_model.cuda() torch_model.to(‘cuda’)

#Input to the model

Reference:https://discuss.pytorch.org/t/random-number-on-gpu/9649

torch.cuda.FloatTensor(batch_size, 3, 224, 224).normal_()

x = torch.randn(batch_size, 3, 224, 224, requires_grad=True).cuda() #x.cuda() #x.to(‘cuda’) torch_out = torch_model(x)

Export the model

#torch.onnx.export(torch_model, # model being run

x, # model input (or a tuple for multiple inputs)

“test-onnx.onnx”, # where to save the model (can be a file or file-like object)

export_params=True, # store the trained parameter weights inside the model file

opset_version=10, # the ONNX version to export the model to

do_constant_folding=True, # whether to execute constant folding for optimization

input_names = [‘input’], # the model’s input names

output_names = [‘output’], # the model’s output names

dynamic_axes={‘input’ : {2048 : ‘batch_size’}, # variable lenght axes

‘output’ : {1 : ‘batch_size’}})

torch.onnx.export(torch_model, # model being run x, # model input (or a tuple for multiple inputs) “test-onnx.onnx”, # where to save the model (can be a file or file-like object) opset_version=11, # the ONNX version to export the model to input_names = [‘input’], # the model’s input names output_names = [‘output’] # the model’s output names )

import onnx

onnx_model = onnx.load(“test-onnx.onnx”) onnx.checker.check_model(onnx_model)

import onnxruntime

ort_session = onnxruntime.InferenceSession(“test-onnx.onnx”)

def to_numpy(tensor): return tensor.detach().cpu().numpy() if tensor.requires_grad else tensor.cpu().numpy()

compute ONNX Runtime output prediction

ort_inputs = {ort_session.get_inputs()[0].name: to_numpy(x)} ort_outs = ort_session.run(None, ort_inputs)

compare ONNX Runtime and PyTorch results

np.testing.assert_allclose(to_numpy(torch_out), ort_outs[0], rtol=1e-03, atol=1e-05)

print(“Exported model has been tested with ONNXRuntime, and the result looks good!”)

* copy the onnx file generated to the jetson nano from a terminal on the jetson nano

scp vincelf@beluga.computecanada.ca:/home/vincelf/gae724/src/DeepLabv3FineTuning-master-test/test-onnx.onnx /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/

* test the onnx with the segnet-camera script (python calling cpp) 
Il y a une erreur `ERROR: onnx2trt_utils.hpp:347 In function convert_axis:
[8] Assertion failed: axis >= 0 && axis < nbDims`
  * References: 
    * <https://forums.developer.nvidia.com/t/running-a-pytorch-network-converted-to-onnx-with-tensorrt-on-the-tx2/70685/21>
    * <https://docs.nvidia.com/deeplearning/tensorrt/support-matrix/index.html>
    * <https://github.com/onnx/onnx-tensorrt/issues/192>
    * <https://github.com/onnx/onnx-tensorrt/issues/125>
    * <https://github.com/pytorch/pytorch/issues/16908>
    * <https://github.com/onnx/onnx-tensorrt/issues/283#issuecomment-545703178>
   

cd projects/dusty-nv/jetson-inference/build/aarch64/bin lefv2603@lefv2603-jetsonnano:~/projects/dusty-nv/jetson-inference/build/aarch64/bin$ ./segnet-camera –network ‘/home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx’ –labels /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/labels.txt –colors /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/colors.txt –input_blob input –output_blob ouput [gstreamer] initialized gstreamer, version 1.14.5.0 [gstreamer] gstCamera attempting to initialize with GST_SOURCE_NVARGUS, camera 0 [gstreamer] gstCamera pipeline string: nvarguscamerasrc sensor-id=0 ! video/x-raw(memory:NVMM), width=(int)1280, height=(int)720, framerate=30/1, format=(string)NV12 ! nvvidconv flip-method=0 ! video/x-raw ! appsink name=mysink [gstreamer] gstCamera successfully initialized with GST_SOURCE_NVARGUS, camera 0

segnet-camera: successfully initialized camera device width: 1280 height: 720 depth: 12 (bpp)

segNet – loading segmentation network model from: – prototxt: (null) – model: /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx – labels: /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/labels.txt – colors: /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/colors.txt – input_blob ‘input’ – output_blob ‘ouput’ – batch_size 1

[TRT] TensorRT version 6.0.1 [TRT] loading NVIDIA plugins… [TRT] Plugin Creator registration succeeded - GridAnchor_TRT [TRT] Plugin Creator registration succeeded - GridAnchorRect_TRT [TRT] Plugin Creator registration succeeded - NMS_TRT [TRT] Plugin Creator registration succeeded - Reorg_TRT [TRT] Plugin Creator registration succeeded - Region_TRT [TRT] Plugin Creator registration succeeded - Clip_TRT [TRT] Plugin Creator registration succeeded - LReLU_TRT [TRT] Plugin Creator registration succeeded - PriorBox_TRT [TRT] Plugin Creator registration succeeded - Normalize_TRT [TRT] Plugin Creator registration succeeded - RPROI_TRT [TRT] Plugin Creator registration succeeded - BatchedNMS_TRT [TRT] Could not register plugin creator: FlattenConcat_TRT in namespace: [TRT] completed loading NVIDIA plugins. [TRT] detected model format - ONNX (extension ‘.onnx’) [TRT] desired precision specified for GPU: FASTEST [TRT] requested fasted precision for device GPU without providing valid calibrator, disabling INT8 [TRT] native precisions detected for GPU: FP32, FP16 [TRT] selecting fastest native precision for GPU: FP16 [TRT] attempting to open engine cache file /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx.1.1.GPU.FP16.engine [TRT] cache file not found, profiling network model on device GPU [TRT] device GPU, loading /home/lefv2603/projects/dusty-nv/jetson-inference/build/aarch64/bin/ /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx —————————————————————- Input filename: /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx ONNX IR version: 0.0.6 Opset version: 11 Producer name: pytorch Producer version: 1.5 Domain:
Model version: 0 Doc string:
—————————————————————- WARNING: ONNX model has a newer ir_version (0.0.6) than this parser was built against (0.0.3). [TRT] 677:Shape -> (4) [TRT] 678:Constant -> WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Successfully casted down to INT32. While parsing node number 2 [Gather -> “679”]: — Begin node — input: “677” input: “678” output: “679” name: “Gather_2” op_type: “Gather” attribute { name: “axis” i: 0 type: INT }

— End node — ERROR: onnx2trt_utils.hpp:347 In function convert_axis: [8] Assertion failed: axis >= 0 && axis < nbDims [TRT] failed to parse ONNX model ‘/home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx’ [TRT] device GPU, failed to load /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx segNet – failed to initialize. segnet-camera: failed to initialize imageNet

* fichiers qui sont utiles à comprendre
  * /home/lefv2603/projects/dusty-nv/jetson-inference/build/aarch64/bin/segnet-camera.py
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/segNet.cpp
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/segNet.cu
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/segNet.h
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/tensorNet.cpp
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/tensorNet.h
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/imageNet.cpp
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/imageNet.cu
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/imageNet.cuh
  * /home/lefv2603/projects/dusty-nv/jetson-inference/c/imageNet.h
  * /home/lefv2603/projects/dusty-nv/jetson-inference/python/training/segmentation/onnx_export.py
  * /home/lefv2603/projects/dusty-nv/jetson-inference/python/training/segmentation/onnx_validate.py
  * /home/lefv2603/projects/dusty-nv/jetson-inference/python/training/segmentation/train.py
  * /home/lefv2603/projects/dusty-nv/jetson-inference/data/networks/FCN-ResNet18-DeepScene-576x320/classes.txt
  * /home/lefv2603/projects/dusty-nv/jetson-inference/data/networks/FCN-ResNet18-DeepScene-576x320/colors.txt
  * /home/lefv2603/projects/dusty-nv/jetson-inference/python/bindings/PySegNet.cpp
  * /home/lefv2603/projects/dusty-nv/jetson-inference/python/bindings/PySegNet.h
  * /home/lefv2603/projects/dusty-nv/jetson-inference/examples/segnet-camera/segnet-camera.cpp
  * /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/models/segmentation/deeplabv3.py
  * /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/models/segmentation/fcn.py

* Exemple de model très simple qui est transformé en onnx en quelques lignes
Référence:<https://github.com/onnx/onnx-tensorrt/issues/125>

#!/usr/bin/env python3

import torch from torch import onnx

input = torch.rand(2,64,4)

class Model(torch.nn.Module): def init(self): super().init()

def forward(self, x: torch.Tensor) -> torch.Tensor:
    n, hw, c = x.shape
    # Split the hw dimension into 8*8
    h = 8
    w = hw // h
    x = x.view(n, h, w, c)
    # NHWC -> NCHW is pretty common.
    return x.permute(0, 3, 1, 2)

Opset version 9 is required by Jetson Xavier, which has TRT6.

torch.onnx.export(Model(), input, “/tmp/barf.onnx”, opset_version=9)

* Notes en cours
`x = torch.flatten(x, 3)` a mettre qqpart dans le modele

Fichier: ~/DeepLabv3FineTuning-master/models/segmentation/deeplabv3.py

class ASPPPooling(nn.Sequential): def init(self, in_channels, out_channels): super(ASPPPooling, self).init( nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, out_channels, 1, bias=False), nn.BatchNorm2d(out_channels), nn.ReLU())

def forward(self, x):
    size = x.shape[-2:]
    x = torch.flatten(x, 3)
    x = super(ASPPPooling, self).forward(x)
    return F.interpolate(x, size=size, mode='bilinear', align_corners=False) ``` * retrain dans un env virutal sur Beluga Compute Canada ``` (env) [vincelf@blg5803 DeepLabv3FineTuning-master-test]$ python $TRAINDIR/main.py $TRAINDIR/CrackForest $TRAINDIR/VLFExp ``` * export to onnx ``` (env) [vincelf@blg5803 DeepLabv3FineTuning-master-test]$ python export-to-onnx.py ``` * copy to jetson nano ``` lefv2603@lefv2603-jetsonnano:~/projects/dusty-nv/jetson-inference/build/aarch64/bin$ scp vincelf@beluga.computecanada.ca:/home/vincelf/gae724/src/DeepLabv3FineTuning-master-test/test-onnx.onnx /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/ ``` * Run inference using onnx model on jetson nano ``` lefv2603@lefv2603-jetsonnano:~/projects/dusty-nv/jetson-inference/build/aarch64/bin$ ./segnet-camera --network '/home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx' --labels /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/labels.txt --colors /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/colors.txt --input_blob input --output_blob ouput ```

fails with same error ``` [TRT] device GPU, loading /home/lefv2603/projects/dusty-nv/jetson-inference/build/aarch64/bin/ /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx —————————————————————- Input filename: /home/lefv2603/projects/msminhas93/DeepLabv3FineTuning-master/test-onnx.onnx ONNX IR version: 0.0.6 Opset version: 11 Producer name: pytorch Producer version: 1.5 Domain:
Model version: 0 Doc string:
—————————————————————- WARNING: ONNX model has a newer ir_version (0.0.6) than this parser was built against (0.0.3). [TRT] 677:Shape -> (4) [TRT] 678:Constant -> WARNING: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. Successfully casted down to INT32. While parsing node number 2 [Gather -> “679”]: — Begin node — input: “677” input: “678” output: “679” name: “Gather_2” op_type: “Gather” attribute { name: “axis” i: 0 type: INT }

## Simplifier le model onnx avec onnx-simplifier
Pour faire face "facilement" et "rapidement" à l'erreur rencontrée précédement `[8] Assertion failed: axis >= 0 && axis < nbDims`, et qui est recommandé par la plupart de ceux qui ont rencontré cette erreur, est d'utiliser le package onnx-simplifier. 
Référence: <https://github.com/daquexian/onnx-simplifier>
Le problème est qu'il dépend de onnxruntime, et que cette libraire n'est pas disponible dans l'environnement de Compute Canada. 
L'objectif est donc d'installer onnxrutime sur le Jetson nano. 
Mais les pré-requis sont d'avoir cuda et cudnn (qui viennent avec Jetpack mais qu'il faut rajouter aux env); une version de de cmake version 3.13 ou plus (`CMake 3.13 or higher is required.  You are running version 3.10.2`) alors que le L4T du Jetson vient avec la version 3.10.2. 

Références: 
* <https://github.com/onnx/onnx/blob/master/docs/PythonAPIOverview.md#optimizing-an-onnx-model>
* <https://github.com/daquexian/onnx-simplifier>
* <https://github.com/onnx/models#onnx-model-zoo>
* <https://github.com/microsoft/onnxruntime>
* <https://microsoft.github.io/onnxruntime/>
* <https://github.com/microsoft/onnxruntime/blob/master/BUILD.md#TensorRT>
* <https://github.com/microsoft/onnxruntime#official-builds>
* <https://developer.nvidia.com/cudnn>
* <https://smist08.wordpress.com/2019/04/03/playing-with-cuda-on-my-nvidia-jetson-nano/>
* <https://cmake.org/download/>

CUDA (NVIDIA GPU API programming libraries, from NVIDIA) and it's library specialized with neural network cuDNN

cuDNN is linked on a specifi CUDA version. Install CUDA first, then find out which cuDDN version. 

Also add the CUDA environment variables. 

What is also important to know is that some library have specific modules for that. Exemples: tensorflow-gpu; onnxruntime-gpu

Il faut rajouter dans les variables d'environnement le répertoire d'installation de cuda
* Vérifier si le compilateur CUDA peut être appelé

nvcc –help

* trouver ou est installé cuda et si les librairies cuDNN sont présentes

lefv2603@lefv2603-jetsonnano:~$ whereis cuda cuda: /usr/local/cuda lefv2603@lefv2603-jetsonnano:~$ whereis cudnn cudnn: /usr/include/cudnn.h ls -al /usr/local/cuda/bin ls -al /usr/local/cuda/lib64

* ajouter le chemin de cuda dans le PATH et LD_LIBRARY_PATH

export PATH=${PATH}:/usr/local/cuda/bin export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda/lib64

* vérifer que le compilateur cuda peut être exécuté

nvcc –help

* mettre à jour les variables d'environnement dans le fichier .bashrc

vi .bashrc . .bashrc

* vérifer en ouvrant une nouvelle console

nvcc –help

* <https://towardsdatascience.com/installing-tensorflow-with-cuda-cudnn-and-gpu-support-on-windows-10-60693e46e781>
* <https://developer.nvidia.com/cudnn>
* <https://devblogs.nvidia.com/tensor-ops-made-easier-in-cudnn/>

### Installation de Cmake-3.17.2
* Télécharger CMake-3.17.2 targ.gz
* unzip dans le répertoire Downloads
* voir le fichier RELEASE.rst dans le tar.gz pour l'installation de cmake 
* cd dans le répertoire cmake-3.17.2
* exécuter le bootstrap, make, et make install

lefv2603@lefv2603-jetsonnano:~/Downloads/cmake-3.17.2$ ./bootstrap && make && sudo make install

- S'il y a une erreur à l'installation de cmake avec OPENSSL, il faut installer les fichiers de developpement de OPENSSL. 

– Could NOT find OpenSSL, try to set the path to OpenSSL root folder in the system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY OPENSSL_INCLUDE_DIR) CMake Error at Utilities/cmcurl/CMakeLists.txt:454 (message): Could not find OpenSSL. Install an OpenSSL development package or configure CMake with -DCMAKE_USE_OPENSSL=OFF to build without OpenSSL.

sudo apt-get install libssl-dev

* Recommencer avec ./bootstrap
* ensuite make
  * cela prend beaucoup de temps. Allez vous coucher et revenez le lendemain pour continuer avec le sudo male install
* puis sudo make install
  * cette étape est très rapide

### Installation de onnxruntime 
* il faut installer cmake -3.17-2 avant
* il faut avoir rajouter dans PATH et LD_LIBRARY_PATH le chemin de CUDA
* récuprer le code de github

git clone –recursive https://github.com/Microsoft/onnxruntime

* rouler la commande d'installation

cd projects/onnxruntime ./build.sh –config RelWithDebInfo –build_shared_lib –parallel

* C'est aussi très long, peut-être même plus que pour cmake. 


## Installer Pytorch torchvision sur le nano
References: 
* <https://elinux.org/Jetson_Zoo>
* <https://gist.github.com/heavyinfo/8d0ef5a3768c5d8d22e164863670330a>
* <https://forums.developer.nvidia.com/t/pytorch-for-jetson-nano-version-1-5-0-now-available/72048#5324123>

From https://gist.github.com/heavyinfo/8d0ef5a3768c5d8d22e164863670330a #!/bin/bash

This script will install pytorch, torchvision on nano

If you have any of these installed already on your machine, you can skip those.

sudo apt-get -y update sudo apt-get -y upgrade # prend vraiment beaucoup de temps à completer #Dependencies sudo apt-get install python3-setuptools

#Installing PyTorch #For latest PyTorch refer original Nvidia Jetson Nano thread - https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/. wget https://nvidia.box.com/shared/static/ncgzus5o23uck9i5oth2n8n06k340l6k.whl -O torch-1.4.0-cp36-cp36m-linux_aarch64.whl sudo apt-get install python3-pip libopenblas-base sudo pip3 install Cython sudo pip3 install numpy torch-1.4.0-cp36-cp36m-linux_aarch64.whl

#Installing torchvision #For latest torchvision refer original Nvidia Jetson Nano thread - https://devtalk.nvidia.com/default/topic/1049071/jetson-nano/pytorch-for-jetson-nano/. sudo apt-get install libjpeg-dev zlib1g-dev

from train.py on https://github.com/dusty-nv/pytorch-segmentation/blob/master/train.py, it is used against this fork of torchvision: https://github.com/dusty-nv/vision/tree/v0.3.0

git clone –branch v0.5.0 https://github.com/pytorch/vision torchvision # see below for version of torchvision to download

git clone –branch v0.3.0 https://github.com/dusty-nv/vision torchvision # see below for version of torchvision to download cd torchvision sudo python setup.py install cd ../ # attempting to load torchvision from build dir will result in import error

checkout code from https://github.com/dusty-nv/pytorch-segmentation (link from https://github.com/dusty-nv/jetson-inference/tree/master/python/training)

TODO run a test to make sure all is fine

find a cheap baseline model to train to test the env

also the onnx export to test

and run the onnx

```