pbcquoc / vietocr Goto Github PK

Transformer OCR

License: Apache License 2.0

Python 20.32% Jupyter Notebook 79.68%

vietocr's Introduction

Hey! 👋

I'm a DataScientist, interested in AI and Machine Learning. I like to build things, you can find everything that I build here on my Github account

The best way to contact me is usually through Facebook or Email.

vietocr's People

Contributors

Stargazers

Watchers

Forkers

peternara ducbluee phatdatpq duongbk510 ducthangqd1998 ngohienduong thorpham dzungdoanh chanhvq linhduongtuan kennydizi akshay-ijantkar happog xuan2261 ndcuong91 haleluak chaitusvk sonnh173341 tranquansp jimmy-inl hoabuiyt nguyenquanghieu2000d chuan298 chuongnvk54 darkangelkid nguyendong07 manhcuongk55 ductho9799 anminhhung buiquangmanhhp1999 longnguyenegs tienthienhd holigonberg congphuc nightfuryyy vuviethung1998 liuzhuang1024 texervn thangdepzai dcthang ducbx phamphituan datdao1998 hoainam25699 baohoangbk235 minhnn-tiny huygithub23 doansangg phamvandan hoangtrongbinh1111 hieu28022000 ngminhhieu 0h3r0 duyvuleo benjamesbabala mlcom flyinggh bahao113 phamvu-it edwardnguyen1705 kenakai16 pro100share kidogb adam-vnuhus-zz syanhcva repo-collection lbtanh webnah giacnguyenbmt khuongnd datnnt1997 anhtv26062000 huynh-chinh nguyennoi1992 khuongbk nguyenanhdung1206 huynguyen-py holianh zhoulei-zl talk114 anhngml lexuanthinh diepdous daisy195 dndoanh dangnguyenngochai xuanhao321999 thithaotran micsoftvn lannguyen0910 vadanahaya stvgtrabkcm nhattuongvippro123 huypl53 caodoanh2001 ndn96 duclong1009 leducthanguet aspnetcs sownbanana

vietocr's Issues

vv sử dụng ảnh to

Dear a Quốc,

Hiên tại e thấy VietOCR đang đọc được ảnh bé. giờ e muốn đọc ảnh với kích thước A3 thì làm thế nào ạ

How to convert tranformer ocr model to onnx?

Hi anh,
Em đã sử dụng model tranformer ocr để train lại với dữ liệu của mình, hiện tại e muốn triển khai model với hiệu năng tốt hơn nên có ý định convert model sang định dạng onnx.
Vấn đề là phần xử lý để có từ đầu ra của model VietOCR đến khi thu được text còn nhiều bước xử lý và các bước này không kế thừa 1 nn.Module.
Vậy nên em muốn hỏi là model hay thư viện của mình có hỗ trợ hay a có gợi ý gì để em có thể chuyển model sang onnx được không ạ.
Em cảm ơn.

Về pretrain model

@pbcquoc Em chào anh,
Anh cho em hỏi với tập data 10m của anh thì anh train với bao nhiêu step để có pretrain model đó ạ?
Em cảm ơn anh!

Performance khi deloy lên môi trường product

Hello a @pbcquoc,
Khi deloy lên môi trường production cụ thể là website và sử dụng kiến trúc Seq2Seq để predict, em nhận thấy time để predict khá lâu khoảng 3-4s. Có cách nào để improve k a?
E cảm ơn.

Predicted acurracy is 0, some keys of config file was wrong.

Em chào anh, em có chạy file notebook. Thì em có gặp một số lỗi như sau:

Em có thực hiện bằng cách thay đổi thành các key khác:

config["optimizer"]["max_lr"] = 0.1
config["optimizer"]["total_steps"] = 4000

del config["optimizer"]["init_lr"]
del config["optimizer"]["n_warmup_steps"]

Ngoài ra thì em còn thấy missmatching shape model.
Khi tiến hành chạy thì kết quả dự đoán là 0%. Điều này có phải là do chưa train đủ lâu hay không ạ.

Em xin cảm ơn!

File pretrain cho backbone seq2seq.

Anh có thể chia sẻ file pretrain của seq2seq không ạ, em chỉ tìm được pretrain của transformer thôi ạ. Cảm ơn anh.

Dữ liệu khi training

Em chào anh, đầu tiên em xin cảm ơn anh về thư viện ocr là một thư viện rất tốt cho em và cộng đồng sử dụng. Em có một vấn đề mong anh giải đáp. Khi tạo dữ liệu để train mô hình attentionocr, thì các đoạn text có nên là các câu có nghĩa hay không ạ? Nếu như em cần OCR cả những chữ số, ký tự đặc biệt thì nên tạo data như nào ạ? Em đang gen từ ngẫu nhiên và lẫn lộn trong đó cả số và ký tự, ví dụ "Đồng bào con gà 12389 * &^^" , thì lúc train cho kết quả không như mong đợi. Em xin cảm ơn.

Một số vấn đề về mạng CNN

Em xin phép hỏi a một số vấn đề về mặt lý thuyết ạ.

Mạng VGG-19 a sử dụng có đầu ra trước khi flatten là 1xCx2x32 tương ứng BxCxHxW1 (với ảnh đầu vào 128x32). Khác với mã code của paper CRNN-CTC gốc là 1xCx1x33 ~ 33 timesteps. Ý tưởng paper CRNN-CTC là mỗi timesteps sẽ tương ứng với một vùng chữ nhật widthx32 trên ảnh gốc, còn đầu ra của anh thì chiều height featuremap không bằng 1 nên anh đã flatten để được 1xCx64 ~ 64 timesteps. Theo em thấy thì mỗi timestep trong 64 timestep kia chỉ biểu diễn được widthx16 trên ảnh gốc. Vậy tại sao anh lại thay đổi như vậy ?
Vấn đề thứ hai là anh nghĩ sử dụng AvgPool2D thay vì Maxpool2D sẽ có lợi gì không ạ.
Em cảm ơn anh.

Lỗi khi chạy demo

Dear anh,

Em có làm như hướng dẫn tạo file chạy demo : https://github.com/pbcquoc/vietocr/blob/master/vietocr_gettingstart.ipynb
Khi chạy e bị lỗi này. do máy e k có kết nối mạng

e đọc code thấy đoạn này kết nối đến github. e chuyển cái này thành file offline được k ạ.

Train với ảnh gray

Em chào anh ạ, anh cho em hỏi là em cần config như thế nào để có thể train với ảnh gray ạ, em có thử convert('L') trong hàm process_image() ở file translate.py nhưng em đang gặp lỗi ở phần augmentor ảnh ạ

Lỗi TypeError: vgg19_bn() got an unexpected keyword argument 'pretrained'

RuntimeError: CUDA error: no kernel image is available for execution on the device

Lỗi ký tự '/'

Chào anh, em đang sử dụng vietocr để train lại trên tập dữ liệu của em. Tuy nhiên em nhận thấy rằng model chạy rất tốt trên hầu hết các loại chữ nhưng thường lỗi ở ký tự '/' như thông tin liên quan ngày tháng có '/' ở giữa. Cho em hỏi có cách nào để cải thiện vấn đề này k ạ! Em cảm ơn!

model attention OCR

bạn ơi mình ko thấy trong code cũng như public model attention OCR mà chỉ thấy transformer OCR, bạn có thể cho mình link download được không.
Cám ơn tác giả

RuntimeError: Error(s) in loading state_dict for VietOCR

Hi anh,
Em tải weight vgg-transformer về sau đó load_state_dict thì gặp lỗi này:

size mismatch for transformer.embed_tgt.weight: copying a param with shape torch.Size([232, 256]) from checkpoint, the shape in current model is torch.Size([233, 256]).
size mismatch for transformer.fc.weight: copying a param with shape torch.Size([232, 256]) from checkpoint, the shape in current model is torch.Size([233, 256]).
size mismatch for transformer.fc.bias: copying a param with shape torch.Size([232]) from checkpoint, the shape in current model is torch.Size([233]).

Đây là file config của em:

vocab: 'aAàÀảẢãÃáÁạẠăĂằẰẳẲẵẴắẮặẶâÂầẦẩẨẫẪấẤậẬbBcCdDđĐeEèÈẻẺẽẼéÉẹẸêÊềỀểỂễỄếẾệỆfFgGhHiIìÌỉỈĩĨíÍịỊjJkKlLmMnNoOòÒỏỎõÕóÓọỌôÔồỒổỔỗỖốỐộỘơƠờỜởỞỡỠớỚợỢpPqQrRsStTuUùÙủỦũŨúÚụỤưƯừỪửỬữỮứỨựỰvVwWxXyYỳỲỷỶỹỸýÝỵỴzZ0123456789!"#$%&''()*+,-./:;<=>?@[\]^_`{|}~ '
device: cuda
weights: weights/transformerocr.pth
backbone: vgg19_bn
cnn:
    # pooling stride size
    ss:
        - [2, 2]
        - [2, 2]
        - [2, 1]
        - [2, 1]
        - [1, 1]         
    # pooling kernel size 
    ks:
        - [2, 2]
        - [2, 2]
        - [2, 1]
        - [2, 1]
        - [1, 1]
transformer:  
    d_model: 256
    nhead: 8
    num_encoder_layers: 6
    num_decoder_layers: 6
    dim_feedforward: 2048
    max_seq_length: 1024
    pos_dropout: 0.1
    trans_dropout: 0.1
seq_modeling: 'transformer'
beamsearch: False

tranformerorc.pth là pretrain trên dataset nào vậy ạ?

Error: Size mismatch for resnet_transformer

E đang thử mô hình resnet_transformer thì cũng đang gặp lỗi size mismatch (version đang là 0.1.9):

Đây là code em chạy:

config = Cfg.load_config_from_name('resnet_transformer')
config['device'] = 'cuda:0'
config['predictor']['beamsearch'] = False
detector = Predictor(config)

Đây là thông báo lỗi:


File "/home/duycuong/PycharmProjects/research_py3/text_recognition/classifier/vietocr/vietocr/eval.py", line 24, in <module>
    detector = Predictor(config)
  File "/home/duycuong/PycharmProjects/research_py3/text_recognition/classifier/vietocr/vietocr/tool/predictor.py", line 19, in __init__
    model.load_state_dict(torch.load(weights, map_location=torch.device(device)))
  File "/home/duycuong/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for VietOCR:
	Missing key(s) in state_dict: "cnn.model.conv0_1.weight", "cnn.model.bn0_1.weight", "cnn.model.bn0_1.bias", "cnn.model.bn0_1.running_mean", "cnn.model.bn0_1.running_var", "cnn.model.conv0_2.weight", "cnn.model.bn0_2.weight", "cnn.model.bn0_2.bias", "cnn.model.bn0_2.running_mean", "cnn.model.bn0_2.running_var", "cnn.model.layer1.0.conv1.weight", "cnn.model.layer1.0.bn1.weight", "cnn.model.layer1.0.bn1.bias", "cnn.model.layer1.0.bn1.running_mean", "cnn.model.layer1.0.bn1.running_var", "cnn.model.layer1.0.conv2.weight", "cnn.model.layer1.0.bn2.weight", "cnn.model.layer1.0.bn2.bias", "cnn.model.layer1.0.bn2.running_mean", "cnn.model.layer1.0.bn2.running_var", "cnn.model.layer1.0.downsample.0.weight", "cnn.model.layer1.0.downsample.1.weight", "cnn.model.layer1.0.downsample.1.bias", "cnn.model.layer1.0.downsample.1.running_mean", "cnn.model.layer1.0.downsample.1.running_var", "cnn.model.conv1.weight", "cnn.model.bn1.weight", "cnn.model.bn1.bias", "cnn.model.bn1.running_mean", "cnn.model.bn1.running_var", "cnn.model.layer2.0.conv1.weight", "cnn.model.layer2.0.bn1.weight", "cnn.model.layer2.0.bn1.bias", "cnn.model.layer2.0.bn1.running_mean", "cnn.model.layer2.0.bn1.running_var", "cnn.model.layer2.0.conv2.weight", "cnn.model.layer2.0.bn2.weight", "cnn.model.layer2.0.bn2.bias", "cnn.model.layer2.0.bn2.running_mean", "cnn.model.layer2.0.bn2.running_var", "cnn.model.layer2.0.downsample.0.weight", "cnn.model.layer2.0.downsample.1.weight", "cnn.model.layer2.0.downsample.1.bias", "cnn.model.layer2.0.downsample.1.running_mean", "cnn.model.layer2.0.downsample.1.running_var", "cnn.model.layer2.1.conv1.weight", "cnn.model.layer2.1.bn1.weight", "cnn.model.layer2.1.bn1.bias", "cnn.model.layer2.1.bn1.running_mean", "cnn.model.layer2.1.bn1.running_var", "cnn.model.layer2.1.conv2.weight", "cnn.model.layer2.1.bn2.weight", "cnn.model.layer2.1.bn2.bias", "cnn.model.layer2.1.bn2.running_mean", "cnn.model.layer2.1.bn2.running_var", "cnn.model.conv2.weight", "cnn.model.bn2.weight", "cnn.model.bn2.bias", "cnn.model.bn2.running_mean", "cnn.model.bn2.running_var", "cnn.model.layer3.0.conv1.weight", "cnn.model.layer3.0.bn1.weight", "cnn.model.layer3.0.bn1.bias", "cnn.model.layer3.0.bn1.running_mean", "cnn.model.layer3.0.bn1.running_var", "cnn.model.layer3.0.conv2.weight", "cnn.model.layer3.0.bn2.weight", "cnn.model.layer3.0.bn2.bias", "cnn.model.layer3.0.bn2.running_mean", "cnn.model.layer3.0.bn2.running_var", "cnn.model.layer3.0.downsample.0.weight", "cnn.model.layer3.0.downsample.1.weight", "cnn.model.layer3.0.downsample.1.bias", "cnn.model.layer3.0.downsample.1.running_mean", "cnn.model.layer3.0.downsample.1.running_var", "cnn.model.layer3.1.conv1.weight", "cnn.model.layer3.1.bn1.weight", "cnn.model.layer3.1.bn1.bias", "cnn.model.layer3.1.bn1.running_mean", "cnn.model.layer3.1.bn1.running_var", "cnn.model.layer3.1.conv2.weight", "cnn.model.layer3.1.bn2.weight", "cnn.model.layer3.1.bn2.bias", "cnn.model.layer3.1.bn2.running_mean", "cnn.model.layer3.1.bn2.running_var", "cnn.model.layer3.2.conv1.weight", "cnn.model.layer3.2.bn1.weight", "cnn.model.layer3.2.bn1.bias", "cnn.model.layer3.2.bn1.running_mean", "cnn.model.layer3.2.bn1.running_var", "cnn.model.layer3.2.conv2.weight", "cnn.model.layer3.2.bn2.weight", "cnn.model.layer3.2.bn2.bias", "cnn.model.layer3.2.bn2.running_mean", "cnn.model.layer3.2.bn2.running_var", "cnn.model.layer3.3.conv1.weight", "cnn.model.layer3.3.bn1.weight", "cnn.model.layer3.3.bn1.bias", "cnn.model.layer3.3.bn1.running_mean", "cnn.model.layer3.3.bn1.running_var", "cnn.model.layer3.3.conv2.weight", "cnn.model.layer3.3.bn2.weight", "cnn.model.layer3.3.bn2.bias", "cnn.model.layer3.3.bn2.running_mean", "cnn.model.layer3.3.bn2.running_var", "cnn.model.layer3.4.conv1.weight", "cnn.model.layer3.4.bn1.weight", "cnn.model.layer3.4.bn1.bias", "cnn.model.layer3.4.bn1.running_mean", "cnn.model.layer3.4.bn1.running_var", "cnn.model.layer3.4.conv2.weight", "cnn.model.layer3.4.bn2.weight", "cnn.model.layer3.4.bn2.bias", "cnn.model.layer3.4.bn2.running_mean", "cnn.model.layer3.4.bn2.running_var", "cnn.model.conv3.weight", "cnn.model.bn3.weight", "cnn.model.bn3.bias", "cnn.model.bn3.running_mean", "cnn.model.bn3.running_var", "cnn.model.layer4.0.conv1.weight", "cnn.model.layer4.0.bn1.weight", "cnn.model.layer4.0.bn1.bias", "cnn.model.layer4.0.bn1.running_mean", "cnn.model.layer4.0.bn1.running_var", "cnn.model.layer4.0.conv2.weight", "cnn.model.layer4.0.bn2.weight", "cnn.model.layer4.0.bn2.bias", "cnn.model.layer4.0.bn2.running_mean", "cnn.model.layer4.0.bn2.running_var", "cnn.model.layer4.1.conv1.weight", "cnn.model.layer4.1.bn1.weight", "cnn.model.layer4.1.bn1.bias", "cnn.model.layer4.1.bn1.running_mean", "cnn.model.layer4.1.bn1.running_var", "cnn.model.layer4.1.conv2.weight", "cnn.model.layer4.1.bn2.weight", "cnn.model.layer4.1.bn2.bias", "cnn.model.layer4.1.bn2.running_mean", "cnn.model.layer4.1.bn2.running_var", "cnn.model.layer4.2.conv1.weight", "cnn.model.layer4.2.bn1.weight", "cnn.model.layer4.2.bn1.bias", "cnn.model.layer4.2.bn1.running_mean", "cnn.model.layer4.2.bn1.running_var", "cnn.model.layer4.2.conv2.weight", "cnn.model.layer4.2.bn2.weight", "cnn.model.layer4.2.bn2.bias", "cnn.model.layer4.2.bn2.running_mean", "cnn.model.layer4.2.bn2.running_var", "cnn.model.conv4_1.weight", "cnn.model.bn4_1.weight", "cnn.model.bn4_1.bias", "cnn.model.bn4_1.running_mean", "cnn.model.bn4_1.running_var", "cnn.model.conv4_2.weight", "cnn.model.bn4_2.weight", "cnn.model.bn4_2.bias", "cnn.model.bn4_2.running_mean", "cnn.model.bn4_2.running_var". 
	Unexpected key(s) in state_dict: "cnn.cnn.features.0.weight", "cnn.cnn.features.0.bias", "cnn.cnn.features.1.weight", "cnn.cnn.features.1.bias", "cnn.cnn.features.1.running_mean", "cnn.cnn.features.1.running_var", "cnn.cnn.features.1.num_batches_tracked", "cnn.cnn.features.3.weight", "cnn.cnn.features.3.bias", "cnn.cnn.features.4.weight", "cnn.cnn.features.4.bias", "cnn.cnn.features.4.running_mean", "cnn.cnn.features.4.running_var", "cnn.cnn.features.4.num_batches_tracked", "cnn.cnn.features.7.weight", "cnn.cnn.features.7.bias", "cnn.cnn.features.8.weight", "cnn.cnn.features.8.bias", "cnn.cnn.features.8.running_mean", "cnn.cnn.features.8.running_var", "cnn.cnn.features.8.num_batches_tracked", "cnn.cnn.features.10.weight", "cnn.cnn.features.10.bias", "cnn.cnn.features.11.weight", "cnn.cnn.features.11.bias", "cnn.cnn.features.11.running_mean", "cnn.cnn.features.11.running_var", "cnn.cnn.features.11.num_batches_tracked", "cnn.cnn.features.14.weight", "cnn.cnn.features.14.bias", "cnn.cnn.features.15.weight", "cnn.cnn.features.15.bias", "cnn.cnn.features.15.running_mean", "cnn.cnn.features.15.running_var", "cnn.cnn.features.15.num_batches_tracked", "cnn.cnn.features.17.weight", "cnn.cnn.features.17.bias", "cnn.cnn.features.18.weight", "cnn.cnn.features.18.bias", "cnn.cnn.features.18.running_mean", "cnn.cnn.features.18.running_var", "cnn.cnn.features.18.num_batches_tracked", "cnn.cnn.features.20.weight", "cnn.cnn.features.20.bias", "cnn.cnn.features.21.weight", "cnn.cnn.features.21.bias", "cnn.cnn.features.21.running_mean", "cnn.cnn.features.21.running_var", "cnn.cnn.features.21.num_batches_tracked", "cnn.cnn.features.23.weight", "cnn.cnn.features.23.bias", "cnn.cnn.features.24.weight", "cnn.cnn.features.24.bias", "cnn.cnn.features.24.running_mean", "cnn.cnn.features.24.running_var", "cnn.cnn.features.24.num_batches_tracked", "cnn.cnn.features.27.weight", "cnn.cnn.features.27.bias", "cnn.cnn.features.28.weight", "cnn.cnn.features.28.bias", "cnn.cnn.features.28.running_mean", "cnn.cnn.features.28.running_var", "cnn.cnn.features.28.num_batches_tracked", "cnn.cnn.features.30.weight", "cnn.cnn.features.30.bias", "cnn.cnn.features.31.weight", "cnn.cnn.features.31.bias", "cnn.cnn.features.31.running_mean", "cnn.cnn.features.31.running_var", "cnn.cnn.features.31.num_batches_tracked", "cnn.cnn.features.33.weight", "cnn.cnn.features.33.bias", "cnn.cnn.features.34.weight", "cnn.cnn.features.34.bias", "cnn.cnn.features.34.running_mean", "cnn.cnn.features.34.running_var", "cnn.cnn.features.34.num_batches_tracked", "cnn.cnn.features.36.weight", "cnn.cnn.features.36.bias", "cnn.cnn.features.37.weight", "cnn.cnn.features.37.bias", "cnn.cnn.features.37.running_mean", "cnn.cnn.features.37.running_var", "cnn.cnn.features.37.num_batches_tracked", "cnn.cnn.features.40.weight", "cnn.cnn.features.40.bias", "cnn.cnn.features.41.weight", "cnn.cnn.features.41.bias", "cnn.cnn.features.41.running_mean", "cnn.cnn.features.41.running_var", "cnn.cnn.features.41.num_batches_tracked", "cnn.cnn.features.43.weight", "cnn.cnn.features.43.bias", "cnn.cnn.features.44.weight", "cnn.cnn.features.44.bias", "cnn.cnn.features.44.running_mean", "cnn.cnn.features.44.running_var", "cnn.cnn.features.44.num_batches_tracked", "cnn.cnn.features.46.weight", "cnn.cnn.features.46.bias", "cnn.cnn.features.47.weight", "cnn.cnn.features.47.bias", "cnn.cnn.features.47.running_mean", "cnn.cnn.features.47.running_var", "cnn.cnn.features.47.num_batches_tracked", "cnn.cnn.features.49.weight", "cnn.cnn.features.49.bias", "cnn.cnn.features.50.weight", "cnn.cnn.features.50.bias", "cnn.cnn.features.50.running_mean", "cnn.cnn.features.50.running_var", "cnn.cnn.features.50.num_batches_tracked", "cnn.cnn.classifier.0.weight", "cnn.cnn.classifier.0.bias", "cnn.cnn.classifier.3.weight", "cnn.cnn.classifier.3.bias", "cnn.cnn.classifier.6.weight", "cnn.cnn.classifier.6.bias". 
	size mismatch for transformer.embed_tgt.weight: copying a param with shape torch.Size([232, 512]) from checkpoint, the shape in current model is torch.Size([233, 256]).
	size mismatch for transformer.pos_enc.pe: copying a param with shape torch.Size([10000, 1, 512]) from checkpoint, the shape in current model is torch.Size([1024, 1, 256]).
	size mismatch for transformer.transformer.encoder.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.encoder.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.encoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.encoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.0.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.encoder.layers.0.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.encoder.layers.0.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.0.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.0.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.0.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.0.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.1.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.encoder.layers.1.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.encoder.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.encoder.layers.1.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.1.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.encoder.layers.1.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.encoder.layers.1.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.1.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.1.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.1.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.1.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.2.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.encoder.layers.2.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.encoder.layers.2.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.encoder.layers.2.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.2.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.encoder.layers.2.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.encoder.layers.2.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.2.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.2.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.2.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.2.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.3.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.encoder.layers.3.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.encoder.layers.3.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.encoder.layers.3.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.3.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.encoder.layers.3.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.encoder.layers.3.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.3.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.3.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.3.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.3.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.4.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.encoder.layers.4.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.encoder.layers.4.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.encoder.layers.4.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.4.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.encoder.layers.4.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.encoder.layers.4.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.4.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.4.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.4.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.4.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.5.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.encoder.layers.5.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.encoder.layers.5.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.encoder.layers.5.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.5.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.encoder.layers.5.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.encoder.layers.5.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.5.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.5.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.5.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.layers.5.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.encoder.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.0.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.0.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.multihead_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.0.multihead_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.0.multihead_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.0.multihead_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.decoder.layers.0.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.decoder.layers.0.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.0.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.1.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.1.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.multihead_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.1.multihead_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.1.multihead_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.1.multihead_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.decoder.layers.1.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.decoder.layers.1.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.1.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.2.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.2.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.2.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.multihead_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.2.multihead_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.2.multihead_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.2.multihead_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.decoder.layers.2.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.decoder.layers.2.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.2.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.3.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.3.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.3.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.multihead_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.3.multihead_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.3.multihead_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.3.multihead_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.decoder.layers.3.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.decoder.layers.3.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.3.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.4.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.4.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.4.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.multihead_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.4.multihead_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.4.multihead_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.4.multihead_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.decoder.layers.4.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.decoder.layers.4.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.4.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.self_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.5.self_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.5.self_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.5.self_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.multihead_attn.in_proj_weight: copying a param with shape torch.Size([1536, 512]) from checkpoint, the shape in current model is torch.Size([768, 256]).
	size mismatch for transformer.transformer.decoder.layers.5.multihead_attn.in_proj_bias: copying a param with shape torch.Size([1536]) from checkpoint, the shape in current model is torch.Size([768]).
	size mismatch for transformer.transformer.decoder.layers.5.multihead_attn.out_proj.weight: copying a param with shape torch.Size([512, 512]) from checkpoint, the shape in current model is torch.Size([256, 256]).
	size mismatch for transformer.transformer.decoder.layers.5.multihead_attn.out_proj.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.linear1.weight: copying a param with shape torch.Size([2048, 512]) from checkpoint, the shape in current model is torch.Size([2048, 256]).
	size mismatch for transformer.transformer.decoder.layers.5.linear2.weight: copying a param with shape torch.Size([512, 2048]) from checkpoint, the shape in current model is torch.Size([256, 2048]).
	size mismatch for transformer.transformer.decoder.layers.5.linear2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.norm1.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.norm1.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.norm2.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.norm2.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.norm3.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.layers.5.norm3.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.norm.weight: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.transformer.decoder.norm.bias: copying a param with shape torch.Size([512]) from checkpoint, the shape in current model is torch.Size([256]).
	size mismatch for transformer.fc.weight: copying a param with shape torch.Size([232, 512]) from checkpoint, the shape in current model is torch.Size([233, 256]).
	size mismatch for transformer.fc.bias: copying a param with shape torch.Size([232]) from checkpoint, the shape in current model is torch.Size([233]).

TypeError: init() got an unexpected keyword argument 'n_warmup_steps'

Em nhận được thông báo này khi training ạ. Mong anh chỉ dẫn cách fix lỗi này ạ

Computing MD5: /tmp/tranformerorc.pth
MD5 matches: /tmp/tranformerorc.pth
transformer.embed_tgt.weight missmatching shape
transformer.fc.weight missmatching shape
transformer.fc.bias missmatching shape
Traceback (most recent call last):
  File "main.py", line 26, in <module>
    trainer = Trainer(config, pretrained=True)
  File "/Volumes/DATA/DRAF/Deep Learning/DetectHand2/vietocr/model/trainer.py", line 62, in __init__
    self.scheduler = OneCycleLR(self.optimizer, **config['optimizer'])
TypeError: __init__() got an unexpected keyword argument 'n_warmup_steps'

Dưới dây là code của em:

from vietocr.tool.config import Cfg
from vietocr.model.trainer import Trainer


if __name__ == '__main__':
    config = Cfg.load_config_from_name('vgg_transformer')
    dataset_params = {
        'name': 'hw',
        'data_root': './data/',
        'train_annotation': 'train_annotation.txt',
        'valid_annotation': 'test_annotation.txt'
    }

    params = {
        'print_every': 200,
        'valid_every': 15 * 200,
        'iters': 20000,
        'checkpoint': './checkpoint/transformerocr_checkpoint.pth',
        'export': './weights/transformerocr.pth',
        'metrics': 10000
    }

    config['trainer'].update(params)
    config['dataset'].update(dataset_params)

    trainer = Trainer(config, pretrained=True)
    trainer.visualize_dataset()

Lõi khi chạy demo

Em làm theo file hướng dẫn thì gặp lỗi ở 2 phiên bản khác nhau. Em xin cách khắc phục ạ!

Lỗi khi load model

Em chào anh ạ,
Em dùng version 0.3.5 và chạy theo hướng dẫn của anh. Đến bước khai báo "detector = Predictor(config)" thì em gặp lỗi:

Em có thử thay bằng torch.jit.load() nhưng cũng không được, anh giúp em với ạ.

Cách xử lý ảnh đầu vào để train theo batch như thế nào ?

Em chào anh.

Trước hết em cảm ơn anh vì thư viện VietOCR thực sự có ích cho cộng đồng học máy nói chung và em nói riêng. Trong quá trình tìm hiểu thư viện VietOcr, em có một thắc mắc nhỏ là em chưa tìm thấy cách xử lý ảnh line text đầu vào có chiều dài khác nhau để có thể train được theo từng batch. Nếu có thể, anh có thể chia sẻ giúp em được không ?

Em cảm ơn anh. Chúc anh trong thời gian tới, công việc tiếp tục phát triển và có nhiều đóng góp cho cộng đồng AI Việt Nam hơn nữa.

Hướng dẫn sử dụng model VGG19-bn - Seq2Seq

Dear a Quốc,

Cho e hỏi a có model train của vgg_seq2seq và hướng dẫn dùng nó không. Em đang dùng vgg_transformer thấy nó đọc chậm thật

OCR CMND

Xin chào anh Quốc,
Cám ơn anh đã tạo ra thư viện OCR cho người việt. Em có 3 vần đề muốn hỏi anh là:
+ Em muốn lấy thông tin từ CMND thông qua Transformer thì em có thể scan 1 lần toàn bộ CMND không ạ hay phải cắt nhỏ ra từng box?
+ Nếu train CMND thì em phải train tầm bao nhiêu anh thì có thể sử dụng ạ?
+ Do em làm .NET nên em có thử download file weights như hình thì thấy kết quả ra không đúng, không biết em có làm sai bước nào không? :D

input
Result

=> Kết quả không như mong muốn ạ. Anh có thể chỉ em cách để tăng độ chính xác?

Chúc anh nhiều sức khỏe.

Bounding box khác nhau cho kết quả khác nhau

Hi a Quốc, em đang test thử model với 1 số loại chữ khác nhau thì gặp vấn đề khi thay đổi vùng crop. Ví dụ như sau:

Kết quả vgg19_bn-seq2seq: UASORU7250KXXY

Kết quả vgg19_bn-seq2seq: UASCRU7250KXY

Kết quả vgg19_bn-seq2seq: UASORU7250KXY

Khi e test bằng model transformer thì cũng bị vấn đề tương tự. Vậy có cách nào để giải quyết vấn đề này hoặc có gợi ý nào về việc crop đẹp để model đọc tốt ko anh?

Số lượng dữ liệu train

Em chào anh ạ, anh cho em hỏi là train từ đầu thì số lượng ảnh để train khoảng bao nhiêu là đủ tốt ạ.
Em cảm ơn ạ

Train mới bị lỗi

Em cần train tiếp với bộ data anh cung cấp. Nhưng lúc train bị lỗi như vầy
Em để nguyên các folder vi_00,vi_01,.. và tạo mới file train_annotation.txt và test annotation.txt nên các file này sẽ như v: vi_00 abc
vi_01 aksk
....
Em không biết nên sửa sao

TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Hi anh, em đang train model của anh theo code trong file vietocr_gettingstart.ipynb thì gặp lỗi sau:

Dòng code em chạy là:
trainer = Trainer(config, pretrained=True)

Closed

Missing_key khi predict bằng attention_seq2seq

Em chào anh, em predict thử bằng attention seq2seq nhưng bị lỗi này, em nghĩ do file weight của anh là do train với transformer nên bị lỗi như vậy. Nếu em muốn dùng attention seq2seq thì phải chỉnh như thế nào ạ?
Mong anh chỉ giúp, em cảm ơn ạ.

KeyError: Caught KeyError in DataLoader worker process 1.

Chào anh, em train lại model của anh trên dataset SROIE, em đã config lại dataset cho giống dataset của anh mà khi train nó bị lỗi này em fix hoài không được ,

Code và dataset của em trên colab
em cảm ơn

Error while training for new dataset?

When i tried to use your notebook and your dataset to train new model, i had an error "This DataLoader will create 3 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create". And when i set num_wokers = 2 in config file, it occured new error: "StopIteration".
Can you help me fix this error? Thank you!

How to get probability value when predict image??

Anh ơi, em muốn lấy được giá trị xác suất sau khi đọc ảnh xong thì mình lấy như nào ạ!

Dùng CPU để chạy thử

Tác giả cho hỏi, máy mình hiện tại ko có GPU, xuất hiện lỗi này

Giờ mình phải config tại đâu để có thể chạy thử bằng CPU vậy ạ ? Cảm ơn tác giả

lỗi khi sử dụng file predict

sau khi em clone có build và instal. nhưng khi predict thì lại xảy ra lỗi, anh có thể giúp em được không ạ

vocab trong việt ocr

Mình có 1 chút thắc mắc về xử lý target trong vietocr

Trong vocab.py có 1 biến là self.mask_token không biết biến đấy có ý nghĩa là gì, mình có thể hiểu là giá trị unknown trong vocab không ?
Target được feed vào transformer có fix_length hay không ?
Nếu fix length thì len của hàm len sẽ là max_length+2 thay vì là +4

Điểm prob của vietocr

Hi anh,
Em có sử dụng và fine tune lại Vietocr. Trong quá trình sử dụng thì em thấy prob trả về hầu như là 0.93. Sau khi fine tune thì prob trả về chủ yếu là 0.91. Anh cho em hỏi là cách tính prob của Vietocr như thế nào và tại sao prob trả về hầu như là giống nhau đến thế. Tks anh!

Tiếp tục training model

hi anh !
Anh cho em hỏi là em muốn tiếp tục training model đang trên dở thì setup thế nào ạ ? Em đã thử truyền model vào checkpoint mà không được ???

Error: size mismatch for transformer.embed_tgt.weight: copying a param with shape torch.Size([232, 256]) from checkpoint, the shape in current model is torch.Size([233, 256])

Hi a Quốc, em đang test model của anh theo code bên dưới thì gặp lỗi sau:

Traceback (most recent call last):
  File "/home/duycuong/PycharmProjects/research_py3/text_recognition/classifier/vietocr/vietocr_gettingstart.py", line 13, in <module>
    detector = Predictor(config)
  File "/home/duycuong/PycharmProjects/research_py3/text_recognition/classifier/vietocr/vietocr/tool/predictor.py", line 19, in __init__
    model.load_state_dict(torch.load(weights, map_location=torch.device(device)))
  File "/home/duycuong/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for VietOCR:
	size mismatch for transformer.embed_tgt.weight: copying a param with shape torch.Size([232, 256]) from checkpoint, the shape in current model is torch.Size([233, 256]).
	size mismatch for transformer.fc.weight: copying a param with shape torch.Size([232, 256]) from checkpoint, the shape in current model is torch.Size([233, 256]).
	size mismatch for transformer.fc.bias: copying a param with shape torch.Size([232]) from checkpoint, the shape in current model is torch.Size([233]).

Đây là code em chạy

from PIL import Image
from vietocr.tool.predictor import Predictor
from vietocr.tool.config import Cfg

config = Cfg.load_config_from_name('vgg_transformer')

config['device'] = 'cuda:0'
config['predictor']['beamsearch']=False
detector = Predictor(config)

Traning with dual GPU

Mình traning x2 RTX 3060 thì tham số "device" để như thế nào vậy Quốc?

Thanks.

Training bị đứng trên RTX 2080

Chào anh, em chạy thử quá trình huấn luyện với code mẫu trên máy mình với GPU RTX 2080, CUDA 10.1, nhưng bị đứng khi bước Trainer.train().

Tuy nhiên khi chạy bằng colab ở notebook getting started thì chạy bình thường.

Em có debug thử thì nó nằm ở vấn đề trong việc lấy ra từng batch, ở dòng 100 trong file trainer.py của thư mục model.

Em chưa tìm ra cách fix được cái lỗi này, vấn đề này anh có từng gặp chưa ạ.

reproducing kết quả vietocr_gettingstart.ipynb

Chào Quốc,

Cảm ơn bạn vì vietocr,
Mình hiện đang chạy thử lại phần training trong vietocr_gettingstart.ipynb với phiên bàn 0.3.2 nhưng với 20000 iterations thì kết quả đang thấp.
Hiện mình đã chạy thử khoảng 180k iterations nhưng kết quả khoảng 0.30 cho full câu và 0.70 cho chars.
Để tiện so sánh nên mình muốn hỏi với bộ dataset trong notebook thì bạn đạt được kết quả bao nhiều về acc full câu và phải training với bao nhiều iterations?

Cảm ơn bạn trước nhé!

ModuleNotFoundError: No module named 'vietocr.loader.DataLoader'

Bạn ơi bạn có thể check giúp mình lỗi này được ko ạ. Khi mình chạy version 0.1.9 thì bị lỗi này. Mình check trong code thì báo lỗi dòng này "from vietocr.loader.DataLoader import DataGen"

Traceback (most recent call last):
  File "main.py", line 2, in <module>
    from vietocr.model.trainer import Trainer
  File "/Volumes/DATA/STUDY/AI/Buoi05/DetectHand2/lib/python3.7/site-packages/vietocr/model/trainer.py", line 11, in <module>
    from vietocr.loader.DataLoader import DataGen

error trong phần decode khi train

chào bạn, khi mình fine tune mô hình seq2seq thì gặp lỗi bên dưới, bạn có thể giải thích giúp mình được không

Lỗi khi load model vgg_seq2seq

Dear anh,

Khi dùng code của anh em có load thử model vgg_seq2seq thì gặp lỗi này ạ (em đã cập nhật vietocr version mới nhất)

Không biết các tham số của mô hình này em có điền sai ở đâu không ạ vì mô hình vgg_transformer em vẫn dùng được bình thường

StopIteration: During handling of the above exception, another exception occurred:

Dạ chào anh, em có thử train code của anh với dataset nhỏ ( chỉ gồm 4 ảnh) để thử nghiệm , quá trình tạo dataset thành công nhưng khi visualize lại không hiện ra gì và train thì hiện lỗi như hình ạ, không biết có phải chỉnh lại tham số valid hay iter gì cho phù hợp với số lượng ảnh trong dataset không ạ. Em cảm ơn

Dùng custom vocab khi test

Chào bạn, mình muốn chỉnh sửa vocab của mình thành chỉ những số từ 0-9 khi prediect thì có thể chỉnh được ở đâu ạ.

Khi mình chỉnh config['vocab'] = '0123456789' khi init_detector thì bị lỗi size mismatch.

Lỗi size mismatch

Em chào anh
Em đang train mô hình OCR sử dụng thư viện vietocr, em đã từng train vài lần trước đây nhưng hôm nay gặp một lỗi lạ, anh có thể giúp em xem đây là lỗi gì không ạ.
Em cảm ơn anh ạ.

Computing MD5: /tmp/tranformerorc.pth
MD5 matches: /tmp/tranformerorc.pth
transformer.decoder.embedding.weight missmatching shape, required torch.Size([256, 256]) but found torch.Size([233, 256])
transformer.decoder.fc_out.weight missmatching shape, required torch.Size([256, 1024]) but found torch.Size([233, 1024])
transformer.decoder.fc_out.bias missmatching shape, required torch.Size([256]) but found torch.Size([233])

Kích thước đầu vào transformer

Em chào anh và mọi người, em có thắc mắc cần mọi người giải đáp hộ ạ.
Như em từng đọc 1 bài làm áp dụng transformer cho câu văn bản thì đầu vào cho encoder cần padding để các câu có độ dài bằng nhau.
Nhưng em thấy ở đây, chiều rộng của ảnh đầu vào là khác nhau, sau lớp CNN thì kích thước input là W N C trong đó W khác nhau
Vậy nên anh và mọi ngườigiúp em rằng ở đây có cần padding để input có độ dài bằng nhau không ạ. Em cảm ơn ạ

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Hello anh Quốc,
E đang thực hiện retrain model, nhưng khi train thì gặp lỗi này và research thì không hiểu gặp vấn đề gì, mong a giúp đỡ
E sử dụng colab, torch 1.9 cuda 10.2, Vietocr ver 0.3.5
Lỗi như sau ạ :

----> 1 trainer = Trainer(config, pretrained=True)

8 frames
/usr/local/lib/python3.7/dist-packages/vietocr/model/trainer.py in __init__(self, config, pretrained, augmentor)
     30 
     31         self.config = config
---> 32         self.model, self.vocab = build_model(config)
     33 
     34         self.device = config['device']

/usr/local/lib/python3.7/dist-packages/vietocr/tool/translate.py in build_model(config)
    128             config['seq_modeling'])
    129 
--> 130     model = model.to(device)
    131 
    132     return model, vocab

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in to(self, *args, **kwargs)
    850             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    851 
--> 852         return self._apply(convert)
    853 
    854     def register_backward_hook(

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    528     def _apply(self, fn):
    529         for module in self.children():
--> 530             module._apply(fn)
    531 
    532         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    528     def _apply(self, fn):
    529         for module in self.children():
--> 530             module._apply(fn)
    531 
    532         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    528     def _apply(self, fn):
    529         for module in self.children():
--> 530             module._apply(fn)
    531 
    532         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    528     def _apply(self, fn):
    529         for module in self.children():
--> 530             module._apply(fn)
    531 
    532         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    550                 # `with torch.no_grad():`
    551                 with torch.no_grad():
--> 552                     param_applied = fn(param)
    553                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    554                 if should_use_set_data:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in convert(t)
    848                 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
    849                             non_blocking, memory_format=convert_to_format)
--> 850             return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
    851 
    852         return self._apply(convert)
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Cảm ơn anh.

predict text with batch images

I'm using Tesseract to detect texts and extract text regions. Those images have the different sizes.

How could I use the transformerOCR as the batch predicter for those images?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.