Comments (6)
Hi Zhigang, where are the test images from? Since the pretrained model was trained on latex rendered in a vanilla setting, anything out-of-domain wouldn't work (likely). To get a model that can recognize any picture in the world, we need to add distortions and artifacts to the training data (via data augmentation), or include handwritten data (as Mathpix did), then the trained model can work under various settings.
from im2markup.
Hi Zhigang, where are the test images from? Since the pretrained model was trained on latex rendered in a vanilla setting, anything out-of-domain wouldn't work (likely). To get a model that can recognize any picture in the world, we need to add distortions and artifacts to the training data (via data augmentation), or include handwritten data (as Mathpix did), then the trained model can work under various settings.
Thanks for replying! Test images are screenshots from arbitrary sources like paper, book or images from google results. There's little noise. And may I ask, have you tested the model on these sources before and how it behaved?
Thanks to your reminding of data augmentation, I'll think this way.
from im2markup.
Oh, that's why. I tried on screenshots before and they didn't work well. However, I'm pretty sure if you include those variants in the training set it would work, as shown by Mathpix.
from im2markup.
Got high generalization error when predicting using latex formula picture in real word, for example, below is a predict for one formula picture:
\begin{array} { c c } { { { { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } & { } &
And this is my training result:
EM 14.03 - BLEU-4 74.61 - perplexity -1.42 - Edit 78.67
Has someone stuck in the same situation as me?
hello,have you solved this problem?I have the same problem as yours
from im2markup.
Hi @hengyeliu this is a normal behavior of neural network based approaches. The released model is only pretrained on a particular rendering of LaTeX symbols, so it is unrobust against noise at all. To make it work for real formulas, you need to add noise during training as well.
from im2markup.
Hi @hengyeliu this is a normal behavior of neural network based approaches. The released model is only pretrained on a particular rendering of LaTeX symbols, so it is unrobust against noise at all. To make it work for real formulas, you need to add noise during training as well.
Thanks for your reply, I will try your suggestion
from im2markup.
Related Issues (20)
- - HOT 1
- not working for below type of images (other than given by you). I think we need to put images in particular format HOT 8
- can anyone share the trained model file which is genralized on any type of image like mathpix HOT 3
- [Please Respond] Can you help me training the model for to recognize the out of given data image set HOT 1
- how to remove katex parser error HOT 1
- target vocab size HOT 5
- There is a bug in preprocess_latex.js HOT 3
- error importation cudnn HOT 20
- [regarding real dataset] Please respond HOT 18
- I am getting None with intermediate weights HOT 1
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 2270: invalid continuation byte HOT 7
- How to make code show predicted mathematical expression in latex format HOT 1
- can you explain about value 'Accuracy'?
- why downsample by 2 in preprocess HOT 2
- Why using lua instead of python? HOT 1
- can you explain src\modeel\cnn.lua
- Getting low accuracy using customized images for test. HOT 2
- 'perl' and 'cat' is not recognized
- Can you provide a vocab dictionary?
- The python version of the dataset resource is not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from im2markup.