Comments (3)
Sure, go ahead and give it a try (and then maybe in a way that also covers things like "May 20th, 2023"). What might happen is that this breaks other productions. Maybe you can just extend the ruleDDMMYYYY
to accept a named month - ruleDDMM
does it already.
But actually, looking at your example, it is mainly a scoring problem - here I ordered the resolutions by score and there isn't much missing:
2023-08-17 20:20 (X/X) s=-357.082 p=(108, 103, 130, 'ruleHHMMmilitary', 'ruleDOM1', 'ruleNamedMonth', 'ruleDOMMonth', 'ruleLatentDOY', 'ruleDateTOD')
2023-08-17 20:20 (X/X) s=-367.858 p=(108, 103, 130, 'ruleNamedMonth', 'ruleHHMMmilitary', 'ruleDOM1', 'ruleDOMMonth', 'ruleLatentDOY', 'ruleDateTOD')
2020-08-17 X:X (X/X) s=-372.308 p=(108, 103, 111, 'ruleYear', 'ruleDOM1', 'ruleNamedMonth', 'ruleDOMMonth', 'ruleDOYYear')
2020-08-17 X:X (X/X) s=-379.009 p=(108, 103, 111, 'ruleNamedMonth', 'ruleDOM1', 'ruleDOMMonth', 'ruleYear', 'ruleDOYYear')
2020-08-17 X:X (X/X) s=-379.974 p=(126, 111, 'ruleDDMM', 'ruleYear', 'ruleDOYYear')
2020-08-X X:X (X/X) s=-695.337 p=(108, 103, 111, 'ruleNamedMonth', 'ruleDOM1', 'ruleYear', 'ruleMonthYear')
2023-08-17 X:X (X/X) s=-773.528 p=(108, 103, 130, 'ruleDOM1', 'ruleNamedMonth', 'ruleDOMMonth', 'ruleLatentDOY')
2023-08-17 X:X (X/X) s=-781.179 p=(108, 103, 130, 'ruleNamedMonth', 'ruleDOM1', 'ruleDOMMonth', 'ruleLatentDOY')
X-08-17 X:X (X/X) s=-790.539 p=(126, 111, 'ruleYear', 'ruleDDMM')
X-08-X X:X (X/X) s=-1291.254 p=(108, 103, 130, 'ruleHHMMmilitary', 'ruleDOM1', 'ruleNamedMonth', 'ruleLatentDOM')
X-08-X X:X (X/X) s=-1296.821 p=(108, 103, 130, 'ruleNamedMonth', 'ruleHHMMmilitary', 'ruleDOM1', 'ruleLatentDOM')
2020-X-X X:X (X/X) s=-1694.826 p=(108, 103, 111, 'ruleYear', 'ruleDOM1', 'ruleNamedMonth', 'ruleDOMMonth', 'ruleLatentDOY')
2023-06-01 20:20 (X/X) s=-1696.719 p=(108, 103, 130, 'ruleHHMMmilitary', 'ruleDOM1', 'ruleNamedMonth', 'ruleLatentDOM')
2020-X-X X:X (X/X) s=-1701.773 p=(126, 111, 'ruleDDMM', 'ruleLatentDOY', 'ruleYear')
2023-06-01 20:20 (X/X) s=-1702.286 p=(108, 103, 130, 'ruleNamedMonth', 'ruleHHMMmilitary', 'ruleDOM1', 'ruleLatentDOM')
2023-06-17 X:X (X/X) s=-1984.401 p=(108, 103, 130, 'ruleHHMMmilitary', 'ruleDOM1', 'ruleNamedMonth', 'ruleLatentDOM')
2023-06-17 X:X (X/X) s=-1989.968 p=(108, 103, 130, 'ruleNamedMonth', 'ruleHHMMmilitary', 'ruleDOM1', 'ruleLatentDOM')
2020-X-X X:X (X/X) s=-1701.432 p=(108, 103, 111, 'ruleNamedMonth', 'ruleDOM1', 'ruleDOMMonth', 'ruleLatentDOY', 'ruleYear')
X-X-17 X:X (X/X) s=-1994.620 p=(108, 103, 111, 'ruleNamedMonth', 'ruleDOM1', 'ruleYear', 'ruleMonthYear')
So maybe adding more examples of this kind to the data set used to train the scorer is the easier way to solve this problem.
from ctparse.
Hi, the problem you are facing is that ctparse
has a build-in (strong) bias of parsing current or future dates, relative to the reference time. Hence parsing this date in 2023 will strongly favour anything in 2023. And of the top of my head I see no easy way to solve that (it was a design decision matching the application this was build for). Of course, if can adjust your reference time to something in the past, it will work. However, at the price of other features not working as you would expect anymore (e.g. "tomorrow").
from ctparse.
I see, thanks for clarifying! Would it make sense to have a rule like ruleDDNamedMonthYYYY
, which is a common way of writing dates? There already is ruleDDMMYYYY
.
from ctparse.
Related Issues (20)
- The duration of "1 hour and 42 minutes" returns just 42 minutes, debug returns the values correctly HOT 3
- strings ctparse can't parse HOT 1
- parsing time intervals incorrectly HOT 2
- obsolete documentation HOT 2
- [Question] adding new attribute to Time object HOT 2
- Parse simple US dates
- `latent_time=False` works only half-way HOT 1
- Time interval parsing issues
- Get TimeDelta from Duration HOT 1
- What's the difference between time/corpus.py and time/auto_corpus.py? HOT 1
- Parse string of the type "7.6 evening/night"
- Time expression of the kind am 21.10.2015 früh 06:55 parse incorrectly HOT 4
- Time expression hangs ctparse when timeout=0 and max_stack_depth=0 HOT 2
- from Tuesday to Friday HOT 1
- Month and Year not joined
- Remove dependencies on sklearn HOT 1
- Time expression "Montag 9. März bis Mittwoch 11. März" doesn't parse correctly
- Time expression "02 Mär 2020 - 03 Mär 2020" fails to parse correctly
- Intervals of the kind "ab dem xx.xx.xx-xx.xx.xx" parsed incorrectly HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ctparse.