karlicoss / orgparse Goto Github PK
View Code? Open in Web Editor NEWPython module for reading Emacs org-mode files
Home Page: https://orgparse.readthedocs.org
License: BSD 2-Clause "Simplified" License
Python module for reading Emacs org-mode files
Home Page: https://orgparse.readthedocs.org
License: BSD 2-Clause "Simplified" License
I'm trying to retrieve the repeat part of a scheduled date but it is not clear if it saved during parsing. It seems the date regex take care of that (cookie part from what I've understood) but then I get lost. On simple example I would like to use:
import orgparse
node = orgparse.loads('''
* Pay the rent
SCHEDULED: <2020-01-01 Wed +1m>
''').children[0]
time = node.scheduled
# repeat = node.???
Sorry for asking as an Issue but I couldn't find another "channel" for that.
I am new to orgparse
and org syntax. I would like to know why the TITLE
entry in _special_comments
is of type list? In my cases there is only one entry not more. In what cases would there be more then one element in that list? Maybe you can give an example org file to illustrate that.
>>> orgobj._special_comments['TITLE']
['My Title']
Your current folder structure looks like this
orgparse
├── doc
│ ├── ...
│ └── source
│ └── ...
├── ...
├── orgparse
│ ├── date.py
│ ├── extra.py
│ ├── __init__.py
│ ├── inline.py
│ ├── node.py
│ ├── py.typed
│ ├── tests
│ │ ├── data
│ │ │ └── ...
│ │ ├── __init__.py
│ │ ├── test_data.py
│ │ ├── test_date.py
│ │ ├── test_hugedata.py
│ │ ├── test_misc.py
│ │ └── test_rich.py
│ └── utils
│ ├── __init__.py
│ ├── _py3compat.py
│ └── py3compat.py
├── pytest.ini
├── README.rst
├── setup.py
└── tox.ini
The problem in short
Your current folder structure looks like this
orgparse
├── orgparse
│ ├── __init__.py
│ ├── ...
│ ├── tests
│ │ └── *.py
Because of that you ship your package (via pypi or distros) always including your unittests. There is no need for this. The package is blown up, resources (data transfer) are wasted and CO2 produced.
Beside of resources it is unusual today. There are also some other packaing and unittest related problems.
The recommended structure looks like this.
orgparse
├── orgparse
│ ├── __init__.py
│ ├── ...
├── tests
│ └── *.py
Today there are some variants of project folder layouts (e.g. the so called src
-layout) but all of them have in common that package folder (in your case orgparse/orgparse
) is always separted from the tests folder (orgparse/test
).
Because you are using tox
I am not able to provide you with a PR. I am sure tox can handle that but I assume that tox need to be reconfigured after modifying the folder structure.
There are also some problems with your unittest invocation which I report in a separate Issue (#56).
Hi,
I'm using orgparse to get a list of recent clock entries. However I don't see how to access the properties and comments inside the clock entries.
Here's an example:
* AHU-Tickets
** TODO [#C] a sample ticket with priority, in my AHU project :AHU_39:
:PROPERTIES:
:assignee: Matthew Carter
:filename: this-years-work
:reporter: Matthew Carter
:type: Story
:priority: Medium
:status: To Do
:created: 2019-01-24T23:24:54.321-0500
:updated: 2021-07-19T18:40:30.722-0400
:ID: AHU-39
:CUSTOM_ID: AHU-39
:type-id: 10100
:END:
:LOGBOOK:
CLOCK: [2022-02-24 Thu 20:30]--[2022-02-24 Thu 20:35] => 0:05
:id: 10359
Sample time clock entry
:END:
*** description: [[https://example.atlassian.net/browse/AHU-39][AHU-39]]
The summary is here
*** Comment: Matthew Carter
:PROPERTIES:
:ID: 10680
:created: 2019-01-24T23:25:19.455-0500
:updated: 2019-01-24T23:27:36.125-0500
:END:
From: org-jira.
We can see the clock entry:
CLOCK: [2022-02-24 Thu 20:30]--[2022-02-24 Thu 20:35] => 0:05
And I would like to access the properties:
:id: 10359
and the comments:
Sample time clock entry
Is that possible?
Now orgparse unable parse multiline properties. Properties where new values/new lines for same items presented as
:PROPERTIES:
:item: value 1
:item+: same item value 2
:another_item: another value
orgparse thinks that they are different items because it works with just single line properties.
Is it possible to add such thing?
Line 135 in 36b31d8
I am preparing a little bugfix but stumbled across this line where you use codecs.open()
instead of builtin.open()
or pathlib.Path.open()
.
From your current point of view:
Is there a good and strict reason for this?
Are there any reasons against using the usual builtin.open()
or pathlib.Path.open()
?
If there is no need for using codecs
I will show you in a PR what I have in my mind. ;)
Line 40 in 702faa6
Hi,
There may be a mistake in this regexp. An Org heading requires one or more stars at the beginning of the line, followed by one or more spaces. But this regexp appears to make the spaces after the stars optional. This is important, because a star at the beginning of a line, without spaces afterward, may be the beginning of bold text.
Thanks for your recent work on orgparse, it looks promising.
It would be great if orgparse supported lists like this:
- one
- two
- three
I was able to port my code over from org-export-json to orgparse by parsing out the items from node.body myself, but I probably did a bad job of it, and that's exactly the sort of core org syntax thing that I'd love to be able to rely on orgparse for.
My CLOCK are generated this way:
CLOCK: [2020-09-09 Wed 09:04 CEST]--[2020-09-09 Wed 09:04 CEST] => 0:00
Notice the timezone information. I need to be able to parse this timezone information to have non-naive datetimes.
Indeed, I'm travelling as I work, and use these clock information generated in local times all over the world.
They need to be converted then to my customers timezone.
Anyway, it seems quite desirable to handle only non-naive datetime anyway.
Note also that I don't think I've done anything special to change the time format of emacs.
First, orgparse
don't recognize this because of the timezone information that is not expected.
I'm able to fix that, but then, what to do with the timezone information ?
python's datetime
is a really big mess and does not know how to solve this alone:
datetime.datetime.strptime
Otherwise you must create yourself a tzinfo
object...
So, the only way to do this correctly is:
OrgDateClock
so that we can do some proper parsing of this info (but then, how do you check duration and I didn't check if you had other code expecting that OrgDateClock were holding actual datetime object).How do you view this problem and are you interested to support full non-naive time parsing ?
I'm able to send you a PR, but won't do it without your approval, as it may imply some decisions about adding a dependency to orgparse
.
Is it possible to extract the line number from a node? I need this functionality because I want to signal to the user the document is invalid.
This issue is also mentioned in #38, but haven't been resolved in the latest version 36b31d8.
Test case (repeated task), modified from doctest:
>>> from orgparse import loads
>>> node = loads('''
... * TODO Pay the rent
... DEADLINE: <2005-10-01 Sat +1m>
... :LOGBOOK:
... - State "DONE" from "TODO" [2005-09-01 Thu 16:10]
... - State "DONE" from "TODO" [2005-08-01 Mon 19:44]
... - State "DONE" from "TODO" [2005-07-01 Fri 17:27]
... :END:
... ''').children[0]
>>> print(node.body)
:LOGBOOK:
:END:
Test case (clock):
>>> from orgparse import loads
>>> node = loads('''\
... * TODO Clock
... :LOGBOOK:
... CLOCK: [2022-01-01 Sat 00:00]--[2022-01-01 Sat 01:11] => 1:11
... :END:
... ''').children[0]
>>> print(node.body)
:LOGBOOK:
:END:
Hi,
i was wondering how to correctly parse an org file that contains custom TODO keys (e.g. ['TODO', 'WAITING', 'DONE', 'CANCELLED']
). It seems supported as indicated by the add_todo_keys()
method but looking into the file loading methods, those keys have to be set before actually loading the content and the API does not support that.
Am i missing something?
Thanks!
In file node.py the constant RE_HEADING_TAGS should probably be something like '(.*?)\s*:([\w@:]+):\s*$'
to allow for characters like "ä", "č" and so on in tags that are accepted by org-mode without problems.
Otherwise very helpful little library. Thank you for sharing it!
from pathlib import Path
fp = Path('mypath')
env = orgparse.OrgEnv(filename=fp)
root = orgparse.load(fp, env)
=>
File "/usr/lib/python3.10/site-packages/orgparse/__init__.py", line 142, in load
return loadi(lines, filename=filename, env=env)
File "/usr/lib/python3.10/site-packages/orgparse/__init__.py", line 162, in loadi
return parse_lines(lines, filename=filename, env=env)
File "/usr/lib/python3.10/site-packages/orgparse/node.py", line 1447, in parse_lines
raise ValueError('If env is specified, filename must match')
Just converting path to str during env creation workarounds:
from pathlib import Path
fp = Path('mypath')
env = orgparse.OrgEnv(filename=str(fp))
root = orgparse.load(fp, env)
Is orgparse.date public API? I'd like to use it in my code, but I'm not sure if it's intended as public API, and I wouldn't want to depend on it as library code if if it's not guaranteed to stay around as a stable API (by "stable" I don't mean "not growing" of course: just not mutating existing interfaces, except for carefully considered things like bugs, undefined behaviour...).
If it isn't public & stable, I'll just copy it and it will still be useful :-) -- just not as much as maintained library code of course.
Is there any plan to support writing to org files? i.e. creating new org-nodes?
In an org-file like this, the table would have tabname as a name in the org-element. It would be helpful to have this information as attributes or something in orgparse.extra.Table.
#+caption: Some caption for a table
#+name: tabname
| x | y |
|---|---|
| 1 | 2 |
I use
root = loads("""
* H1
** H2
*** H3
* H4
** H5
""")
for node in root[1]:
print str(node)
and get
* H1
** H2
*** H3
** H5
instead of
* H1
** H2
*** H3
H5 is not a child of H1, but it shows up in root[1]
It seems like orgparse get logbook drawer into the body.
If I set property drawer and logbook drawer it wiil get logbook drawer as body text.
Test this sample.
** DONE subheader 1 :template:blog:
CLOSED: [2021-05-25 Tue 00:51]
:PROPERTIES:
:CREATED: [2021-05-25 Tue 00:33]
:TEMPLATE: test
:END:
:LOGBOOK:
*** DONE subsubheader :article:
CLOSED: [2021-05-25 Tue 00:51]
:PROPERTIES:
:CREATED: [2021-05-25 Tue 00:33]
:TEMPLATE: test
:END:
:LOGBOOK:
** subheader 2 :blog:
:PROPERTIES:
:CREATED: [2021-05-25 Tue 00:38]
:END:
Fish are an important resource for humans worldwide, especially as food.
then run node.body .
I breaks on double 🔚 and gets logbbok into the body.
The following code produces error in orgparse 0.2.3:
from orgparse import loads
root = loads('''
* Node
:PROPERTIES:
:Effort: 1:23:45
:END:
''')
root.children[0].properties['Effort']
Error message:
./venv/lib/python3.6/site-packages/orgparse/node.py in parse_property(line)
136 match = RE_PROP.search(line)
137 if match:
--> 138 prop_key = match.group(1)
139 prop_val = match.group(2)
140 if prop_key == 'Effort':
ValueError: too many values to unpack (expected 2)
Other formats listed below results in similar errors:
The available effort formats are mentioned in org-duration.el. I can parse these formats by adding some tests and modifying orgparse/node.py line140-143. If supporting such format is preferable, I can work on this during my free time and open a PR.
I just installed your library as documented using 'pip install orgparser'
and then tried to import 'load' and 'loads' from orgparser in a python shell.
It fails with the following syntax error message:
In [3]: from orgparse import load, loads
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/IPython/core/interactiveshell.py", line 3296, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
from orgparse import load, loads
File "/usr/local/lib/python3.5/dist-packages/orgparse/init.py", line 112, in
from .node import parse_lines, OrgNode # todo basenode??
File "/usr/local/lib/python3.5/dist-packages/orgparse/node.py", line 16
chunk: List[str] = []
^
SyntaxError: invalid syntax
Not that it's slow, but making it even faster wouldn't hurt. Or at least setting up some proper benchmarks.
https://github.com/org-roam/test-org-files is a good source of test files
py-spy
output from parsing a bunch of files:
Note that iterative parsing (using generators) makes it a bit misleading
_iparse_timestamps
appears as a child call of _iparse_repeated_tasks
Tried replacing re
with regex
(https://pypi.org/project/regex), but didn't have any effect
Not sure but it seems that there is no version information available.
>>> import orgparse
>>> orgparse.__version__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'orgparse' has no attribute '__version__'
I do use 0.4.20231004
from PyPi install via pip
.
Hi!
Author of this primitive Org mode to Python3 parser speaking. I'd love to replace my stupid parser with a decent one in the future - if possible.
So: what is the goal of orgparse? What is the scope? What is the non-scope? What is the vision? I'd love to read such a small section in your readme file.
For example: will orgparse be able to parse all important Org mode syntax elements such as lists, tables, internal and external links, footnotes, text formatting (italic, underline, bold, ...), and so forth?
Currently, orgparse does seem to store the content of a heading without further analyzing it except various time- and date-stamps.
While writing a long Issue text about why unittest
isn't running I found out that you use pytest
.
Please offer information's like this in your README or in separate CONTRIBUTE.md
. How to create a PR; against which branch? Naming conventions about new branches? Code guilelines? etc.?
2011-04-31
is not a valid date, april only has 30 days.
testcase:
** test
<2011-04-31 Sat>
leads to:
Traceback (most recent call last):
File "/home/hrehfeld/projects/2023/topics/orgmode.py", line 31, in <module>
doc = orgparse.load(filepath, make_env(filepath))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/__init__.py", line 140, in load
return load(orgfile, env)
^^^^^^^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/__init__.py", line 148, in load
return loadi(all_lines, filename=filename, env=env)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/__init__.py", line 168, in loadi
return parse_lines(lines, filename=filename, env=env)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/node.py", line 1464, in parse_lines
node._parse_pre()
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/node.py", line 1151, in _parse_pre
self._body_lines = list(ilines)
^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/node.py", line 1202, in _iparse_timestamps
self._timestamps.extend(OrgDate.list_from_str(l))
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/date.py", line 471, in list_from_str
odate = cls(
^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/date.py", line 227, in __init__
self._start = self._to_date(start)
^^^^^^^^^^^^^^^^^^^^
File "/home/hrehfeld/projects/2023/topics/.venv/lib/python3.11/site-packages/orgparse/date.py", line 238, in _to_date
return datetime.date(*date)
^^^^^^^^^^^^^^^^^^^^
ValueError: day is out of range for month
I'd expect orgparse either to parse the date somehow, or provide context where the error happens. This is probably as easy as augmenting the ValueError
with location info.
I would like to extract properties from the OrgRootNode, a use case of this could be to find the ID of an Org Roam Article
I found a difference between date object types returned when parsing nodes with only one line or multiple lines.
Eg:
from orgparse import loads
r0 = loads("""* Heading 1
* Heading 2
body""")
print([c.scheduled for c in r0.children]) # prints [OrgDate(None), OrgDateScheduled(None)]
It's pretty minor and mostly cosmetic. I'm creating a PR to harmonize these a bit.
Hi,
The docstring says that the argument can be "str or file-like" and I've been using it with the latter. But that got broken by a recent venv update.
Looks like commit 3067189 is the culprit. It added a line
path = str(path) # in case of pathlib.Path
which means that it's now trying to open a file named <_io.TextIOWrapper name='tasks.org' mode='r' encoding='UTF-8'>
, which obviously fails.
That line of code looks a bit out of place, or at least unrelated to the commit message. Was it intended to be commited?
-Ben
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.