Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Merlijn Wajer
deepublisher-datasets
Commits
5638ebe9
Commit
5638ebe9
authored
Sep 15, 2021
by
Merlijn Wajer
Browse files
Add parse_csv: rudimentary script to parse csv result files created by operators
parent
ff082fe8
Changes
1
Hide whitespace changes
Inline
Side-by-side
parse_csv.py
0 → 100644
View file @
5638ebe9
import
csv
import
sys
import
re
reader
=
csv
.
DictReader
(
open
(
sys
.
argv
[
1
]))
LEAF_ROW_1
=
'bad crop leaf and note'
LEAF_ROW_2
=
'Ok but could use improvement'
LEAF_RE
=
'\d+'
reslist
=
[]
for
row
in
reader
:
leafs
=
re
.
findall
(
LEAF_RE
,
row
[
LEAF_ROW_1
])
leafs
+=
re
.
findall
(
LEAF_RE
,
row
[
LEAF_ROW_2
])
leafs
=
list
(
map
(
int
,
leafs
))
leafs
=
list
(
map
(
str
,
leafs
))
identifier
=
row
[
'identifier'
]
type_
=
'train'
pages
=
','
.
join
(
leafs
)
reason
=
row
[
LEAF_ROW_1
]
+
row
[
LEAF_ROW_2
]
reslist
.
append
({
'identifier'
:
identifier
,
'type'
:
type_
,
'pages'
:
pages
,
'reason'
:
reason
})
import
yaml
r
=
yaml
.
dump
(
reslist
)
print
(
r
)
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment