Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
Aram Verstegen
archive-hocr-tools
Commits
83504d6d
Commit
83504d6d
authored
Sep 28, 2021
by
Aram Verstegen
Browse files
Trying to improve dehyphenation
parent
24bad335
Changes
1
Hide whitespace changes
Inline
Side-by-side
bin/hocr-to-epub
View file @
83504d6d
...
...
@@ -466,10 +466,15 @@ class EpubGenerator(object):
if
len
(
line_content
)
and
len
(
line_content
[
-
1
])
and
line_content
[
-
1
][
-
1
]
in
hyphens
:
# Remove the last character if it is a hyphen
line_content
[
-
1
]
=
line_content
[
-
1
][:
-
1
]
# Add placeholder value
line_content
.
append
(
'
\x7f
'
)
page_content
+=
line_content
#
Create HTML/epub page
#
Flatten list into string and add spaces
page_text
=
' '
.
join
(
page_content
)
# Remove placeholder and spaces in the positions that previously had a line break hyphen
page_text
=
page_text
.
replace
(
'
\x7f
'
,
''
)
# Create HTML/epub page
page_html
=
u
"<p>%s</p>"
%
page_text
# Add a warning if the confidence in the text is below the given threshold
...
...
Write
Preview
Supports
Markdown
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment