Skip to content

Commit 38f8c0b

Browse files
committed
check if IDs (node.ord) form a valid sequence in *.conll and *.conllup
There are e.g. files with gaps (1,2,3,5). Failing to detect this results in wrong parsing and even `fix_cycles=1` may result in infinite loop. When a node is created using `root.create_child()`, it already has `node.ord` set up correctly as the last node in the tree. For *.conllu we could do the same, but we don't do that because Read.Conllu is speed critical and invalid files can be detected using `validate.py`.
1 parent 0d1d6fd commit 38f8c0b

2 files changed

Lines changed: 4 additions & 2 deletions

File tree

udapi/block/read/conll.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,8 @@ def parse_node_line(self, line, root, nodes, parents, mwts):
9191
else:
9292
raise exception
9393
elif attribute_name == 'ord':
94-
setattr(node, 'ord', int(value))
94+
if int(value) != node._ord:
95+
raise ValueError(f"Node {node} ord mismatch: {value}, but expecting {node._ord} at:\n{line}")
9596
elif attribute_name == 'deps':
9697
setattr(node, 'raw_deps', value)
9798
elif attribute_name != '_' and value != '_':

udapi/block/read/conllup.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,8 @@ def parse_node_line(self, line, root, nodes, parents, mwts):
7979
else:
8080
raise exception
8181
elif attribute_name == 'ord':
82-
setattr(node, 'ord', int(value))
82+
if int(value) != node._ord:
83+
raise ValueError(f"Node {node} ord mismatch: {value}, but expecting {node._ord} at:\n{line}")
8384
elif attribute_name == 'deps':
8485
setattr(node, 'raw_deps', value)
8586
elif value == '_' and attribute_name != 'form':

0 commit comments

Comments
 (0)