Skip to content

Commit 50a5fc1

Browse files
authored
fix bug for vocab size
1 parent 3f6e2d2 commit 50a5fc1

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

datastore/get_datastore_chat.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
writer = draftretriever.Writer(
3030
index_file_path=datastore_path,
3131
max_chunk_len=512*1024*1024,
32-
vocab_size=tokenizer.vocab_size,
32+
vocab_size=tokenizer.vocab_size + len(tokenizer.get_added_vocab()),
3333
)
3434
if args.large_datastore:
3535
dataset = load_dataset('stingning/ultrachat', split='train')

0 commit comments

Comments
 (0)