Performance/caching issue: tokenizer fails to reset has_special flag after encountering special span, effectively disabling caching

## How to reproduce the behaviour


```python
nlp = English()
doc = nlp("I can't believe you have done this")
```

"can't" is a tokenizer exception because of the funky contraction (yay english)

https://github.com/explosion/spaCy/blob/0069cf99b60023363ba5f47404819f59c8a39b3e/spacy/lang/en/tokenizer_exceptions.py#L233

Spans that contain these exceptions are marked as `has_special`
declared here:
https://github.com/explosion/spaCy/blob/0069cf99b60023363ba5f47404819f59c8a39b3e/spacy/tokenizer.pyx#L179
set here:
https://github.com/explosion/spaCy/blob/0069cf99b60023363ba5f47404819f59c8a39b3e/spacy/tokenizer.pyx#L375

And `has_special` spans are not cached:

https://github.com/explosion/spaCy/blob/0069cf99b60023363ba5f47404819f59c8a39b3e/spacy/tokenizer.pyx#L523

The problem is that `has_special`, once set to a nonzero value, is never reset. And this means that once the tokenizer encounters a special case, *every subsequent span is also marked as special, and none of them get cached, even if they should be*.

This has some fairly significant tokenizer performance implications. It should be much faster than it is. I'll put some benchmarking in my PR.

## Your Environment

## Info about spaCy

- **spaCy version:** 3.8.14
- **Platform:** macOS-26.3.1-arm64-arm-64bit
- **Python version:** 3.12.4
- **Pipelines:** en_core_web_md (3.8.0), en_core_web_sm (3.8.0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance/caching issue: tokenizer fails to reset has_special flag after encountering special span, effectively disabling caching #13950

How to reproduce the behaviour

Your Environment

Info about spaCy

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Performance/caching issue: tokenizer fails to reset has_special flag after encountering special span, effectively disabling caching #13950

Description

How to reproduce the behaviour

Your Environment

Info about spaCy

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions