Chunking is a shallow form of parsing that identifies continuous spans of tokens that form syntactic units such as noun phrases or verb phrases.
Example:
| Vinken | , | 61 | years | old |
|---|---|---|---|---|
| B-NLP | I-NP | I-NP | I-NP | I-NP |
The Penn Treebank is typically used for evaluating chunking. Sections 15-18 are used for training, section 19 for development, and and section 20 for testing. Models are evaluated based on F1.
| Model | F1 score | Paper / Source |
|---|---|---|
| Low supervision (Søgaard and Goldberg, 2016) | 95.57 | Deep multi-task learning with low level tasks supervised at lower layers |
| Suzuki and Isozaki (2008) | 95.15 | Semi-Supervised Sequential Labeling and Segmentation using Giga-word Scale Unlabeled Data |