Code-Savvy Linguistics: Terms Every Computational Linguist Must Master
The Evolving Nexus of Language and Technology: Where Code Meets Cognition
Computational Linguistics: Terms for the Modern Linguist
1. Finite-State Transducer (FST)
A machine used for modeling morphological analysis and phonological rules, especially in two-level morphology.
E.g., mapping “run + PAST” → “ran”
2. Distributional Semantics
A method of representing word meaning through statistical patterns of co-occurrence.
“You shall know a word by the company it keeps.” — J.R. Firth
3. Word Embeddings
High-dimensional vector representations of words based on contextual similarity.
E.g., Word2Vec, GloVe
4. Dependency Parsing
Analyzing sentence structure by identifying head-dependent relationships rather than phrase constituents.
E.g., in “She loves cats,” “loves” is the head of “She” and “cats.”
5. Named Entity Recognition (NER)
Automatically detecting and classifying proper nouns into categories like Person, Organization, or Location.
E.g., “Karachi” → Location
6. Treebank Grammar
A grammar derived from annotated corpora, especially useful in training statistical parsers.
7. Token Normalization
The process of standardizing raw text, including lowercasing, stemming, and removing punctuation.
Crucial for accurate NLP results.
8. POS Tagging Ambiguity Resolution
Handling cases where multiple tags could apply — a major challenge in real-world text.
“Lead” → noun or verb?
9. Noise Channel Model
A foundational concept in noisy text correction (e.g., in spelling correction):
What was the intended message before the “noise” (error) occurred?
10. BLEU Score (Bilingual Evaluation Understudy)
A metric to evaluate machine translation quality by comparing with human-generated references.
11. Language Modeling
Building models that predict the likelihood of word sequences, e.g., for predictive typing or speech recognition.
N-gram and neural language models like GPT
12. Semantic Role Labeling (SRL)
Identifying the roles played by sentence elements — who did what to whom, when, and how.
E.g., Agent, Theme, Instrument
13. Zero-Shot Learning in NLP
The ability of models to perform tasks they were never explicitly trained for — a leap toward generalized understanding.
14. Knowledge Graphs
Structured networks of entities and relationships that help NLP systems with reasoning and disambiguation.
E.g., Wikidata, ConceptNet
15. Transformer Architecture
A deep learning model (e.g., BERT, GPT) that relies on self-attention mechanisms for parallelized, context-rich text processing.
16. Coreference Resolution
Identifying when different expressions refer to the same entity.
“Sara loves her dog.” — Who is “her”?
17. Word Sense Disambiguation (WSD)
The task of determining which sense of a word is used in a given context.
“Bank” = riverbank or financial institution?
18. Transfer Learning in NLP
Leveraging knowledge learned in one task to enhance performance in another.
E.g., fine-tuning a pre-trained BERT model
19. Multimodal NLP
Integrating text with other data types (image, audio, video) for holistic language understanding.
E.g., describing images using text
20. Dialogue Systems
Computational systems designed to interact with users through conversation — includes chatbots, virtual assistants, and spoken dialogue agents.