Blog Archive

Other

May 2026 - Dispelling NLP Myths: How Finetuning Compares to Prompting (Part 1)
April 2026 - WIP: Towards Better Research Management Tools
February 2026 - Homelab for NLP
January 2026 - Bye 2025, Hi 2026
December 2025 - WIP: LLMs for Text Normalization (i.e., Domain Adaptation)
November 2025 - Deep dive: On the Theoretical Limitations of Embedding-Based Retrieval
October 2025 - Temperature, Tokens, and Long Tales/Tails
August 2025 - WIP: Using Landmarks to Extract Spans with Prompting
June 2025 - Practical Tidbits: Taking a Magnifying Glass to (Text) Classifier Performance
February 2025 - Work In Progress: LLMs for ETL
January 2025 - Improving the NLP Tool Kit: Characterization
December 2024 - Fun with Words: A Foray into Solving NYT Connections via Decomposition
October 2024 - Fun With Words: NYT Connections
October 2024 - Quick and Dirty Metric to Imperial Conversions (How to Entertain Yourself as an American Driving in a Metric Country)
September 2024 - Negative Result: Improving Fixed Vocab Text Representations
August 2024 - Practical Tidbits: To Pickle or Not to Pickle
June 2024 - Practical Tidbits: Selecting MinHash Hyperparameters for Deduplication
May 2024 - Practical Tidbits: ElasticSearch with custom Embeddings (Vectors) for Versions Greater than 7.6
April 2024 - Original Work: “Nudging” Active Learning to Learn Minority Classes
March 2024 - An In-depth Discussion of Textual Similarity: Taking a look at the toolkit
March 2024 - A Dream: An Easy Way to Work with Documents and (implicitly) Structured Text
December 2023 - An In-depth Discussion of Textual Similarity: Characteristics and When They Matter
November 2023 - An In-depth Discussion of Textual Similarity: Starting the Conversation