Blog Archive
Other
- February 2026 - Homelab for NLP
- January 2026 - Bye 2025, Hi 2026
- December 2025 - WIP: LLMs for Text Normalization (i.e., Domain Adaptation)
- November 2025 - Deep dive: On the Theoretical Limitations of Embedding-Based Retrieval
- October 2025 - Temperature, Tokens, and Long Tales/Tails
- August 2025 - WIP: Using Landmarks to Extract Spans with Prompting
- June 2025 - Practical Tidbits: Taking a Magnifying Glass to (Text) Classifier Performance
- February 2025 - Work In Progress: LLMs for ETL
- January 2025 - Improving the NLP Tool Kit: Characterization
- December 2024 - Fun with Words: A Foray into Solving NYT Connections via Decomposition
- October 2024 - Fun With Words: NYT Connections
- October 2024 - Quick and Dirty Metric to Imperial Conversions (How to Entertain Yourself as an American Driving in a Metric Country)
- September 2024 - Negative Result: Improving Fixed Vocab Text Representations
- August 2024 - Practical Tidbits: To Pickle or Not to Pickle
- June 2024 - Practical Tidbits: Selecting MinHash Hyperparameters for Deduplication
- May 2024 - Practical Tidbits: ElasticSearch with custom Embeddings (Vectors) for Versions Greater than 7.6
- April 2024 - Original Work: “Nudging” Active Learning to Learn Minority Classes
- March 2024 - An In-depth Discussion of Textual Similarity: Taking a look at the toolkit
- March 2024 - A Dream: An Easy Way to Work with Documents and (implicitly) Structured Text
- December 2023 - An In-depth Discussion of Textual Similarity: Characteristics and When They Matter
- November 2023 - An In-depth Discussion of Textual Similarity: Starting the Conversation