About Searchability

Searchability is a hosted search platform for organizations with large, multilingual record collections. Upload your data, configure your search engine, and deploy it on your own domain or embed it on your existing site. See it in action on CRARG's publicly searchable Holocaust records.

Phonetic search across languages

  • Search by sound with a custom-tuned Daitch-Mokotoff Soundex, refined over two decades of real-world use, so that Yakubovitch matches Jakubowicz.
  • Reduce false positives with configurable phonetic prefix constraints, internal n-gram matching, and Slavic surname stemming.
  • Search in Cyrillic or Hebrew to find equivalent records in Latin letters. ICU-based accent normalization handles transliteration automatically.
  • Town synonym expansion. Searching for "Czestochowa" automatically searches "Chenstochov" too. Synonym groups are configurable per search engine.

Full-text search with Boolean logic

  • Search millions of cells in single-digit milliseconds with English stemming and full-text indexing.
  • Use Boolean operators to build precise queries: goldberg AND (butcher OR farmer)
  • Try wildcards and regular expressions when you're uncertain: gold* or /g[ou](l+)db(.*)rg/
  • Use exact-match brackets to bypass phonetic expansion: {Goldberg}
  • Year ranges are auto-detected: 1891-1895 searches the year field automatically.
  • Misspelling suggestions. Fuzzy matching detects likely typos and suggests corrections, with frequency-based scoring to avoid spurious suggestions on common names.

Built for scale

  • Built on Elasticsearch with custom analyzers for phonetic matching, case-insensitive lookup, edge n-grams, and full-text search. Backed by PostgreSQL and S3.
  • Bulk import pipeline ingests datasets from S3, chunks them into batches, extracts names and years, groups multi-person records, and indexes via the bulk API.
  • Background workers (Sidekiq, Redis) handle dataset sync, reindex, and cleanup. Content-hash deduplication skips unchanged files.
  • Automated data quality checks flag encoding errors, non-standard Unicode, and column mismatches during import.
  • Hosted on Railway with auto-deploys from GitHub, SSL, health checks, error tracking (Sentry), and request throttling. Puma runs multiple worker processes with concurrent threads per process. Railway supports both horizontal scaling (adding replicas) and vertical scaling (more CPU and memory), so the platform can grow from a small organization to thousands of concurrent users.

Multi-tenant and embeddable

  • Custom domains. Point your own domain at Searchability and your users see your branding, not ours.
  • Embeddable search. Drop a single <iframe> tag on any page to add a search form to your existing site.
  • Configurable search engines. Each engine has its own field labels, query constraints, town synonyms, phonetic tuning, and access controls.
  • Access control. Search engines can be public, restricted to specific users, or fully private.

Use Searchability for your organization

If your organization has a large collection of records that people need to search, Searchability can run on your own domain with your own branding. Contact the developer to discuss your use case.