Extension for locality-sensitive hashing (LSH)
Maintainer(s):
yoonspark,
ericmanning
Installing and Loading
INSTALL lsh FROM community;
LOAD lsh;
Example
-- Create toy data
CREATE TEMPORARY TABLE temp_names AS
SELECT * FROM (
VALUES
('Alice Johnson'),
('Robert Smith'),
(NULL),
('Charlotte Brown'),
) AS t(name);
-- Apply MinHash
SELECT lsh_min(name, 2, 3, 2, 123) AS hash FROM temp_names;
About lsh
For more information regarding usage, see the documentation.
Added Functions
| function_name | function_type | description | comment | examples |
|---|---|---|---|---|
| lsh_min | scalar | Computes band hashes for each input string (or list of existing shingles) based on its MinHash signature | Produces list of 64-bit band hashes | NULL |
| lsh_min32 | scalar | Computes band hashes for each input string (or list of existing shingles) based on its MinHash signature | Reduces each band hash to 32 bits | NULL |
| lsh_euclidean | scalar | Computes band hashes for each input point based on its Euclidean LSH signature | Produces list of 64-bit band hashes | NULL |
| lsh_euclidean32 | scalar | Computes band hashes for each input point based on its Euclidean LSH signature | Reduces each band hash to 32 bits | NULL |
| lsh_jaccard | scalar | Computes Jaccard similarity for each input string pair | Accepts ngram argument, unlike core Jaccard function | NULL |