PhD Proposal by Jingfan Meng

Wednesday

November 20, 2024

10:00AM - 12:00PM

Location

Klaus 2108

Title: Efficient Locality Sensitive Hashing Approaches, Primitives, and Applications

Jingfan Meng
Ph.D. Student

School of Computer Science, Colleague of Computing

Georgia Tech

Date: Wednesday, November 20th, 2024

Time: 10:00 AM -- 11:00 AM EST

Location: Klaus 2108

Microsoft Teams Meeting: Link

Committee:

Dr. Jun (Jim) Xu (Ph.D. advisor), School of Computer Science, Georgia Tech

Dr. Kexin Rong, School of Computer Science, Georgia Tech

Dr. Joy Arulraj, School of Computer Science, Georgia Tech

Dr. Santosh Vempala, School of Computer Science, Georgia Tech

Dr. Mitsunori Ogihara, Department of Computer Science, University of Miami

Abstract:

Approximate nearest neighbor search (ANNS) is a fundamental algorithmic problem with numerous applications in many areas of computer science, especially databases and machine learning. It is an intriguing question how to build index data structures in support for efficient ANNS under various useful distance metrics. Locality sensitive hashing (LSH) is a classical solution approach to ANNS that offers a small index size and reduced costs for index construction and data changes. With numerous LSH approaches in the literature, however, most research has been limited to the collision probability of LSH functions, and many important aspects, such as the fundamental time complexities of evaluating these randomized LSH functions, have been overlooked for long.

My research spans both the design of new LSH approaches for ANNS under hard-to-query distance metrics and the design of new algorithms that speed up the evaluation of these LSH functions. In this proposal, I will cover novel LSH-based ANNS solutions to two such distance metrics -- point-to-subspace metric in L1 (P2SL1), and generalized point-to-subspace metric in L2 (Mahalanobis distance) -- both of which are useful but currently lack efficient solutions. I will also cover two algorithmic techniques -- efficient range summation (ERS) and fast Gaussian orthogonal ensemble quadratic form (FGoeQF) -- that help build up LSH primitives that are faster than existing versions by orders of magnitudes. Finally, I will cover candidate-based density estimation (CanDE), which is an LSH-based application for getting important data analytics in the neighborhood of the query, using our novel estimators that approximate the actual on-table collision rate for the first time ever.

Event Category

Other/Miscellaneous

Invited Audience

Public

Dev - Graduate Education

Office of Graduate and Postdoctoral Education

Search

Wednesday

November 20, 2024

Accessibility Information

Office of Graduate Education

Georgia Institute of Technology