[Pushdown]
Speed Up Query Execution By Pushdown in AsterixDB

Area: Big Data Managament, Query Optimization

Advisor: Prof. Michael Carey (UCI)

AsterixDB is a scalable big data management system. The research problem addressed by my work is the limitation of incomplete operator pushdown in the current implementation of AsterixDB. Since there are expensive memory copy costs between operators, my work aims to push the query execution logic down to the data-scan stage as much as possible.

  • Explore three patterns of query on which pushdown can have a performance impact
  • Design comprehensive benchmarks to test the performance
[poster] [technical report]

[cache-research]
Efficient Cache Management for Replicated Web Search Engine

Area: Information Retrival, Theory

Advisor: Prof. Bo Tang

Posting list cache plays an important role in saving searching time in modern search engines. However, most search engines adopt a simple scheme called uniform caching which caches the same content on all the servers. It does not exploit the variations among queries, thus wasting memory space on caching the same cache content redundantly on multiple servers. To tackle this limitation, we want to develop a new caching scheme which diversifies the cache contents in different servers by training past query log.

  • Provide the theoretical inapproximability proof of diversified caching problem
  • Implement a framework with a suite of techniques and heuristics for diversified caching
  • [In progress] Enhancing the cache admission process with semantic information of words. (like word collocations, we should cache words which are like to appear together on the same server)