UPDATE: In June 2006 I began an applied researcher position in Microsoft's Technology Care and Safety Group in Redmond, WA. I have been working extensively on techniques for protecting people from unwanted communications (e.g., spam) and detecting compromised machines (e.g., bot infections). This is a challenging and extremely rewarding area to work in, with great colleagues and immediate large-scale impact. Interestingly, I have noticed a surprising degree of similarity to many of the technical challenges encountered in my graduate work. I am also continuing my graduate research and plan to obtain my Ph.D. in 2007. Some day I will update my homepage, but for now I am posting content from my UIUC site below. The contact information is no longer valid, but you can reach me via email at the same address.
Currently I am a Ph.D. Candidate in the Department of Computer Science at the University of Illinois in Urbana-Champaign. I am researching data management techniques in the Database and Information Systems group under the advice of Professor AnHai Doan. Previously I studied at the University of Washington where I received a B.S. in Computer Science and another in Applied and Computational Mathematical Science. I expect to complete my graduate studies in 2007 and hopefully continue researching in the beautiful Northwest. Research InterestsI am interested in applying Database, Artificial Intelligence, and Web technologies to manage and query large volumes of both structured and unstructured data. In particular: data integration, machine learning, information extraction, knowledge-based systems, user interaction, and collaborative systems. To date my work has followed several (overlapping) themes:Answering Structured Queries over Unstructured Data: Developed a system for answering structured queries over unstructured Web data. The idea is to immediately provide a best-effort answer with minimal start-up cost then allow the user to answer simple questions to improve their results. One novel aspect is the incorporation of an extraction operator into relational algebra and the expansion of existing operators to accommodate the resulting uncertainty. Techniques were leveraged from information extraction, database management systems, similarity joins, and data uncertainty. This work is under submission and I am currently pursuing the related topic of managing uncertainty in data integration systems. Maintaining Semantic Mappings over Autonomous Sources: Developed the MAVERIC system to automatically detect when mappings break due to changes at underlying sources. Evaluated over 114 Web sources across 6 domains, significantly outperforming previous systems. Leveraged anomaly/outlier detection techniques, ensemble learning, Winnow, and training set expansion. This work was presented at VLDB 2005 and I am currently pursuing additional topics for information system maintenance (e.g., robust design). Collaboratively Building Information Systems: Developed the MOBS approach to building information systems. Rather than relying solely on a system administrator, leveraged users to alleviate costs of several key integration tasks (source discovery, information extraction, one-to-one and complex schema matching, and ontology matching) and deployed two integration systems (over 10 online bookstores and over 348 database publications). Tools included Bayesian learning, statistical inference, matching tools, Web crawlers, and IR-style text classification. This work was presented at ICDE 2005, KCVC 2005, DIVO 2004, IIWeb 2003, and WebDB 2003. In the future I plan to refine the techniques employed here to both improve the efficiency of collaboration (e.g., learn quickly by observing user actions without asking explicit questions) and solve additional key bottlenecks for information systems (e.g., maintenance). Presentations, Database & Information Systems @ UIUCSelected Publications
Professional Activities and Services
|
||||||||||||||||||||||||||||||||||||||||||||||||||||