The Qatar Computing Research Institute (QCRI) - MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) research collaboration is a medium for knowledge joint-creation, transfer, and exchange of expertise between QCRI and MIT CSAIL scientists. Scientists from both organizations are undertaking a variety of core computer science research projects -- database management, Arabic language technology, new paradigms for social computing, and data visualization, etc., with the goal of developing innovative solutions that can have a broad and meaningful impact. The agreement also offers CSAIL researchers and students exposure to the unique challenges in the Gulf region. Scientists at QCRI are benefiting from the expertise of MIT’s eminent faculty and researchers through joint research projects that will enable QCRI to realize its vision to become a premier center of computing research regionally and internationally.
The goal of this project was to conduct applied and core computer science research and to build innovative technologies that can be used by decision-makers, NGOs, affected communities, and scholars to improve the effectiveness of humanitarian strategies such as preparedness, mitigation, and response during humanitarian crises and emergencies. The core of this project focused on developing a multimodal data processing system for understanding disaster scenes and situations from social media.
Muhammad Imran, QCRI
Ferda Ofli, QCRI
Antonio Torralba, MIT CSAIL
This project is focused on the development of a lifestyle recommendation system eventually intended to reduce the risk of obesity and type 2 diabetes. The project team explored the use of reinforcement learning with a new healthy lifestyle and behavioral change representation initially focused to recommend activity patterns which maximize the user's quality of sleep. These recommendations were then to be used to create both a new model for behavioral change (which would be incorporated into a health coaching system to provide just-in-time recommendations to increase the user's quality of sleep), as well as a new analytics system to support coaching by healthcare professionals.
The main focus of this project was to discover causal relationships in (multivariate) sequence of states (e.g. in health data) and to uncover the complex dependency structures from high-dimensional time-series encoded as sequence of states. The project team focused on addressing, in particular, the following challenges, using optimal transport methodology: machine learning techniques to extract sequences of states from time-series data, causality analysis from sequence of states data, explanatory models for sequence of states data, supervised learning methods to predict categorical or continuous output from sequence of states input data, unsupervised learning methods for sequence of states data, and factor analysis for sequence of states data.
This project focused on developing accurate map-making techniques using crowd-sourced methods to overcome challenges related to creating and maintaining street maps, especially in a rapidly developing environment such as Doha, Qatar, leveraging data primarily from mobile phones and investigating current limitations due to sensor noise, outages and data sparsity.
Information technologies at the start of this project could inform each of us about the best alternatives for shortest paths from origins to destinations, but they could not contain incentives or alternatives that manage the information efficiently to get collective benefits. To obtain such benefits, one would need to have not only good estimates of how the traffic is formed but also to have target strategies to reduce enough vehicles from the best possible roads in a feasible way.
Moreover, to reach the target vehicle reduction is not trivial, it requires individual sacrifices such as some drivers taking alternative routes, shifts in departure times or even changes in modes of transportation. The opportunity is that during large events (Carnivals, Festivals, Sports events, etc.) the traffic inconveniences in large cities are unusually high, yet temporary, and the entire population may be more willing to adopt collective recommendations for social good.
This project fell into three categories: 1) the use of machine learning and other advanced analytical techniques to discover new information related to on-field performance, 2) the development and application of novel techniques that provide new ways of viewing sporting events, and 3) providing a system for content-adaptive video retargeting.
This project focused on how data management can be used to facilitate social computing. The Humanitarian Technologies research thrust sought to establish key technologies required to facilitate disaster management and humanitarian relief activities based on social media. These technologies leveraged current social networks and primarily focused on data consumption, generation, and integration.
Lalana Kagal, CSAIL
Carlos Castillo, QCRI
Patrick Meier, QCRI
The research challenge addressed was that of securing computing infrastructure against a broad class of cyberattacks. The project's objective was to develop new techniques that can remove many of the vulnerabilities that attackers exploit and that can predict and intercept new (zero-day) attacks that exploit previously unknown vulnerabilities. These objectives were pursued through a number of sub-projects that fall into three categories: Systems that are much more difficult to penetrate; Systems that can work through penetrations; and Systems that can recover quickly.
Srini Devadas, CSAIL
Adam Chlipala, CSAIL
Frans Kaashoek, CSAIL
Shafi Goldwasser, CSAIL
Howard Shrobe, CSAIL
Martin Rinard, CSAIL
Armando Solar Lezama, CSAIL
Vinod Vaikuntanathan, CSAIL
Nickolai Zeldovich, CSAIL
Dimitrios Serpanos, QCRI
This project focused a new study type to understand the basis of complex genetic traits, a functional genome-wide association study (fGWAS). Most experimental designs, relying solely on linear models and genetic information to predict phenotypes, fail to recover the full range of predictability of a trait. By combining extensive well-controlled cellular data with novel integrative computational models, this team sought to find a large chunk of the missing heritability of multiple complex traits. With these contributions, the team then worked to capture the broad-sense heritability that is missed by linear models that rely solely on genotype and markers acting individually.
This new study type focused on making advances along two fronts by measuring and integrating fine-grained cellular measurements into genotype-phenotype models:
(1) Integrative models that use cellular measurements to prioritize particular genetic variants and interactions, leading to more effective multiple hypothesis controls and better predictions
(2) Cellular measurements, interpreted as biomarkers, will be used directly to improve prediction of phenotypes
How is memory implemented in the human brain?
This project focused on the development of machine learning classification algorithms for human neuroscience data with the goal of gaining knowledge of the computations and brain regions associated with visual long-term memory.
Aude Oliva, CSAIL
Polina Golland, CSAIL
Halima Bensmail, QCRI
Othmane Bouhali, QCRI
The goal of the project was to design a high-throughput and low-power FPGA implementation of the newly proposed sparse FFT algorithm. For the purposes of guiding the implementation
effort, the team chose the input data size as a million (220) points, with a maximum of 500 nonzero frequency coefficients. The team completed an initial implementation of the SFFT Core, which includes: 4096 point dense-FFT module, a top-511 element selector module, a Voting module and the Value-compute module. The team improved the design performance and resource usage by modifying the pipeline of the design. The researchers also completed an extensive debugging of the design using customized test-benches. Given the filtered input data slices, the project's FPGA implementation now produces the value-index pairs of the 500 most significant frequency components.
Raymond Filippi, QCRI
MAQSA is a system for social analytics on news. MAQSA provides an interactive topic-centric dashboard that summarizes news articles and social activity (e.g., comments and tweets) around them. MAQSA helps editors and publishers in newsrooms understand user engagement and audience sentiment evolution on various topics of interest. It also helps news consumers explore public reaction on articles relevant to a topic and refine their exploration via related entities, topics, articles and tweets. Given a topic, e.g., “Gulf Oil Spill,” or “The Arab Spring”, MAQSA combines three key dimensions: time, geographic location, and topic to generate a detailed activity dashboard around relevant articles. The dashboard contains an annotated comment timeline and a social graph of comments. It utilizes commenters’ locations to build maps of comment sentiment and topics by region of the world. Finally, to facilitate exploration, MAQSA provides listings of related entities, articles, and tweets. It algorithmically processes large collections of articles and tweets, and enables the dynamic specification of topics and dates for exploration.
The major goal of the project was to understand the food habits from social media images. This includes: training machine learning models for image auto-tagging and content extraction from noisy hashtags; predicting population level health statistics in US and Qatar; monitoring temporal and regional trends in food consumption and its implications; learning models that can achieve in depth analysis of food images through the use of large scale cooking recipe data collected from the web.
This project focused on exploiting big data for image and video manipulation. Our work solves fundamental and challenging computer graphics problems with applications to various impactful domains including: computational photography, multimedia and video content post-production.
This projects objective was to answer the question: How can users get the full benefits of multi-user software even when their friends and colleagues use different software vendors, platforms, and service providers? More technically, it aimed to design and aid in standardization of protocols which allow for decentralization of social software, thus giving users and vendors a free market for innovation. It also aimed to develop software infrastructure that supports this vision, such as servers to support data storage and retrieval, libraries and development tools to support application developers, and web applications for use by end users. The approach is iterative, building up from small working systems, improving scaling, security, and user experience, as we test and demonstrate new solutions.
We aimed to assess the current tactics used by Qataris and other GCC nationals to express identity through the use of virtual identity technologies (e.g., social media profiles and avatars), which are not necessarily designed with their values in mind. This investigation sought muliple results: (1) articulation of base principles and best practices for developing technologies that empower Qataris to enact traditional values and cultural norms, (2) new computational techniques for understanding user values and practices in virtual identity systems, and (3) a novel application illustrating the efficacy of our discovered design principles.
This project dealt with database management. Specifically, the project team focused on investigating a system to support data scientists, called Data Civilizer, which helped with a number of problems around discovering, integrating, and cleaning data sets. A particular focus was on methods that combined machine learning, program synthesis, and human-in-the-loop techniques to advance the state of the art in these important areas.
This project aimed to develop key speech and language processing technology enabling users to search for verified facts and claims, in both written and video repositories of English and Arabic, using questions posed in natural and spoken language. The research addressed four essential cross-cutting topic areas to achieve this objective. First, we investigated methods that enable rich annotation of Arabic multimedia content. Second, we investigated language processing methods to analyze open-ended user-generated content, e.g., dialogs, and perform veracity assessment and inference. Third, we explored speech and language methods for processing low-resource Arabic dialects. Finally, we explored interpretation and debugging techniques to improve machine translation between English and Arabic.
At the initiation of this project, shared computing platforms, from small clusters to large datacenters, suffer from low utilization, wasting billions of dollars in energy and infrastructure every year. Low utilization stems from a disconnect between layers of the hardware and software stack. The goal of this project was to investigate and develop integrated intra- and inter-node resource management techniques that provide both near-peak utilization and guaranteed high performance in shared environments.
To this end, this project consisted of three main thrusts: