The Qatar Computing Research Institute (QCRI) - MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) research collaboration is a medium for knowledge joint-creation, transfer, and exchange of expertise between QCRI and MIT CSAIL scientists. Scientists from both organizations are undertaking a variety of core computer science research projects -- database management, Arabic language technology, new paradigms for social computing, and data visualization, etc., with the goal of developing innovative solutions that can have a broad and meaningful impact. The agreement also offers CSAIL researchers and students exposure to the unique challenges in the Gulf region. Scientists at QCRI are benefiting from the expertise of MIT’s eminent faculty and researchers through joint research projects that will enable QCRI to realize its vision to become a premier center of computing research regionally and internationally.
Current Projects
Project activity in the program concluded on June 30, 2020. Information regarding all completed projects can be found by selecting Past Projects.
Past Projects
The goal of this project is to conduct applied and core computer science research and to build innovative technologies that can be used by decision-makers, NGOs, affected communities, and scholars to improve the effectiveness of humanitarian strategies such as preparedness, mitigation, and response during humanitarian crises and emergencies. The core of this project will focus on developing a multimodal data processing system for understanding disaster scenes and situations from social media.
Principal Investigators
Muhammad Imran, QCRI
Ferda Ofli, QCRI
Antonio Torralba, MIT CSAIL
This project is focused on the development of a lifestyle recommendation system eventually intended to reduce the risk of obesity and type 2 diabetes. The project team will explore the use of reinforcement learning with a new healthy lifestyle and behavioral change representation initially focused to recommend activity patterns which maximize the user's quality of sleep. These recommendations will be used to create both a new model for behavioral change (which will be incorporated into a health coaching system to provide just-in-time recommendations to increase the user's quality of sleep), as well as a new analytics system to support coaching by healthcare professionals.
Principal Investigators
Peter Szolovits, MIT CSAIL
Luis Fernandez-Luque, QCRI
Raghvendra Mall, QCRI
The main focus of this project is to discover causal relationships in (multivariate) sequence of states (e.g. in health data) and to uncover the complex dependency structures from high-dimensional time-series encoded as sequence of states. The project team will address, in particular, the following challenges, using optimal transport methodology: machine learning techniques to extract sequences of states from time-series data, causality analysis from sequence of states data, explanatory models for sequence of states data, supervised learning methods to predict categorical or continuous output from sequence of states input data, unsupervised learning methods for sequence of states data, and factor analysis for sequence of states data.
Principal Investigators
Abdelkader Baggag, QCRI
Justin Solomon, MIT CSAIL
This project aims to develop accurate map-making techniques using crowd-sourced methods to overcome challenges related to creating and maintaining street maps, especially in a rapidly developing environment such as Doha, Qatar, leveraging data primarily from mobile phones and investigating current limitations due to sensor noise, outages and data sparsity.
Principal Investigators
Mohammed Alizadeh, MIT CSAIL
Hari Balakrishnan, MIT CSAIL
Sanjay Chawla, QCRI
Sam Madden, MIT CSAIL
This project deals with database management. Specifically, the project team is investigating a system to support data scientists, called Data Civilizer.
Principal Investigators
Sam Madden, MIT CSAIL
Mourad Ouzzani, QCRI
Michael Stonebraker, MIT CSAIL
This project aims to develop key speech and language processing technology enabling users to search for verified facts and claims, in both written and video repositories of English and Arabic, using questions posed in natural and spoken language. The research addresses four essential cross-cutting topic areas to achieve this objective. First, we will investigate methods that enable rich annotation of Arabic multimedia content. Second, we will investigate language processing methods to analyze open-ended user-generated content, e.g., dialogs, and perform veracity assessment and inference. Third, we will explore speech and language methods for processing low-resource Arabic dialects. Finally, we will explore interpretation and debugging techniques to improve machine translation between English and Arabic.
Principal Investigators
Ahmed Ali, QCRI
James Glass, MIT CSAIL
Preslav Nakov, QCRI
Stephan Vogel, QCRI
The
goal of this project was to conduct applied and core computer science
research and to build innovative technologies that can be used by
decision-makers, NGOs, affected communities, and scholars to improve the
effectiveness of humanitarian strategies such as preparedness,
mitigation, and response during humanitarian crises and emergencies. The
core of this project focused on developing a multimodal data processing
system for understanding disaster scenes and situations from social
media.
Principal Investigators
Muhammad Imran, QCRI
Ferda Ofli, QCRI
Antonio Torralba, MIT CSAIL
This project is focused on the development of a lifestyle recommendation system eventually intended to reduce the risk of obesity and type 2 diabetes. The project team explored the use of reinforcement learning with a new healthy lifestyle and behavioral change representation initially focused to recommend activity patterns which maximize the user's quality of sleep. These recommendations were then to be used to create both a new model for behavioral change (which would be incorporated into a health coaching system to provide just-in-time recommendations to increase the user's quality of sleep), as well as a new analytics system to support coaching by healthcare professionals.
Principal Investigators
Peter Szolovits, MIT CSAIL
Luis Fernandez-Luque, QCRI
Raghvendra Mall, QCRI
The main focus of this project was to discover causal relationships in (multivariate) sequence of states (e.g. in health data) and to uncover the complex dependency structures from high-dimensional time-series encoded as sequence of states. The project team focused on addressing, in particular, the following challenges, using optimal transport methodology: machine learning techniques to extract sequences of states from time-series data, causality analysis from sequence of states data, explanatory models for sequence of states data, supervised learning methods to predict categorical or continuous output from sequence of states input data, unsupervised learning methods for sequence of states data, and factor analysis for sequence of states data.
Principal Investigators
Abdelkader Baggag, QCRI
Justin Solomon, MIT CSAIL
This project focuses on the generation of video facial expression given an audio signal.
Principal Investigators
Stephan Vogel, QCRI
Wojciech Matusik, CSAIL
This project focused on developing accurate map-making techniques using crowd-sourced methods to overcome challenges related to creating and maintaining street maps, especially in a rapidly developing environment such as Doha, Qatar, leveraging data primarily from mobile phones and investigating current limitations due to sensor noise, outages and data sparsity.
Principal Investigators
Mohammed Alizadeh, MIT CSAIL
Hari Balakrishnan, MIT CSAIL
Sanjay Chawla, QCRI
Sam Madden, MIT CSAIL
Information technologies at the start of this project could inform each of us about the best alternatives for shortest paths from origins to destinations, but they could not contain incentives or alternatives that manage the information efficiently to get collective benefits. To obtain such benefits, one would need to have not only good estimates of how the traffic is formed but also to have target strategies to reduce enough vehicles from the best possible roads in a feasible way.
Moreover, to reach the target vehicle reduction is not trivial, it requires individual sacrifices such as some drivers taking alternative routes, shifts in departure times or even changes in modes of transportation. The opportunity is that during large events (Carnivals, Festivals, Sports events, etc.) the traffic inconveniences in large cities are unusually high, yet temporary, and the entire population may be more willing to adopt collective recommendations for social good.
This project focused on understanding the impact of large-scale events and city growth to the traffic in the city and people’s commuting, and sequentially proposing reasonable and feasible travel demand management strategy to mitigate the traffic congestion in the future. This project took a fast growing city, Doha, Qatar as testbed. Traffic in Doha is notoriously bad and the population is growing very fast. Doha will host the FIFA World Cup in 2022, which will definitely attract a great number of tourists, and increase the pressure of the road network. To meet these challenges, this project used big data resources to understand the impact of World Cup and assist the policy maker with more reasonable planning strategy.
The project estimated the travel demand of local population using Bluetooth data and census data. The demand was assigned to the road network and the travel time of each trip estimated.
Principal Investigators
Sofiane Abbar (Social Computing, QCRI)
Marta González (HumNet Lab, MIT)
This project fell into three categories: 1) the use of machine learning and other advanced analytical techniques to discover new information related to on-field performance, 2) the development and application of novel techniques that provide new ways of viewing sporting events, and 3) providing a system for content-adaptive video retargeting.
Principal Investigators
John Guttag, CSAIL
Fredo Durand, CSAIL
Wojciech Matusik, CSAIL
Mohamed Hefeeda, QCRI
This project focused on how data management can be used to facilitate social computing. The Humanitarian Technologies research thrust sought to establish key technologies required to facilitate disaster management and humanitarian relief activities based on social media. These technologies leveraged current social networks and primarily focused on data consumption, generation, and integration.
Principal Investigators
Lalana Kagal, CSAIL
Carlos Castillo, QCRI
Patrick Meier, QCRI
The
research challenge addressed was that of securing computing
infrastructure against a broad class of cyberattacks. The project's
objective was to develop new techniques that can remove many of the
vulnerabilities that attackers exploit and that can predict and
intercept new (zero-day) attacks that exploit previously unknown
vulnerabilities. These objectives were pursued through a number of
sub-projects that fall into three categories: Systems that are much more
difficult to penetrate; Systems that can work through penetrations; and
Systems that can recover quickly.
Principal Investigators
Srini Devadas, CSAIL
Adam Chlipala, CSAIL
Frans Kaashoek, CSAIL
Shafi Goldwasser, CSAIL
Howard Shrobe, CSAIL
Martin Rinard, CSAIL
Armando Solar Lezama, CSAIL
Vinod Vaikuntanathan, CSAIL
Nickolai Zeldovich, CSAIL
Dimitrios Serpanos, QCRI
This
project focused a new study type to understand the basis of complex
genetic traits, a functional genome-wide association study
(fGWAS). Most experimental designs, relying solely on linear
models and genetic information to predict phenotypes, fail to recover
the full range of predictability of a trait. By combining
extensive well-controlled cellular data with novel integrative
computational models, this team sought to find a large chunk of the
missing heritability of multiple complex traits. With these
contributions, the team then worked to capture the broad-sense
heritability that is missed by linear models that rely solely on
genotype and markers acting individually.
This new study type focused on making advances along two fronts by
measuring and integrating fine-grained cellular measurements into
genotype-phenotype models:
(1) Integrative models that use cellular measurements
to prioritize particular genetic variants and interactions, leading to
more effective multiple hypothesis controls and better predictions
(2) Cellular measurements, interpreted as biomarkers,
will be used directly to improve prediction of phenotypes
Key milestones: (1) developing novel computational methods that
link genotype to phenotype using functional information in Functional
Genome Wide Association Studies, and (2) characterizing natural human
genetic variation using new computational methods.
Principal Investigators
David Gifford, CSAIL
Tommi Jaakkola, CSAIL
Halima Bensmail, QCRI
Reda Rawi, QCRI
How is memory implemented in the human brain?
This project focused on the development of machine learning
classification algorithms for human neuroscience data with the goal of
gaining knowledge of the computations and brain regions associated with
visual long-term memory.
Principal Investigators
Aude Oliva, CSAIL
Polina Golland, CSAIL
Halima Bensmail, QCRI
Othmane Bouhali, QCRI
The
goal of the project was to design a high-throughput and low-power FPGA
implementation of the newly proposed sparse FFT algorithm. For the
purposes of guiding the implementation
effort, the team chose the input data size as a million (220) points,
with a maximum of 500 nonzero frequency coefficients. The team completed
an initial implementation of the SFFT Core, which includes: 4096 point
dense-FFT module, a top-511 element selector module, a Voting module and
the Value-compute module. The team improved the design performance and
resource usage by modifying the pipeline of the design. The researchers
also completed an extensive debugging of the design using customized
test-benches. Given the filtered input data slices, the
project's FPGA implementation now produces the value-index pairs of
the 500 most significant frequency components.
Principal Investigators
Arvind, CSAIL
Raymond Filippi, QCRI
MAQSA is a system for social analytics on news. MAQSA provides an interactive topic-centric dashboard that summarizes news articles and social activity (e.g., comments and tweets) around them. MAQSA helps editors and publishers in newsrooms understand user engagement and audience sentiment evolution on various topics of interest. It also helps news consumers explore public reaction on articles relevant to a topic and refine their exploration via related entities, topics, articles and tweets. Given a topic, e.g., “Gulf Oil Spill,” or “The Arab Spring”, MAQSA combines three key dimensions: time, geographic location, and topic to generate a detailed activity dashboard around relevant articles. The dashboard contains an annotated comment timeline and a social graph of comments. It utilizes commenters’ locations to build maps of comment sentiment and topics by region of the world. Finally, to facilitate exploration, MAQSA provides listings of related entities, articles, and tweets. It algorithmically processes large collections of articles and tweets, and enables the dynamic specification of topics and dates for exploration. The MAQSA Project completed during Spring 2012, resulting in a patent and conference paper.
Principal Investigators
Sam Madden, CSAIL
Jorge Quiane Ruiz, QCRI
Sihem Amer-Yahia, QCRI
The major goal of the project was to understand the food habits from social media images. This includes: training machine learning models for image auto-tagging and content extraction from noisy hashtags; predicting population level health statistics in US and Qatar; monitoring temporal and regional trends in food consumption and its implications; learning models that can achieve in depth analysis of food images through the use of large scale cooking recipe data collected from the web.
Principal Investigators
Ferda Ofli, QCRI
Antonio Torralba, MIT CSAIL
Ingmar Weber, QCRI
This research focused on developing motion magnification and comparison techniques for sports applications, and to develop motion magnification techniques for laparoscopic surgery.
Principal Investigators
Fredo Durand, CSAIL
John Guttag, CSAIL
Mohamed Hefeeda, QCRI
This project focused on exploiting big data for image and video manipulation. Our work solves fundamental and challenging computer graphics problems with applications to various impactful domains including: computational photography, multimedia and video content post-production.
Principal Investigators
>Mohamed Elgharib, QCRI
Wojciech Matusik, CSAIL
This projects objective was to answer the question: How can users get the full benefits of multi-user software even when their friends and colleagues use different software vendors, platforms, and service providers? More technically, it aimed to design and aid in standardization of protocols which allow for decentralization of social software, thus giving users and vendors a free market for innovation. It also aimed to develop software infrastructure that supports this vision, such as servers to support data storage and retrieval, libraries and development tools to support application developers, and web applications for use by end users. The approach is iterative, building up from small working systems, improving scaling, security, and user experience, as we test and demonstrate new solutions.
Principal Investigators
Tim Berners-Lee, CSAIL
We aimed to assess the current tactics used by Qataris and other GCC nationals to express identity through the use of virtual identity technologies (e.g., social media profiles and avatars), which are not necessarily designed with their values in mind. This investigation sought muliple results: (1) articulation of base principles and best practices for developing technologies that empower Qataris to enact traditional values and cultural norms, (2) new computational techniques for understanding user values and practices in virtual identity systems, and (3) a novel application illustrating the efficacy of our discovered design principles.
Principal Investigators
D. Fox Harrell, CSAIL
Haewoon Kwak, QCRI
This project dealt with database management. Specifically, the project team focused on investigating a system to support data scientists, called Data Civilizer, which helped with a number of problems around discovering, integrating, and cleaning data sets. A particular focus was on methods that combined machine learning, program synthesis, and human-in-the-loop techniques to advance the state of the art in these important areas.
Principal Investigators
Sam Madden, MIT CSAIL
Mourad Ouzzani, QCRI
Michael Stonebraker, MIT CSAIL
This project aimed to develop key speech and language processing technology enabling users to search for verified facts and claims, in both written and video repositories of English and Arabic, using questions posed in natural and spoken language. The research addressed four essential cross-cutting topic areas to achieve this objective. First, we investigated methods that enable rich annotation of Arabic multimedia content. Second, we investigated language processing methods to analyze open-ended user-generated content, e.g., dialogs, and perform veracity assessment and inference. Third, we explored speech and language methods for processing low-resource Arabic dialects. Finally, we explored interpretation and debugging techniques to improve machine translation between English and Arabic.
Principal Investigators
Ahmed Ali, QCRI
James Glass, MIT CSAIL
Preslav Nakov, QCRI
Stephan Vogel, QCRI
At the initiation of this project, shared computing platforms, from small clusters to large datacenters, suffer from low utilization, wasting billions of dollars in energy and infrastructure every year. Low utilization stems from a disconnect between layers of the hardware and software stack. The goal of this project was to investigate and develop integrated intra- and inter-node resource management techniques that provide both near-peak utilization and guaranteed high performance in shared environments.
To this end, this project consisted of three main thrusts:
- Elastic multicore systems, which combine recent hardware support for fast resource management with a novel software runtime to make hardware adaptation work for, not against, performance guarantees. Elastic multicores are focused on using different hardware resources (such as cores, caches, and power) to achieve a given performance target as efficiently as possible, and safely share resources among guaranteed-performance and best-effort applications.
- Novel solutions to enable collaborative multi-tenancy, where resource-intensive workloads are co-scheduled and placed using fine-grained, automatically-collected resource usage profiles, considering aspects such as cache and memory bandwidth sharing.
- A shared system prototype that enables QF computing users to aggressively colocate applications on shared many-core nodes. The system will guarantee the latency requirement of performance-critical tasks (such as Al Jazeera video processing) while achieving high system utilization with intelligent placement of batch tasks such as HPC and data analytics.
Principal Investigators
Xiaosong Ma, QCRI
Daniel Sanchez, CSAIL