News & Events
The REMND project (Robust Extraction of Metaphors in Novel Data) is developing and validating an automated system for recognizing and understanding metaphors in cross-cultural communication. Our objective is to identify and analyze metaphors in naturally occurring text and apply the resulting capability in practical, analytic case studies. The end result of the successful REMND project will include vastly advanced automated language processing capabilities suitable for applications in cross-cultural contexts.
For more information, please contact:
Prof. Tomek Strzalkowski, Principal Investigator
University at Albany, SUNY
ILS Room 262B, Social Science Building
Albany, NY 12222.
Email: tomek [at] albany [dot] edu
Phone: 518-442-2608; Fax: 518-442-2606
The REMND project will (1) automatically find metaphorical expressions and systematically interpret their semantics, in four languages – American English, Mexican Spanish, Iranian Farsi and Russian; and (2) apply these interpretations to discover sources of possible, underlying agreement and disagreement in specific cases of intercultural interaction. Thus we view the REMND project as a concerted attack on the problem of identifying and interpreting the obscure, unspoken meaning in text: the next logical and far-reaching step in the continuing progress toward more highly capable language processing in increasingly multi-cultural world.
REMND is a multi-step process, which starts with an efficient capture of relevant text data from licensed news sites and through web harvesting techniques, thus achieving a high quality data set with sufficient instances of target concepts for analysis. From this data, the Metaphor Classification Module (MCM) identifies passages that employ linguistic metaphors based on their association with multiple subject domains, the proto-Source Domains. Conceptual metaphors are then discovered from among these by the Source Domain Identifier (SDI), while the Source-to-Target Mapping (STM) module establishes the semantic analogy between the Target and Source domains. Finally, the affect and force of the metaphor, as it is used in each context, is calculated. Force is a concept we add to affect calculation, in order to measure the strength of the impact the metaphor can be expected to have on the reader.
This project is funded by the Intelligence Advanced Research Projects Activity (IARPA) as part of the Metaphor program.
As part of our research on the Metaphor project, we developed extensive lexicons in various languages.
One resource is the expanded MRC psycholinguistic database. Using a custom expansion method, we extended the existing MRC Psycholinguistic Database to provide coverage to over 120K words. Our expanded MRC (MRC+) contains imageability and concreteness ratings for 120K words in English. In addition, we developed corresponding MRC+ lexicons in Spanish, Russian and Farsi, which also contain imageability and concreteness ratings of hundreds of thousands of words.
We used the ANEW lexicon for word valence ratings as another resource in our research. This lexicon (ANEW+) has also been expanded from the original human-rated lexicon. Corresponding expanded lexicons are also developed for Spanish, Russian and Farsi.
We followed strict validation protocol to evaluate the validity of our expansions. The details of our validation method and resulting findings are in this paper.
If you would like access to these resources for your research, please contact Prof. Tomek Strzalkowski.
Contact details are at the top of this page.