in form at ion Great Wave Off Kanagawa

 

I'm currently a Post-doctoral fellow in the Natural Language Processing group of the Department of Computer Science
at Johns Hopkins University. I'm also a member of the Center for Language and Speech Processing.
When I was a PhD student here at JHU, my advisor was Professor David Yarowsky.



 
  --> research
 

My dissertation was titled "Translation Discovery Using Diverse Similarity Measures."
It had to do with learning reasonably high-quality translation lexicons into English
for a variety of foreign languages. The distinguishing aspect of the research was that
the learning required very little in terms of training resources in the foreign language
of interest. A few hundred thousand words of monolingual news text was shown to yield very useful
results: for example, a ranked list of possible English translations for each word appearing
in an Uzbek corpus, in which at least 34% of Uzbek words have a correct English
translation in the top 10.


My research interests lie mainly in the areas of machine translation and,
more generally, multilingual natural language processing.
My goal this summer (2006) is to extend and apply existing techniques, many of which
were developed here at JHU, to build morphological analyzers, basic syntactic analyzers,
named entity recognizers, and cross-language name matching tools, for a large set of
world languages. This includes many European, North Indian, Dravidian, and Turkic languages,
and hopefully several Austronesian and Southeast Asian languages as well.



 
  --> other projects
 

Setting up, maintaining and processing the results of ongoing,
high-volume data collection of Internet news in many languages
from sites around the world.

Being the sysadmin (with David Smith) for the NLP linux cluster.



 
  --> publications
 

 ---- 2006 --------------------------------------------- 

Charles Schafer. 2006.
"Novel Probabilistic Finite-State Transducers for Cognate and Transliteration Modeling."
7th biennial conference of the Association for Machine Translation in the Americas (AMTA).

Charles Schafer. Translation discovery using diverse similarity measures.
Doctoral dissertation. Johns Hopkins University, 2006.

 ---- 2005 --------------------------------------------- 

Charles Schafer and Elliott Drabek. 2005. ``Models for Inuktitut-English Word Alignment.''
ACL 2005 Workshop on Building and Using Parallel Corpora.

 ---- 2004 --------------------------------------------- 

Charles Schafer and David Yarowsky. "Exploiting Aggregate Properties of Bilingual Dictionaries
For Distinguishing Senses of English Words and Inducing English
Sense Clusters." ACL short paper. Barcelona, July 2004. ]

 ---- 2003 --------------------------------------------- 

[ Charles Schafer and David Yarowsky.
"A Two-Level Syntax-Based Approach to Arabic-English Statistical Machine Translation."
In Workshop on Machine Translation for Semitic Languages, New Orleans, Louisiana, 2003. ]

[ Charles Schafer and David Yarowsky.
"Statistical Machine Translation Using Coercive Two-Level Syntactic Transduction."
EMNLP 2003, Sapporo, Japan.
]

 ---- 2002 --------------------------------------------- 

[ Charles Schafer and David Yarowsky.
``Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages.''
In Proceedings of CoNLL 2002.
]

[ Ellen Riloff, Charles Schafer, and David Yarowsky.
``Inducing Information Extraction Systems for New Languages via Cross-Language Projection.''
In Proceedings of COLING 2002.
]

[ R. Florian, S. Cucerzan, C. Schafer and D. Yarowsky.
``Combining Classifiers for Word Sense Disambiguation.''
In Journal of Natural Language Engineering.
]

 ---- 2001 --------------------------------------------- 

[ D. Yarowsky, S. Cucerzan, R. Florian, C. Schafer, and R. Wicentowski.
``The Johns Hopkins SENSEVAL2 System Descriptions.''
In Proceedings of SENSEVAL2, pp. 163--166.
]

 ------------------------------------------------------- 



 
  --> some professional activities
 

  • Program Committee ## SemaNet'02: Building and Using Semantic Networks
  • Program Committee ## SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text
  • Program Committee ## 2004 Conference on Empirical Methods in Natural Language Processing
  • Reviewer ## Journal of Natural Language Engineering special issue on Parallel Texts
  • Program Committee ## 2006 Conference on Empirical Methods in Natural Language Processing



     
      --> software
     

     _______________________________________________________________________ 

    [ JHU Devanagari Font Conversion Tools ]
    JHU has developed a software package for converting Devanagari encodings
    (currently Unicode, Naidunia, Jagran, Bhaskar, JC_Hindi, Yamuna, and Amarujala)
    into the ITRANS format and subsequently into UTF8.
    The package also provides facilities for rapidly
    converting new encodings.

    This tool set currently includes

    A. One-line commands to convert data in the above encodings into ITRANS.

    B. One-line commands to process web archives which may include
    embedded SGML tags and lines of English text; these are
    automatically detected, wrapped in language-escape tags and left
    unconverted, while the Hindi text is converted to ITRANS.

    C. An architecture that makes successive font conversion efforts
    relatively easy by pushing most of the development work into a
    simple process of character classification and editing of a couple
    of mapping tables. Fairly extensive documentation on doing this
    conversion process for a new font/encoding is included.

    D. Also included in the download are HTML archiving and cleanup
    scripts, which can be used to clean up web downloads as a
    preprocessing step to font conversion.

    E. Finally, we include instructions for downloading and installing
    the free, high-quality ITRANS -> UTF8 converter available
    from www.aczone.com .

    created by Sanjeev Khudanpur and Charles Schafer. 
    June 2003. 
    
     _______________________________________________________________________ 
     _______________________________________________________________________ 
     _______________________________________________________________________ 
     _______________________________________________________________________ 
     _______________________________________________________________________ 



     
      --> multilingual stuff
     


    [ some internet multilingual resources. this is really old. ]



     
      --> thought for the day
     


    [ do you agree? ]

    [ The old Lie ]



     
      --> interesting spots
     

    [ fifty eggs ]

    [ Sorry Everybody ]



     
      --> some pictures
     

    -- taipei, taiwan, summer 2002 -- ---------------------------
    [ photo booth in taipei ]

    -- minnesota, fall 2002 -- ---------------------------
    [ me and dad in the boundary waters, northern minnesota, september 2002 ]

    -- sapporo, japan, summer 2003 -- ---------------------------
    [ a tasty snack in sapporo ]
    [ pocari sweat machine ]
    [ distraught squid ]
    [ for odori, shin sapporo, with love ]
    [ in kyoto ]
    [ construction with extreme politesse ]
    [ a long way from alabama ]
    [ making the 9:09 ]

    -- family pictures, late 2003 -- ---------------------------
    [ grandmother spiers, mom, john, jenny ]
    [ dad and john discussing apples at scott's apple orchard ]
    [ mom in jackson square (new orleans) ]
    [ grandmother schafer opening a present ]
    [ my brother john in funny hat ]

    -- milwaukee trip, around the new year, 2004 -- ---------------------------
    [ jan 1st 2004, 1AM, milwaukee, wisconsin. dylan had just asked me to be his best man. ]
    [ dylan deserves this ]
    [ mike, jane, elizabeth and dylan looking happy ]
    [ dylan holds forth ]
    [ mike and dylan play an effete sport ]
    [ mike and jane. ]
    [ after all these years, back at the exact spot where mike and jane got married. ]

    -- minneapolis, dylan's wedding, june 2004 -- ---------------------------
    (--1--)   (--2--)   (--3--)   (--4--)   (--5--)   (--6--)  

    -- barcelona, july 2004 -- ---------------------------
    [ elliott and david ]
    [ david and an arch of triumph ]
    [ self and grace (at banquet) ]
    [ noah and karen (tango in a square) ]
    [ chin and the expiatory temple of the sacred family ]
    [ hand and disclaimer of elven responsibility ]
    [ self and apt monicker ]
    [ posse ]
    [ john hale at the barcelona forum ]
    [ capricious salad ]
    [ downward spiral ]
    [ downward spiral (with self) ]



     
      --> sundries
     
    our friends, the cicadas:

    
    
    for cicada-kind The Tower of Babel