Minyue Dai's CPT Project

From CSclasswiki
Jump to: navigation, search

OCR System of Historical Documents

This summer I interned in Engineering Practicum program at Google. My project is Optical Character Recognition System (OCR) of Historical Documents, which is supervised by GoogleOCR team under Google Research. My podmate Carrie and I worked on building a character recognition system optimized for historical documents in various languages.

Overall Experience

Engineering Practicum is a 12-week long internship program designed for first-year and sophomore students. Interns work as a pair and get help from two hosts. My podmate Carrie and I have similar background and related experience about OCR of historical manuscripts, so our collaboration turns out to be great. An interesting but stressful fact is that our GoogleOCR team is under Google Research, which means almost all Googlers are PhDs. My host told me they never had undergraduate interns before and he did have same expectations as those for graduate students for us. Another challenging point is their whole system in built on C++, which both Carrie and I had no experience before. There are also many internal tools at Google and be honest we spent more of our first two weeks in learning new tools and frameworks.

What I've Done