Ha Cao's CPT Project 2019

From CSclasswiki
Jump to: navigation, search

Software Engineering Internship at Two Sigma Investments LP

Two Sigma Investments LP is an international quantitative hedge fund whose headquarter is in Soho, NYC. It uses machine learning, artificial intelligence, and distributed computing for its trading strategies. Its mission is to discover value in the world's data.

I worked as a software engineering intern in Trading Engineering, an area of Engineering at Two Sigma.

Application Process

The application usually starts with a coding challenge, followed by a phone interview, and an onsite round in NYC consisting of 4 to 5 interviews that are 45 to 60 minutes long. I did all of my interviews at Grace Hopper. The type of questions are algorithms and data structures, operating systems,s and networks, etc.

Internship Structure

The internship was 10 weeks long. There was a thorough team-matching process in which I stated my technical interests and skills and they matched me with a team in Trading Engineering where they build, run, and maintain the trading platforms that execute orders of buying and selling stocks and other financial instruments for Two Sigma. I had an individual project with a manager. One manager only managed one intern, so I received a lot of help, guidance, and mentorship. Besides, they also matched me with an intern buddy, who took me out to company-paid lunches and offered advice on anything that I was concerned about, technical or non-technical alike.

Project Details

My project was to build a real-time outlier detection framework for critical trading data. In order to do so, I came with metrics and statistics that would help measure the performance of the trading system and spot any anomalous behaviors. I used historical trading data as training data and use my framework to detect outliers on live trading data. Because the trading data don't have labels as outliers or non-outliers, I applied outlier detection with unsupervised machine learning; specifically, I used Isolation Forest for anomaly detection [1]. Finally, I built a streaming dashboard to display the real-time performance of the trading system together with the outliers that are detected.

Programming Languages

I used Java, Python, and Scala. Scala was new to me.


I used distributed computing technologies such as Spark, Spark Streaming, and Kafka to read, write, and process large volumes of data, Elasticsearch to write information I wanted to display to a server, and use Grafana to read from Elasticsearch and create a streaming dashboard to show the results.

Lessons Learned

I learned how to do full-stack development by doing a project on my own from beginning to end. I also got to apply object-oriented programming I learned from class to real-life projects. Moreover, I picked up a new programming language, new technologies, unsupervised machine learning, and some trading knowledge along the way.

Fun Experiences

The internship was a very educational and fun experience. I got to live and make friends with other interns in the same NYU dorm. We had regular board game nights every week. I also learned how to play poker and got to play in a poker tournament for interns at Two Sigma. The company also took us out to see The Lion King in theater and Phantom of the Opera on Broadway.


I would highly recommend this internship as a CPT internship for any international student who majors in Computer Science!

Presentation Slides

Here are my presentation slides about this internship. File:Ha cpt presentation.pdf