Sponsored and organized by the Quantitative Analysis Center (QAC), the University will be holding its first DataFest from April 8-10 starting at 6 p.m. Assistant Professor of the Practice Valarie Nazzaro, Director of Centers for Advanced Computing Manolis Kaparakis, and Visiting Assistant Professor of QAC Pavel Oleinikov held an information session on Monday, April 22, to give students an idea of what DataFest will entail.
Nazzaro outlined the details of the event, explaining that students will be given a very large, complex data set, unlike anything they would have seen before. The competition is set up so that teams are asked only one general question about the data, and then they will have from Friday night of DataFest until noon on Sunday to extract meaning from it.
She gave an example of something that could be similar to this year’s surprise data set.
“One of the examples…was police arrests in Los Angeles, and the very first data set had 10 million arrest records from Los Angeles in a 5-year span,” Nazzaro said. “The only thing [students] were asked was: ‘What policy changes would you recommend?’ So from that, they were asked to extract any sort of insight that they would give as advice to the police department.”
Teams at DataFest will be recognized in three categories: Best Visualization, Best Insight, and Best Use of External Data.
Nazzaro commented on what she considers the best approach for thinking about success in terms of those three areas.
“You’re not trying to accomplish all three things at once,” Nazzaro said. “Most people are going to pick one and…just go at it. Hopefully, the best visualization is also an interesting insight…but one thing the judges are told is that it’s DataFest not StatFest, which means that the most interesting insight might not come from a model or from a statistical test; it might come from something else.”
Kaparakis added to this sentiment, encouraging students to think broadly and to approach the problem from all possible angles.
“The kind of data that may be coming your way, is…not generated for academic research, [rather] it’s data that is generated as part of the ordinary business of life, if you will,” Kaparakis said. “You have to be prepared to look at these data with fresh eyes…. It’s not necessarily data that would be best suited for the model that you learned in [a given] course…. You’re going to have to get down and dirty in terms of how you approach it.”
Another aspect of the event is that academic and industry professionals will be in attendance to work with students over the course of the weekend.
Continuing to discuss the events schedule, Nazzaro also addressed some of the specific companies that will be sending representatives to participate in the event.
“There are going to be consultants from industry,” Nazzaro said. “We currently have people from Pfizer, Boehringer-Ingelheim, Aetna, and MassMutual, who are going to come and assist individuals as best they can. So [students] can use these professionals as soundboards, or to help with a technical issue. At the very end of it, [teams] are just going to be asked to present for no more than five minutes on what they found.”
Kaparakis spoke to the process of forming teams, noting that students should remember to work with people who have different skill sets from their own, so as to be able to cover all of their bases during the competition.
“We hope it will be not just a work event, but also a nice social event, for people to work together and start building networks,” Kaparakis said.
Nazzaro also encouraged those in the room to share their thoughts on good ways to make students feel prepared and willing to participate in the event. Workshops will be held, prior to DataFest, that students can use as a kind of preparation for the competition. The QAC plans to add more workshops, as well as other events, covering a wider range of topics based on student suggestions.
Joli Holmes ’17 added her suggestions for future events in the QAC. She cited past guest speakers such as R. Luke Dubois, an artist who has many notable works related to data, and Matt Daniels, who gave a presentation on data journalism and media art, and argued that their visits seemed to have been well received by the University community.
“Bring more of the artists,” Holmes said. “Those [events] are super interdisciplinary…. You get such a different crowd of people that are not as involved with statistics or computing, but maybe are much more design-oriented.”
Nazzaro similarly emphasized the fact that DataFest is open to students of all academic backgrounds.
“We’re expecting, from Wesleyan, [students] not necessarily from the Computer Science [Department], but anyone who’s used to working with data, and even someone from any sort of major who is comfortable with data,” Nazzaro said. “We’re expecting people who have experience with R, Python, or other computing languages, but there’s really no limit.”
Teams from Connecticut College, Yale, the University of Connecticut, and Trinity have also been invited to participate. Kaparakis explained that the QAC is currently budgeting for about 60 students from the University to participate.
Yiren Chai ’17 described his motivations for participating in these kinds of events.
“I think that it’s a good idea to use the things you learn in class outside of that class,” Chai said. “That’s why I’m interested in the event, so that I can practice my data analysis skills [while] getting more experience with the real world.”
Kaparakis concluded with a final statement of encouragement for University students planning to participate in DataFest.
“Let’s have fun with it,” Kaparakis said. “Not everything you try is going to work, but…this is the time to experiment and fail…or [to] succeed, and I know we will succeed.”