Data Hackathons are the Latest Hacking-Good Competitions

Hackathon Image

 

Unless you’ve been living in a cave (and one that is not outfitted with WiFi), you have probably heard about those weekend-long coding hackathons that seem to be the latest rage, occurring both inside and outside of Silicon Valley. Bringing together a collection of super-hot programmers and “wanna be” coders, these good-vibe coding hackathons usually involve a heads-down gut busting race toward trying to code-up a catchy app that will win praise, reward and eternal bragging rights.

But, move over coding hackathons since the newer craze is Data Hackathons (aka Data Hacks, aka Datathons).

I recently attended a mobile data science hackathon that was put together by the University of California Irvine’s School of Information & Computer Science (UCI ICS) along with their Blackstone LaunchPad incubator and high-tech firm M2Catalyst, in conjunction with the OCTANe Technology Investor Forum.

The reason that this particular data hackathon was called a “mobile data” hackathon was that the data involved in the event was based on a large-scale collection of mobile device related data (more on this aspect in a moment).

With a grand prize of $2,000 and lesser amounts for the other top 5 teams, the winners also earned an opportunity to showcase their prowess to the annual OCTANe Technology Investor Forum under the chances that they might get money and mentoring from potential investors.

 

HACKATHONS ARE IN

The distinguishing aspect of a data hackathon is that it focuses on doing something useful with data, rather than being a traditional programming oriented hackathon. For the traditional programming oriented hackathon there is usually a requirement that the participants must produce an app, and it sometimes is a particular kind of app like say aimed at healthcare or maybe aimed at social media.

In contrast, a data hackathon asks the participants to delve into a potential treasure trove of data.

Hackathons are usually tightly timed events, putting pressure on the participants to produce something while the clock is running down. Typically starting on a Saturday, they often go for 24 hours until a Sunday and then are followed on Sunday evenings by presentations and awards being given out. For the 48 hour hackathons the starting time is often Friday late afternoon and then runs until Sunday. The weekend approach helps to avoid conflicts with weekday schedules, and makes for an exciting weekend (more so than say parachute jumping!).

And, it seems to carry on the age-old tradition of pulling all-nighters when in college and striving with lots of coffee to get a big project done by an upcoming due date.

Hackathons sometimes are only for students, usually college students, but that’s been changing gradually over time and now there are hackathons that allow a wide mix of participants, including students, faculty, professionals, and anybody that thinks they have what it takes to compete.

Most of the time, a hackathon involves gathering the participants in a specific location for the event and ranges from using an old warehouse to having it take place in a cool incubator someplace. Not all hackathons involve a physical gathering and there are some that take place on a virtual basis, allowing for participants from around the globe to more easily participate.

The carrot at the end of the stick for the participants is that there is usually a cash prize, there might be swag that goes to the participants like free hardware and software, there is often free food such as all the pizza you can eat, there is the excitement of being part of something special, there is the notoriety that one can use to put on their resume or brag about participating, there is the chance to make handy networking contacts for future work or employment, there is the possibility of gaining a reputation and fame for your proficiency, and there is the chance of being “discovered” and maybe in one weekend starting the next Facebook or Twitter.

Lots of reasons to participate, for sure.

By-and-large most hackathons involve working in teams. You can either team-up before the competition and enter as a team, or you can try to find fellow birds-of-a-feather when first showing up at the event. Most require that you bring your own equipment such as souped-up laptops, and will usually provide high-speed WiFi for the attendees.

Sleep is often considered optional at a hackathon.

You and your team work through the night on your creation. Whether any of you get much shut eye is up to you and your team to decide. Some teams take turns doing quick cat naps, while others power entirely through the night by using caffeinated sodas.

 

DATA HACKATHONS

The data hackathon is a variant on the traditional coding style hackathon in that the participants are provide with a sizable bulk of data and required to analyze it and hopefully find insights in it.

There are usually predetermined “categories” of competition at data hackathon.

The most common categories are Data Visualization, Statistical Insights, Machine Learning and AI, and App Development. Team are usually encouraged to enter into just one of the available categories at the competition, but can sometimes enter into more than one category if they think they have a solid chance at doing something that cuts across the stated categories of the competition.

Here’s what the categories are about:

Data Visualization. This category involves taking raw data and finding interesting, useful, and engaging ways to visually portray the data. In some cases the data hackathon tells you which visualization tools you are allowed to use, while in other instances they let you use whatever data visualization tool that you are familiar with. The key is that in-the-end you must take a morass of data and provide visualization that provides something intelligently reflective about the data.

Statistical Insights. This category involves using both simple and potentially highly complex statistical techniques to spot trends and patterns in the data. You can sometimes use your own preferred statistical tools, while in other cases you are told which ones are allowed to be used. The key overall is to find trends and patterns that are insightful, and hopefully do so better than what your hackathon competition finds.

Machine Learning and AI. One category that sometimes is provided involves making use of state-of-the-art machine learning and AI techniques to find something that is interesting, useful and engaging about the data. You either are told which machine learning and AI techniques and tools to use, or you are allowed to use your own choices. Once again the key is that you are able to deeply probe the data and find something useful about it.

App Development. This category is somewhat less frequent at data hackathons but nonetheless involves developing an app that exploits the data that was provided. In any of these categories you could potentially write your own app, but for this category you must write an app. The app has to be shaped around the nature of the data and cannot be so far removed from what the data is about that it seems like you just coded some randomly inspired app.

 

THE DATA

The nature of the data that is provided at a data hackathon varies widely from event to event.

The event that had mobile data consisted of 50,000 mobile devices and real-world data collected from and about the mobile devices over an extended period of time. About 150 metrics were included such as which apps were running on the mobile device at any point in time, what the device itself was such as type of model and memory, signal strength and quality when the mobile device was on a network, GPS data, CPU consumption, data usage, battery charge, and so on.

Some of the teams looked at how energy consumption or footprint by usage of apps occurred over time, and were trying to show that the data could allow device makers and consumers to be more aware of when and in what way their battery usage gets consumed. This might inspire mobile device makers to change the nature of the batteries or the nature of the operating system in order to increase battery power for mobile users, it could inspire battery makers, it could inspire consumers as to how to gauge when their batteries will most be drained, and so on.

Some teams took a different perspective. They looked at how apps and mobile devices were used on certain special occasions, such as on Black Friday or on Thanksgiving (turns out that an interesting factoid was the surprising number of golfers that went out the golf course on Thanksgiving, as measured by GPS location and time spent). This analysis by special occasion could be used by marketing companies that want to determine what ads and when those ads should be best displayed on mobile devices during those special occasions.

Others looked at app usage (one team reported that there were 70,000 distinct apps being used across the entire data set), another looked at when and where photos were being taken.

 

SOME TIPS ABOUT DATA HACKATHONS

For those that attend or are thinking about attending a data hackathon, and for those that are thinking about putting together a data hackathon, I offer some sage advice in the following six tips:

 

 

1. Be creative about thinking about data
Some participants in a data hackathon do not look beyond the obvious aspects of the data that they are given. They just focus on the fields of data as given, and fail to think creatively.

What do I mean by being creative?

I mean that you need to look at ways of possibly combining multiple fields to create essentially something new out of the data. You need to look at how the fields of data are related and unrelated, and decide which fields make sense to look at separately versus collectively.

There is sometimes a tendency to be in a panic at the start of the data hackathon and just jump immediately into wanting to do something with the data so that you and your team can “get underway” – but in the end you’ll likely not be much of a winner if you have just crunched the data in the same way that the other teams are doing so.

It is worthwhile at the start to brainstorm and try to find an angle that nobody else might be thinking of.

Another perspective is to consider going beyond the data provided and combining the data with some other third-party data. For example, maybe using a free data set from the federal government about demographics and combining it with the provided data about mobile device usage. This is a creative approach that few other participants would think of, and it is one that you can usually prepare for in-advance by doing your homework before the event and getting those other data sources ready for use.

Some data hackathons might not allow you to use third-party data or insist that you prove that the data was openly available and not proprietary, so make sure to read the rules carefully of the data hackathon.
2. Don’t become overly enamored by the tool

Often, a data hackathon attracts techies and tools nerds, and so they are very tool oriented. As a result, they end-up at the end showing let’s saying tons of intricate curves and pie charts, but it is just showcasing the tools being used, and does not showcase insights about the data.

The tools in a data hackathon are normally second fiddle to the insights – when I say this, I am not saying that the tools are not crucial, since they are, as they will be what helps you to find the insights, but it is in-the-end what the data has that is most important, and the tool is secondary though vital to probing the data.
3. Use as much of the data as you can

Sometimes the amount of data is huge, occupying hundreds of gigabytes or maybe even dozens of terabytes. This requires using Big Data mindsets and capabilities to analyze.

Some teams will parcel out a smaller chunk of the data, which, though yes if you find good insights is handy, but at the same time if you don’t exploit the whole data set then you might be missing out on other very important and interesting insights.

Do not prematurely judge the data and cut it down in size.
4. Have your insights well pronounced and ready to present

Some of the techies that participate in a data hackathon might be really good at heads-down stuff, but they lack an ability to ultimately state what they found and present it so that the judges and the audience know what they actually accomplished.

This is unfortunate because sometimes a team has actually discovered a tremendous eureka in the data, and yet when at the end they make a presentation it is so muddled and poorly presented that nobody has any clue about what the team found.

It’s a darn shame when a team has discovered some gold nuggets but cannot get the credit because they are unable to convey what they found.

Often, teams will make sure to include someone on the team that can actually articulate the findings and that has the ability to speak in front of a crowd and give a presentation.

Though this is not something most data hackathon participants think about beforehand, they usually to their regret realize the value of giving presentations once it happens at the end of the event.

Imagine that you have slaved away painstakingly on the data for 24 or maybe 48 hours, and when you get a one minute or two minute opportunity to present your results that it all falls apart because no one on your team can make those minutes count.
5. Building something versus data mining

I mentioned earlier that some data hackathons have a category involving building an app that somehow relates to the data. If you are in that category then you are somewhat safe that if you spend all of the hackathon building something then you are going to be OK at the end in terms of chances of winning (in that category).

If you are competing in the other categories that aren’t about building something, then you had best use your time to use given tools to do data mining. I’ve seen instances where the team is focused on building a tool to do data mining, which is a misuse usually of the time because they are reinventing the wheel – there are plentiful tools for data mining and so unless your team has a new insight about how data mining tools should work, you are going to be easily bested by those that know about existing tools and can use them well.

Resist the urge to code.

Embrace the urge to do data mining.
6. Having data that has significance

Sometimes the event organizers provide data that is not very inspiring. This makes it hard for the participants to then go about finding useful insights.

For participants faced with such a circumstance, it is very hard to turn lemons into lemonade. You can try to use some of my above suggestions about thinking creatively, and maybe it will help.

It is up to the organizers though to make sure that the event is interesting and exciting and provides ample opportunity to do interesting and engaging analyses.

Just because an organizer might happen to have a large set of data handy does not necessarily mean that it is worthy for use in a data hackathon.

I ask the organizers of data hackathons to think carefully beforehand about the data sets being provided.

I would even suggest that the data set should maybe have some kind of compelling business or social need involved. In other words, if a bunch of shape data hackers are going to spend an entire weekend trying to find insights in the data, at least leverage the intense scrutiny by the data hackers towards finding insights that might actually make a difference to business or to society.
GET INTO A DATA HACKATHON

If you’ve not attended a data hackathon, either as a participant or as an observer or a judge, you should definitely consider doing so.

For the participants, it can be quite a thrill and create friends and an experience of a life time.

For observers, admittedly it is a bit like watching grass grow, since most of the time the teams will be huddled together and have their noses in their laptop screens, but if you attend at the start and especially at the end, I think you’ll see what all the excitement is really about.

Finally, one small comment is that it is certainly refreshing to see people coming together for a positive aspect and trying to do something good in the world. Whether you think data hackathons are interesting or not, they promote a healthy spirit of expanding and advancing our ability to make use of data and enrich us all as to making data into something informative and actionable.

Seems like a laudable goal to me.

Good luck on your data hacks!

Enough said.