Astound recognized as most promising technology by KDD with Startup Research Award
For more than two decades now, the annual Association for Computing Machinery (ACM) Special Interest Group (SIG) conference on Knowledge Discovery and Data (KDD) has been the place for big ideas in big data and data science. The KDD conference is thought to be so fundamental to the field that Forbes and Dataconomy both included its inaugural conference in 1995 as one of the top milestones in the history of data science.
Research presented at KDD has uncovered ways to fight bias in algorithms, it has helped urban planners better understand where crime could occur in their cities, and it has used historical hospital data to predict when patients will get sick again. A paper presented at KDD 2017 even proposed an algorithm that uses computer vision to learn about fashion and style from images and develop similar items on its own. Over the past few years, as terms like “big data,” “data mining” and “artificial intelligence” have made their way from the ivory tower to the dinner table (and the runway), the KDD conference has become increasingly influential.
KDD 2018 is no exception. The conference will be held next week, August 19 through 23 in London, gathering the world’s top data scientists from academia and industry to share their knowledge and discuss new ideas to move the field forward. In 2018, 497 papers were submitted to the Applied Data Science Track. Less than 15 percent were accepted. Astound is honored to be among them.
Astound data scientist Karan Samel and Chief Data Scientist Xu Miao co-authored a paper titled “Active Deep Learning to Tune Down the Noise in Labels.” Samel will present their findings at a poster presentation next week at KDD in London. What’s more – Astound is also the winner of a Startup Research Award. KDD launched the Startup Research Awards in 2017, aiming to encourage the participation of small startups in the area of data science. Astound is one of few startups that will have the opportunity to present at the conference. On August 21, Samel’s poster presentation will detail a new approach that pairs deep neural networks (DNNs) with a form of supervised learning called active learning to address a common enterprise data issue – data that is messy and inconsistent.
Writing about KDD in 2011, MIT Technology Review documented the emergence of “The New Big Data.” Suddenly the explosion of the internet and readily available consumer data meant that big data was something much bigger, but truly grasping this potential would require new techniques for analyzing unstructured data. The article goes on to say that “the intense business interest in data has changed the field of data mining” but “most of the data that businesses are producing is an unstructured mess.”
Fast forward seven years, and the same problems persist. Samel and Miao’s paper “Active Deep Learning to Tune Down the Noise in Labels” addresses one piece of this problem. The issue, simply put, is that people find different ways to solve the same problem, generating lots of data that is inconsistent which makes it difficult for AI to predict the right answer. For example, in an IT department, support agents receive incidents submitted by employees, they read through the incident (e.g. I don’t have access to my email) and select a category to route this incident to ensure it ends up in the right person’s hands to resolve the issue quickly and accurately. Sounds simple enough, right? But it’s not. In very large companies, there could be 100s of categories to choose from and 10 or 20 that could apply broadly to that particular incident. Some categories may have been created years ago and are no longer used. The list goes on.
Today, as Samel and Miao point out in their paper, the focus of real-world applications of AI is less about building a high-quality software system and more about building a high-quality dataset. Samel and Miao’s active deep denoising (ADD) approach outlines a process for building a high-quality dataset. It uses DNNs to surface the noise, or the inconsistencies, in a dataset. It then loops in a domain expert to weigh in on the inconsistencies (in the incident management case described earlier, this might be a IT Incident Management process owner) and help guide the model on what the right answer is. The approach is designed to be increasingly automated over time, so the human expert is only looped when a critical learning choice has to be made. The result? Samel and Miao show that ADD applied to one enterprise example can effectively reduce one third of the prediction errors.
Imagine if the employee service you received at your job was 33 percent more accurate. There’d be less time for frustration and more time to work on what matters. That’s the future of work we’re building. That’s what gets us up in the morning and keeps us up at night. Want to help us make the future of work a reality? Join our team: http://bit.ly/AstoundCareers
Attending KDD and want to meet up? We’ll be presenting Tuesday, August 21 during the following times:
- Applied Data Science Poster Blitz Sessions: Tuesday, August 21st 4:00-5:30pm
- Poster Reception: Tuesday, August 21st at 7:00-9:30pm