If, as science fiction writer Arthur C. Clarke posited, “any sufficiently advanced technology is indistinguishable from magic,” then it sometimes feels like data science is viewed as pulling rabbits out of hats. This narrative is, of course, misguided. It’s the perception of what I call “data magic,” where people believe data can be pumped into one end of the “data-science machine” and the perfect widget (the solution to everyone’s problems) will emerge on the other side.
To a considerable degree, this is because the vast majority of people don’t understand the workings of data science, and when you get into the more advanced areas, such as deep learning, even many data scientists will acknowledge they don’t understand the many levels of complexity. But if you’re a data scientist, you (mostly) know enough to know when you’re out of your depth. Outside the data science community, however, that’s not always the case. And that’s understandable.
In part, it’s our own fault. It’s a tough club; historically difficult for “outsiders” to penetrate, much less understand. The languages of data science have been held close to the chest. Just as the Roman Catholic Church selected Ecclesiastical Latin as the core communication language to control messaging, particularly through the Middle Ages and into the early-Modern period, we data scientists could be accused of similar actions—though obviously not on the same scale nor having such a direct impact on entire populations. But, just as the Reformations of the 16th century led to unshackling language, making the tenets of the various Christian churches more available to the masses, data science must now further extend its vernacular.
A shared language would allow us to move beyond believing that data scientists have mystical capabilities to solve any problem, by running data through an AI environment to produce the desired results, as if by magic. It would help people understand that data science isn’t a magical panacea.
In fact, if you truly want advanced data science, one of the worst things to do is assign a data scientist to solve isolated or ad hoc problems, as this will silo communication by keeping data science in the back room.
Rather, the best way to proliferate data science is to expose enterprise-level problems, understanding that, if done right, data science is a team sport. Having multi-disciplinary teams dedicated to products or customers yields superior business results and develops cross-functional understanding. A cohort including a commercial associate, a product manager, an engineer, a data scientist and representatives from other key functional organizations should be locked in an uninterrupted room for a meeting of the minds focused on the biggest needs and opportunities. This is where the true magic happens.
Still, a traveler on this journey should be cognizant of the warning signs. If, in working with clients or other third parties in the spirit of collaboration, you begin searching for solutions explainable to absolutely everyone, take pause. Just as a magician does not limit their performance to elementary tricks, the audience understands that data scientists should not default to easily explainable solutions. Deliberately watering down the process may have the collateral effect of delivering a less-than-optimal solution for the problem. It’s a balance.
That balance relies on companies trusting in the capabilities of their data scientists. Trusting data scientists will share our language as much as we can, but not dilute solutions when things turn too technical, and that we’ll always remain true to our discipline This is the kind of trust and balance enabling technologically advanced companies to get at their respective truth sets.
This article was originally published on Medium.