Data Ethics and a late NDA
Data ethics is an important thing, and the general public is caring it; but strangely, most data scientists and engineers are either not actively think about it, or not interested in it at all. The so-called “Engineering Culture” focuses on build things that solve problems, and engineers LOVE this culture. They not only like build ideas that solve problems, but they also want to ignore anything that is not directed related to their goal, such as social networking, considering their potential users or checking whether their product is making sense ethically. I am not blaming the engineering culture: I like that culture and would want to work under such culture. I am saying that engineers and data scientists should not use that concept as an excuse to avoid necessary considerations outside engineering. At Columbia, most of the courses are theoretical, and you only have few exposures in some classes. Data Ethics served as the last course along with a capstone project that does well on preparing students with the sense that data ethics in an indispensable part of data science. There are lots of examples that data ethics is missing or less considered during research, in a data product, or even in the whole company. Sometimes considering data ethics might be hard for engineers, and you may face confrontation with your boss or co-workers. However, this should be part of your daily work, and it is your responsibility to consider it. I hope in the future, more engineers and data scientists will be educated on data ethics in a formal way, not by the headline of the news.
After the class, I was notified by the advisor that the project shareholder who promised to give us an important dataset months ago finally proceeded their internal approval and sent their dataset. However, there was one thing they forgot to do: let us sign an NDA first. It is quite common that the decision-making process in such a big company would take a while, but it is also quite ironic that after all sorts of the complicated internal process they made, they forgot to do a simple thing in the last step, which could be a disaster. Luckily, we are all professionals, and we agreed to sign on that late NDA without any concerns. It is an alert that being successful is hard: you need to do almost everything right; failed, however, is quite simple. You are done even if you were perfect but only made a small mistake. My lesson learned here is that don’t ignore EVERY single thing, especially when you are 99% finished. There is some funny programmer humor that when you prepare everything correctly and work hard, the product you made will have some fatal error right after It went live. On the other hand, when you do everything in a rush, and you don’t have time for debugging, unit testing, or even writing proper documentation, your product will be magically working like a charm. It is not always the case since things would be usually worse when you come under-prepared, but it is true that even if you think you did everything correctly, you were possibly making some stupid errors. One way to solve it is to find an experienced mentor that could guide you through every step of development. If there is no mentor, you need to come up with a rigorous and robust strategy to examine your outcome, like the uniting test, etc. For data scientists, there are no one-size-fits-all solutions, so you need to get some experiences. In this case, even if some Kaggle experience could be beneficial.