se-stackoverflow January 28, 2022

Using synthetic data to power machine learning while protecting user privacy

On this episode, we talk to John Myers, CTO and cofounder of Gretel, a company that provides synthetic data for training machine learning models without exposing any of their customers personally identifiable information.

On this episode, we talk to John Myers, CTO and cofounder of Gretel. The company provides users with synthetic data that can be used in machine learning models, generating results comparable to the real data, but without exposing personally identifiable information (PII). We talk about how data outliers can identify individuals, demo data that feels real but isn’t, and skewing patterns by skewing dates.

Episode notes:

Gretel uses machine learning to create statistically similar data that contains no personally identifiable information (PII). 

Think your commits are anonymous? Think again: DefCon researchers figured out how to de-anonymize code creators by their style

We published an article about the importance of including privacy in your SDLC: Privacy is an afterthought in the software lifecycle. That needs to change.

Our Lifeboat badge shoutout goes to 1983 (the year Ben was born) for their answer to Why can I not use `new` with an arrow function in JavaScript/ES6?


Tags: , ,


community December 30, 2021

How often do people actually copy and paste from Stack Overflow? Now we know.

April Fool's may be over, but once we set up a system to react every time someone typed Command+C, we realized there was also an opportunity to learn about how people use our site. Here’s what we found.
code-for-a-living March 3, 2022

Stop aggregating away the signal in your data

By aggregating our data in an effort to simplify it, we lose the signal and the context we need to make sense of what we’re seeing.