Using synthetic data to power machine learning while protecting user privacy

On this episode, we talk to John Myers, CTO and cofounder of Gretel, a company that provides synthetic data for training machine learning models without exposing any of their customers personally identifiable information.

On this episode, we talk to John Myers, CTO and cofounder of Gretel. The company provides users with synthetic data that can be used in machine learning models, generating results comparable to the real data, but without exposing personally identifiable information (PII). We talk about how data outliers can identify individuals, demo data that feels real but isn’t, and skewing patterns by skewing dates.

Episode notes:

Gretel uses machine learning to create statistically similar data that contains no personally identifiable information (PII).

Think your commits are anonymous? Think again: DefCon researchers figured out how to de-anonymize code creators by their style.

We published an article about the importance of including privacy in your SDLC: Privacy is an afterthought in the software lifecycle. That needs to change.

Our Lifeboat badge shoutout goes to 1983 (the year Ben was born) for their answer to Why can I not use `new` with an arrow function in JavaScript/ES6?

TRANSCRIPT

Add to the discussion