Worried About Sending Your Data to a Chatbot? 'PrivateGPT' Is Here

Privacy has become a major concern with online AI models that are hooked up to corporate servers.

Companies like Samsung, JPMorgan, Apple, and Amazon have banned employees from using ChatGPT out of fear that their confidential information will be leaked. ChatGPT, which is owned by OpenAI, is continuously trained on all the prompts and messages that users input. But now, there is an alternative for anybody creeped out by the idea of divulging personal information to an online chatbot: PrivateGPT is an open-source model that allows users to ask questions based on their own documents without an internet connection.  


Created by a developer named Iván Martínez Toro, PrivateGPT runs locally on your home device and requires you to first download an open-source Large Language Model (LLM) called gpt4all. Then, users are instructed to put all their relevant files into a directory for the model to ingest all the data. After the LLM is trained, users can ask any questions to the model and it will answer using the documents provided as context. PrivateGPT can ingest over 58,000 words and currently needs significant local computing resources—specifically, a good CPU—to set up. 

“PrivateGPT at its current state is a proof-of-concept (POC), a demo that proves the feasibility of creating a fully local version of a ChatGPT-like assistant that can ingest documents and answer questions about them without any data leaving the computer (it can even run offline),” Toro told Motherboard. “It is easy to imagine the potential of turning this POC into an actual product that makes it possible for companies to get the productivity boost of having access to their own personalized, secure, and private ChatGPT.”

Toro said that he created this app after seeing how valuable ChatGPT is in the workplace. “People and Legal departments at my current company had had access to ChatGPT for a couple of weeks and we eventually ran out of credits; they both contacted me that very moment asking to get their access back because they didn't want to go back to doing the work without it,” he told Motherboard. Toro also added that his colleagues’ experiences were made even more difficult when the legal department wanted to summarize a private legal document using ChatGPT, but couldn’t due to privacy risks. 

Privacy has become a major concern with online AI models that are hooked up to corporate servers. One major data leak through LLM chats occurred in April when three Samsung employees in Korea accidentally leaked sensitive information to ChatGPT. One employee shared confidential source code to check for errors, another asked ChatGPT to optimize their code, and a third shared a recording of a meeting and asked the chatbot to convert it into notes. OpenAI’s data policy says that it uses non-API consumer data to improve its models, however, you can switch this off in ChatGPT’s settings. Bloomberg reported that following this incident, the publication banned the use of generative AI and is trying to create its own proprietary model to prevent this from happening again.  

Aside from company-wide risks, individuals have been hesitant to use the chatbot for fear of leaking personal information. Italy decided to temporarily ban ChatGPT for around a month, citing concerns about the service using people’s personal information which is against the EU’s General Data Protection Regulation (GDPR), a data privacy law. ChatGPT was later unbanned after OpenAI fulfilled the conditions that the Italian data protection authority requested, which included presenting users with transparent data usage information and providing the option for users to rectify misinformation about their personal data, or delete it altogether.