Github Copilot: Public release and Privacy Concerns
So what is Github Pilot?
GitHub Copilot is an AI pair programmer that helps you write code faster and with less work. It draws context from comments and code to suggest individual lines and whole functions instantly. GitHub Copilot is powered by Codex, a generative pretrained language model created by OpenAI. It is available as an extension for Visual Studio Code, Visual Studio, Neovim, and the JetBrains suite of integrated development environments (IDEs).
Basically, the GOAT of code autocomplete
How to get access to GitHub copilot?
It's currently free for students who have the education student pack from Github. It's also free for users who have contributed to popular open-source frameworks.
Lastly, you can get it through their 60-day free trial that transitions to either $10/month or 100$/year
Try it here
But is it good?
This depends on many factors, such as the language you code in and the complexity of your projects. Since Copilot was trained on open source projects, where there isn't an even distribution of programming languages, it means it's likely to be really good at specific languages and certain common tasks. (reversing array, form validation code etc)
From a language perspective, expect it to perform really well in popular languages like Python, JavaScript, TypeScript, Ruby, Java, and Go. You should definitely give it ago(using the trial) to fully asses if its worth it, as this may very subjective and depend on your usecase
Will Copilot steal my code
This was a huge concern for me during the technical preview, as their privacy policy, was a bit vague and not so clear.
There are two concerns here:
1. Will Copilot be trained on my snippets and surrounding code?
Here surrounding code stands for the model prompt, the information needed by copilot to give a suggestion, might include other files in your IDE
From my understanding, No*, though it's not completely clear
Here is a quote from Github, regarding that
Will my private code be shared with other users?
No. We use data, including information about which suggestions users accept or reject, to improve the model. We follow responsible practices in accordance with our Privacy Statement to ensure that your code snippets will not be used as suggested code for other users of GitHub Copilot.
the language here isn't very clear, they may mean that:
- We use your code snippets to humanly see which suggestions are wrong, and then improve the model by training it in that specific area,
Example: Copilot's suggestion acceptance on Keras framework is low, so lets train copilot on publicly available Keras code and not the suggestions/snippets it just failed on
OR
- we feed-back your snippets to copilot to train it, but we have "responsible practices" in place to prevent it from spitting back your code, the problem here is obviously you can never trust how good those measures are.
2. Will engineers at Github/Copilot be able to see the content of my code snippets
From the above statement, it seems so.
But fear not there is a setting to disable this.
If you disable this I believe that the only data collected for product improvement should be basic telemetry data, such as usage statistics