- The change expands the list of data sources the company uses, including even those platforms that are not Google’s services.
The new update from Google says that in addition to the collection of data that is publicly available online for AI data scraping, the company will also collect business information on websites for display on Google Services. This is a marked departure from conventional practices where companies would extract data from only their own platforms and services.
The update from Google came soon after OpenAI was subjected to a class-action lawsuit in California regarding scraping private data from the internet. OpenAI allegedly used data from social media, blogs, and other public websites to train ChatGPT without the consent of the users.
The growing attention on the issue of web scraping has gained importance, with platforms such as Reddit and Twitter being particularly vocal about these concerns. Twitter has already set limits on the number of tweets that can be viewed by an account each day. Both Reddit and Twitter have eliminated free access to their APIs, even though such moves have proven controversial.
With Google and OpenAI setting new precedents for the use of data available online, internet users not only have to consider who can see the data but how such data can be used. In addition, the unregulated use of publicly available data also creates concerns about the use of copyrighted materials and other forms of intellectual property.
With Google’s business primarily focused on collecting user data and its sale to advertisers, data scraping could arguably be considered a core aspect of its business practices.
Is your organization taking any measures to protect website content from AI chatbots? Let us know on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We’d love to hear from you!
Image source: Shutterstock