OpenAI's GPT-4 using YouTube transcripts to improve its language model
By MYBRANDBOOK
More than a million hours of YouTube video transcriptions have been used by OpenAI to improve GPT-4, their sophisticated language model. Even with the knowledge of possible legal repercussions, OpenAI defended its conduct by citing fair usage as a means of improving the worldview of its model. OpenAI's President, Greg Brockman, was directly involved in the selection of training videos.
OpenAI uses "numerous sources including publicly available data and partnerships for non-public data."The company is also contemplating the creation of its own synthetic data. The company had trained its models using data such as computer code from Github, chess move databases, as well as educational content from Quizlet. After other resources were depleted, it considered using transcriptions from YouTube videos, podcasts, and audiobooks.
The report also mentions that OpenAI had exhausted useful data sources by 2021. OpenAI is consistently sourcing data to improve its AI models.
Google's representative, Matt Bryant, stated that the company has "seen unconfirmed reports" about OpenAI's use of YouTube transcripts. He said that Google's guidelines prohibit unauthorized scraping or downloading of YouTube content.
YouTube CEO Neal Mohan made similar comments this week regarding OpenAI's potential use of YouTube data to train its Sora video-generating model. Bryant also highlighted that Google enforces "technical and legal measures" to prevent unauthorized usage when there's a clear legal or technical justification.
Google Pay has added "Open Wallet" shortcut
With the introduction of the "Open Wallet" shortcut, Google Pay has impro...
TRAI targets to finalise National Broadcast Policy by May-end
The Telecom Regulatory Authority of India will finalise the National Broa...
TAC Security becomes Cyber Security Assessor for the App Defen
The cybersecurity company, TAC Security has been selected as a key Cyber ...
InterGlobe’s Rahul Bhatia and C.P. Gurnani together announce
In a move that is set to transform the AI landscape, Rahul Bhatia, Group M...
TEJAS NETWORKS INDIA PVT. LTD.
POLYCAB INDIA PVT. LTD.
TALLY SOLUTIONS PVT. LTD.
CENTRE FOR DEVELOPMENT OF TELEMATICS
Technology Icons Of India 2023: Madhabi Puri Buch
Madhabi Puri Buch is the chairperson of the securities regulatory body...
Technology Icons Of India 2023: Sunil Gupta
Sunil Gupta is the Co-founder, Managing Partner & CEO of Yotta Infrast...
Technology Icons Of India 2023: Kumar Mangalam Birla
Aditya Birla Group chairman Kumar Mangalam Birla’s return to Vodafon...
NIC bridging the digital divide and supporting government in eGovernance
The National Informatics Centre (NIC) is an Indian government departme...
INDIANOIL helps reach precious petroleum fuels to every nook and corner of the country
IndianOil, a diversified, integrated energy major with presence in alm...
STPI encouraging software exports from India
Software Technology Parks of India (STPI) is an S&T organization under...
SONATA INFORMATION TECHNOLOGY LIMITED
Sonata Software Limited is a leading Modernization engineering company...
RAH INFOTECH
RAH Infotech is India’s fastest growing technology value added dist...
INGRAM MICRO INDIA PVT. LTD.
Ingram Micro India, a large national distributor offers a comprehensiv...