ChatGPT(Generative Pre-trained Transformer)[1]is achatbotlaunched byOpenAIin November2022. It is built on top of OpenAI'sGPT-3family of largelanguage models, and is fine-tuned (anapproach totransfer learning[2]) with bothsupervisedandreinforcement learningtechniques.ChatGPT was launched as a prototype on November 30, 2022, and quickly garnered attention for itsdetailed responses and articulate answers across many domains of knowledge. Its uneven factualaccuracy was identified as a significant drawback.[3]Following the release of ChatGPT, OpenAI wasvalued at $29 billion.[4]TrainingPioneer Building, San Francisco, home of OpenAI HQOpenAI CEO Sam AltmanChatGPT was fine-tuned on top of GPT-3.5 usingsupervised learningas well asreinforcementlearning.[5]Both approaches used human trainers to improve the model's performance. In the caseof supervised learning, the model was provided with conversations in which the trainers played bothsides: the user and theAIassistant. In the reinforcement step, human trainers first ranked responsesthat the model had created in a previous conversation. These rankings were used to create 'rewardmodels' that the model was further fine-tuned on using several iterations ofProximal PolicyOptimization(PPO).[6][7]Proximal Policy Optimization algorithms present a cost-effective benefit totrust region policy optimizationalgorithms; they negate many of the computationally expensiveoperations with faster performance.[8][9]The models were trained in collaboration withMicrosoftontheirAzuresupercomputing infrastructure.
In addition, OpenAI continues to gather data from ChatGPT users that could be used to further trainand fine-tune ChatGPT. Users are allowed to upvote or downvote the responses they receive fromChatGPT; upon upvoting or downvoting, they can also fill out a text field with additional feedback.[10][11][12]Features and limitationsWhile the core function of a chatbot is to mimic a human conversationalist, ChatGPT is versatile,including the ability to write and debug computer programs; to compose music, teleplays, fairy tales,and student essays; to answer test questions (sometimes, depending on the test, at a level abovethe average human test-taker);[13]to write poetry and song lyrics;[14]to emulate a Linux system; tosimulate an entire chat room; to play games like tic-tac-toe; and to simulate an ATM.[15]ChatGPT'straining data includesman pagesand information aboutInternet phenomenaand programminglanguages, such asbulletin board systemsand thePythonprogramming language.[15]In comparison to its predecessor, InstructGPT, ChatGPT attempts to reduce harmful and deceitfulresponses.[16]In one example, while InstructGPT accepts the premise of the prompt "Tell me aboutwhenChristopher Columbuscame to the US in 2015" as being truthful, ChatGPT acknowledges thecounterfactual nature of the question and frames its answer as a hypothetical consideration of whatmight happen if Columbus came to the U.S. in 2015, using information about Columbus'voyagesand facts about the modern world – including modern perceptions of Columbus' actions.
Upload your study docs or become a
Course Hero member to access this document
Upload your study docs or become a
Course Hero member to access this document
End of preview. Want to read all 6 pages?
Upload your study docs or become a
Course Hero member to access this document