Course Hero Logo

See neural language models language models vs

Course Hero uses AI to attempt to automatically extract content from documents to surface to you and others so you can study better, e.g., in search results, to enrich docs, and more. This preview shows page 730 - 733 out of 747 pages.

SeeNeural language modelsLanguage models vs. computational complexityreduction, 344–346Late fusion in multimodal learning, 461–462LayersAlexNet, 172alignment vectors, 401–403
INDEX676Layers (Continued)autoencoders, 453–454backpropagation, 81CBOW model, 346–347cheat sheet, 659combining, 181–185, 245–246continuous skip-gram model, 348–349convolutional neural networks, 175–179digit classification, 102, 121ELMo, 573Fast R-CNN, 544–545GloVe, 356–361GoogLeNet, 211, 213house prices example, 162–163image captioning, 424, 433–436LSTM, 278–280Mask R-CNN, 560–561multiple perceptrons, 19neural language models, 344–346neural machine translation, 379–384optimized continuous skip-gram model, 350output units, 154–155regularization, 167–168ResNet, 215–222RNNs, 242–245self-attention, 408semantic segmentation, 549–550, 554sequence-to-sequence learning, 366–368transfer learning, 228Transformer, 411unrolling, 246–247vanishing gradients, 136–141VGGNet, 206–209word embeddings, 316–319word2vec, 352–355LDA (linear discriminant analysis), 533leaky ReLUfunction, 139–140Learning algorithmanalytic motivation, 49–50geometric description, 51initialization statements, 8–9intuitive explanation, 37–41linear regression as, 519–523multiclass classification, 101perceptrons, 7–15ResNet, 216–217training loops, 10weight decay, 166Learning curve plots, 481–482Learning parameter tweaking, 143–146Learning problem solutions with gradientdescent, 44–48Learning process with saturated neurons, 125Learning rategradient descent, 46learning algorithm, 8Leibniz notation, 68LeNet, 171, 201__len__()method, 431Linear algebracheat sheet, 660perceptron implementation, 20–21Linear classificationplots, 53XOR, 528–530Linear discriminant analysis (LDA), 533Linear output units, 154–155, 159–160Linear regressioncoefficients, 523–525curvature modeling, 522–523as machine learning algorithm, 519–523multivariate, 521–522R-CNN, 543univariate, 520–521Linear separability, 15–16, 32, 56load_datafunction, 455load_imgfunction, 224LoadingCIFAR-10 dataset, 191–192, 488digit classification datasets, 119–120GloVe embeddings, 356–357MNIST dataset, 94, 119, 465Logistic functionbackpropagation, 269gradient computation, 70gradient descent, 61–67Logistic output units, 154–155Logistic regressionclassification with, 525–527support vector machines, 532–533XOR classification, 528–530Logistic sigmoid functionactivation function, 136backpropagation, 269–270binary classification problems, 155–156classification with, 526–527digit classification, 121LSTM, 273, 275saturated output neurons, 130–133Logistic sigmoid neurons, 615Logistic sigmoid units, 453Logit function, 155Long short-term memory (LSTM), 265–266activation functions, 277–278alternative view, 280–281cell networks, 278–280
INDEX677character-based embedding, 572concluding remarks, 282–283ELMo, 572–574gradient health, 267–272GRUs, 613–615highway networks, 282image captioning, 433–434introduction, 272–277neural language models, 322neural machine translation, 379–384programming example, 291–298PyTorch vs. TensorFlow, 635sequence-to-sequence learning,366–368skip connections, 282Longer-term text prediction, 287–289Loss functionsautoencoders, 451, 457backpropagation, 269convolutional layers, 200digit classification, 122–124GPT, 581gradient computation, 70–71logistic regression, 527multiclass classification, 103–104, 158multitask learning, 471neural machine translation, 383–384output units, 154–155PyTorch vs. TensorFlow, 635saturated neurons, 130–136tweaking, 144–145weight decay, 166LSTM.SeeLong short-term memory (LSTM)MMachine learning algorithm, linear regressionas, 519–523MAE (mean absolute error)autoencoders, 457–458book sales forecasting problem, 259–260Magnitude of vectors, 44Many-to-many networks in text autocompletion,301Many-to-one networks in text autocompletion,301Mask R-CNN, 559–561MASK tokens in BERT, 585Masked language model task in BERT, 582–583Masked self-attention mechanism in GPT, 578

Upload your study docs or become a

Course Hero member to access this document

Upload your study docs or become a

Course Hero member to access this document

End of preview. Want to read all 747 pages?

Upload your study docs or become a

Course Hero member to access this document

Term
Spring
Professor
N/A
Tags
Machine Learning, Artificial neural network, neural network, language models, Neural Language Models

Newly uploaded documents

Show More

Newly uploaded documents

Show More

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture

  • Left Quote Icon

    Student Picture