Megatron-Turing NLG: a neural network with 530 billion parameters for working with texts in NL | Gadgets News

The day before yesterday, Nvidia unveiled the Megatron-Turing NLG artificial neural network (ANN) developed in collaboration with Microsoft. Its peculiarity lies in a huge number of parameters – 530 billion. For comparison, as already told by Gadgets News, the residual ResNet neural network created in 2015 (which reduced the percentage of ImageNet image recognition errors to 3.57%) consisted of 60 million parameters. However, the record of Google’s ANN is still not broken – it has 1.6 trillion parameters at all – but little is known about it.

Megatron-Turing NLG has a transformer-type architecture, which has been actively used in natural language text processing since 2017 (for which, before that, recurrent neural networks were mainly used). V press release Nvidia has marked the following tasks for the Megatron-Turing NLG:

  • Predicting completion [текста]
  • Reading comprehension
  • Logical thinking
  • Inference in natural language
  • Elimination of semantic ambiguity.

Practical examples include adding program code and writing short annotations to articles or even entire books. In the hands of cybercriminals, such Megatron-Turing NLG systems can become a tool for creating fake news and other fakes – such cases are known for the example of GPT-3. But even with a conscientious approach, they are not immune from learning from texts containing errors, stereotypes, etc.

The neural network was trained on the Selene supercomputer, which is estimated at over $ 85 million. It is equipped with 560 Nvidia DGX-A100 servers, each of which has eight Nvidia A100 graphics accelerators. Selene’s peak performance is 79.215 TFLOPS (FP64). As training material, 15 English-language text sets (including Wikipedia) with a total volume of 270 billion tokens (characters or their combinations) were used.

In the light of the latest news about neuromorphic processors, the comparison of the latest software and hardware neural networks with the human brain begs again. Let me remind you that the tiny Intel Loihi 2 chip contains 1 million artificial neurons in an area of ​​31 mm.2… Therefore, in theory, the Cerebras WSE-2 superchip (46225 mm2), having the same Intel 4 lithography (formerly 7 nm), would have accommodated almost one and a half billion artificial neurons. Accordingly, a data center of ten such superchips in terms of the number of neurons is nominally equivalent to the human neocortex. But in reality, of course, this is far from the case – a biological neuron is a very complex device, the analogue of which can be a separate ANN. If it will consist, for example, of 1000 artificial neurons, then instead of 10 Cerebras WSE-2 superchips, 10 thousand will be required. …

As for software neural networks, the parameters in them are the weights of the inputs of neurons. Their analog in the living brain are synapses, the number of which is estimated in hundreds of trillions – thousands of times more than in Megatron-Turing NLG.

Thus, for the artificial reproduction of the human brain, in addition to understanding how it is arranged and working, resources will be required a thousand times more than can be realized today. At the same time, one cannot fail to note the speed with which all these technologies are developing. If (without any particular reason) we apply Moore’s law to neuromorphic processors and ANNs and assume that the size of hardware and software neural networks will double every two years, then the prerequisites for artificial reconstruction of the human brain will arise in about twenty years.

We wish to say thanks to the writer of this post for this amazing material

Megatron-Turing NLG: a neural network with 530 billion parameters for working with texts in NL | Gadgets News