代理合作 | 用户实例 | 社会责任 | 法律声明 | 企业邮箱 | English
Without the need for large-scale pre training, Tsinghua proposes an efficient NLP learning framework TLM: starting from scratch to compare the performance of pre trained language models

Recently, researchers from Tsinghua University have developed an NLP learning framework. Unlike the current popular NLP framework, which requires a large amount of training and task fine-tuning paradigm, this framework does not require large-scale deep learning training. Compared with other popular training frameworks, this framework has higher efficiency, and even in the use of multiple types of NLP tasks, its accuracy exceeds that of general pre training frameworks, This research result raises doubts about the large-scale pre training models and methods: how much contribution does large-scale training have to the next task, and do we really need a lot of training to achieve the best results? Researchers refer to this method as TLM

Natural Language Processing Technology

TLM and PLM. Overall, PLM learns as much task independent knowledge as possible at a very high cost, while TLM learns relevant knowledge for each task at a very low cost. The comparison between TLM and PLM has the following characteristics.

1. Promote the fairness and democratization of NLP research

Pre training itself heavily relies on a large amount of computing resources, which limits the focus of most NLP researchers on fine-tuning algorithms. However, the upper limit of fine-tuning performance is largely constrained by the performance of the pre trained model. TLM allows most researchers to freely explore model architecture, loss functions, algorithms, and other aspects based on state-of-the-art solutions at a lower cost and higher efficiency.

2. Efficiency

TLM significantly outperforms PLM in terms of average FLOPs consumption per task. When we have a few target tasks to solve (such as researchers wanting to study a small number of datasets), TLM can be very efficient; However, when it comes to solving a large number of tasks at once (such as building an NLP platform in the industry to provide similar services to multiple parties), PLM still has advantages.

3. Flexibility

TLM is task driven, so it can give researchers greater freedom to customize strategies for labeling, sequence length, data representation, hyperparameter adjustment, and so on, thereby achieving the goal of improving performance and efficiency.

4. Generality

PLM learns task independent general representations and can be used for small and zero sample learning, while TLM sacrifices general representations to some extent for efficiency by learning task related representations. In this sense, TLM needs to be further improved in terms of versatility. In addition, PLM and TLM can also be combined to achieve a better balance between universality and efficiency.

In order to gain a deeper understanding of the working mechanism of TLM, researchers visualized the attention scores output by each attention head in the model. It can be observed that the attention pattern of TLM includes more "diagonal" patterns (red box in Figure 3), which means that most tokens assign attention scores to their neighboring tokens,


联系我们
400-013-7168
业务一部
18565655050
业务二部
17779719123
公司座机
0797-8161123
添加客服微信