Skip to content

Latest commit

 

History

History
47 lines (28 loc) · 3.67 KB

File metadata and controls

47 lines (28 loc) · 3.67 KB

DataSet

数据竞赛

Machine Learning Datasets

  • NLP

    Task Type train test full name remarks
    MNLI Natural language inference 392k - Multi-Genre Natural Language Inference 文本蕴含任务,在给定前提下,需要判断假设是否成立
    QQP Semantic textual similarity/Paraphrase identification 363k - Quora Question Pairs 两个句子是否语义一致,二分类
    QNLI 108k - Question Natural Language Inference 前身是SQuAD 1.0数据集,给定一个问句,判断给定文本中是否包含该问句的正确答案,二分类
    SST-2 Sentiment analysis 67k - - -
    CoLA 8.5k - The Corpus of Linguistic Acceptability 对一个给定句子,判定其是否语法正确,二分类任务
    STS-B Semantic textual similarity 5.7k 1.4k Semantic Textual Similarity Benchmark 用1到5的分数来表征两个句子的语义相似性,回归/5分类
    MRPC classify sentences as paraphrases or not paraphrases 4.1k 1.7k Microsoft Research Paraphrase Corpus 判断两个给定句子,是否具有相同的语义,二分类任务
    RTE Natural language inference 2.5k - Recognizing Textual Entailment 判断两个句子是否能够推断或对齐
    WNLI Natural language inference - - Winograd Natural Language Inference 文本蕴含任务
    • SentEval evaluation toolkit for sentence embeddings, and some corpors
    • ChineseSTS 中文文本相似
    • DMQA 英文阅读理解
  • CV

  • GNN

  • Chatbot