Cramming BERT, 2022-12-28
2022-12-28
1 ã¹ã±ãŒã«ã¢ãããšã¹ã±ãŒã«ããŠã³
ãã©ã³ã¹ãã©ãŒããŒã¢ãŒããã¯ãã£ãçšããæ©æ¢°åŠç¿ã¢ãã«ã®å€§èŠæš¡ãªåŠç¿ã¯ãèšèªç解ãèªç¶èšèªçæãå«ãèªç¶èšèªåŠçã®å€ãã®ãµãåéã§ç»æçãªæ¹åããããããŠããïŒVaswani et al.2017; Dosovitskiy et al.2021; Radford et al.2019 ïŒã çŸåšã§ã¯åãå ¥ããããŠããïŒãããæŽå²çã«ã¯é©ãã¹ãïŒãããã®ã·ã¹ãã ã®äž»èŠãªåäœã¯ã確å®ã«ã¹ã±ãŒã«ããããšãã€ãŸããã¢ãã«ã®ãã©ã¡ãŒã¿æ°ãããŒã¿éãå¢å ãããšãã«ç¶ç¶çã«æ§èœãåäžãããããšã§ããã Kaplanã(2020)ãç 究ããããã«ããããã®æ§èœåäžã¯æ§ã ãªã¹ãä¹åã«ãã£ãŠããèšè¿°ããããããã¯ãã¹ã±ãŒãªã³ã°ãæ§èœåäžã®éµã§ãããšããæ¯é çãªãã©ãã€ã ãèšå®ãããã®ã§ãïŒSutton, 2019ïŒã ã¹ã±ãŒã«ã®åã¯ã極ããŠå€§èŠæš¡ãªã¢ãã«ã®çç£ç«¶äºãåŒãèµ·ããããã®çµæãèšèªã¢ãã«ãèšç·Žããèœåããããšæããç 究è ãå®å家ãã»ãšãã©ããªãç°å¢ãäœãåºããŸããã èªç¶èšèªç解ã«ãããå€ãã®å®çšçãªã¢ããªã±ãŒã·ã§ã³ã®åºç€ãšãªãå€æåšãšãªã£ããªãªãžãã«ã®BERTã¢ãã«DevlinãïŒ2019ïŒã¯ããã§ã«èšç·Žã«ããªãã®éã®èšç®ãå¿ èŠãšããŠããŸããã ããããLiuãïŒ2019ïŒã®åçŸãšæ¹è¯ã¯ãèšç®ã®ã¬ãã«ãæ¡éãã«åŒãäžããããšã§ããã®æ§èœãããã«åäžãããŸããã ãã®ãããªäºååŠç¿æžã¿ã®ãã§ãã¯ãã€ã³ããæ§ã ãªäžæµçšéã«æ®åããã«ã€ãïŒWolf et al., 2020ïŒãæ倧ã®èšèªã¢ãã«ã®ç«¶äºãç£æ¥ã©ãã®çŠç¹ãšãªã£ãã®ã§ãã ããã«ãããzettaFLOPã¹ã±ãŒã«ïŒRaffel et al., 2020; Yang et al., 2020; Zaheer et al., 2021ïŒãåŸã«ã¯éåžžã«å€§ããªyottaFLOPã¹ã±ãŒã«ïŒBrown et al., 2020; Black et al., 2022; Chowdhery et al., 2022; Rae et al., 2022ïŒã§èšç®éãç ç²ã«ãã€ã€ã äºååŠç¿æžã¿èšèªã¢ãã«ã®æ§èœåäžãããã¬ãŒãã³ã°å®è¡ãè¡ãããããã«ãªã£ãã®ã§ãã
ç§ãã¡ã®ç®æšã¯ããã®åŸåãèŠããèšèªã¢ãã«ã®åŠç¿ãã©ã®ããã«ã¹ã±ãŒã«ããŠã³ããã®ãæé©ãããŸããã®éã«ã©ã®ãããªãã¬ãŒããªããçãããã調æ»ããããšã§ãã 1å°ã®GPUã§1æ¥ãããŠãŒãããåŠç¿ããå Žåãæ§ãããªç 究è ã§ãã©ã®çšåºŠã®ããŠã³ã¹ããªãŒã æ§èœãéæã§ããã®ã§ããããïŒ ãã®ãããªæ§ãããªãªãœãŒã¹ã§ãBERT ã®æ§èœã¬ãã«ãŸã§èšèªã¢ãã«ããã¬ãŒãã³ã°ã§ããããšã¯ãããã€ãã®èå³æ·±ãæå³ãæã¡ãŸãã 1ã€ã¯ãããã¹ã±ãŒã«ããŠã³ããã¢ãã«ã®äºååŠç¿ãã倧èŠæš¡ãªèšç®æ©ã«ããäºååŠç¿ã®å®è¡å¯èœãªã¢ããã°ã§ãããªãã°ã倧èŠæš¡ã¢ãã«ã§ã¯çŸåšå®çŸãå°é£ãªããããªãåŠè¡ç調æ»ã®ãã¹ããéãããšã«ãªããŸãã äŸãã°ãæ¢åã®äºååŠç¿ã¿ã¹ã¯ãšæ°ããäºååŠç¿ã¿ã¹ã¯ã®éãã«é¢ããç 究課é¡ãã¢ãã«ã®äºæž¬å€ãããŒã¿ãã€ã³ãã«ãã¬ãŒã¹ããïŒIlyas et al., 2022ïŒãã¡ã³ããŒã·ããæšè«ïŒCarlini et al., 2022ïŒãããŒã¿ãã€ãºãã³ã°ïŒGeiping et al., 2021ïŒãªã©ã®ã»ãã¥ãªãã£ã«é¢ãã課é¡ã åŠç¿äžã«çããå®å®æ§ãäžè¬åãªã©ã®ãããã¯ã«å¯Ÿããå¹ åºãå®èšŒç調æ»ïŒNagarajan & Kolter, 2019; Jiang et al., 2019ïŒããªã©ã ã åæã«ãæ³çèŠä»¶ã«ãããåºæãäžç¢ºããªå ¬éããŒã¿ã§åŠç¿ããã¢ãã«ã蚱容ããããã©ãããäžæ確ã§ãããå®åè ãå°éçãŸãã¯ä¿¡é Œã§ããããŒã¿ãœãŒã¹ãçšããŠèšèªã¢ãã«ãåãã¬ãŒãã³ã°ããããšã«é¢å¿ãããç¶æ³ãæ³åã§ããïŒWilka et al., 2017; Gold & Latonero, 2017ïŒã ããã«ãåã«ã¹ã±ãŒãªã³ã°ãããåãã ãã§ãªããéå»æ°å¹Žã«ããããã®åéã®ç 究ã®å šäœçãªæŠå¿µã®é²æ©ããã³ãããŒã¯ããããšãåæ©ãšãªã£ãŠããŸãã æ§ãããªãã¬ãŒãã³ã°ãªãœãŒã¹ã§BERTã®ãããªããã©ãŒãã³ã¹ãéæãããšããç®æšã¯ã2018幎ã«ã¯èããããªãã£ããšæãããŸãããçŸä»£ã®é²æ©ãšå€å§åšã®ãã¬ãŒãã³ã°æè¡ã«ãã£ãŠãããã¯çŸåšå¯èœãããããŸããã
ãããã®çåã«çãããããæã ã¯ãCrammingããšåŒã¶ããã¹ãã®åæ¥ã«èšèªã¢ãã«å šäœãåŠç¿ãã課é¡ãæ€èšããŸãã ãã®ç 究ã§ã¯ããŸãåŠç¿ãã€ãã©ã€ã³ã®æ§ã ãªåŽé¢ã調æ»ããã©ã®ãããªå€æŽãã¹ã±ãŒã«ããŠã³ã·ããªãªã§å®éã«ããã©ãŒãã³ã¹ãåäžããããã確èªããŸãã ãã®ãããªå¶çŽã®ããç°å¢ã«ãããŠãã倧èŠæš¡èšç®æ©ã§èŠ³æž¬ãããã¹ã±ãŒãªã³ã°åã«å¿ å®ã«åŸã£ãæ§èœãåŸãããããšã蚌æããã ãã®æ³åã®åœç¶ã®åž°çµãšããŠãã¹ã±ãŒã«ããŠã³ã¯å°é£ã§ãããããå°ããªã¢ãã«ã¢ãŒããã¯ãã£ã¯åŸé èšç®ãé«éåããããšãã§ããããæéã®çµéãšãšãã«ã¢ãã«ã®å šäœçãªæ¹åçã¯ã»ãŒäžå®ã«ä¿ãããã ããããã¢ãã«ãµã€ãºãç ç²ã«ããããšãªããåŸé èšç®ã®å®å¹é床ãåäžãããããšã§ãã¹ã±ãŒãªã³ã°åãå©çšããåŠç¿ã¬ã·ãã®å€æŽãçºèŠããæ¹åãå³ãããšãã§ããã æçµçã«ã¯ãããããªäºç®ã§ãGLUEã¿ã¹ã¯ã§BERTã«è¿ããæã«ã¯ãããè¶ ãããããªãç«æŽŸãªæ§èœãéæããã¢ãã«ãèšç·Žããããšãã§ããŸãã1ã
äºååŠç¿ãå€ãã®ç 究è ãã§ããããã«ããæ矩ã解ããŠããã è«æã®æŠèŠãšããŠãã¢ãã«ãã¹ã±ãŒã«ããŠã³ããç¶æ ã§ãæ§ã ãªåŽé¢ã§å€æŽãã©ã®ããã«åœ±é¿ãããã確èªããã ãããŠã¢ãã«ãµã€ãºãç ç²ã«ããããšãªããåŸé èšç®ã®å®å¹é床ãåäžãããããšã§ãã¹ã±ãŒãªã³ã°åãå©çšããåŠç¿ã¬ã·ãã®å€æŽãçºèŠããã å ·äœçã«ã¯ããã®1å°ã®GPU1æ¥å¶çŽæ¡ä»¶ã®å ã§Glueã¿ã¹ã¯ãBERTãããã¯ãããè¶ ãããããªã¢ãã«ã«äºååŠç¿ããããšãã§ããã
2 æã
ã®æãåŸãã«çžãïŒéãããèšç®æ©ã§ã®ã»ããã¢ãã
ãã®èª¿æ»ãå§ããåã«ãç§ãã¡ãé¢å¿ãæã£ãŠããå¶éã®ç¯å²ã«ã€ããŠæŠèª¬ããããšæããŸãã è©°ã蟌ã¿ã®ã«ãŒã«ã¯ä»¥äžã®éãã§ããã
ä»»æã®å€§ããã®å€æåšããŒã¹ã®èšèªã¢ãã«ããå®å šã«ãŒãããããã¹ã¯èšèªã¢ããªã³ã°ã§åŠç¿ãããã
æ¢åã®äºååŠç¿æžã¿ã¢ãã«ããã€ãã©ã€ã³ã®ãããªãéšåã«ãå«ããããšã¯ã§ããªãã
äžæµããŒã¿ãé€ãä»»æã®çããã¹ããåŠç¿ã«å«ããããšãã§ãããã€ãŸããåŠç¿æžã¿ã¢ãã«ãå¿ èŠãšããªããµã³ããªã³ã°æ©æ§ã§ããã°ãããŒã¿ã®ãµã³ããªã³ã°æ¹æ³ãšã¿ã€ãã³ã°ãé©åã«éžæããããšã§ãé«éåãå®çŸããããšãã§ããã
çããŒã¿ã®ããŠã³ããŒããšååŠçã¯ãç·èšç®éããé€å€ããããååŠçã«ã¯CPUããŒã¹ã®ããŒã¯ãã€ã¶ãŒæ§ç¯ãããŒã¯ã³åããã£ã«ã¿ãªã³ã°ãå«ãŸããŸãããè¡šçŸåŠç¿ã¯å«ãŸããŸããïŒäŸãã°ãåèªåã蟌ã¿ã®äºååŠç¿ã¯ãæçµçãªå®è¡æéã«ã«ãŠã³ããããªãéããèš±å¯ãããŸããïŒã
ãã¬ãŒãã³ã°ã¯1ã€ã®GPUã§24æéè¡ãããŸãã
ããŠã³ã¹ããªãŒã ã®æ§èœã¯GLUE (Wang et al., 2018)äžã§è©äŸ¡ãããã
GLUEäžã§ã®ããŠã³ã¹ããªãŒã ã»ãã¡ã€ã³ãã¥ãŒãã³ã°ã¯ãããŠã³ã¹ããªãŒã ã¿ã¹ã¯ã®èšç·ŽããŒã¿ã®ã¿ãçšããçæéã®èšç·Žã«éããïŒ5ãšããã¯ä»¥äžãèæ ®ïŒããã¹ãŠã®GLUEã¿ã¹ã¯ã«å¯ŸããŠã°ããŒãã«ã«èšå®ããããã€ããŒãã©ã¡ãŒã¿ã§åäœããå¿ èŠããããŸãã ããŠã³ã¹ããªãŒã ã®åŸ®èª¿æŽã¯ãç·èšç®äºç®ããé€å€ãããŸãã æã ã®å®è£ ã§ã¯ãå€å žçãªrtx2080ti GPUïŒ2018幎9æãªãªãŒã¹ïŒãšãããææ°ã®rtxa4000ãŸãã¯rtxa6000 GPUïŒ2020幎10æãªãªãŒã¹ïŒã«ããåå¥ã®ã»ããã¢ããã®äž¡æ¹ãåæããŸãã åãŠãããã«4ã€ã®CPUã³ã¢ãš32GBã®RAMãçµã¿åãããŠããŸãã ãªããã®ãããªå¶éãããã®ã§ããããïŒç§ãã¡ã¯äž»ã«ãDevlinãïŒ2019ïŒã®ãªãªãžãã«ã®BERTã»ããã¢ããããéãããèšç®éã§å調æ»ããããšã«èå³ããããŸãã æé©ãªãµã€ãºãšåœ¢ç¶ã¯ã¹ã±ãŒãªã³ã°æ³åã«äŸåããããããã©ã³ã¹ãã©ãŒããŒã®æé©ãªã¢ãŒããã¯ãã£ã¯åºå®ãããŠããŸããïŒKaplanãã2020ïŒã æ¢åã¢ãã«ã®äœ¿çšå¶éã«ãããæ¢åã¢ãã«ããã®èžçïŒTurc et al., 2019; Jiao et al., 2020; Sun et al., 2020; Wang et al., 2020b; Kaliamoorthi et al., 2021ïŒãæ¢åã®å€§èŠæš¡ã¢ãã«ã«åºã¥ãããŒã¿ãã£ã«ã¿ãªã³ã°ïŒGolchin et al., 2022ïŒã¯é€å€ããããã ããããæçµçã«ã¯æ¢ã«åŠçããæ å ±ã®å§çž®ãšè»¢éã«é¢ããåãã«çããŠããããã®ã§ããã ããã«ãæã ã¯BERTãèšç·Žããããã«äœ¿çšãããå ã®ããŒã¿ã»ããã«ããŒã¿ãå¶éããããªããããè¯ãããŒã¿ã®ãã¥ã¬ãŒã·ã§ã³ãšå質ãéããŠå¯èœãªæ¹åãå¯èœã«ããããšæããŸãã rtx2080ti GPUã¯ãDevlinãïŒ2019ïŒããåã«ãªãªãŒã¹ãããããšãèãããšããã®å®éšã®ããã®èªç¶ãªåè£ã§ãããããæè¿ã®rtxa4000ããããæè¿ã®æ¶è²»è ã°ã¬ãŒãã®ã¯ãŒã¯ã¹ããŒã·ã§ã³å€çš®ãšããŠèå³æ·±ããšæããŸãã æåŸã«ãã·ã³ã°ã«ãŠãŒã¶ãŒã¯ãŒã¯ã¹ããŒã·ã§ã³ã®äžéã§ããrtxa6000ã®ãã¹ããè¡ããŸããã 埮調æŽã®æ®µéã§ã¯ããªãªãžãã«ã®BERT埮調æŽããã³è©äŸ¡ã»ããã¢ãããæš¡å£ããããšæããŸãããäŸãã°ãèšç®éã®å€ãäžæµãã¬ãŒãã³ã°ïŒBahri et al, 2021aïŒãè€æ°ã®ããŠã³ã¹ããªãŒã ããŒã¿ã»ããã®äœ¿çšïŒäŸãã°ãä»ã®ã¿ã¹ã¯ã埮調æŽããåã«MNLIã§äºåèšç·Žãç¶ç¶ïŒIzsakãã2021ïŒïŒã ããã³åGLUEã¿ã¹ã¯ã®æ¡åŒµãã€ããŒãã©ã¡ãŒã¿æé©åïŒDevlinãã2019ïŒLiuãã2019ïŒLanãã2019ïŒãªã©ãæããããã
詳现ãªã¹ããã¯ãè¿°ã¹ãããŠããã GPUã¯ãRTX 2080TiãRTX A4000ãRTX A6000ã§ãçŸå®çãªäŸ¡æ Œãªã®ã¯ RTX2080TiãRTX A4000ã ããã Colabã®GPUã§ãæ€èšŒå¯èœãªç¯å²ã§ã¯ããïŒïŒïŒæéåããã®ã¯ããã©ããã®ã®ïŒ CPUã¯4ã€ã®ã³ã¢ã§ããã¹ãã¡ã¢ãªã¯32GBãšã®ããš
3 å¹ççãªå€æåšã«é¢ããé¢é£äœæ¥
BERT ã®ãã¬ãŒãã³ã°ã«ã¯ã©ããããã®æéããããã®ã§ããããïŒ äžè¬çã«ãããŒããŠã§ã¢ãšãœãããŠã§ã¢ã®ã»ããã¢ãããä¹±æŽã«å€åããå¹çã®å°ºåºŠãç°ãªãããããã®è³ªåã«çããã®ã¯å°é£ã§ãïŒDehghani et al.ã2021幎ïŒã ãã¬ãŒãã³ã°å®è¡ã®èšç®ã®äžéã¯ãå®è¡ã®ãŠã©ãŒã«ã¯ããã¯ããžã§ããäžã§å©çšå¯èœãªïŒäœç²ŸåºŠã®ïŒæµ®åå°æ°ç¹æŒç®ã®ç·æ°ãèŠã€ããããšã«ãã£ãŠç¢ºç«ããããšãã§ããŸãã ãã®ããŒã¯å€ã¯ãé«åºŠã«æé©åãããã¢ãã«ã§ãã£ãŠãå®éã®èšç®ã§ã¯å°éããŸãããïŒChowdhery et al.ã2022ïŒããã¬ãŒãã³ã°å®è¡ã®å®çŸã«å¿ èŠãªæåããžã§ãããè¡šããŠããŸãã è¡š1ã§ã¯ãããã€ãã®éžæãããèšç·Žå®è¡ã®äºç®ããŸãšããŠããŸãã TPUäžã§ã®BERTã®æåã®ãã¬ãŒãã³ã°å®è¡ã®åŸãåæã®åå¿ã§ã¯ãGPUäžã§åçã®çµæãåŸãããã«æ倧11æ¥éã®èšç®ãæšå®ããŸããïŒDettmers, 2018ïŒã ããããç¹ã«ãœãããŠã§ã¢ã«ãããæç¶çãªæ¹åã«ãããäžéã¯å€§å¹ ã«æžå°ããŸãã (You et al., 2019; Narasimhan, 2019)ã ããããã¬ã·ããšå®è£ ã¯äžè¬ã«ããµãŒããŒããŒãå šäœïŒGPUã®å ŽåïŒãŸãã¯TPUã¹ã©ã€ã¹ãå¿ èŠãšãããã倧ããªBERTã¢ãŒããã¯ãã£ãã¿ãŒã²ããã«ããŠããŸãã
BERTã®æ¹è¯ãè°è«ããä»ã®ä»äºã¯ãå ã®BERTã«è¿ãèšç®èšå®ãã¿ãŒã²ããã«ããŠãããäŸãã°ãSqueezeBERTïŒIandolaãã2020ïŒã¯4æ¥éã8æã®Titan RTXã«ãŒããæ¡çšããŠããŸãã SellamãïŒ2022ïŒã¯ããªãªãžãã«ã®BERTãã¬ãŒãã³ã°å®è¡ãç°åžžå€ã§ããããã®ãã¬ãŒãã³ã°æéã2åã«ããããšã§ãªãªãžãã«ã®çµæããã確å®ã«åçŸããããšã«èšåããŠããŸãã éããããªãœãŒã¹ã§ã®BERTãã¬ãŒãã³ã°ã®ããã®ç§ãã¡ã®äžå¿çãªæ¯èŒãã€ã³ãã¯ãå šäœçã«åæ§ã®å¶éã§24æé以å ã«BERTããã¬ãŒãã³ã°ãããšããç®æšãè©Šã¿ãIzsakãïŒ2021ïŒã®ä»äºã§ããã8 V100GPUãåãããã«ãµãŒãããŒãã䜿çšããŸãã IzsakãïŒ2021ïŒã¯ãBERTLARGEã¢ãŒããã¯ãã£ã®å€çš®ãéžæããä¿®æ£ãããåŠç¿çã¹ã±ãžã¥ãŒã«ã倧ããªããããµã€ãºãã¹ããŒã¹äºæž¬ãããã¯ã·ãŒã±ã³ã¹ãªã©ã®æ§ã ãªåŸ®èª¿æŽãå«ãã128ã®ã·ãŒã±ã³ã¹é·ã§èšç·ŽããŠããŸãã æã ã¯ãã®èšå®ããæã ã®èšç®äºç®ïŒçŽ15åå°ããïŒã®ããŒã¹ã©ã€ã³èšå®ãšããŠåè©äŸ¡ããŠããŸãã
å¹ççãªãã©ã³ã¹ãã©ãŒããŒã®ç 究 è¿å¹ŽãVaswaniãïŒ2017ïŒã§ææ¡ããããã©ã³ã¹ãã©ãŒããŒã¢ãŒããã¯ãã£ãæ¹åã»ä¿®æ£ããããã®ç 究ãçãã«è¡ãããŠããããã®åéã®ç 究ã®æè¿ã®åé¡ãšã¬ãã¥ãŒã«ã€ããŠã¯TrevisoãïŒ2022ïŒãåç §ããã ããã€ãã®ã¡ã¿ç 究ã§ã¯ãææ¡ãããæ¹åãä¿®æ£ã«ã€ããŠèª¿æ»ããŠãããNarangãïŒ2021ïŒã¯ãRaffelãïŒ2020ïŒã®T5ã¢ãã«ãã€ãã©ã€ã³ã«é©çšãããå¹ åºãã¢ãŒããã¯ãã£ã®ä¿®æ£ããèšèªç解ãšç¿»èš³ã®äž¡æ¹ã®ã¿ã¹ã¯ã«ã€ããŠè©äŸ¡ããŠããã T5ã®ãšã³ã³ãŒãã»ãã³ãŒãæ§é ã¯ã粟ç¥çã«ã¯ãªãªãžãã«ã®ãã©ã³ã¹ãã©ãŒããŒã»ããã¢ããã«è¿ããããšã³ã³ãŒãã³ã³ããŒãã³ãã䜿çšããå Žåã¯BERTãšåæ§ã®åäœãããããšãç解ãããŠããïŒLiu et al.ã2021aïŒã TPUã¹ã©ã€ã¹ã®1.75æ¥ã®èšç®ã§ä¿®æ£ãè©äŸ¡ããããšã§ã圌ãã¯ã»ãšãã©ã®æ¹åãæçµçãªç²ŸåºŠã®ã²ã€ã³ã確å®ã«å®çŸããªãããšãçºèŠããŸãã TayãïŒ2021ïŒã¯ãåãèšå®ã§äœæ¥ããT5ç±æ¥ã®ã¢ãŒããã¯ãã£ã®æé©ãªåœ¢ç¶ãšãã¢ãã«ãã¹ã±ãŒãªã³ã°ããããšãã®ããŠã³ã¹ããªãŒã æ§èœã«å¯Ÿããçžå¯Ÿçãªå¹æãè©äŸ¡ããŸãã TayãïŒ2022aïŒã®æ§ã ãªã¢ãŒããã¯ãã£ã®æ¹è¯ã®ã¹ã±ãŒãªã³ã°åäœã®ãããªãæ¢æ±ã¯ãç¹ã«ããŠã³ã¹ããªãŒã 粟床ãè©äŸ¡ããéã«ããã¹ãŠã®ã¹ã±ãŒã«ã§VaswaniãïŒ2017ïŒã®ãªãªãžãã«ã¢ãŒããã¯ãã£ãäžåãããããªä¿®æ£ãããªãããšãçºèŠããã ScaoãïŒ2022ïŒã®æ¥µç«¯ãªã¹ã±ãŒã«ã®ãã¬ãŒãã³ã°ã«åããæ¹åã調æ»ããã¡ã¿ã¹ã¿ãã£ã¯ãã¬ã€ã¢ãŠããäœçœ®åã蟌ã¿ãèªå·±ååž°ã¢ãã«ã®ããŒã¿ãœãŒã¹ãžã®å°ããªä¿®æ£ã«çŠç¹ãåœãŠãŠãããä»ã®æ¥µç«¯ã«å€§èŠæš¡ãªãã¬ãŒãã³ã°å®è¡ã¯ã ãããŸã§åæ§ã«ãã®èšå®ã«ãããŠä¿å®çã§ããïŒBrownãã2020; Blackãã2022; Raeãã2022)ã ããããäžè¬çã«ããããã®è©äŸ¡ã¯ãæã ã䜿çšããäºå®ããã倧ããªèšç®æ©èšå®ã察象ãšããŠãããæ¹åç¹ïŒå€ãã®å ŽåãåŠè¡çãªãœãŒã¹ãããå°èŠæš¡ã®è©äŸ¡ã§ææ¡ãããïŒããã倧ããªã¹ã±ãŒã«ã«å€æããããã©ããã«é¢ä¿ããŠããã®ã§ããã
ãã®ç 究ã§ã¯ã(ã¢ãã)ã¹ã±ãŒãªã³ã°ã®åé¡ã¯ããŠãããéãããèšç®éã«ã®ã¿çŠç¹ãåœãŠãŸãã ã¹ã±ãŒãªã³ã°å å ·äœçãªæ¹åç¹ãèŠã€ããããšã®é£ããã¯ãKaplanã(2020)ã®ã¹ã±ãŒãªã³ã°åã«åæ ãããŠããŸãã Kaplanã(2020)ã¯ãåºç¯å²ã®ãã©ã³ã¹ãã©ãŒããŒã¢ãã«åœ¢ç¶ã«ãããŠãã¢ãã«ãµã€ãºïŒéåã蟌ã¿å±€ã®ãã©ã¡ãŒã¿æ°ãšããŠïŒã®ã¿ãæ§èœã匷ãäºæž¬ããããšãèŠåºããŠããŸãã ããã«ãåºå®ãããèšç®ããžã§ããã«å¯ŸããŠãæé©ãªã¢ãã«ãµã€ãºãå°ãåºãããšãã§ããŸãããæ§èœãšã¢ãã«ãµã€ãºãšã®é¢é£ã¯ãããã§ããã倧ããªã¢ãã«ã¯åäœèšç®éãããã®ããŒã¿åŠçéãå°ãªããã®ã®ãã»ãŒåãããŒãžã³ã§é«éåãããŸãã ãããã®ã¹ã±ãŒãªã³ã°åã®æ£ç¢ºãªä¿æ°ãšåœ¢ç¶ã¯ãå埩ããç¶ãïŒHoffmannãã2022ïŒãé¢é£ããèšå®ã«é©å¿ãããïŒBansalãã2022ïŒClarkãã2022ïŒBahriãã2021bïŒãã®ã®ããã®å šäœã®è«çã¯ãããšãå°ããªã¹ã±ãŒã«ã§ã®ãã¯ãŒåãããããŸããããªã芳枬ã«é©åãããšããŠããéããããããã«æãããŸãã
é¢é£ç 究ã«ã€ããŠã®èšåã ããã¯èšãããšããããšããŸã ããç解ã§ããŠããªãã
4.調æ»
æã ã®å®éšçè©äŸ¡ã®ããã«ãæã ã¯DevlinãïŒ2019ïŒã®ã»ããã¢ããã«ããªãã®æ°ã®ææ¡ãããä¿®æ£ãå®è£ ããã»ã¯ã·ã§ã³2ã§èª¬æããããã«æã ã®éãããèšç®æ©èšå®ã«ããããããã®å©ç¹ã«ã€ããŠãã¹ãããã æã ã¯ãŸããå ±éã®å®è£ ãšåæããŒã¿ã»ããã¢ãããæããã«ãã次ã«ã¢ãŒããã¯ãã£ããã¬ãŒãã³ã°ãããŒã¿ã»ããã®æ¹åã調æ»ããã
4.1 å®è£
ã®è©³çŽ°
æã ã¯ãPyTorchïŒPaszke et al., 2017ïŒã§ãã¹ãŠãå®è£ ããããœãããŠã§ã¢ãããïŒHooker, 2021ïŒããã®å©çãå¶éããããã«ã確ç«ãããã³ã³ããŒãã³ãã«ããã«çµæãåãããã§ããããç¹æ®ãªå®è£ ã䜿çšããªãã æã ã¯ãPyTorchãã¬ãŒã ã¯ãŒã¯ã®å®è£ ã¬ãã«ã§ãã¹ãŠãç¶æãããã¹ãŠã®ã³ã³ããŒãã³ãã«é©çšå¯èœãªèªåæŒç®åèåïŒSarofeen et al.ã2022ïŒã®ã¿ãèš±å¯ããŸãã æçµçãªã¢ãŒããã¯ãã£ã®å€çš®ãéžæããåŸã«ã®ã¿ãæã ã¯æ¬¡ã«DaoãïŒ2022ïŒã«èšèŒãããå¹ççãªæ³šæã«ãŒãã«ãåæå¹åããã æã ã¯ãæšæºçãª16ãããããã³32ãããæµ®åå°æ°ç¹ç²ŸåºŠïŒå®å šãª32ããããããŒããã¹ã±ãŒãªã³ã°ããã16ãããïŒRasleyãã2020ïŒããã³çŽç²ãªbfloat16ïŒWang & Kanwarã2019ïŒä»¥äžïŒã®èªåæ··å粟床ïŒMicikeviciusãã2018ïŒã®åãèšå®ãçšããŠãã¹ãŠã®å®éšãšã¢ãã¬ãŒã·ã§ã³ç 究ãå®è¡ããã æã ã®èšå®ã«ãããŠããªãããŒã(Ren et al., 2021; Rasley et al., 2020)ã®å©ç¹ã¯èŠãã ããŸãã)ã åæããŒã¿èšå® ç§ãã¡ã¯ãDevlinã(2019)ã®ãªãªãžãã«ã®çããã¹ããœãŒã¹ã«è¿ãã¢ããã°ã§èª¿æ»ãéå§ããè±èªã®Wikipedia (20220301.en)ãšè±èªã®ããã¯ã³ãŒãã¹ã®æè¿ã®ãã³ãã䜿ããTan (2019); Bandy & Vincent (2021) ã®è§£èª¬ãåç §ããã å šãŠã®ããã¹ãã匷å¶çã«å°æåã«ããã¢ã¯ã»ã³ããšéã¢ã¹ããŒæåãåãé€ãããã®ããŒã¿ã®ã¿ã«åºã¥ããŠè±èªã®ããŒã¯ãã€ã¶ãŒããŒãããäœæããŸãã èªåœãµã€ãº2 15 = 32768 (Wu et al., 2016)ã®WordPieceãéžæããã BPE (Sennrich et al., 2016) ã SentencePiece with Unigrams (Kudo, 2018; Kudo & Richardson, 2019) ã§ã¯æ§èœã«å€§ããªå€åããªãããšã確èªããã å°ããèªåœãµã€ãº(2 12 , 2 13 , 2 14)ã¯ããã©ãŒãã³ã¹ãäœäžãã倧ããèªåœãµã€ãº(2 16)ã¯ä¿¡é Œã§ããã»ã©è¯ãã¯ãªããªãã£ãã ããŒã¯ã³åãããããŒã¿ãé·ã128ã®ã©ã³ãã ãªã·ãŒã±ã³ã¹ã«ããã¯ããç¡é¢ä¿ãªãã©ã°ã¡ã³ããã§åé¢ããããã®ã»ãã¬ãŒã¿ãåé€ããããšã«ããæ§èœãžã®åœ±é¿ã¯æå°éã§ãã£ãã ãŸããäºååŠç¿ã§ããŒã¯ã³ãå ¥ããŠã圱é¿ã¯èŠãããªãã£ãã çãé åé·ã¯ãæã ãã¿ãŒã²ãããšããŠããããŠã³ã¹ããªãŒã ã¢ããªã±ãŒã·ã§ã³ã«ãšã£ãŠååã§ããã泚æã®èšç®ãç°¡çŽ åããããšãã§ããŸãã ããŒã¿ãå®å šãªã·ãŒã±ã³ã¹ã«ããã¯ãããšãããåçŽãªã·ãŒã±ã³ã¹ãã¹ã«å¶éãããããå©çšå¯èœãªèšç®æ©ãæé©ã«äœ¿çšããLiuãïŒ2019ïŒïŒIzsakãïŒ2021ïŒã 察象ãšãªãèšç®æ©èšå®ã®å Žåããã®ã·ãŒã±ã³ã¹é·ã¯ãgtx2080tiäžã®ããŒã¹BERTã¢ãŒããã¯ãã£ã®ã»ãšãã©ã®ããªãšãŒã·ã§ã³ã§64ãã96ã®ãã€ã¯ãããããµã€ãºã«ãªããããããã倧ããªããããµã€ãºã«èç©ããŸãã ç§ãã¡ã®éãããèšç®äºç®ã§ãããã¯ãããŒã¿ãã€ã³ããå蚪ãããªãã·ã³ã°ã«ãšããã¯ãã¬ãŒãã³ã°ïŒKomatsuzaki, 2019; Hernandez et al.
auto operator fusionã®ã¿èš±å¯ãšããããäœãã¯ããããªãã DaoãïŒ2022ïŒã«èšèŒãããå¹ççãªæ³šæã«ãŒãã«ãåæå¹åããŠããã æŒç®ç²ŸåºŠã«ã€ããŠã¯å€ãã®ç 究ãèžè¥²ããŠããã èªåœãµã€ãºã¯2^15ã®WordPieceãéžæãèªåœãµã€ãºã®å€åã§è¯ãçµæã¯åŸãããªãã£ãã ç³»åé·ãçãããããšã¯ãåŸæ®µã®åŠçã«ãšã£ãŠååã§ããã°ç°¡çŽ åããããšãå¯èœã
4.2 ã¢ãŒããã¯ãã£ã®å€æŽ
å¹ççã«åŠç¿ãã¹ã±ãŒã«ããŠã³ããæãæçœãªæ¹æ³ã¯ãã¢ãã«ã¢ãŒããã¯ãã£ãä¿®æ£ããããšã§ããã çŽæçã«ãè©°ã蟌ã¿äœå¶ã§ã¯ããå°ããïŒããäœã容éã®ã¢ãã«ãæé©ã§ããå¯èœæ§ãé«ããšæãããã ãã®ã»ã¯ã·ã§ã³ã§ã¯ãã¢ãã«ã®çš®é¡ãšåŠç¿å¹çã®é¢ä¿ã«ã€ããŠç 究ããŠããŸãã
ã¹ã±ãŒãªã³ã°åã¯ã¹ã±ãŒã«ããŠã³ã®åŒ·ãéå£ãšãªãããšãããããŸãã ããŒã¯ã³ããšã®åŠç¿å¹çã¯ã¢ãã«ãµã€ãºã«åŒ·ãäŸåãããããã©ã³ã¹ãã©ãŒããŒã®ã¿ã€ãã«ã¯äŸåããªãã ããã«ãå°ããã¢ãã«ã¯åŠç¿å¹çãæªããã¹ã«ãŒãããã®åäžã¯ã»ãšãã©èŠèŸŒããŸããã 幞ããªããšã«ãåããµã€ãºã®ã¢ãã«ã§ããã°åŠç¿å¹çã¯ã»ãŒäžå®ã§ããããããã©ã¡ãŒã¿æ°ãã»ãŒäžå®ã«ä¿ã¡ãªããåŸé èšç®ãé«éåããã¢ãŒããã¯ãã£ã®æ¹è¯ã«ãã£ãŠãæ§èœãåäžãããããšãã§ããŸãã ãã®ãããåäžã®åŸé ã¹ãããã®èšç®æéã«ã©ã®ãããªåœ±é¿ãäžããããäž»ãªåºæºãšããŠèšèšãéžæããããšãã§ããã¢ãŒããã¯ãã£ã®éžæãéåžžã«å®¹æã«ãªããŸãã
äœãªãœãŒã¹é åã§ãã¹ã±ãŒãªã³ã°åã¯æç«ãã
è¿å¹Žãå€ãã®ç 究ããªãªãžãã«ã®å€æåšãé«éåããããã®ã¢ãŒããã¯ãã£ã®æ¹è¯ãéçºããŠããŸãã ãããã®æ¹æ³ã®å€ãã¯ã倧èŠæš¡ãªT5ã¢ãŒããã¯ãã£ã®åŠç¿ãæ¹åããããšã¯çºèŠãããŠããŸãã Narangã (2021); Tayã (2022a). ããããããŒã¿ã¹ã«ãŒããããæéèŠèŠãããäœã³ã³ãã¥ãŒãç°å¢ã§ã¯ããããããããã®æ¹æ³ãæå¹ãªã®ããïŒKaplanã(2020)ã¯ãé«ãªãœãŒã¹é åã§ã¹ã±ãŒãªã³ã°åã芳枬ããŠããããªãœãŒã¹ã倧ãããªã£ãŠãéçãŸã§åŒ·ãä¿æããããã§ãã é©ãã¹ãããšã«ããããã®æ³åã¯æ¥µç«¯ãªèšç®éã®ããŠã³ã¹ã±ãŒã«ã«ãããŠãæç«ããäœã³ã¹ãã®ãã¬ãŒãã³ã°ã«å¯Ÿããéå£ãšãªãã®ã§ãã å³1ã§ã¯ãæç®ã«ããå€ãã®å€æåšã®ããªãšãŒã·ã§ã³ã«ã€ããŠãã¹ã±ãŒãªã³ã°åã®å¹æãäŸç€ºããŠããŸããããã§ã¯ãã»ã¯ã·ã§ã³4.3ã§åŸè¿°ããããã«æé©åãããåŠç¿ãã€ããŒãã©ã¡ãŒã¿ãçšããŠåã¢ãŒããã¯ãã£ããªãšãŒã·ã§ã³ãåŠç¿ããŠããŸãã ãããã®ã¢ãŒããã¯ãã£å€çš®ããåæ£èŠåãšå転åã蟌ã¿ãçµã¿èŸŒãã å ±æããŒã¹ã©ã€ã³ã¢ãã«ã«é©çšããã å³1ã¯ããã¹ãŠã®ã¢ãŒããã¯ãã£ãåãæéäºç®ã§å®è¡ããããŒã¯ã³ã®ç·æ°ã«å¯ŸããMLMæ倱ã®é²æãèŠèŠåãããã®ã§ãã ãã©ã³ã¹ãã©ãŒããŒã®çš®é¡ãšãµã€ãºãå€ããŠãã24æéåŸã®æçµçãªæ倱ã«ã¯ã»ãšãã©åœ±é¿ããªãããšãããããŸãã ããå€ãã®ãã©ã¡ãŒã¿ãæã€ã¢ãã«ã¯ãMLMã®æ倱ãåŸé ããšã«éãæžå°ãããããããå¹ççã«åŠç¿ããããšãã§ããŸãã ããããå°ããã¢ãŒããã¯ãã£ã¯ãé ãåŠç¿å¹çãé«ãã¹ã«ãŒãããã§è£ããããéãããäºç®ã§ããå€ãã®ããŒã¯ã³ãåŠçããããšãã§ããŸãã å³1ã¯ãåŠç¿ã®åæ段éïŒæåã®1BããŒã¯ã³ïŒã«ãããŠãã¢ãŒããã¯ãã£ã®éããäºæž¬ã§ããªãããšã瀺ããŠããŸãã ãã®åŸãããŒã¯ã³ããšã®å¹çã¯ä¹æ³å®æ°ïŒå¯Ÿæ°è»žã«ããæ°Žå¹³æ¹åã®ã·ããïŒã®ã¿ã§ç°ãªã£ãŠããŸãã ãã®å®æ°ã¯ã¢ãã«ã®çš®é¡ã§ã¯ãªããã»ãŒå®å šã«ã¢ãã«ãµã€ãºã«äŸåãããããåŠç¿çµäºæã«ã¯å šãŠã®éžæè¢ã1.9ååŸã®MLMãã¹ã«å°éããŸãã
ã¹ã±ãŒãªã³ã°åãå©çšãã
ããŒã¯ã³åäœã®æ§èœã¯ã¢ãã«ãµã€ãºãšå¯æ¥ã«é¢ä¿ããŠãããããã¹ã±ãŒãªã³ã°åã¯å€æåšã®ãµã€ãºãšã¿ã€ãã倧ããå€æŽããããšã§å€§ããªå©çãåŸãããšãé»ãã§ããããã«èŠããŸãã ãã®çµæãSchwarzschildïŒ2021ïŒã®ããã«BPTTã§èšç·ŽããŠããæŒæåå€æåšã¢ãŒããã¯ãã£ãçšããå ŽåïŒDaiãã2020ïŒNawrotãã2022ïŒãFFNå±€ãèœãšããå ŽåïŒSridharãã2022ïŒããªã«ã¬ã³ãå±€ãçšããå ŽåïŒLanãã2019ïŒãäœã®æ¹åãèŠãããŸããã ã¢ãŒããã¯ãã£ããã£ãŒããããŒã«ãªã¹ã±ãŒãªã³ã°ããŠãïŒTay et al., 2021; Wies et al., 2021ïŒãäœã®å©åŸãåŸãããªãã ãã®åçã¯ãå¹ççã«ã¹ã±ãŒã«ããŠã³ããããã®1ã€ã®æãéããäžæ¹ã§ãå¥ã®æãéããŸãã åŸé å¹çã¯åããµã€ãºã®ãã¹ãŠã®ã¢ãã«ã«å¯ŸããŠã»ãŒäžå®ã§ãããããã¢ãã«ãµã€ãºãã»ãŒäžå®ã«ä¿ã¡ãªããèšç®ãé«éåããã¢ãŒããã¯ãã£ã®éžæãè¿ éã«æ€çŽ¢ããããšã«ãã£ãŠãã¹ã±ãŒãªã³ã°åãå©çšããããšãã§ããŸãã ãã®ã«ããŽãªã«ã¯ãæãããªæé©åãå€æ°å«ãŸããŠããã以äžã§ã¯ããããã«å ããŠããããã§ã¯ããã䟡å€ãã/ç¡æã®å©ç¹ãããããä»ã®ããã€ãã®èª¿æŽã«ã€ããŠèª¬æããŸãã
ã¢ãã³ã·ã§ã³ãããã¯
æã ã¯ãã¹ãŠã®QKVãã€ã¢ã¹ãç¡å¹ã«ããŸã(Dayma et al., 2021)ã ããã¯ãèšç®ã®å±€ãåãé€ãããšã«ãã£ãŠã¹ã±ãŒãªã³ã°åãå©çšããã¢ãã«ãµã€ãºãã»ãŒäžå®ã«ä¿ã¡ãªãããåæ¹ããã³åŸæ¹ééãããããéããããã®ã§ãã æã ã¯ããããGPUäžã§ããè¯ã䞊ååããè¥å¹²ã®æ§èœåäžãããããããã泚ç®ãããã®æ°ãæžããããšã§åŸé ã³ã¹ããæžãããããšãçºèŠããïŒMerity, 2019; Araabi & Monz, 2020; Liu et al., 2021b; Javaheripi et al., 2022ïŒã ãããããããã®éãæžãããšåŸ®èª¿æŽã®æ§èœãäœäžãããããæçµçã«ã¯12ããããã¹ãŠãç¶æããã ãœããããã¯ã¹æŒç®ãžã®çœ®ãæãã«ããã¡ãªããã¯èŠãã ããªã(Richter & Wattenhofer, 2020)ã ããã«ããªãªãžãã«ã®å€é èªå·±ã¢ãã³ã·ã§ã³æ©æ§ãç¶æããã å¹ççãªæ³šæïŒSukhbaatar et al., 2019; Beltagy et al., 2020; Wang et al., 2020a; Liu et al., 2021cïŒãå¹çç泚æã®ç 究ïŒTay et al., 2020aïŒbïŒã¯å€§éã«è¡ãããŠããã ããããæ倧é åé·ã128ã«èšå®ãããããæã ã®èšå®ã«ãããŠæ³šæã®è€éãã¯ããŸãæ°ã«ãªããªãã ãããæ€èšŒããããã«ãæè¿ææ¡ãããFLASHæ©æ§(Hua et al., 2022)ãå®è£ ããããå©ç¹ã¯èŠãã ããªãã£ãã æã ã¯ããã«ãLee-ThorpãïŒ2021ïŒã§ææ¡ãããFourier attentionãå®éšããããäœã®æ¹åãèŠåºããªãã ããŒã¿ãªãŒåã蟌ã¿ïŒSu et al., 2021; Black et al., 2022ïŒã¯å°ããªå©ç¹ãããããããé床ã®äœäžã«ãã£ãŠçžæ®ºããããããæçµçã«ããããæ¡çšããªãããšã«ããã
ããã§ããããç³»åé·ãéåžž512ã§ããã128ã«èœãšãããŠããããšãåãã£ããããã¯å¹æã¯å€§ãããã ãŸããQKVã®éã¿ã®ãã€ã¢ã¹ã¯ç¡å¹åãããŠããïŒããšããšã©ãã ã£ãïŒïŒ
ãã£ãŒããã©ã¯ãŒããããã¯
æã ã¯ããã¹ãŠã®ç·åœ¢å±€ã®ãã€ã¢ã¹ãç¡å¹ã«ããããšããçµéšçãªå©åŸãèŠã€ããïŒDaymaãã2021ïŒã 泚æå±€ãšåæ§ã«ãããã¯ã¢ãã«ãµã€ãºã«é¡èãªåœ±é¿ãäžããããšãªããåŸé èšç®ãå éããããšã«ãã£ãŠã¹ã±ãŒãªã³ã°åã掻çšããã ãã®çµæãã¢ãã«ã®æ¹åé床ãæãªãããšãªããããé«ãã¹ã«ãŒããããåŸãããšãã§ããã å ã®ãã£ãŒããã©ã¯ãŒããããã¯ã¯ã»ãšãã©å€æŽãããGELU以å€ã®æŽ»æ§åã«å€æŽããŠãäœã®ã¡ãªãããèŠãã ããªãã£ãã æã ã¯ããããã¯ãã²ãŒãç·åœ¢ãŠãããã«åé åºä»ãããããšããå°ããªæ¹åãèŠåºãïŒDauphin et al.ã2017ïŒã ä»ã®ä»äºãäŸãã°ïŒBlack et al., 2022ïŒãšã¯å¯Ÿç §çã«ãæã ã¯ã²ãŒãã£ã³ã°ã«ããé ã次å ã®åæžãè£åããããã«ãFFNãããã¯ã®ãã©ã¡ãŒã¿æ°ãå¢å ãããªãã ãšã³ãããã£ã³ã°ãè¡ããæã ã¯ãHuaã(2022)ã«èšèŒãããŠããããã«ãé±ç¶æ£åŒŠæ³¢äœçœ®åã蟌ã¿ãå®è£ ããåŠç¿ããããŸãã¯é±ç¶æ£åŒŠæ³¢åã蟌ã¿ã«å¯Ÿãã挞é²çãªå©ç¹ãèŠåºããã å ¥åãšåºåã®åã蟌ã¿ãåãé¢ãããšã«ããæ¹åã¯èŠãããªãïŒChung et al.ã2020ïŒã Lanã(2019)ããã®å ¥åãšã³ãããã£ã³ã°ãå æ°å解ããææ¡ã¯ãæã ã®èšå®ã«ãããŠå©åŸãæäŸããªãã æã ã¯ãåã蟌ã¿ãããã¯ã®æåŸã«ã¬ã€ã€ãŒã®æ£èŠåãå«ããã
ç·åœ¢å±€ã®ãã€ã¢ã¹ãç¡å¹åããã ãé±ç¶æ£åŒŠæ³¢äœçœ®åã蟌ã¿ãå®è£ ããåŠç¿ããããŸãã¯é±ç¶æ£åŒŠæ³¢åã蟌ã¿ã«å¯Ÿãã挞é²çãªå©ç¹ãèŠåºããããã¯ãã€ã³ãããã
ã¬ã€ã€ãŒæ§é
å€ãã®ç 究ã§èŠ³å¯ãããããã«ãæã ã¯ãã¬ã€ã€ãŒããŒã ã«ããäºåæ£èŠåããã¹ãã¬ã€ã€ãŒããŒã ãããæçã§ããããšãèŠåºãïŒBaevski & Auli, 2018; Xiong et al.ã2020ïŒã Liu et al., 2020b; Shleifer et al., 2021ïŒã®ãããªããã®ä¿®æ£ã®ä»ã®ããªãšãŒã·ã§ã³ããè¿œå ã®å©çãèŠãã ãããšã¯ã§ããªãã ããã«ãLayer NormalizationãRMS Normalizationã«çœ®ãæããŠãå©åŸã¯åŸãããªã(Zhang & Sennrich, 2019)ã æã ã¯ãäºåæ£èŠåã®éèŠãªå¹æã¯ãåŠç¿ãå®å®ããããã倧ããªåŠç¿çãšãŠã©ãŒã ã¢ããã®åæžãå¯èœã«ããããšã§ãããããèªäœãå«ãããšã«ããå©ç¹ã¯éãããŠãããšèããŠããããšã«çæããã ãŸãã(Zhang & He, 2020)ã§èª¬æãããŠããããã«ãå±€å šäœã確ççã«åé€ããããšã«ããå©ç¹ã¯ãªãã
ãã¬ã€ã€ãŒããŒã ã«ããäºåæ£èŠåããã¹ãã¬ã€ã€ãŒããŒã ãããæçã§ãããããã€ã³ãã
ããããããã¯
æã ã¯ãéç·åœ¢ããããåé€ããŠãæªåœ±é¿ããªãããšãçºèŠããã æã ã¯ããã«ãã³ãŒããã€ã¢ã¹(Radford et al., 2019)ãèœãšããã¹ããŒã¹ããŒã¯ã³äºæž¬(Liu et al., 2019; Izsak et al., 2021)ãçšããŠã¡ã¢ãªãç²åŸããããšãã§ããã ããã«åŠç¿ãå®å®ãããããã«ãæåŸã®Layer Normãè¿œå ããã
ããã¯æ§ã ãªãã€ã³ããããã
4.3 èšç·Žèšå®ã®å€æŽ
æã ã¯ãBERT-base ã¢ãŒããã¯ãã£ã«å¯Ÿãããã¬ãŒãã³ã°ãã€ããŒãã©ã¡ãŒã¿ã®åœ±é¿ãç 究ããŠããã ãªãªãžãã«ã® BERT ãã¬ãŒãã³ã°ã¬ã·ãã¯ãåœç¶ã®ããšãªãããè©°ã蟌ã¿èšå®ã«ãããã¢ãã«æ§èœãäœãçµæãšãªããããå€ãã®æšæºçãªéžæãåæ€èšããã
ç®ç
æã ã¯ã15ïŒ ã®ãã¹ãã³ã°çã§ããŒã¯ã³ã®å®å šã«ããã¯ããããããã¯ã«ãã¹ã¯ãããèšèªã¢ããªã³ã°ã®ã¿ã§èšç·ŽããDevlinãã®ãªãªãžãã«ã®èšå®ã§ããã (2019)ã§ã¯ãå šãã¹ã¯ã®10%ãã©ã³ãã ãªåèªã§åãã10%ãå€æŽããªãã ãã倧ããªçãäŸãã°(Wettig et al., 2022)ã§ææ¡ããã40%ã§ã®ãã¹ãã³ã°ã«ããæ¹åã¯èŠãããªããä»é²ãåç §ã ãŸããåè¿°ã®20%ã«ãŒã«ãæå¹ã«ããŠãç¡å¹ã«ããŠãå·®ã¯èŠãããªãã å¹³åäºä¹èª€å·®(Hui & Belkin, 2021)ãL1æ倱ãªã©ããã¹ã¯èšèªã®ç®çã«å¯Ÿããä»ã®é¢æ°ãè©äŸ¡ããããå©ç¹ã¯èŠãã ããªãã£ãã
æé©ååšã®éžæ
æã ã¯ãAdam (Kingma & Ba, 2015)ãéžæãããªããã£ãã€ã¶ãšããŠç¶æãã(Loshchilov & Hutter, 2017)ã«èšèŒãããŠããããã«0.01ã®éã¿æžè¡°ãβ1 = 0.9 ãβ2 = 0.98 ãε = 10-12ãšããã äœåãªã³ã¹ãããããã«åŠç¿ãå®å®ãããããã«ãã¯ãªããå€0.5ã§åŸé ã¯ãªããã³ã°ãå«ãã ãããã®ãã©ã¡ãŒã¿ãåççã«å€åãããŠããé¡èãªå€åã¯èŠãããªãã ä»ã®äžæ¬¡é©å¿åãªããã£ãã€ã¶ïŒShazeer & Stern, 2018; Liu et al., 2020aïŒããã¹ãããŸããããæã ã®èšå®ã§ã®å©ç¹ã¯èŠã€ãããŸããã§ããã ããã«ãé«æ¬¡ã®ãªããã£ãã€ã¶ãçšããå©ç¹ãèŠãã ããªããïŒYadav, 2020; Anil et al., 2021ïŒãç¹ã«é«æ¬¡ã®ãªããã£ãã€ã¶ã§ã¯å®è£ ã«å€§ããªã°ãã€ããããããšã«æ³šæããã
åŠç¿çã®ã¹ã±ãžã¥ãŒã«ãšããŒã¯
IzsakãïŒ2021ïŒã®å©èšã«åŸããåŠç¿çã¹ã±ãžã¥ãŒã«ãäºç®ãšé£åãããäºç®ããŒãã«ãªããšåŠç¿çãæžè¡°ããããã«åã¹ã±ãŒã«åããã èå³æ·±ãããšã«ãå³2ã«ãããŠãã°ããŒãã«ã«å€æ°ã®åŠç¿ç圢ç¶ãåæ§ã®æ倱äœæžãããããäžæ¹ã§ãã¹ã±ãžã¥ãŒã«ã®éžæã«ãã£ãŠããã€ãã®å©çãåŸãããšãã§ããããšã芳å¯ãããã æã ã¯ã10-3ã®ããŒã¯åŠç¿çãæã€åçŽãª1ãµã€ã¯ã«åŠç¿çïŒSmith & Topin, 2018ïŒããæã ã®äºç®å ã§æå°ã®äºååŠç¿æ倱ãããããããšãèŠåºããã
ããããµã€ãºã¹ã±ãžã¥ãŒã«
æã ã®èšå®ã®ç¹æ®æ§ã¯ãåäžã®GPUã«å¶éãããŠããããããã®GPUã«å ¥ããã€ã¯ãããããµã€ãºïŒã»ãšãã©ã®å®éšã§96ïŒã¯ãæé©ãªããããµã€ãºãããæ°åå°ãããšããç¹ã§ãã ãã®èšå®ã«ãããæé©ãªããããµã€ãºã¯ãããªãã¬ãŒãã³ã°ãã¹ãæå°ã«ããããã«ã¯çŽ1536ã§ããã2080tiã®ããŠã³ã¹ããªãŒã ããã©ãŒãã³ã¹ãæ倧ã«ããããã«ã¯4032ã§ããããšãããããŸããã ã€ãŸãã2080tiã§ã¯ãåŸé ãèç©ãããããã16åãš42åã®ãã©ã¯ãŒã/ããã¯ã¯ãŒãã»ãã¹ããšã«æŽæ°ãå®è¡ããã ãã§ãã ãã倧ããªA4000ãšA6000ã«ãŒãã§ã¯ãããã¯128/256ã®ãã€ã¯ãããããµã€ãºãš4096ã®æçµããããµã€ãºã«çžåœããæã ã¯åã³ãããèç©ããŸãã 幞ããªããšã«ãç©æ¥µçãªããããµã€ãºã¹ã±ãžã¥ãŒã«ãçšããããšã§ãå°ããªã¹ããŒãã¢ãããèŠã€ããããšãã§ããŸãã ãã®çµæããã¬ãŒãã³ã°ã®æ©ã段éã§ããé²æ©ããæ§èœã«ããããªå©ç¹ãããããã ãŸããèªåçãã€é©å¿çãªãããã«ãŒã«ïŒDe et al., 2017; Bollapragada et al., 2018a;bïŒãå®éšãããããããã®é©å¿çã¹ã±ãžã¥ãŒã«ããã®æè¯ã®çµæã¯ãåºå®ãããç·åœ¢ã¹ã±ãžã¥ãŒã«ã«é¡äŒŒããŠããããšãããã£ãã ç°¡ç¥åã®ããã«ãæã ã¯ãã ããåçŽãªç·åœ¢ã¹ã±ãžã¥ãŒã«ã«åºå·ããã
ããããã¢ãŠãã®åé€
DevlinãïŒ2019ïŒã®ãªãªãžãã«ã®BERTã¢ãã«ã¯ãVasw aniãïŒ2017ïŒã®ããã«ããããã¢ãŠããå«ã¿ãç·èšç®äºç®ã«å¯ŸããŠãã¬ãŒãã³ã°ããŒã¿ãå°ãããšãã«ãªãŒããŒãã£ããã£ã³ã°ãé²æ¢ãããã®ã§ããã æ£ååãšããŠæçšã§ããäžæ¹ãããããã¢ãŠãã¯ãé¢é£ããç¹åŸŽãããããããããšæŽæ°ãçºçããªããããåãã©ã¡ãŒã¿ãèŠãåŸé æŽæ°ã®æ°ãå¹æçã«æžå°ãããã åæã«ãæŽæ°ã®å®è¡æéã¯ããããã¢ãŠãã®ååšã«åŒ·ã圱é¿ãããªããããããããã¢ãŠãã¯1ç§ãããã®æŽæ°ãæ£å³ã§æžå°ãããããšã«ãªããŸãã crammingã®èšå®ã§ã¯ãåŠç¿ããŒã¿ã¯èšç®éã«æ¯ã¹å€§ããã ã·ã³ã°ã«ãšããã¯ã¹ã±ãžã¥ãŒã«ã«ãããªãŒããŒãã£ããã£ã³ã°ã¯äžå¯èœã§ããããã©ã¡ãŒã¿æŽæ°æ°ãæ倧åãããããããªãã¬ãŒãã³ã°äžã¯ããããã¢ãŠããç¡å¹åãã(Brown et al., 2020)ã äžæµã®åŸ®èª¿æŽã®éã«ããããã¢ãŠããå床æå¹ã«ããããããã¢ãŠãå€ã0.1ã«èšå®ããã ããã«ãé·ãã«ãªãã¥ã©ã ïŒLi et al., 2022ïŒïŒä»é²åç §ïŒãšããŒã¯ã³ããããã¢ãŠãïŒHou et al., 2022ïŒãå®éšããããæã ã®èšå®ã§ã®å©åŸã¯èŠãã ããªãã£ãã
4.4 ããŒã¿ã»ããã®æé©å
æã ã¯ãã¹ã±ãŒãªã³ã°æ³åããã¢ãŒããã¯ãã£ã®ä¿®æ£ã«ãã£ãŠïŒèšç®å¹çãè¶ ããïŒå€§ããªå©åŸãåŸãããã®éå£ãšãªãããšãäžèšã§çºèŠããã ããããã¹ã±ãŒãªã³ã°åã¯ãããè¯ãããŒã¿ã§åŠç¿ããããšã劚ãããã®ã§ã¯ãããŸããã 1ç§éã«å€ãã®ããŒã¯ã³ãåŠç¿ããèœåã䜿ãæãããããããåªããããŒã¯ã³ãåŠç¿ããããšãç®æããªããã°ãªããŸããã æã ã¯ãããè¯ãããŠã³ã¹ã±ãŒãªã³ã°ãžã®2ã€ã®ããŒã¿ããŒã¹ã®çµè·¯ãèããã ãŸããæ¢åã®ããŒã¿ãæ§ã ãªæ¹æ³ã§ãã£ã«ã¿ãªã³ã°ãåŠçããœãŒãããã 第äºã«ãããŒã¿ãœãŒã¹ã亀æããããšã§ããã
ãã®ç®çã®ããã«ãGutenbergãBooks3ãWikipedia (en)ããã®çããã¹ãã®ã¿ãå«ãThe Pile (Gao et al., 2020)ã®ããã€ãã®ãµãã»ããã䜿ã£ãŠå®éšããã ãããã®PileããŒã¿ã»ãããããæåã®4Ã106ãšã³ããªãŒãããŒã¯ã³åããæã ã®ã·ã³ã°ã«ãã¹ã«ååãªããŒã¯ã³ãçæããã ãŸããCommon Crawl (Raffel et al., 2020) ã®å·šå€§ãªã¯ãªãŒã³çã§ããC4ã人æ°ã®ããããŒã¿ãœãŒã¹ã§ãããããããæåã®20Ã106ãšã³ããªãã¹ããªãŒãã³ã°ããã åããŒã¿ã»ãœãŒã¹ã«å¯ŸããŠãã»ã¯ã·ã§ã³ 4.1 ã§èª¬æããããã«ãç¬èªã® WordPiece ããŒã¯ãã€ã¶ãå çæããŸãã ããã4ã€ã®ãœãŒã¹ã®ãã¡ãPileãããŠã³ã¹ããªãŒã MNLIæ§èœã®é¢ã§æãåªããŠããããšãããããŸãã ããããç¹ã«C4ããŒã¿ã»ããã«ã€ããŠã¯ãè¿œå åŠçã«ãã£ãŠããã«æ¹åã§ããããšãå€æããã
ãŸããLeeã(2022)ã«èšèŒãããŠããããã«ãæ£ç¢ºãªéšåæååã®éè€æé€ãè©äŸ¡ããããæã ã®ã±ãŒã¹ã§ã¯ããŠã³ã¹ããªãŒã æ§èœã«åœ¹ç«ããªãããšãåãã£ãã 次ã«ãå§çž®äžå¯èœãªããŒã¿ã«å¯Ÿãããã£ã«ã¿ãªã³ã°ããã¹ãããŸãã ããŒã¯ã³ååšãã®ãã®ã䜿ã£ãŠãC4ã»ããããããŸãå§çž®ã§ããªããã¹ãŠã®åŠç¿ã·ãŒã±ã³ã¹ãåé€ããŸããåçŽã«éŸå€tãäŸãã°t = 0.3ãèšå®ãããšã³ããªå ã®ããŒã¯ã³æ°ãçã®æåæ°ã®tåãã倧ãããã¹ãŠã®ãšã³ããªãããŒã¿ã»ããããåé€ããŸãã ããã«ãããäŸãã°å§çž®ãã«ããHTMLãmarkdownã®ã³ãŒãã§æ§æãããã·ãŒã±ã³ã¹ãªã©ãåé€ããããšãã§ããŸãã é©ãã¹ãããšã«ããã®çµæãC4ãå€§å¹ ã«æ¹åãããè¡š2ã«ãŸãšããããŠããŸãã
次ã«ã2ã€ã®æ¹åããããã«ããã€ãã®æ¹åãèŠãããŸããã 1ã€ç®ã¯ãããŒã¯ã³åããããã¹ãŠã®ã·ãŒã±ã³ã¹ãäœããã®ææšã§ãœãŒãããããšã2ã€ç®ã¯ãæçµçãªããããµã€ãºã倧ããããããšã§ãã ãã£ã«ã¿ãªã³ã°ã®ããã«ãæã ã¯ãã¹ãŠã®ããŒã¯ã³åãããã·ãŒã±ã³ã¹ããã®å¹³åïŒ1ã°ã©ã ïŒããŒã¯ã³ã®æç çã§ãœãŒãããå¯èœæ§ã®é«ãã·ãŒã±ã³ã¹ãæåã«çŸããããã«ããã ããã¯ããã倧ããªã³ãŒãã¹ããæœåºããããšã§ãå¯èœæ§ã®äœãã·ãŒã±ã³ã¹ã«å°éããããšããªããããè¥å¹²åŒ·åããããšãã§ããå¹æããããŸãã æåŸã«ãïŒã»ã¯ã·ã§ã³4.3ã§è¿°ã¹ãããã«ïŒåŠç¿ã®æåŸã«ããããµã€ãºã4032/4096ã«å¢å ãããããšã¯ãC4ã§ã¯äžé£ãåãã«å¹æçã§ãããbookcorpus-wikipediaã§ã¯ããã»ã©ã§ããããŸããã ã©ã¡ãã®ä¿®æ£ãæçµçã«ã¯ããŒã¿ååžã®æºããã«ãã£ãŠåŠç¿ãé»å®³ãããå¯èœæ§ãæžããããšãã§ãããšèããŠããã
èªåœãµã€ãº
ãŸãã(Devlin et al., 2019)ã«èšèŒãããŠãã32768ãšãããªãªãžãã«ã®èªåœãµã€ãºãè©°ã蟌ã¿äœå¶ã§æé©ãã©ããã確èªããã å éšçã«ãããã¯æç«ããªããããããªããèªåœã倧ãããã°å€§ããã»ã©ãåºæããŒã¯ã³ãšåºæããŒã¯ã³éã®é¢ä¿ãèšç·Žäžã«åŠç¿ããå¿ èŠãããã äžæ¹ãèªåœã®ãµã€ãºã倧ãããããšãããŒã¿ã¯ããã«å§çž®ããïŒããæç¹ã§æ¶æ» ããŸããïŒãè©°ã蟌ã¿åŠç¿æã«æåã§ããããŒã¯ã³ã®åºå®æ°ã«ãããå€ãã®æ å ±ãå§çž®ããããšãå¯èœã«ãªããŸãã å³3ã§ã¯ãããã¯ã³ãŒãã¹ãšãŠã£ãããã£ã¢ã®ããŒã¿ã«ãããŠãèªåœãµã€ãºã倧ããã»ã©å¹³åGLUEã¹ã³ã¢ãé«ããªãããšãããããŸãããMNLIã¿ã¹ã¯ã§ã¯å ã®32768èªåœãµã€ãºä»è¿ã§å¹æãé æã¡ã«ãªã£ãŠããŸãã ä»åŸããã®èªåœãµã€ãºãç¶æããã
5. glueã®æ§èœã®åŸ®èª¿æŽ
æåŸã«ãWang et al. (2018)ã®GLUEãã³ãããŒã¯ã§ãDevlin et al. (2019)ã®ããã«WNLIãé€ããæ§èœã系統çã«è©äŸ¡ããã æã ã¯ãåã®ã»ã¯ã·ã§ã³ã®éãMNLIïŒmïŒã®ã¿ã䜿çšããå®å šãªGLUEã¹ã³ã¢ã«åºã¥ããŠãã€ããŒãã©ã¡ãŒã¿ããã¥ãŒãã³ã°ããªãããšã«æ³šæããŸãã æã ã¯ãã»ã¯ã·ã§ã³2ã§æ·èšãããåãå¶çŽã®äžã§ãäºååŠç¿ãããBERT-baseãã§ãã¯ãã€ã³ããšæã ã®ã¢ãã«ã®äž¡æ¹ã埮調æŽããã BERT-baseã§ã¯ãããããµã€ãº32ãåŠç¿ç2Ã10-5ã§5ãšããã¯ã®éããã¹ãŠã®ããŒã¿ã»ããã埮調æŽããŸããã è©°ã蟌ãŸããã¢ãã«ã«ã€ããŠã¯ãããã¯æé©ã§ã¯ãªããããããµã€ãº16ãã³ãµã€ã³æžè¡°ã䌎ãåŠç¿ç4Ã10-5ããããããªæ¹åãåŸãããããšãããããŸãïŒãã®ã»ããã¢ããã¯ãäºååŠç¿ãããBERTãã§ãã¯ãã€ã³ããæ¹åããŸããïŒã è¡š3ããã³è¡š4ã¯ãGLUEããŠã³ã¹ããªãŒã ã¿ã¹ã¯ã«ããããã®ã»ããã¢ããã®æ§èœã瀺ããŠããŸãïŒ5ããŠã³ã¹ããªãŒã è©Šéšã«ãããäžå€®å€ãšããŠïŒã ããã§ã¯ãå ã®BERT-ããŒã¹ãã§ãã¯ãã€ã³ããäºç®ã«éããåŸã«åæ¢ããBERTäºåèšç·Žèšå®ã®åçŸã(Izsak et al., 2021)ã«èšèŒãããã»ããã¢ãããããã³åGPUã»ããã¢ããã«ã€ããŠ1æ¥èšç·Žããä¿®æ£ã¬ã·ããæ¯èŒããŠããŸãã å šäœãšããŠãæ§èœã¯é©ãã»ã©ãŸãšãã§ãç¹ã«MNLIãQQPãQNLIãSST-2ã®å€§ããªããŒã¿ã»ããã§ã¯ãäžæµã®åŸ®èª¿æŽã«ããå®å šãªBERTã¢ãã«ãšè©°ã蟌ãŸããå€çš®ã®éã«æ®ãå·®ç°ãæ»ããã«ããããšãã§ããŸãã ããã«ãéãããäºç®ã§ã®çŽ æŽãªBERTèšç·Žãšã(Izsak et al., 2021)ã«èšèŒãããã¬ã·ãã®äž¡æ¹ã«å¯Ÿããå®è³ªçãªå©åŸãèŠåºãããšãã§ããŸãã Izsakãã2021ïŒã«ã€ããŠã¯ãèšè¿°ãããã¬ã·ãã¯ããšããšãã«8 GPUãµãŒããŒãã¬ãŒãã®ããã«èšèšããããã®å®éšã§ã¯ããå°ããªGPUã«ããã«BERT-倧èŠæš¡ã¢ãã«ãçµãããšã¯ãç§ãã¡ã®ã·ããªãªã§ãã®ã¬ã·ãã®æ§èœå£åã®å€§éšåã«è²¬ä»»ããããŸãã å šäœãšããŠãè©°ã蟌ãŸããã¢ãã«ã¯ãå°ããããŒã¿ã»ããã§ãã£ãŠããã»ãšãã©æ©èœããŸãã ããããå¹³åå€ã¯CoLAïŒCorpus of linguistic acceptabilityïŒïŒWarstadtãã2019ïŒã§å€§å¹ ã«äœäžããŠããŸãã ãã®æåã¯èå³æ·±ããæã ã¯2ã€ã®ä»®èª¬ãæ瀺ããã ãŸãã埮調æŽã®ããã«éžæãããã°ããŒãã«ãã€ããŒãã©ã¡ãŒã¿ããç¹ã«CoLAã«é©åããŠããªãããšãèããããã CoLaã®æ§èœã¯ãã€ããŒãã©ã¡ãŒã¿ã«é¢ããŠèãå¯èœæ§ããããJiao et al. (2020)ã¯CoLaã®ã¿ã§é·ããã¬ãŒãã³ã°ããããJoshi et al. (2020)ã¯CoLaã®ã¿ã§é·ãèšç·ŽããããJoshiã(2020)ã¯CoLaã®ã¿ã§å°ãªãèšç·ŽãããããŠããŸãã ããã«ãããããããBERTã«ã€ããŠã¯ãã°ããŒãã«ãªãã€ããŒãã©ã¡ãŒã¿ã®ã»ãããååšããè©°ã蟌ã¿åã¢ãã«ã®æ¬ é¥ãææããŠããã 第äºã®ä»®èª¬ãšããŠããããã®ã¢ãã«ã¯ãCoLAã§ããŸãããããã«ååãªããŒã¿ãèšæ¶ããåã«ãããå€ãã®ããã¹ããåŠçããå¿ èŠãããããšãèããããã ããã¯ãLiu et al. (2021d)ã¯ãäžéBERTãã§ãã¯ãã€ã³ãããããŒãããéã«ãCoLAãä»ã®äžæµã¿ã¹ã¯ãšæ¯èŒããŠæ¯èŒçæ©ãåŠç¿ãããããšãçºèŠããŠããŸãã äžæ¹ãç¹ã«CoLAã«é¢ããæ¬ é¥ã¯ãBERTãããå°ããªã¢ãŒããã¯ãã£ã«èžçããã¢ãããŒãïŒSun et al., 2019; Turc et al., 2019; Mukherjee et al., 2021ïŒã«ãå ±éããããã¯èšèªçå容æ§ã®ããã®éãããèœåã䌎ãå¯èœæ§ãããã
5.1 ã¢ãã¬ãŒã·ã§ã³ - ã©ã®å€åãæ¬åœã«éèŠãªã®ãïŒ
è¡š5ã§ã¯ããã®ç 究ã§è°è«ããããã¹ãŠã®å€æŽã®ã¢ãã¬ãŒã·ã§ã³ç 究ã®èŠçŽã瀺ãã åã®ã»ã¯ã·ã§ã³ãšåæ§ã«ãå€æŽãã¢ãŒããã¯ãã£ããã¬ãŒãã³ã°ãããã³ããŒã¿ã® 3 ã°ã«ãŒãã«åé¡ãã ãã¹ãŠã®å€æŽãå ã® BERT ã¬ã·ãã«ãªã»ããããããšã«ãã£ãŠãåã°ã«ãŒããåé€ããŠããŸãã ããã§ãPreNormã¬ã€ã€ãŒæ§é ãªã©ã®ã¢ãŒããã¯ãã£ã®å€æŽã¯ããã¬ãŒãã³ã°ã»ããã¢ããã§èª¬æããããç©æ¥µçãªåŠç¿çã¹ã±ãžã¥ãŒã«ãå¯èœã«ããããããããã®å ŽåããŸãæå°éã®å€æŽãè¡ãå¿ èŠãããããšãããããŸãã ãã®ããšãèæ ®ãããšãã¢ãŒããã¯ãã£ã®å€æŽã«ããå¹³åGLUEã¹ã³ã¢ãçŽ2ãã€ã³ããããŒã¿ã®å€æŽã«ããçŽ1ãã€ã³ãããã¬ãŒãã³ã°ã®å€æŽã«ããçŽååã®ãã€ã³ããåŸãããããšãããããŸãã
5.2 ãã¬ãŒãã³ã°ãé·ããªããšã©ããªããïŒ
ãŸãããããŸã§è¿°ã¹ãŠããè©°ã蟌ã¿åŒã®ã¬ã·ããããå€ãã®äºç®ã§äœ¿çšããå Žåã«ã©ããªãããæ€èšŒããã ãã®ãããA6000 GPU8å°ã§48æéã¢ãã«ãåŠç¿ããããšãããåèš208ãšã¯ãµFLOPïŒc.f.ïŒãšãªã£ãã è¡š1. ãããŸã§ã®èšå®ããã®ãŸãŸé©çšãã48æéãšããæ°ããªäºç®ãã«ããŒããããã«ãåŠç¿çã¹ã±ãžã¥ãŒã«ãåçŽã«ã¹ã±ãŒãªã³ã°ããŠããŸãã è¡š6ã§ãæã ã¯ãè°è«ãããã¬ã·ãããã倧ããªèšç®ããžã§ããã«çŽã¡ã«äžè¬åãããããšãããããŸãã ããã¯ãå°ãªããšããä»ãïŒã»ã¯ã·ã§ã³4.4ã§ãœãŒããããïŒããŒã¿ã»ãããå°ããããŠãäœåºŠãç¹°ãè¿ãããŠããããšãããé©ãã¹ãããšã§ãã æ°ãã«åŠç¿ãããã¢ãã«ã¯ãç¹ã«MNLIãšSST-2ã§åŒ·åãªæ§èœãæã¡ãå ã®BERTãã§ãã¯ãã€ã³ããå€§å¹ ã«äžåããLiuãã®roBERTA-baseãã§ãã¯ãã€ã³ããšåæ§ã®ç¯å²ã«åãŸããŸãïŒLiu et al. (2019)ã®roBERTA-baseãã§ãã¯ãã€ã³ããšåæ§ã®ç¯å²ã«å ¥ãããããã¯ããå€ãã®èšç®éã§åŠç¿ããããã®ã§ããã ããããïŒåã³ïŒCoLAãªã©ã®ä»ã®ã¿ã¹ã¯ã§ã¯ãæ°ããã¢ãã«ã¯ããã倧ããªèšç®é åã§ãã»ãšãã©æ¹åããŸããã
6 LIMITATIONS
ãã®ç 究ã§ã¯ãMLMç®æšã§åŠç¿ãããå€æåšããŒã¹ã®ã¢ãŒããã¯ãã£ã«èª¿æ»ãéå®ããŸããã ããããªãããã»ã¯ã·ã§ã³2ã§æèµ·ãããè©°ã蟌ã¿ã®äžè¬çãªã¿ã¹ã¯ã¯ããããã®å¶çŽãç·©åããå Žåã§ãèå³æ·±ããã®ã§ãããšèããã ç¹ã«ç®çèªã«å¯ŸããŠã¯å€ãã®ä¿®æ£ãææ¡ãããŠããïŒJoshi et al., 2020; Bao et al., 2020; Bajaj et al., 2022; Tay et al., 2022bïŒã äžæ¹ãArtetxe et al. (2022)ãšWang et al. (2022)ã¯ãMLMãäºååŠç¿ç®çãšããŠã¯ãŸã ããæã¡ããããããšãçºèŠããããELECTRA (Clark et al., 2019; 2020; He et al., 2021) ãªã©ã®ä»ã®ææ¡ã¯ãè©°ã蟌ã¿ã¢ãã«ã«ãšã£ãŠæçã§ãããããããªãæ¡çšããããšãã§ããã ãŸããæé©ãªã¢ãŒããã¯ãã£ã¯ãã©ã³ã¹ãã©ãŒããŒããŒã¹ã§ã¯ãªããããããªã(Merity, 2019; Fusco et al., 2022; Peng, 2021)ã
7 çµè«
æã ã¯ããã©ã³ã¹ãã©ãŒããŒããŒã¹ã®èšèªã¢ãã«ããéåžžã«éãããèšç®éã®èšå®ã«è©°ã蟌ãŸããå Žåã«ãã©ã®çšåºŠã®æ§èœãéæã§ããããè°è«ããããã€ãã®ä¿®æ£ã«ãããGLUEäžã§é©åãªäžæµæ§èœã«ã€ãªããããšãçºèŠããŸããã ããããKaplanã(2020)ã®å€ãã®ç€ºåãçµéšçã«èŠãã ããèšèªã¢ãã«ãè©°ã蟌ãããšã¯é£ããããã§ãã (2020)ã®å€ãã®ç€ºåããã®é åã§ãä¿æããããã倧ããªã¢ãã«ã«ããæ¹åã¯ããã®é ãé床ã«ãã£ãŠçžæ®ºãããäŸããããŸãã æã ã¯ããã®ç 究ããã»ã¯ã·ã§ã³2ã§å®åŒåããè©°ã蟌ã¿åé¡ã®æ¢æ±ã®ããã®ããŒã¹ã©ã€ã³ãæäŸããè¿å¹Žãã©ã³ã¹ãã©ãŒããŒã¢ãŒããã¯ãã£ã«ææ¡ãããå€ãã®æ¹åãšããªãã¯ã«ãããªãå ãåœãŠãããšãã§ããã°ãšé¡ã£ãŠããã
Last updated