¤½¤Î̾¾Î¤Ç¥Í¥Ã¥È¾å¤òÁû¤¬¤»¤¿¡Ê¡©¡ËGoogle»±²¼¤ÎDeepmind¤¬³«È¯¤·¤¿¿Í¹©ÃÎǽ¡ØDQN¡Ù¡£Â¿¤¯¤ÎÊý¤Ï¤´Â¸¤¸¤À¤È»×¤¦¤¬¡¢¡ÈDQN¡Ê¥É¥¥å¥ó¡Ë¡É¤È¤¤¤¦¤Î¤Ï¡¢ÆüËܤΥͥåȥ¹¥é¥ó¥°¤Ç¡È¤¤¤ï¤æ¤ë¥ä¥ó¥¡¼¤ò¤Ï¤¸¤á¤È¤·¤¿¡¢ÃÎÀ¤Ë·ç¤±¤ëº¤¤Ã¤¿¤Ò¤È”¤ò»Ø¤¹¸ÀÍդȤ·¤Æ»È¤ï¤ì¤Æ¤¤¤ë¡£
¤½¤Î¤¢¤Þ¤ê¤ËÈéÆù¤Ê¶öÁ³¤Î¥Í¡¼¥ß¥ó¥°¤Î¥¤¥ó¥Ñ¥¯¥È¤Ë²¡¤µ¤ì¤Æ¡¢¤½¤ÎGoogle¤Î¡ØDQN¡Ù¼«ÂΤ¬¤É¤¦¤¤¤¦¥â¥Î¤Ê¤Î¤«¤Þ¤Ç¤Ïµ¤¤Ë¤«¤±¤Ê¤«¤Ã¤¿¤Ò¤È¤â¿¤¤¤Î¤Ç¤Ï¤Ê¤¤¤À¤í¤¦¤«¡£¤½¤³¤Çº£²ó¡¢Google¤Î¸¦µæ¼ÔDharshan Kumaran»á¤ÈDemis Hassabis»á¤¬Åê¹Æ¤·¤¿¥Ö¥í¥°¤Îʸ¾Ï¤«¤é¡¢¡ØDQN¡Ù¤ÎÆâÍƤò¾Ò²ð¤·¤è¤¦¡£
ÀâÌÀ¤ò¼õ¤±¤º¤Ë¥Æ¥ì¥Ó¥²¡¼¥à¤¬¤Ç¤¤ë
¤½¤ÎÅê¹Æ¤Ï¡¢¡Ö¥²¡¼¥àµ¡¡Ø¥¢¥¿¥ê2600¡Ù¤Î¡Ø¥Ö¥ì¥¤¥¯¥¢¥¦¥È¡Ù¡Ê¤è¤¦¤¹¤ë¤Ë¥·¥ó¥×¥ë¤Ê½é´ü¤Î¡È¥Ö¥í¥Ã¥¯¤¯¤º¤·¡É¥²¡¼¥à¡Ë¤ò»Ï¤á¤Æ¥×¥ì¥¤¤·¤¿¤Ò¤È¤â¡¢¤¹¤°¤Ë¤½¤ÎÍ·¤ÓÊý¤Ï¤ï¤«¤Ã¤¿¤Ï¤º¤À¡×¤È¤¤¤¦°ìʸ¤«¤é»Ï¤Þ¤ë¡£¤½¤ì¤Ï¡¢¸½¼ÂÀ¤³¦¤Ç¥Ü¡¼¥ë¤¬¤É¤Î¤è¤¦¤ËÄ·¤ÍÊ֤뤫¤ò¡¢ÉáÄ̤ΤҤȤÏÃΤäƤ¤¤ë¤«¤é¤À¡¢¤È¤¤¤¦¤Î¤¬¤½¤ÎÍýͳ¤À¡£
¤Ç¤Ï¡¢¤â¤·¤½¤ó¤ÊÃ챤¬¤Ê¤¯¡¢±ÇÁü¤È¥³¥ó¥È¥í¡¼¥ë¥Ñ¥É¥ë¤ÈÆÀÅÀ¾ðÊó¤À¤±¤òÍ¿¤¨¤é¤ì¤¿¤È¤·¤¿¤é¡¢¤Ò¤È¤Ï¤É¤¦¤¹¤ë¤À¤í¤¦¤«¡©¡¡¤¢¤ë¤¤¤Ï¿Í¹©ÃÎǽ¤Ê¤é¤É¤¦¤¹¤ë¤À¤í¤¦¤«¡©
¤½¤ì¤ò¤ä¤Ã¤Æ¤Î¤±¤ë¤Î¤¬¡ØDQN¡Ù¤À¤È¤¤¤¦¡£¡ØDQN¡Ù¤È¤Ï¡Èdeep Q-network¡É¤È¤¤¤¦¥¢¥ë¥´¥ê¥º¥à¤Îά¤À¡£¤³¤Î¡ØDQN¡Ù¤Ï¡¢¥Ö¥í¥Ã¥¯¤¯¤º¤·¤À¤±¤Ç¤Ê¤¯¡¢²£¥¹¥¯¥í¡¼¥ë¤Î¥·¥å¡¼¥Æ¥£¥ó¥°¥²¡¼¥à¤ä¥Ü¥¯¥·¥ó¥°¥²¡¼¥à¡¢3D¤Î¥«¡¼¥ì¡¼¥¹¥²¡¼¥à¤Ê¤É¤ò¡¢Í¿¤¨¤é¤ì¤ë±ÇÁü¤È¡¢²Äǽ¤ÊÁàºî¡¢¤½¤·¤ÆÆÀÅÀ¤Î¾ðÊó¤À¤±¤«¤é¥×¥ì¥¤¤Ç¤¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤·¤Þ¤¦¤È¤¤¤¦¤Î¤À¡£¤³¤Î¸¦µæ¤Ë´Ø¤·¤Æ¤Ï²Ê³Ø»¨»ï¡Ø¥Í¥¤¥Á¥ã¡¼¡Ù¤Ëȯɽ¤µ¤ì¤Æ¤¤¤ë¡£
|
|
¡ØDQN¡Ù¤Ï¡¢49¸Ä¤Î¥²¡¼¥à¤Î¤¦¤Á43¸Ä¤Ç¡¢¡ØDQN¡Ù°ÊÁ°¤Îµ¡³£Íѳؽ¬¥á¥½¥Ã¥É¤ò¾å¤Þ¤ï¤ë·ë²Ì¤ò½Ð¤·¤¿¡£¤½¤ì¤É¤³¤í¤«¡¢È¾¿ô°Ê¾å¤Î¥²¡¼¥à¤Ë¤ª¤¤¤Æ¡¢¿Í´Ö¤Î¥×¥í¥Õ¥§¥Ã¥·¥ç¥Ê¥ë¡¦¥×¥ì¡¼¥ä¡¼¤ÎÆÀÅÀ¤Î75¡ó°Ê¾å¤ÎÆÀÅÀ¤ò³ÍÆÀ¤·¤¿¡£
¤Ê¤«¤Ë¤Ï¡¢¶Ã¤¯¤Û¤É¹âÅ٤ʺîÀï¤òΩ¤Æ¤Æ¥Ï¥¤¥¹¥³¥¢¤ò½Ð¤·¤¿¥²¡¼¥à¤â¤¢¤ë¡£¤¿¤È¤¨¤Ð¥Ö¥í¥Ã¥¯¤¯¤º¤·¤Ë¤ª¤¤¤Æ¤Ï¡¢ºÇ½é¤Ë¥Ö¥í¥Ã¥¯¤Îü¤Ë·ê¤ò³«¤±¤Æ¡¢¥Ü¡¼¥ë¤ò¥Ö¥í¥Ã¥¯¤Î±ü¤ËÊü¤ê¹þ¤ó¤Ç¥Ö¥í¥Ã¥¯¤Î΢¦¤òÊø¤¹¤È¤¤¤¦¤³¤È¤ò¤ä¤ë¤è¤¦¤Ë¤Ê¤Ã¤¿¤Î¤À¡£
µ²±¤òºÆÀ¸¤·¤ÆÉü½¬¤¹¤ë
É®¼Ô¤Ï¥³¥ó¥Ô¥å¡¼¥¿¡¼¡¦¥×¥í¥°¥é¥à¤Ë¾Ü¤·¤¯¤Ï¤Ê¤¤¤Î¤Ç¡¢¤ï¤«¤ê¤ä¤¹¤¤É½¸½¤¬¤Ç¤¤º¡¢Ä¾Ìõ¤Ë¤Ê¤Ã¤Æ¤·¤Þ¤¦¤Î¤À¤¬¡¢¤³¤Î¡ØDQN¡Ù¤Ï¡¢¡ÈDeep Newral Network¡É¡Ê¿¼¤¤¿À·Ð·Ï¤Î¥Í¥Ã¥È¥ï¡¼¥¯¡Ë¤È¡¢µ¬ÌϤò³ÈÂ礵¤»¤ë¤³¤È¤¬¤Ç¤¤ëÍͼ°¤Î¡ÈReinforcement Learning¡É¡Ê¶¯²½³Ø½¬¡Ë¤òÏ¢·È¤µ¤»¤ë¤³¤È¤¬¤Ç¤¤ë¡¢¤¤¤¯¤Ä¤â¤Îµ¡Ç½¤ò³èÍѤ·¤ÆÀ®¤êΩ¤Ã¤Æ¤¤¤ë¤È¤¤¤¦¡£
¡ÈReinforcement Learning¡É¤È¤¤¤¦¤Î¤Ï¡¢ÆÃÄê¤Î¾õ¶·²¼¤Ç¾Íè³ÍÆÀ¤¹¤ëÊó½·¡Ê¥²¡¼¥à¤Î¾ì¹ç¤ÏÆÀÅÀ¤Î¤³¤È¡Ë¤òºÇÂç¸Â¤Ë¤¹¤ë¤¿¤á¤Ë¡¢¤Ê¤Ë¤ò¤Ê¤¹¤Ù¤¤«¤ò·èÄꤹ¤ëµ¡³£ÍѤγؽ¬ÂηϤÀ¡£
¤½¤·¤Æ¡¢¤½¤ì¤é¤Î¤Ê¤«¤Ç¤â¤Ã¤È¤â½ÅÍפʤΤϡ¢¿À·ÐÀ¸Íý³Ø¤Ë¥Ò¥ó¥È¤òÆÀ¤¿¡ÈExperience Replay¡É¡Ê·Ð¸³¤ÎºÆÀ¸¡Ë¤È¤¤¤¦¥á¥«¥Ë¥º¥à¤À¡£¿Í´Ö¤ÎǾ¤Ë¤ª¤¤¤Æ¤Ï¡¢¿ç̲Ãæ¤Ë³¤ÇϤ¬ºÇ¶á¤Î·Ð¸³¤ò»×¤¤½Ð¤·¤Æǧ¼±¤ò¿¼¤á¤ë¤È¤¤¤¦³èÆ°¤¬µ¯¤³¤Ã¤Æ¤¤¤ë¤½¤¦¤À¤¬¡¢¤³¤ì¤ÈƱÍͤˡØDQN¡Ù¤Î³Ø½¬Ãʳ¬¤Ë¤ª¤¤¤Æ¤â¡¢²áµî¤Ëµ¯¤³¤Ã¤¿¤³¤È¤Îµ²±¤ò°ú¤ÃÄ¥¤ê½Ð¤·¤Æ¤¤Æ¥È¥ì¡¼¥Ë¥ó¥°¤¹¤ë¤È¤¤¤¦¤â¤Î¤À¡£
|
|
¤³¤Î¡ÈExperience Replay¡É¤Îµ¡Ç½¤òÄä»ß¤¹¤ë¤È¡¢¡ØDQN¡Ù¤Î¥Ñ¥Õ¥©¡¼¥Þ¥ó¥¹¤ÏÃø¤·¤¯°²½¤¹¤ë¤È¤¤¤¦¡£¤Ä¤Þ¤ê¡¢¤³¤Îµ¡Ç½¤¬¡ØDQN¡Ù¤ÎÀ®²Ì¤Ë¤È¤Ã¤ÆÈó¾ï¤Ë½ÅÍפÀ¤È¤¤¤¦¤³¤È¤À¡£
¤³¤Î¼ê¤Îµ»½Ñ¤Ï¡¢¾Í褵¤Þ¤¶¤Þ¤ÊʬÌî¤ÇÌò¤ËΩ¤Ä¤³¤È¤¬´üÂÔ¤µ¤ì¤ë¡£¤¿¤È¤¨¤Ð¡¢¥¹¥Þ¡¼¥È¥Õ¥©¥ó¤ÎGoogle¥¢¥×¥ê¤ËÊ£»¨¤Êºî¶È¤òÌ¿Îᤷ¤Æ¤â¡¢¤½¤ì¤òÀ®¤·¿ë¤²¤é¤ì¤ë¤è¤¦¤Ë¤Ê¤Ã¤Æ¤¤¤¯¤À¤í¤¦¡£
¤Þ¤¿¡¢¤³¤Î¥Ö¥í¥°¤ÎÃø¼Ô¤Ï¡¢¡Ö»ä¤¿¤Á¤Ï¤³¤Î³Ø½¬¥¢¥ë¥´¥ê¥º¥à¤¬¡¢µ¤¾Ý³Ø¤äʪÍý³Ø¡¢Ìô³Ø¡¢¥²¥Î¥à²Ê³Ø¤Ê¤É¡¢¤è¤êÂ礤¤¥¹¥±¡¼¥ë¤ÎÊ£»¨¤Ê¥Ç¡¼¥¿¤ò°·¤¦¸¦µæ¼Ô¤Ë¤â¿·¤·¤¤Íý²ò¤Î¥Ò¥ó¥È¤òÍ¿¤¨¤é¤ì¤ë¤³¤È¤ò˾¤ó¤Ç¤¤¤Þ¤¹¡×¤È½ñ¤¤¤Æ¤¤¤ë¡£
¶áǯ¤Î¥í¥Ü¥Ã¥È¹©³Ø¤ä¿Í¹©ÃÎǽ¤Î¿ÊÊâ¤Ï¤á¤¶¤Þ¤·¤¤¤¬¡¢¤½¤ì¤é¤Îµ»ö¤òÆɤि¤Ó¤Ë´¶¤¸¤é¤ì¤ë¤Î¤¬¡¢¡È¤Þ¤º¤Ï¿Í´Ö¡¢¤¢¤ë¤¤¤Ï¾¤ÎÀ¸Êª¤òÃΤ뤳¤È¤«¤é¥¹¥¿¡¼¥È¤·¤Æ¤¤¤ë¡É¤È¤¤¤¦ÅÀ¤À¡£¤³¤ÎGoogle¤Î¥Ö¥í¥°¤ÎºÇ¸å¤Ë¤â¿¨¤ì¤é¤ì¤Æ¤¤¤ë¤¬¡¢¥í¥Ü¥Ã¥È¤ä¿Í¹©ÃÎǽ¤Î¿Ê²½¤Ï¡¢¤½¤ì¤ÈƱ»þ¤Ë¡È¿Í´Ö¤äÀ¸Êª¤Ø¤ÎÍý²ò¤¬¤¤¤Ã¤½¤¦¿¼¤Þ¤ë¡É¤È¤¤¤¦ÅÀ¤Ë¤â¡¢Æ±¤¸¤¯¤é¤¤¤Î²ÁÃͤ¬¤¢¤ë¤è¤¦¤Ë»×¤¦¡£
¤³¤ì¤À¤±¹Å¤¤ÆâÍƤò½ñ¤¤¤Æ¤¤Æ¤Ê¤ó¤À¤¬¡¢Ê¸¾ÏÃæ¤Ë¡ÈDQN¡É¤È¤¤¤¦Ê¸»ú¤¬½Ð¤Æ¤¯¤ë¤¿¤Ó¤Ë¡¢¤ä¤Ã¤Ñ¤ê¡Ö¤¯¤¹¤Ã¡×¤È¾Ð¤Ã¤Æ¤·¤Þ¤¦¡£¤¤¤Ä¤«¿Í¹©ÃÎǽ¤Ë¤â¤³¤ÎÌÌÇò¤µ¤¬Íý²ò¤Ç¤¤ë¤È¤¤¬Íè¤ë¤Î¤À¤í¤¦¤«¡£
|
|