#title Apache Nutch MultiLingual Support [http://nislab.human.waseda.ac.jp/blog/?page_id=7 SenÀ» ÀÌ¿ëÇÒ ¼ö ÀÖ°Ô ÇØÁÖ´Â NutchAnalysis.jj ÆÐÄ¡¹æ¹ý]À» Âü°íÇØ ÀϺ»¾î 󸮰¡ °¡´ÉÇØÁ³À½. ¾ÆÁ÷ ÀÚµ¿ ÀνÄÀº ¸øÇÏ°í ÀÖÀ½.[[BR]] [http://kazuomik.livejournal.com/55872.html ÀϺ»¾î N-Gram Profile ¸¸µé±â]¸¦ ÀÌ¿ëÇØ ÀϺ»¾î°¡ ÀÚµ¿ ÀÎ½ÄµÇ°Ô ÇÒ ¼ö ÀÖ´Ù. ¾ÆÁ÷ Å×½ºÆ®´Â ¸øÇغÃÀ½.[[BR]] == ¾î¶»°Ô? == nutchÀÇ [http://wiki.apache.org/nutch/ °ø½Ä À§Å°]¸¦ µû¶ó Çغ»´Ù. [http://wiki.apache.org/nutch/FAQ FAQ]µµ ÂüÁ¶ÇÒ ¸¸ÇÑ ³»¿ëÀÌ ÀÖ´Ù. java.netÀÇ [http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html Introduction to Nutch, Part 1: Crawling]°ú [http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html Introduction to Nutch, Part 2: Searching]µµ Âü°íÇØ º¸¸é ÁÁÀ»µí.. === ½ÃÀÛ === ·ÎÄÿ¡ nutch binary¸¦ ¼³Ä¡ÇÑ ÈÄ¿¡ {{{ dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ mkdir test dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi test/nutch dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ cat test/nutch http://my.domain.name/}}} Å©·Ñ·¯°¡ ¿ÜºÎ ¸µÅ©·Î ³ª°¡´Â °ÍÀ» ¸·±â À§ÇØ ¾Æ·¡¿Í °°ÀÌ ¼öÁ¤. {{{ dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/crawl-urlfilter.txt # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*my.domain.name/}}} === ¹®Á¦ ¹ß»ý ¹× ÇØ°á === nutch Å©·Ñ·¯°¡ »ó¼¼ÇÑ ¸Þ½ÃÁö ¾øÀÌ °è¼Ó NullPointerExceptionÀ» ³ÂÀ½. ã¾Æº» °á°ú ±âº» ¼³Á¤¿¡ Ãß°¡ÀûÀ¸·Î ÇÊ¿äÇÑ ³»¿ëÀÌ ´©¶ôµÊ. {{{ dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/nutch-site.xml}}} ÇÁ·ÎÆÛƼµé Áß¿¡ Å©·Ñ·¯ÀÇ Á¤º¸¸¦ ³Ö¾îÁØ ÈÄ¿¡ ¹®Á¦¾øÀÌ ½ÇÇàµÊ. ±âº»°ªÀÌ ºñ¾î À־ ExceptionÀÌ ¹ß»ýµÈ °ÍÀ¸·Î º¸ÀÓ. {{{ http.agent.name My Nutch Test http.agent.description Test http.agent.url no http.agent.email no }}}