Apache Nutch MultiLingual Support
SenÀ» ÀÌ¿ëÇÒ ¼ö ÀÖ°Ô ÇØÁÖ´Â NutchAnalysis.jj ÆÐÄ¡¹æ¹ýÀ» Âü°íÇØ ÀϺ»¾î 󸮰¡ °¡´ÉÇØÁ³À½. ¾ÆÁ÷ ÀÚµ¿ ÀνÄÀº ¸øÇÏ°í ÀÖÀ½.
ÀϺ»¾î N-Gram Profile ¸¸µé±â¸¦ ÀÌ¿ëÇØ ÀϺ»¾î°¡ ÀÚµ¿ ÀÎ½ÄµÇ°Ô ÇÒ ¼ö ÀÖ´Ù. ¾ÆÁ÷ Å×½ºÆ®´Â ¸øÇغÃÀ½. ¾î¶»°Ô? ¶java.netÀÇ Introduction to Nutch, Part 1: Crawling°ú Introduction to Nutch, Part 2: Searchingµµ Âü°íÇØ º¸¸é ÁÁÀ»µí..
½ÃÀÛ ¶·ÎÄÿ¡ nutch binary¸¦ ¼³Ä¡ÇÑ ÈÄ¿¡
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ mkdir test dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi test/nutch dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ cat test/nutch http://my.domain.name/ Å©·Ñ·¯°¡ ¿ÜºÎ ¸µÅ©·Î ³ª°¡´Â °ÍÀ» ¸·±â À§ÇØ ¾Æ·¡¿Í °°ÀÌ ¼öÁ¤.
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/crawl-urlfilter.txt # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*my.domain.name/ ¹®Á¦ ¹ß»ý ¹× ÇØ°á ¶nutch Å©·Ñ·¯°¡ »ó¼¼ÇÑ ¸Þ½ÃÁö ¾øÀÌ °è¼Ó NullPointerExceptionÀ» ³ÂÀ½.
ã¾Æº» °á°ú ±âº» ¼³Á¤¿¡ Ãß°¡ÀûÀ¸·Î ÇÊ¿äÇÑ ³»¿ëÀÌ ´©¶ôµÊ.
dormael@dormael-desktop:~/nutch-test/nutch-0.8.1$ vi conf/nutch-site.xml ÇÁ·ÎÆÛƼµé Áß¿¡ Å©·Ñ·¯ÀÇ Á¤º¸¸¦ ³Ö¾îÁØ ÈÄ¿¡ ¹®Á¦¾øÀÌ ½ÇÇàµÊ.
±âº»°ªÀÌ ºñ¾î ÀÖ¾î¼ ExceptionÀÌ ¹ß»ýµÈ °ÍÀ¸·Î º¸ÀÓ.
<name>http.agent.name</name> <value>My Nutch Test</value> <name>http.agent.description</name> <value>Test</value> <name>http.agent.url</name> <value>no</value> <name>http.agent.email</name> <value>no</value> |
Be careful how you get yourself involved with persons or situations that can't bear inspection. |