{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Malay" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "\n", "This tutorial is available as an IPython notebook at [Malaya/example/dictionary-malay](https://github.com/huseinzol05/Malaya/tree/master/example/dictionary-malay).\n", " \n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### requirements" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Make sure you already installed,\n", "\n", "```bash\n", "pip3 install requests beautifulsoup4\n", "```" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "os.environ['CUDA_VISIBLE_DEVICES'] = ''" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ubuntu/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3361\n", " self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n", "/home/ubuntu/dev/malaya/malaya/tokenizer.py:202: FutureWarning: Possible nested set at position 3879\n", " self.tok = re.compile(r'({})'.format('|'.join(pipeline)))\n" ] } ], "source": [ "import malaya" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### DBP\n", "\n", "Query from https://prpm.dbp.gov.my/cari1?keyword=,\n", "\n", "```python\n", "def keyword_dbp(word, parse: bool = False):\n", " \"\"\"\n", " crawl https://prpm.dbp.gov.my/cari1?keyword= to check a word is a malay word.\n", "\n", " Parameters\n", " ----------\n", " word: str\n", " parse: bool, optional (default=False)\n", " if True, will parse using BeautifulSoup.\n", "\n", " Returns\n", " -------\n", " result: Dict\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.keyword_dbp('ayam')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.keyword_dbp('ayamaaaaa')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'definisi': ['Definisi : sj ikan; ~ hutan a) Euxiphippops sextriatus; b) Pomacanthus annularis; ~ laut, Abalistes spp.\\xa0(Kamus Dewan Edisi Keempat)',\n", " 'Definisi : beberapa jenis binatang (yg bentuk tubuhnya seakan-akan burung tetapi tidak pandai terbang) yg biasanya dipelihara, Gallus gallus. ~ belanda sj ayam yg besar, Meleagris gallopavo. ~ beroga (denak, hutan) sj ayam liar, Gallus bankiva. ~ biring ayam jantan yg kuning kakinya. ~ bulu balik ayam yg bulunya terbalik. ~ dara ayam betina yg hampir bertelur. ~ katik ayam yg kecil. ~ percik ayam panggang yg disaluti sos atau kuah yg dibuat drpd santan dan rempah-ratus. ~ sabung ayam yg dipelihara utk disabung. ~ serama sj ayam peliharaan yg kecil, jinak, berbulu cantik dan berkaki pendek. ~ tambatan ki orang yg dianggap hebat dan diharapkan dpt membawa kemenangan dlm sesuatu perlawanan, mis bola sepak dan bola jaring.\\xa0(Kamus Pelajar Edisi Kedua)'],\n", " 'tesaurus': None}" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.keyword_dbp('ayam', parse = True)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.keyword_dbp('ayamaaaaa', parse = True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Wiktionary\n", "\n", "Query from https://en.wiktionary.org/wiki/,\n", "\n", "```python\n", "def keyword_wiktionary(\n", " word,\n", " acceptable_lang: List[str] = ['brunei malay', 'malay'],\n", "):\n", " \"\"\"\n", " crawl https://en.wiktionary.org/wiki/ to check a word is a malay word.\n", "\n", " Parameters\n", " ----------\n", " word: str\n", " acceptable_lang: List[str], optional (default=['brunei malay', 'malay'])\n", " acceptable languages in wiktionary section.\n", "\n", " Returns\n", " -------\n", " result: Dict\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'brunei malay': [{'etymology': 'From Proto-Malayic *hayam, from Proto-Malayo-Polynesian *qayam.\\n',\n", " 'definitions': [{'partOfSpeech': 'noun',\n", " 'text': ['ayam', 'chicken (bird)', 'chicken (meat)'],\n", " 'relatedWords': [],\n", " 'examples': []}],\n", " 'pronunciations': {'text': ['IPA: /ajam/',\n", " '(Kedayan) IPA: /hajam/',\n", " 'Hyphenation: a‧yam'],\n", " 'audio': []}}],\n", " 'malay': [{'etymology': 'From hayam, from Proto-Malayic *hayam, from Proto-Malayo-Polynesian *qayam.\\n',\n", " 'definitions': [{'partOfSpeech': 'noun',\n", " 'text': ['ayam (Jawi spelling ايم\\u200e, plural ayam-ayam, informal 1st possessive ayamku, 2nd possessive ayammu, 3rd possessive ayamnya)',\n", " 'chicken (bird)',\n", " 'chicken (meat)'],\n", " 'relatedWords': [{'relationshipType': 'synonyms',\n", " 'words': ['manuk / مانوق\\u200e']}],\n", " 'examples': []}],\n", " 'pronunciations': {'text': ['IPA: /ajam/', 'Rhymes: -ajam, -jam, -am'],\n", " 'audio': []}}]}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.keyword_wiktionary('ayam')" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": true }, "outputs": [ { "data": { "text/plain": [ "{'brunei malay': [{'etymology': '',\n", " 'definitions': [],\n", " 'pronunciations': {'text': [], 'audio': []}}],\n", " 'malay': [{'etymology': '',\n", " 'definitions': [],\n", " 'pronunciations': {'text': [], 'audio': []}}]}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.keyword_wiktionary('ayamaaaa')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Check a word is a malay word\n", "\n", "```python\n", "def is_malay(word, stemmer=None):\n", " \"\"\"\n", " Check a word is a malay word.\n", "\n", " Parameters\n", " ----------\n", " word: str\n", " stemmer: Callable, optional (default=None)\n", " a Callable object, must have `stem_word` method.\n", "\n", " Returns\n", " -------\n", " result: bool\n", " \"\"\"\n", "```" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.is_malay('ayam')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.is_malay('sakitkan')" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.is_malay('tersakitkan')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "stemmer = malaya.stem.sastrawi()" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "malaya.dictionary.is_malay('tersakitkan', stemmer = stemmer)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }