Malaya Cache#

This tutorial is available as an IPython notebook at Malaya/example/caching.

This module only useful if use default Malaya repository, for huggingface, read more at https://huggingface.co/docs/datasets/v1.12.0/cache.html

Default Cache location#

You actually can know where is your Malaya default caching folder. Caching folder will save any models, vocabs, and rules downloaded for specific modules.

[1]:
import malaya
[2]:
malaya.__home__
[2]:
'/Users/huseinzolkepli/Malaya'

Change default Cache location#

To change default cache location, you need to set MALAYA_CACHE OS environment before import Malaya,

export MALAYA_CACHE=/Users/huseinzolkepli/Documents/Malaya

Or you can set in bashenv to make it permanent if you want.

[1]:
import os

os.environ['MALAYA_CACHE'] = '/Users/huseinzolkepli/Documents/malaya-cache'
[2]:
import malaya

malaya.__home__
[2]:
'/Users/huseinzolkepli/Documents/malaya-cache'

Cache subdirectories#

Start from version 1.0, Malaya put models in subdirectories, you can print it by simply,

[3]:
malaya.utils.print_cache()
Malaya/
├── keyword-extraction/
│   ├── alxlnet/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v9.model
│   │   ├── sp10m.cased.v9.vocab
│   │   └── version
│   └── tiny-bert/
│       ├── model.pb
│       ├── sp10m.cased.bert.model
│       ├── sp10m.cased.bert.vocab
│       └── version
├── qa-squad/
│   ├── albert/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v10.model
│   │   ├── sp10m.cased.v10.vocab
│   │   └── version
│   ├── albert-quantized/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v10.model
│   │   ├── sp10m.cased.v10.vocab
│   │   └── version
│   ├── alxlnet/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v9.model
│   │   ├── sp10m.cased.v9.vocab
│   │   └── version
│   ├── bert/
│   ├── tiny-bert/
│   │   ├── model.pb
│   │   ├── sp10m.cased.bert.model
│   │   ├── sp10m.cased.bert.vocab
│   │   └── version
│   ├── xlnet/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v9.model
│   │   ├── sp10m.cased.v9.vocab
│   │   └── version
│   └── xlnet-quantized/
│       ├── model.pb
│       ├── sp10m.cased.v9.model
│       ├── sp10m.cased.v9.vocab
│       └── version
├── sentiment/
│   ├── albert/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v10.model
│   │   ├── sp10m.cased.v10.vocab
│   │   └── version
│   ├── alxlnet/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v9.model
│   │   ├── sp10m.cased.v9.vocab
│   │   └── version
│   ├── bert/
│   │   ├── model.pb
│   │   ├── sp10m.cased.bert.model
│   │   ├── sp10m.cased.bert.vocab
│   │   └── version
│   ├── xlnet/
│   │   └── model.pb
│   └── xlnet-quantized/
│       ├── model.pb
│       ├── sp10m.cased.v9.model
│       ├── sp10m.cased.v9.vocab
│       └── version
├── similarity/
│   ├── albert/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v10.model
│   │   ├── sp10m.cased.v10.vocab
│   │   └── version
│   ├── alxlnet/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v9.model
│   │   ├── sp10m.cased.v9.vocab
│   │   └── version
│   ├── alxlnet-quantized/
│   │   ├── model.pb
│   │   ├── sp10m.cased.v9.model
│   │   ├── sp10m.cased.v9.vocab
│   │   └── version
│   └── tiny-bert/
│       ├── model.pb
│       ├── sp10m.cased.bert.model
│       ├── sp10m.cased.bert.vocab
│       └── version
├── stem/
│   └── lstm-bahdanau/
│       ├── model.pb
│       ├── stemmer.yttm
│       └── version
├── translation-en-ms/
├── version
└── wordvector/
    └── news/
        ├── version
        ├── wordvector.json
        └── wordvector.npy

Deleting specific model#

Let say you want to clear some spaces, start from version 1.0, you can specifically choose which model you want to delete.

[4]:
malaya.utils.delete_cache('wordvector/news')
[4]:
True

What happen if a directory does not exist?

[7]:
malaya.utils.delete_cache('wordvector/news2')
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-7-1104f6734f26> in <module>
----> 1 malaya.utils.delete_cache('wordvector/news2')

~/Documents/tf-1.15/env/lib/python3.7/site-packages/malaya_boilerplate-0.0.1-py3.7.egg/malaya_boilerplate/utils.py in delete_cache(location)
    188     if not os.path.exists(location):
    189         raise Exception(
--> 190             f'folder not exist, please check path from `{__package__}.utils.print_cache()`'
    191         )
    192     if not os.path.isdir(location):

Exception: folder not exist, please check path from `malaya.utils.print_cache()`

Purge cache#

You can simply delete all models, totally purge it. By simply,

[8]:
malaya.utils.delete_all_cache
[8]:
<function malaya_boilerplate.utils.delete_all_cache()>

I am not gonna to run it, because I prefer to keep it for now?