Melange
API ReferenceAndroid

Enums and Constants

Reference for all enums and constants in the Melange Android SDK.

This page documents all enums and constants used by the Melange Android SDK, including inference mode selectors, dataset types, and cache management policies.

Package

com.zeticai.mlange

ModelMode

Controls inference mode selection for general (non-LLM) models. Used with ZeticMLangeModel.

enum class ModelMode {
    RUN_AUTO,
    RUN_SPEED,
    RUN_ACCURACY
}
ValueDescription
RUN_AUTOAutomatically balances speed and accuracy. Selects the fastest configuration while maintaining SNR above 20 dB. Recommended for most use cases.
RUN_SPEEDMaximizes inference speed with minimum latency. Best for real-time applications where response time is the top priority.
RUN_ACCURACYDelivers the highest precision based on maximum SNR scores. Best for applications where accuracy is more critical than speed.
val model = ZeticMLangeModel(
    context,
    PERSONAL_KEY,
    MODEL_NAME,
    VERSION,
    ModelMode.RUN_AUTO
)

LLMModelMode

Controls inference mode selection for LLM models. Used with ZeticMLangeLLMModel.

enum class LLMModelMode {
    RUN_SPEED,
    RUN_AUTO,
    RUN_ACCURACY
}
ValueStatusDescription
RUN_SPEEDAvailableMaximizes inference speed by selecting the most aggressive quantization. Recommended for real-time applications where response latency is the top priority.
RUN_AUTOPausedIntelligently balances speed and accuracy. Evaluates quantized models across benchmark datasets and selects the fastest model that maintains accuracy within acceptable thresholds (maximum 15% accuracy drop).
RUN_ACCURACYPausedDelivers the highest precision. Without a dataset specified, prioritizes less quantized (higher bit-width) models. With a dataset specified, selects the quantization with the highest benchmark score.

RUN_AUTO and RUN_ACCURACY modes are currently paused while the LLM accuracy metric system is being updated. New models are available with RUN_SPEED mode only.

val model = ZeticMLangeLLMModel(
    context,
    PERSONAL_KEY,
    MODEL_NAME,
    null,
    LLMModelMode.RUN_SPEED
)

For more details on mode selection behavior, see the LLM Inference Modes guide.


LLMDataSetType

Specifies the benchmark dataset used for accuracy-based model selection in RUN_ACCURACY mode.

enum class LLMDataSetType {
    NONE,
    MMLU,
    TRUTHFULQA,
    CNN_DAILYMAIL,
    GSM8K
}
ValueDescription
NONENo dataset specified. In RUN_ACCURACY mode, the engine prioritizes less quantized models by default.
MMLUMassive Multitask Language Understanding. Evaluates broad knowledge and reasoning across 57 subjects.
TRUTHFULQATruthfulQA benchmark. Measures the model's ability to generate truthful and informative answers.
CNN_DAILYMAILCNN/DailyMail summarization benchmark. Evaluates text summarization quality.
GSM8KGrade School Math 8K. Measures mathematical reasoning ability on grade-school-level word problems.
val model = ZeticMLangeLLMModel(
    context,
    PERSONAL_KEY,
    MODEL_NAME,
    null,
    LLMModelMode.RUN_ACCURACY,
    LLMDataSetType.MMLU
)

Dataset-specific selection is only meaningful when LLMModelMode is set to RUN_ACCURACY. When using RUN_SPEED, the dataset type is ignored.


LLMKVCacheCleanupPolicy

Controls how the KV (Key-Value) cache is managed when it reaches capacity during token generation.

enum class LLMKVCacheCleanupPolicy {
    CLEAN_UP_ON_FULL,
    DO_NOT_CLEAN_UP
}
ValueDescription
CLEAN_UP_ON_FULLClears the entire conversation context when the KV cache is full and continues generation. This is the default behavior.
DO_NOT_CLEAN_UPMaintains the conversation context without cleanup when the KV cache is full. Requires manual cleanup via cleanUp() before starting a new conversation.
// Default behavior: context is automatically cleared when cache fills up
val model = ZeticMLangeLLMModel(
    context,
    PERSONAL_KEY,
    MODEL_NAME,
    kvCacheCleanupPolicy = LLMKVCacheCleanupPolicy.CLEAN_UP_ON_FULL
)
// Manual cleanup required between conversations
val model = ZeticMLangeLLMModel(
    context,
    PERSONAL_KEY,
    MODEL_NAME,
    kvCacheCleanupPolicy = LLMKVCacheCleanupPolicy.DO_NOT_CLEAN_UP
)

// After a conversation ends, clean up before starting a new one
model.cleanUp()
model.run("New prompt here")

When using DO_NOT_CLEAN_UP, calling run() again without first calling cleanUp() may cause unexpected behavior or bugs. Always call cleanUp() between conversations.


See Also