Releases · LMCache/LMCache

@vllmellm

v0.3.3 sees some performance improvements from the last release.

What's Changed

[DOC] [ROCm]: Update LMCache installation procedure for ROCm by @vllmellm in #1037
[CD]: Clean up unnecessary manylinux base file by @sammshen in #1043
[Refactor]: Remove unnecessary build package dependency by @hickeyma in #1059
[CI] Speedup by canceling previous commit run by @Shaoting-Feng in #1067
[Refactor] Open up interface for storing hashes by @YaoJiayi in #1040
[Bugfix] wrong key check in kv_controller.py by @wxsms in #1053
[CI] Fix step dependency on unit test by @Shaoting-Feng in #1069
[Core] XPYD support by @YaoJiayi in #895
Update modelconfig.json by @JasmondL in #1048
[Bugfix] Make nixl import lazy by @YaoJiayi in #1080
[CI/Build] Optimization for multiple buildkite agents by @Shaoting-Feng in #1074
[bugfix] adapt old vllm version by @chunxiaozheng in #1072
[Doc] fix config file name consistency in KV cache sharing example by @yankay in #1083
[CI/Build][Doc] Remove deprecated pipelined_backend configuration option from testcase, documentation and examples by @yankay in #1085
[CI/Build] Refactor integration test exit and cleanup by @Shaoting-Feng in #1090
[bug]: unpin unretrieved by @sammshen in #1092
extra indentation from 1092 by @sammshen in #1093
[Bugfix] Lookup compatibility with tp by @Shaoting-Feng in #1101
Bump ossf/scorecard-action from 2.4.1 to 2.4.2 by @dependabot[bot] in #995
Bump step-security/harden-runner from 2.12.1 to 2.12.2 by @dependabot[bot] in #994
[ci]: Refactor docs workflow by @hickeyma in #1098
[security]: Tighten runner security by specifying endpoints by @hickeyma in #1097
[fix]: Enable container access for score card and PyPI workflow by @hickeyma in #1103
Introduce a remote monitor thread to monitor remote and support fallback to blackhole by @maobaolong in #764
[Misc] Added unit tests for local cpu and disk backend by @weicaivi in #1041
[Fix][P2P] Set retrieved token mask for P2P mode by @zejunchen-zejun in #1105
[Example]: reversed conditional in 1p1d by @sammshen in #1076
[Add] fix not implemented error in gds by @ApostaC in #1115
[Bugfix] Metadata file path missing suffix in GdsBackend and WekaGdsB… by @YurianStormrage in #1026
[Core] Make PD and offloading compatible by @YaoJiayi in #1112
[Doc] Updating community meeting time by @YuhanLiu11 in #1124
[Doc] Fix documentation error by @YuhanLiu11 in #1125
[bugfix] fix AttributeError in LMCacheConnectorMetadata by @chunxiaozheng in #1120
Revert "[ci]: Refactor docs workflow" by @hickeyma in #1129
[bugfix] adapt old vllm version by @chunxiaozheng in #1119
Add structure for pluggable hash libraries by @hickeyma in #1089
[Bugfix] Assertion Error for CPU offloading in PD by @vladnosiv in #1132
[CI/Build] Check end to end results by @Shaoting-Feng in #1126
[Bugfix] Lazy init for kv caches pointer by @YaoJiayi in #1135
[Bugfix] Fix unit test by @YaoJiayi in #1137
[fix]: zeroing out separator token positions by @sammshen in #1108
[Connector] Support config multiply base dir for fs remote connector by @maobaolong in #1058
[Doc] Fix documentation error by @qaz-t in #1148
[ADAPTER|vLLM] fix: Handle the prefix cache hit >= num_external_hit_tokens case avoid storing invoked by @maobaolong in #1096
[CD]: nightly build patch by @sammshen in #1164
[Bugfix] Fix torch mem alloc by @YaoJiayi in #1163
[Bugfix] Cross-process cache sharing by @junl666 in #1140
[Refactor] refactor disk and prefetch code paths by @YaoJiayi in #1172
[Bugfix] Concurrent LMCache instances on the same machine conflict due to a hardcoded IPC socket path, causing a race condition. by @Foreverythin in #1166
[chore]: Change code headers to SPDX License headers by @hickeyma in #1145
[Bugfix] Decode in PD never makes a cache hit and causes OOM by @vladnosiv in #1133
[CI]: isolate versioning fix by @sammshen in #1199
[CI]: Refactor/Fix CI by @sammshen in #1198
[CI]: follow up fix gpu contention by @sammshen in #1201
[CI]: final fix to find-free-gpu.sh to avoid CI race conditions by @sammshen in #1202
Introduce create_lookup_server_only_on_worker_0 extra config by @maobaolong in #1144
[Enhancement] Add move and compress controller APIs by @YaoJiayi in #1206
[opt]: evict suffixes before prefixes by @sammshen in #1154
[Enhancement] Update more controller apis by @YaoJiayi in #1217
[bugfix] fix save unfull chunk in vllm_v1_adapter by @chunxiaozheng in #1170

New Contributors

@wxsms made their first contribution in #1053
@JasmondL made their first contribution in #1048
@yankay made their first contribution in #1083
@weicaivi made their first contribution in #1041
@zejunchen-zejun made their first contribution in #1105
@YurianStormrage made their first contribution in #1026
@YuhanLiu11 made their first contribution in #1124
@vladnosiv made their first contribution in #1132
@qaz-t made their first contribution in #1148
@junl666 made their first contribution in #1140
@Foreverythin made their first contribution in #1166

Full Changelog: v0.3.2...v0.3.3

@YaoJiayi

LMCache v0.3.2 is a patch release. Users are encouraged to upgrade for the best experience.

Highlights

Support added for vLLM v0.9.1
Addition of vLLM integration tests to minimize compatibility issues with vLLM integration and improve robustness
Addition of dynamic connectors to support older versions of vLLM

What's Changed

[Bugfix] Wrong token truncating by @YaoJiayi in #911
[Docs] Dynamic Connector by @sammshen in #918
[bugfix]: Fix nixl_peer_host and nixl_peer_port to nixl_receiver_host and nixl_receiver_port by @aztecher in #921
Update README.md by @Hanchenli in #925
[Core] Add batched get interface by @YaoJiayi in #924
Docs: fix broken links in offload_kv_cache.rst by @csbo98 in #931
support MLA format for sglang gpu connector by @llc-kc in #903
[Core] Add Paged Memory Allocator by @YaoJiayi in #932
[Doc] Consistent tagline with github repo by @Siddhant-Ray in #935
Bump docker/setup-buildx-action from 3.10.0 to 3.11.1 by @dependabot[bot] in #893
Bump step-security/harden-runner from 2.12.0 to 2.12.1 by @dependabot[bot] in #848
[Test]: Make the unit test coverage report available by @hickeyma in #944
[Bugfix] Adapt to latest vllm version by @YaoJiayi in #937
[Chore]: Add badges to readme by @hickeyma in #946
Bugfix: Pin Unpin notImplementedError by @4D0R in #959
[Doc]Added Mooncake documentation by @xiaguan in #941
Remove redundant and incorrect truncation of the last n tokens. by @yoo-kumaneko in #974
[Security]: Add OpenSSF Scorecard Security Scanning by @hickeyma in #965
[Doc] Remove unmatched a tag by @xleoken in #979
[Doc] Fix broken link by @carlory in #986
[Bug]: Double Free by @sammshen in #993
[Refactor] Memory management 1/N (and testing) by @YaoJiayi in #992
[Doc]: Update LMCache system requirements in documentation by @hickeyma in #991
[ADAPT-VLLM] Support vllm 0.8.5 lmcache connector by @maobaolong in #976
[Test]: Add integration testing for LMCache integration with vLLM by @hickeyma in #930
Simple Adapt KV Connector by @sammshen in #997
[Security]: Add CodeQL Security Scanning by @hickeyma in #963
[MINOR] Improve the store log and metrics to get the real stored token exactly by @maobaolong in #972
[Misc] Fix controller doc by @YaoJiayi in #1009
Update the remote external connector example document by @maobaolong in #838
perf: optimize get_num_new_matched_tokens latency in vLLM scheduler by @xiaguan in #936
[Doc]: Improve the usability of the README file by @hickeyma in #1012
[Misc]support infinistore_type config by @novahow in #980
[doc] Update README by @Shaoting-Feng in #1021
[doc] Update README by @Shaoting-Feng in #1022
[Refactor]: Remove duplicate requirements loading in setup tools by @hickeyma in #1015
[CI]: Add stale issue bot by @hickeyma in #1017
[kv-cache-caculator] add Qwen3 models into caculator by @panpan0000 in #1024
[Bugfix] stream use text/event-stream media_type by @Abirdcfly in #1011
[Doc]: Small improvements to contributing guide by @hickeyma in #1005
Fixed AttributeError caused by missing 'req_ids' in older vllm by @popsiclexu in #1001
[Core] Use a faster hash function by @zhouwfang in #1020

New Contributors

@aztecher made their first contribution in #921
@csbo98 made their first contribution in #931
@llc-kc made their first contribution in #903
@yoo-kumaneko made their first contribution in #974
@xleoken made their first contribution in #979
@carlory made their first contribution in #986
@novahow made their first contribution in #980
@panpan0000 made their first contribution in #1024
@Abirdcfly made their first contribution in #1011
@zhouwfang made their first contribution in #1020

Full Changelog: v0.3.1...v0.3.2

@YaoJiayi

LMCache v0.3.1.post1 is a patch release. Users are encouraged to upgrade for the best experience.

What's Changed

[Bugfix] Wrong token truncating by @YaoJiayi in #911

Full Changelog: v0.3.1...v0.3.1.post1

@sdimitro

LMCache v0.3.1 is a patch release. Users are encouraged to upgrade for the best experience.

What's Changed

Introduce Weka Storage Backend by @sdimitro in #699
[Bugfix] revert change made by #694 of kvcache calculation for deepseekV3 by @mengbingrock in #715
[Doc]: Small fixes to contributing guide by @hickeyma in #743
[style] import optimization by @chunxiaozheng in #738
[Security]: Enable dependabot for dependency updates by @hickeyma in #720
[Bugfix] Prevent vllm from getting stuck due to Mooncake transmission stuck by @LLLL114 in #726
[Bugfix] fix the logic error for save_decode_cache in vllm v0 integration by @blossomin in #752
[Bugfix] Lazy import of cufile by @YaoJiayi in #753
[performance] reduce the number of calls to remote backend exists rpc by @chunxiaozheng in #718
[Misc] Adding online serving example for single shot testing by @4D0R in #721
[Examples][P/D] Examples for Xp1d using LMCache by @ApostaC in #759
[Doc][P/D] documentation pages for LMCache PD disaggregation by @ApostaC in #768
Bump pre-commit from 4.0.1 to 4.2.0 in the minor-update group by @dependabot in #750
Update setuptools requirement from <80.0.0,>=77.0.3 to >=77.0.3,<81.0.0 by @dependabot in #751
[feature] support set extra config in LMCacheEngineConfig by @chunxiaozheng in #742
[Refactor][Bugfix] Add KV Cache format storage in local disk backend by @Shaoting-Feng in #735
[Bugfix] Fix observability threading lock by @Shaoting-Feng in #777
Add a generic GDS backend by @da-x in #773
fix runtime error:Invalid device for infinistore(#502) by @thesues in #517
[CI/Build]: Addition to Dockerfile for choice of vLLM and LMCache package versions by @hickeyma in #746
[CI/Build]: Add nightly build container image of latest code by @hickeyma in #756
[Core] Initial CacheBlend V1 Implemetation by @YaoJiayi in #762
[CI/Build]: Fix linux CUDA wheel builds by @hickeyma in #775
[Fix] missing 'int' for reading config.cufile_buffer_size by @da-x in #793
[Misc] GDS backend: use the SafeTensors format for metadata by @da-x in #783
[FSConnector] improve performance with asyncio and direct read to memory object by @guymguym in #740
[bugfix] add VLLMBufferLayerwiseGPUConnector in union type by @chunxiaozheng in #797
[Enhancement] Support for ROCm on LMCache by @vllmellm in #702
[bugfix] set gds_path by defaults by @chunxiaozheng in #798
[Enhacement] Improve layerwise cache store/load by @YaoJiayi in #794
[optimize] support don't save unfull chunk by @chunxiaozheng in #804
[bugfix] set use_mla in lmcache metadata by @chunxiaozheng in #810
[#771]Support dynamic loading of external remote connector implementations by @maobaolong in #774
feat: enhance controller manager to support JSON messages from Mooncake by @xiaguan in #799
[CI]: Add static checker for GitHub actions workflows by @hickeyma in #808
[CI/Build]: Build container image when a new release is published by @hickeyma in #784
[Mooncake] Config: Use extra config in LMCacheEngine by @stmatengss in #806
[CI]: Stable NIXL checkout for Dockerfile by @sammshen in #824
[Enhancement] PD proxy optimization by @YaoJiayi in #809
[bugfix] delete duplicate header files by @LuyuZhang00 in #827
[Doc] The description of performance snapshot in README.md is not readable. by @shwgao in #816
[Docs] fix lmcache config file environment variable name in docs by @diabloneo in #826
[Fix] Fix race between test_gds/weka rmtree and asyncio thread loop by @sdimitro in #825
[CI]: Add disk cleanup as a GH action by @hickeyma in #830
[Refactor] Unify nixl and offloading code paths and add batch_put interface by @YaoJiayi in #831
[Doc] Add the introduction and install instructions for P/D disagg and NIXL by @ApostaC in #837
[Doc] Clarify enable_prefix_caching=False for cold-start benchmark by @amulil in #834
Reduce space cost to 1/TP while using MLA by @maobaolong in #803
add username and password for redis connector by @calvin0327 in #737
[Bugfix] add health probe for kubernetes by @zerofishnoodles in #846
hotfix: revert mooncake setup function for compatibility by @xiaguan in #844
[optimize] simplify the implementation of MLA by @chunxiaozheng in #780
[optimize] support timeout when get blocking by @chunxiaozheng in #849
[Misc] Use max_num_batched_tokens from Vllm to Determine Batch Size in GPU Connector by @huaxuan250 in #870
[Misc] GdsBackend: add internal fallback to POSIX APIs by @da-x in #811
Fix old vllm compatible issue by update LMCacheConnectorV1Impl by @maobaolong in #853
[Bugfix]: unexpected argument ‘config’ in MooncakestoreConnectorAdapter by @popsiclexu in #874
[Doc] Completed example "share_kv_cache" in "quickstart" by @TeenSpirit1107 in #850
[hotfix] Add host to health probe to improve flexibility by @zerofishnoodles in #856
[FEAT] Add package support for runai and tensorize model loading by @zerofishnoodles in #868
[CI]: Correctness through MMLU by @sammshen in #769
[Doc] Update README to include newsletter sign-up option in connection section by @kobe0938 in #883
[Misc] Sort model names in kv_cache_calculator by @Unprincess17 in #876
Fix double invoke ref_count_down issue while using audit connector by @maobaolong in #877
[optimize] optimize pin/unpin log level by @chunxiaozheng in #887
[Feat] GdsBackend: allow to pass use_direct_io flag by @da-x in #862
[MLA] Fix default remote_serde by @Shaoting-Feng in #885
[Core]SGLang End to End Integration by @Oasis-Git in #869
[Bugfix] Only retrieve the LMCache "hit" chunk during the KV cache load by @ApostaC in #884
[#842] Maintain the vllm lmcache_connector for v1 in lmcache repository itself by @maobaolong in #843
FSConnector#get should not output error log for FileNotFoundError by @maobaolong in #878
[test]: Update test workflow to improve test coverage metrics by @hickeyma in #823
[Core] Support multimodal models that use mm_hashes in vLLM by @Shaoting-Feng in #882
Refactor remote connector to make it easy to extends by @maobaolong in #858
[Refactor] Unify layerwise and non-layerwise code paths by @YaoJiayi in #833
[Bugfix] fix batched insert in lookup server by @chunxiaozheng in #902
[Bugfix]: MooncakestoreConnector init error (#904) by @jeremyzhang866 in #905
[Bugfix] Fix layerwise buffer size by @YaoJiayi in #908

New Contributors

@sdimitro made their first contribution in #699
@mengbingrock made their first contribution in #715
@LLLL114 made their first contribution in #726
@blossomin made their first contribution in #752
@4D0R made their first contribution in #721
@dependabot made their first contribution in https://gi...

@Siddhant-Ray

LMCache v0.3.0 is a feature release. Users are encouraged to upgrade for the best experience.

Highlights

Documentation updated and improved
CPU support added
Full support added for vLLM V1 integration
Support added for XpYd
Bug fixes

What's Changed

[Doc] Add doc pipeline by @Siddhant-Ray in #532
[Integration] modify mooncake_connector to adapt new mooncake store APIs by @stmatengss in #525
[Bugfix][CI] Update build_doc.yml by @Siddhant-Ray in #533
[feat] Add configuration logging by @charmway in #529
[Doc] Rewrite the LMCache documentation by @ApostaC in #541
[Doc] minor fixes by @Siddhant-Ray in #548
Local CPU Backend by @sammshen in #514
[bugfix] fix counter type prometheus metrics error by @chunxiaozheng in #524
[Misc] Allow lookup to skip last n tokens in vllm v1 by @YaoJiayi in #550
[Bugfix][Doc] Add community docs and fix code overflow error by @Siddhant-Ray in #556
[Fix] Update README.md with community meeting info by @Siddhant-Ray in #557
[metrics] add remote backend metrics by @chunxiaozheng in #552
[Doc] [Bugfix] Add developer guide and fix all Sphinx warnings by @Siddhant-Ray in #565
[Bugfix] Fix zmq bind/connect in controller by @YaoJiayi in #569
[Bugfix] Keep up with latest vllm version by @YaoJiayi in #572
pypi tiny upgrade by @sammshen in #573
[Doc][Fix]: Add improvements for usability by @hickeyma in #575
[Misc, Controller] Support get_instance_id for production stack by @YaoJiayi in #576
[Refactor] Remove hot_cache in StorageManager by @YaoJiayi in #567
[Doc] Adding controller and compression related docs by @YaoJiayi in #571
[Bugfix]: Add csrc files for sdist by @sammshen in #568
Fix p2p in vllm v1 by @orozery in #543
SO_REUSEADDR for time-wait problem by @yangxianpku in #564
Refactor: rename RedisMetadata to RemoteMetadata by @maobaolong in #511
[Core, CacheBlend] Add pos encoding kernel by @YaoJiayi in #540
[feat] Add an audit remote connector to audit remote op and verify checksum by @maobaolong in #542
[Docs]: cpu ram, local storage, redis/redis-sentinel/lmcache server by @sammshen in #558
[CI] Fix buildkite LMCache installation issue by @ApostaC in #581
[PD][serde] Use msgpack instead of pickle to serde NixlRequest. by @KuntaiDu in #579
[Doc] Update modelconfig.json to have more models in KV cache size calculator by @hey-kong in #584
[Doc] add badge and fix typos by @HuaizhengZhang in #591
[Bugfix] Update docker_file.rst by @Siddhant-Ray in #582
fix: Pin precise torch version to avoid vLLM compatability issues by @hickeyma in #583
[Doc] add uv installation by @HuaizhengZhang in #596
[Doc] doc for LMCache configuration file by @ApostaC in #580
[Doc]: Fix the docs of mooncake.rst by @maobaolong in #615
[Doc] Update README.md to add DeepWiki badge and clean up spacing by @HuaizhengZhang in #614
[Fix] Link to docker in readme by @Siddhant-Ray in #600
[Misc] Add copyright headers by @Siddhant-Ray in #597
[Doc] correct documentation in evictor classes by @vivamilk in #535
CI/Build Move server to GCP by @Shaoting-Feng in #613
[Core, Controller] Define interfaces for controller by @YaoJiayi in #560
[Enhancement] Add error handling for remote backend by @YaoJiayi in #628
CI/Build: Add project metadata to pyproject toml file by @hickeyma in #592
fix: the RuntimeError "Boolean value of Tensor..." of mooncakestore_connector.py by @maobaolong in #632
[Bugfix, Refactor] Fix memory allocation failure in hierarchical storage by @YaoJiayi in #631
[Enhancement] Async layerwise pipelining for KV cache offloading by @YaoJiayi in #625
[feat] Add a filesystem remote connector by @maobaolong in #506
[Misc] Improve async layerwise pipelining for cpu offloading by @YaoJiayi in #639
[Bugfix] Fix incorrect path joining in local_disk backend by @ZhangShuaiyi in #609
[Misc] Standardize local_disk path processing for env and file configs by @ZhangShuaiyi in #610
CI/Build Add publishing to TestPyPi and refactor PyPi publish by @hickeyma in #611
[Fix]: Torch version for build by @hickeyma in #644
Docs/test pypi installation by @sammshen in #637
[Docs] fix incorrect controller api server command by @kebe7jun in #641
[Bugfix] Fix GPU buffer allocator by @YaoJiayi in #645
[Doc]: Add comments on torch version and how its synchronized with vLLM by @hickeyma in #633
[Core] [PD] Support for multiple NIXL pipes by @ApostaC in #528
[Misc]Add unit tests for FSConnector by @zhongmingyuan in #646
[Bugfix] Fix incorrect single-token saves in v1 by @orozery in #653
support KV transfer of MLA backend by @chenqianfzh in #428
[Bugfix] compatibility with vLLM 0.9.0 by @ApostaC in #655
[Docs]: update instructions on test pypi installation by @sammshen in #647
CI/Build Use latest branch of vLLM for Nightly E2E tests by @Shaoting-Feng in #668
[Bugfix] Remove nixl dependency for now by @YaoJiayi in #672
[Bugfix] Fix LMCacheConnector for latest vllm by @YaoJiayi in #677
[Misc]support deepseek-v3 mla in kv cache size calculator by @zzhbrr in #671
CI/Build: Update docker specification to be compatible with latest vLLM OpenAI server by @hickeyma in #665
[Bugfix] tutorials broken by KVTransferConfig API changes in vLLM v0.9.0 by @leeeizhang in #664
fix issue 675 by @chenqianfzh in #676
add lookup api response by @calvin0327 in #674
Improve the store log of vllm v1 adapter by @maobaolong in #673
[Bugfix] support vllm v1 prometheus multiprocess exporter by @IRONICBo in #687
[Bugfix] Fix layerwise KV cache transfer by @Shaoting-Feng in #670
[Fix] Fix mismatches between received and expected values in nixl test by @ZhangShuaiyi in #663
Micro benchmark for VLLMPagedMemGPUConnectorV2.toGpu and a performance fix by @yanok in #678
[Bugfix]: Update KV cache calculator, fix calculation issues for Qwen3 series and DeepSeek-V3 by @hammersam in #694
[Misc] Fix docker example script by @wwl2755 in #685
CI/Build: Centralize requirement files by @hickeyma in #695
[Misc] Update usage_context.py by @Siddhant-Ray in #700
fix the memoryview: unsupported format <B error at redis connector by @sydnash in #662
[Misc] Rename experimental to v1 and refactor examples by @YaoJiayi in #704
[Doc]: Add DCO details to contributing guide by @hickeyma in #711
[Doc] Add usage stats collection documentation by @zhuohangu in #713
[Bugfix] Fix missed renaming by @YaoJiayi in #716
Refactor: handle memory_obj.ref_count_down in the RemoteBackend by @maobaolong in #650

Release generated from tag v0.2.1.
Built with CUDA 12.4 wheels and uploaded to PyPI.

Release generated from tag v0.2.0.
Built with CUDA 12.4 wheels and uploaded to PyPI.

@Siddhant-Ray

What's Changed

[Doc] Update docstrings to sphinx format, add KVBlend docs, KV blend examples and graceful exit by @Siddhant-Ray in #188
[Core] Cachegen config refactor by @Oasis-Git in https://github.com/LMCache/LMCache/pull/156he/LMCache/pull/199
[Doc] Documentation for pulling LMCache Docker image by @Siddhant-Ray in #200
[Misc] Remove old lmcache-server installation by @ApostaC in #202
[Doc] Create MAINTAINERS.md by @Siddhant-Ray in #210
[Doc] demo page doc source and installation page restructure by @Siddhant-Ray in #205
[CI] Fix buildkite errors caused by port conflicts by @ApostaC in #216
[Core] Fixed Hardcoded Dtype by @qyy2003 in #217
[Core] Improve performance of store with memory pool by @YaoJiayi in #211
[Core] Adding dst_device support for storage backends by @ApostaC in #214
[Enhancement] Drop special tokens in kv blending by @XbzOnGit in #209
[Core]lm_connector with bytearray by @Oasis-Git in #223
[Doc] Update README.md by @Hanchenli in #229
[Bugfix] fix disk contention bug by @YaoJiayi in #234
[Doc] Adding KV cache size calculator to the repo by @ApostaC in #243
[Misc] Correct URL and add contribution by @zhuohangu in #245
[Bugfix] Fix cpu buffer memory leak by @YaoJiayi in #247
[Bugfix] Adding LRUEvictor back by @YaoJiayi in #253
[Misc] Show cache hit rate. by @Second222None in #255
[Doc] Update docker and memory pool docs by @Siddhant-Ray in #260
[Bugfix] Misaligned evictor and mempool by @YaoJiayi in #262
[Core] refactor(connector): support connector storing tensor directly without serde by @DellCurry in #239
[Benchmark] Add a multi-round QA/chat performance benchmark by @ApostaC in #258
[Misc] suppress the warnings when the serving engine cannot keep up with the QPS by @ApostaC in #267
Bump version number to 0.1.4 by @ApostaC in #268

New Contributors

@Oasis-Git made their first contribution in #156
@Hanchenli made their first contribution in #229
@zhuohangu made their first contribution in #245
@Second222None made their first contribution in #255
@DellCurry made their first contribution in #239

Full Changelog: v0.1.3-alpha...v0.1.4-alpha

@Siddhant-Ray

Key Features

Supporting chunked prefill in vLLM
Faster KV loading for multi-turn conversation by saving KV at the decoding time
Experimental KV blending feature to enable reusing non-prefix KV caches
New model support: llama-3.1 and qwen-2
Adding documentations (now available at docs.lmcache.ai
Better examples in examples/ folder

What's Changed

Using frombuffer instead of load for deserialization by @Luke20000429 in #102
initial documentation code by @Siddhant-Ray in #111
Add python & CUDA requirement to README by @qyy2003 in #115
Bug fix: avoid overwriting the right value by @KuntaiDu in #120
Fix issue 104 -- store is slow on CPU by @ApostaC in #106
Add format checker by @KuntaiDu in #123
Add format checking github workflow file by @KuntaiDu in #126
update mypy to exclude checking types in test by @KuntaiDu in #127
Add and organize examples by @XbzOnGit in #128
Initial user documentation by @Siddhant-Ray in #130
Update examples and change launch methods by @XbzOnGit in #138
Add support for llama-3.1-8b-instruct and qwen-2-7b by @YaoJiayi in #143
Support saving decode KV cache to boost multi-turn conversation by @YaoJiayi in #149
Update documentation by @Siddhant-Ray in #154
Fixed bugs in Docs installation by @qyy2003 in #155
Create CODE_OF_CONDUCT.md by @Siddhant-Ray in #157
[Doc] : Create SECURITY.md by @Siddhant-Ray in #159
[Doc] Add contributing guidelines and PR template by @Siddhant-Ray in #160
Adding a mask for retrieve to avoid repetitive data loading by @YaoJiayi in #153
[Refactor] Add support for "dtype" (KV cache storage data type) in LMCacheEngineMetadata by @Alex-q-z in #108
Add Developer Documentation by @qyy2003 in #172
CacheBlend integration by @ApostaC in #121
[Core] Add LRU eviction policy by @YaoJiayi in #162
Add lookup and allow passing a part of KV cache to store by @XbzOnGit in #164
[Misc] Disabling eviction by using a dummy evictor by @ApostaC in #185
[Fix] non-blocking store blocks the model inference by @ApostaC in #186
Bump version number to 0.1.3 by @ApostaC in #187

New Contributors

@Luke20000429 made their first contribution in #102
@Siddhant-Ray made their first contribution in #111
@qyy2003 made their first contribution in #115
@KuntaiDu made their first contribution in #120
@XbzOnGit made their first contribution in #128
@Alex-q-z made their first contribution in #108

Full Changelog: v0.1.2-alpha...v0.1.3-alpha

What's changed

Integrate with latest vLLM (0.6.1.post2)
Supporting pip install lmcache

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Highlights

What's Changed

Contributors

Uh oh!

Uh oh!

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

Key Features

What's Changed

New Contributors

Contributors

Uh oh!

What's changed

Uh oh!

Releases: LMCache/LMCache

v0.3.3

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.2

Highlights

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.1.post1

What's Changed

Contributors

Uh oh!

v0.3.1

What's Changed

New Contributors

Contributors

Uh oh!

v0.3.0

Highlights

What's Changed

Contributors

Uh oh!

Release v0.2.1

Uh oh!

Release v0.2.0

Uh oh!

v0.1.4-alpha

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.3-alpha

Key Features

What's Changed

New Contributors

Contributors

Uh oh!

v0.1.2-alpha

What's changed

Uh oh!