This is a cache of https://github.com/LMCache/LMCache/releases. It is a snapshot of the page as it appeared on 2025-08-18T11:42:38.331+0200.
Releases · LMCache/LMCache · GitHub
Skip to content

Releases: LMCache/LMCache

v0.3.3

03 Aug 22:44
92e3837
Compare
Choose a tag to compare

v0.3.3 sees some performance improvements from the last release.

Screenshot 2025-08-03 at 3 40 13 PM

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.3

v0.3.2

15 Jul 05:19
b9e8b81
Compare
Choose a tag to compare

LMCache v0.3.2 is a patch release. Users are encouraged to upgrade for the best experience.

Highlights

  • Support added for vLLM v0.9.1
  • Addition of vLLM integration tests to minimize compatibility issues with vLLM integration and improve robustness
  • Addition of dynamic connectors to support older versions of vLLM

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1.post1

26 Jun 19:00
887372e
Compare
Choose a tag to compare

LMCache v0.3.1.post1 is a patch release. Users are encouraged to upgrade for the best experience.

What's Changed

Full Changelog: v0.3.1...v0.3.1.post1

v0.3.1

25 Jun 18:43
261cf15
Compare
Choose a tag to compare

LMCache v0.3.1 is a patch release. Users are encouraged to upgrade for the best experience.

What's Changed

  • Introduce Weka Storage Backend by @sdimitro in #699
  • [Bugfix] revert change made by #694 of kvcache calculation for deepseekV3 by @mengbingrock in #715
  • [Doc]: Small fixes to contributing guide by @hickeyma in #743
  • [style] import optimization by @chunxiaozheng in #738
  • [Security]: Enable dependabot for dependency updates by @hickeyma in #720
  • [Bugfix] Prevent vllm from getting stuck due to Mooncake transmission stuck by @LLLL114 in #726
  • [Bugfix] fix the logic error for save_decode_cache in vllm v0 integration by @blossomin in #752
  • [Bugfix] Lazy import of cufile by @YaoJiayi in #753
  • [performance] reduce the number of calls to remote backend exists rpc by @chunxiaozheng in #718
  • [Misc] Adding online serving example for single shot testing by @4D0R in #721
  • [Examples][P/D] Examples for Xp1d using LMCache by @ApostaC in #759
  • [Doc][P/D] documentation pages for LMCache PD disaggregation by @ApostaC in #768
  • Bump pre-commit from 4.0.1 to 4.2.0 in the minor-update group by @dependabot in #750
  • Update setuptools requirement from <80.0.0,>=77.0.3 to >=77.0.3,<81.0.0 by @dependabot in #751
  • [feature] support set extra config in LMCacheEngineConfig by @chunxiaozheng in #742
  • [Refactor][Bugfix] Add KV Cache format storage in local disk backend by @Shaoting-Feng in #735
  • [Bugfix] Fix observability threading lock by @Shaoting-Feng in #777
  • Add a generic GDS backend by @da-x in #773
  • fix runtime error:Invalid device for infinistore(#502) by @thesues in #517
  • [CI/Build]: Addition to Dockerfile for choice of vLLM and LMCache package versions by @hickeyma in #746
  • [CI/Build]: Add nightly build container image of latest code by @hickeyma in #756
  • [Core] Initial CacheBlend V1 Implemetation by @YaoJiayi in #762
  • [CI/Build]: Fix linux CUDA wheel builds by @hickeyma in #775
  • [Fix] missing 'int' for reading config.cufile_buffer_size by @da-x in #793
  • [Misc] GDS backend: use the SafeTensors format for metadata by @da-x in #783
  • [FSConnector] improve performance with asyncio and direct read to memory object by @guymguym in #740
  • [bugfix] add VLLMBufferLayerwiseGPUConnector in union type by @chunxiaozheng in #797
  • [Enhancement] Support for ROCm on LMCache by @vllmellm in #702
  • [bugfix] set gds_path by defaults by @chunxiaozheng in #798
  • [Enhacement] Improve layerwise cache store/load by @YaoJiayi in #794
  • [optimize] support don't save unfull chunk by @chunxiaozheng in #804
  • [bugfix] set use_mla in lmcache metadata by @chunxiaozheng in #810
  • [#771]Support dynamic loading of external remote connector implementations by @maobaolong in #774
  • feat: enhance controller manager to support JSON messages from Mooncake by @xiaguan in #799
  • [CI]: Add static checker for GitHub actions workflows by @hickeyma in #808
  • [CI/Build]: Build container image when a new release is published by @hickeyma in #784
  • [Mooncake] Config: Use extra config in LMCacheEngine by @stmatengss in #806
  • [CI]: Stable NIXL checkout for Dockerfile by @sammshen in #824
  • [Enhancement] PD proxy optimization by @YaoJiayi in #809
  • [bugfix] delete duplicate header files by @LuyuZhang00 in #827
  • [Doc] The description of performance snapshot in README.md is not readable. by @shwgao in #816
  • [Docs] fix lmcache config file environment variable name in docs by @diabloneo in #826
  • [Fix] Fix race between test_gds/weka rmtree and asyncio thread loop by @sdimitro in #825
  • [CI]: Add disk cleanup as a GH action by @hickeyma in #830
  • [Refactor] Unify nixl and offloading code paths and add batch_put interface by @YaoJiayi in #831
  • [Doc] Add the introduction and install instructions for P/D disagg and NIXL by @ApostaC in #837
  • [Doc] Clarify enable_prefix_caching=False for cold-start benchmark by @amulil in #834
  • Reduce space cost to 1/TP while using MLA by @maobaolong in #803
  • add username and password for redis connector by @calvin0327 in #737
  • [Bugfix] add health probe for kubernetes by @zerofishnoodles in #846
  • hotfix: revert mooncake setup function for compatibility by @xiaguan in #844
  • [optimize] simplify the implementation of MLA by @chunxiaozheng in #780
  • [optimize] support timeout when get blocking by @chunxiaozheng in #849
  • [Misc] Use max_num_batched_tokens from Vllm to Determine Batch Size in GPU Connector by @huaxuan250 in #870
  • [Misc] GdsBackend: add internal fallback to POSIX APIs by @da-x in #811
  • Fix old vllm compatible issue by update LMCacheConnectorV1Impl by @maobaolong in #853
  • [Bugfix]: unexpected argument ‘config’ in MooncakestoreConnectorAdapter by @popsiclexu in #874
  • [Doc] Completed example "share_kv_cache" in "quickstart" by @TeenSpirit1107 in #850
  • [hotfix] Add host to health probe to improve flexibility by @zerofishnoodles in #856
  • [FEAT] Add package support for runai and tensorize model loading by @zerofishnoodles in #868
  • [CI]: Correctness through MMLU by @sammshen in #769
  • [Doc] Update README to include newsletter sign-up option in connection section by @kobe0938 in #883
  • [Misc] Sort model names in kv_cache_calculator by @Unprincess17 in #876
  • Fix double invoke ref_count_down issue while using audit connector by @maobaolong in #877
  • [optimize] optimize pin/unpin log level by @chunxiaozheng in #887
  • [Feat] GdsBackend: allow to pass use_direct_io flag by @da-x in #862
  • [MLA] Fix default remote_serde by @Shaoting-Feng in #885
  • [Core]SGLang End to End Integration by @Oasis-Git in #869
  • [Bugfix] Only retrieve the LMCache "hit" chunk during the KV cache load by @ApostaC in #884
  • [#842] Maintain the vllm lmcache_connector for v1 in lmcache repository itself by @maobaolong in #843
  • FSConnector#get should not output error log for FileNotFoundError by @maobaolong in #878
  • [test]: Update test workflow to improve test coverage metrics by @hickeyma in #823
  • [Core] Support multimodal models that use mm_hashes in vLLM by @Shaoting-Feng in #882
  • Refactor remote connector to make it easy to extends by @maobaolong in #858
  • [Refactor] Unify layerwise and non-layerwise code paths by @YaoJiayi in #833
  • [Bugfix] fix batched insert in lookup server by @chunxiaozheng in #902
  • [Bugfix]: MooncakestoreConnector init error (#904) by @jeremyzhang866 in #905
  • [Bugfix] Fix layerwise buffer size by @YaoJiayi in #908

New Contributors

Read more

v0.3.0

28 May 20:25
df5c717
Compare
Choose a tag to compare

LMCache v0.3.0 is a feature release. Users are encouraged to upgrade for the best experience.

Highlights

  • Documentation updated and improved
  • CPU support added
  • Full support added for vLLM V1 integration
  • Support added for XpYd
  • Bug fixes

What's Changed

Read more

Release v0.2.1

24 Apr 17:53
21b0dab
Compare
Choose a tag to compare

Release generated from tag v0.2.1.
Built with CUDA 12.4 wheels and uploaded to PyPI.

Release v0.2.0

24 Apr 01:54
6e17f2a
Compare
Choose a tag to compare

Release generated from tag v0.2.0.
Built with CUDA 12.4 wheels and uploaded to PyPI.

v0.1.4-alpha

10 Dec 20:07
7d34435
Compare
Choose a tag to compare
v0.1.4-alpha Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: v0.1.3-alpha...v0.1.4-alpha

v0.1.3-alpha

29 Oct 23:10
a15c89c
Compare
Choose a tag to compare
v0.1.3-alpha Pre-release
Pre-release

Key Features

  • Supporting chunked prefill in vLLM
  • Faster KV loading for multi-turn conversation by saving KV at the decoding time
  • Experimental KV blending feature to enable reusing non-prefix KV caches
  • New model support: llama-3.1 and qwen-2
  • Adding documentations (now available at docs.lmcache.ai
  • Better examples in examples/ folder

What's Changed

New Contributors

Full Changelog: v0.1.2-alpha...v0.1.3-alpha

v0.1.2-alpha

20 Sep 23:10
0d15ee3
Compare
Choose a tag to compare
v0.1.2-alpha Pre-release
Pre-release

What's changed

  • Integrate with latest vLLM (0.6.1.post2)
  • Supporting pip install lmcache