Releases: LMCache/LMCache
v0.3.3
v0.3.3 sees some performance improvements from the last release.
What's Changed
- [DOC] [ROCm]: Update LMCache installation procedure for ROCm by @vllmellm in #1037
- [CD]: Clean up unnecessary manylinux base file by @sammshen in #1043
- [Refactor]: Remove unnecessary build package dependency by @hickeyma in #1059
- [CI] Speedup by canceling previous commit run by @Shaoting-Feng in #1067
- [Refactor] Open up interface for storing hashes by @YaoJiayi in #1040
- [Bugfix] wrong key check in kv_controller.py by @wxsms in #1053
- [CI] Fix step dependency on unit test by @Shaoting-Feng in #1069
- [Core] XPYD support by @YaoJiayi in #895
- Update modelconfig.json by @JasmondL in #1048
- [Bugfix] Make nixl import lazy by @YaoJiayi in #1080
- [CI/Build] Optimization for multiple buildkite agents by @Shaoting-Feng in #1074
- [bugfix] adapt old vllm version by @chunxiaozheng in #1072
- [Doc] fix config file name consistency in KV cache sharing example by @yankay in #1083
- [CI/Build][Doc] Remove deprecated pipelined_backend configuration option from testcase, documentation and examples by @yankay in #1085
- [CI/Build] Refactor integration test exit and cleanup by @Shaoting-Feng in #1090
- [bug]: unpin unretrieved by @sammshen in #1092
- extra indentation from 1092 by @sammshen in #1093
- [Bugfix] Lookup compatibility with tp by @Shaoting-Feng in #1101
- Bump ossf/scorecard-action from 2.4.1 to 2.4.2 by @dependabot[bot] in #995
- Bump step-security/harden-runner from 2.12.1 to 2.12.2 by @dependabot[bot] in #994
- [ci]: Refactor docs workflow by @hickeyma in #1098
- [security]: Tighten runner security by specifying endpoints by @hickeyma in #1097
- [fix]: Enable container access for score card and PyPI workflow by @hickeyma in #1103
- Introduce a remote monitor thread to monitor remote and support fallback to blackhole by @maobaolong in #764
- [Misc] Added unit tests for local cpu and disk backend by @weicaivi in #1041
- [Fix][P2P] Set retrieved token mask for P2P mode by @zejunchen-zejun in #1105
- [Example]: reversed conditional in 1p1d by @sammshen in #1076
- [Add] fix not implemented error in gds by @ApostaC in #1115
- [Bugfix] Metadata file path missing suffix in GdsBackend and WekaGdsB… by @YurianStormrage in #1026
- [Core] Make PD and offloading compatible by @YaoJiayi in #1112
- [Doc] Updating community meeting time by @YuhanLiu11 in #1124
- [Doc] Fix documentation error by @YuhanLiu11 in #1125
- [bugfix] fix AttributeError in LMCacheConnectorMetadata by @chunxiaozheng in #1120
- Revert "[ci]: Refactor docs workflow" by @hickeyma in #1129
- [bugfix] adapt old vllm version by @chunxiaozheng in #1119
- Add structure for pluggable hash libraries by @hickeyma in #1089
- [Bugfix] Assertion Error for CPU offloading in PD by @vladnosiv in #1132
- [CI/Build] Check end to end results by @Shaoting-Feng in #1126
- [Bugfix] Lazy init for kv caches pointer by @YaoJiayi in #1135
- [Bugfix] Fix unit test by @YaoJiayi in #1137
- [fix]: zeroing out separator token positions by @sammshen in #1108
- [Connector] Support config multiply base dir for fs remote connector by @maobaolong in #1058
- [Doc] Fix documentation error by @qaz-t in #1148
- [ADAPTER|vLLM] fix: Handle the prefix cache hit >= num_external_hit_tokens case avoid storing invoked by @maobaolong in #1096
- [CD]: nightly build patch by @sammshen in #1164
- [Bugfix] Fix torch mem alloc by @YaoJiayi in #1163
- [Bugfix] Cross-process cache sharing by @junl666 in #1140
- [Refactor] refactor disk and prefetch code paths by @YaoJiayi in #1172
- [Bugfix] Concurrent LMCache instances on the same machine conflict due to a hardcoded IPC socket path, causing a race condition. by @Foreverythin in #1166
- [chore]: Change code headers to SPDX License headers by @hickeyma in #1145
- [Bugfix] Decode in PD never makes a cache hit and causes OOM by @vladnosiv in #1133
- [CI]: isolate versioning fix by @sammshen in #1199
- [CI]: Refactor/Fix CI by @sammshen in #1198
- [CI]: follow up fix gpu contention by @sammshen in #1201
- [CI]: final fix to find-free-gpu.sh to avoid CI race conditions by @sammshen in #1202
- Introduce create_lookup_server_only_on_worker_0 extra config by @maobaolong in #1144
- [Enhancement] Add move and compress controller APIs by @YaoJiayi in #1206
- [opt]: evict suffixes before prefixes by @sammshen in #1154
- [Enhancement] Update more controller apis by @YaoJiayi in #1217
- [bugfix] fix save unfull chunk in vllm_v1_adapter by @chunxiaozheng in #1170
New Contributors
- @wxsms made their first contribution in #1053
- @JasmondL made their first contribution in #1048
- @yankay made their first contribution in #1083
- @weicaivi made their first contribution in #1041
- @zejunchen-zejun made their first contribution in #1105
- @YurianStormrage made their first contribution in #1026
- @YuhanLiu11 made their first contribution in #1124
- @vladnosiv made their first contribution in #1132
- @qaz-t made their first contribution in #1148
- @junl666 made their first contribution in #1140
- @Foreverythin made their first contribution in #1166
Full Changelog: v0.3.2...v0.3.3
v0.3.2
LMCache v0.3.2 is a patch release. Users are encouraged to upgrade for the best experience.
Highlights
- Support added for vLLM v0.9.1
- Addition of vLLM integration tests to minimize compatibility issues with vLLM integration and improve robustness
- Addition of dynamic connectors to support older versions of vLLM
What's Changed
- [Bugfix] Wrong token truncating by @YaoJiayi in #911
- [Docs] Dynamic Connector by @sammshen in #918
- [bugfix]: Fix nixl_peer_host and nixl_peer_port to nixl_receiver_host and nixl_receiver_port by @aztecher in #921
- Update README.md by @Hanchenli in #925
- [Core] Add batched get interface by @YaoJiayi in #924
- Docs: fix broken links in offload_kv_cache.rst by @csbo98 in #931
- support MLA format for sglang gpu connector by @llc-kc in #903
- [Core] Add Paged Memory Allocator by @YaoJiayi in #932
- [Doc] Consistent tagline with github repo by @Siddhant-Ray in #935
- Bump docker/setup-buildx-action from 3.10.0 to 3.11.1 by @dependabot[bot] in #893
- Bump step-security/harden-runner from 2.12.0 to 2.12.1 by @dependabot[bot] in #848
- [Test]: Make the unit test coverage report available by @hickeyma in #944
- [Bugfix] Adapt to latest vllm version by @YaoJiayi in #937
- [Chore]: Add badges to readme by @hickeyma in #946
- Bugfix: Pin Unpin notImplementedError by @4D0R in #959
- [Doc]Added Mooncake documentation by @xiaguan in #941
- Remove redundant and incorrect truncation of the last n tokens. by @yoo-kumaneko in #974
- [Security]: Add OpenSSF Scorecard Security Scanning by @hickeyma in #965
- [Doc] Remove unmatched a tag by @xleoken in #979
- [Doc] Fix broken link by @carlory in #986
- [Bug]: Double Free by @sammshen in #993
- [Refactor] Memory management 1/N (and testing) by @YaoJiayi in #992
- [Doc]: Update LMCache system requirements in documentation by @hickeyma in #991
- [ADAPT-VLLM] Support vllm 0.8.5 lmcache connector by @maobaolong in #976
- [Test]: Add integration testing for LMCache integration with vLLM by @hickeyma in #930
- Simple Adapt KV Connector by @sammshen in #997
- [Security]: Add CodeQL Security Scanning by @hickeyma in #963
- [MINOR] Improve the store log and metrics to get the real stored token exactly by @maobaolong in #972
- [Misc] Fix controller doc by @YaoJiayi in #1009
- Update the remote external connector example document by @maobaolong in #838
- perf: optimize get_num_new_matched_tokens latency in vLLM scheduler by @xiaguan in #936
- [Doc]: Improve the usability of the README file by @hickeyma in #1012
- [Misc]support infinistore_type config by @novahow in #980
- [doc] Update README by @Shaoting-Feng in #1021
- [doc] Update README by @Shaoting-Feng in #1022
- [Refactor]: Remove duplicate requirements loading in setup tools by @hickeyma in #1015
- [CI]: Add stale issue bot by @hickeyma in #1017
- [kv-cache-caculator] add Qwen3 models into caculator by @panpan0000 in #1024
- [Bugfix] stream use text/event-stream media_type by @Abirdcfly in #1011
- [Doc]: Small improvements to contributing guide by @hickeyma in #1005
- Fixed AttributeError caused by missing 'req_ids' in older vllm by @popsiclexu in #1001
- [Core] Use a faster hash function by @zhouwfang in #1020
New Contributors
- @aztecher made their first contribution in #921
- @csbo98 made their first contribution in #931
- @llc-kc made their first contribution in #903
- @yoo-kumaneko made their first contribution in #974
- @xleoken made their first contribution in #979
- @carlory made their first contribution in #986
- @novahow made their first contribution in #980
- @panpan0000 made their first contribution in #1024
- @Abirdcfly made their first contribution in #1011
- @zhouwfang made their first contribution in #1020
Full Changelog: v0.3.1...v0.3.2
v0.3.1.post1
LMCache v0.3.1.post1 is a patch release. Users are encouraged to upgrade for the best experience.
What's Changed
Full Changelog: v0.3.1...v0.3.1.post1
v0.3.1
LMCache v0.3.1 is a patch release. Users are encouraged to upgrade for the best experience.
What's Changed
- Introduce Weka Storage Backend by @sdimitro in #699
- [Bugfix] revert change made by #694 of kvcache calculation for deepseekV3 by @mengbingrock in #715
- [Doc]: Small fixes to contributing guide by @hickeyma in #743
- [style] import optimization by @chunxiaozheng in #738
- [Security]: Enable dependabot for dependency updates by @hickeyma in #720
- [Bugfix] Prevent vllm from getting stuck due to Mooncake transmission stuck by @LLLL114 in #726
- [Bugfix] fix the logic error for save_decode_cache in vllm v0 integration by @blossomin in #752
- [Bugfix] Lazy import of cufile by @YaoJiayi in #753
- [performance] reduce the number of calls to remote backend exists rpc by @chunxiaozheng in #718
- [Misc] Adding online serving example for single shot testing by @4D0R in #721
- [Examples][P/D] Examples for Xp1d using LMCache by @ApostaC in #759
- [Doc][P/D] documentation pages for LMCache PD disaggregation by @ApostaC in #768
- Bump pre-commit from 4.0.1 to 4.2.0 in the minor-update group by @dependabot in #750
- Update setuptools requirement from <80.0.0,>=77.0.3 to >=77.0.3,<81.0.0 by @dependabot in #751
- [feature] support set extra config in LMCacheEngineConfig by @chunxiaozheng in #742
- [Refactor][Bugfix] Add KV Cache format storage in local disk backend by @Shaoting-Feng in #735
- [Bugfix] Fix observability threading lock by @Shaoting-Feng in #777
- Add a generic GDS backend by @da-x in #773
- fix runtime error:Invalid device for infinistore(#502) by @thesues in #517
- [CI/Build]: Addition to Dockerfile for choice of vLLM and LMCache package versions by @hickeyma in #746
- [CI/Build]: Add nightly build container image of latest code by @hickeyma in #756
- [Core] Initial CacheBlend V1 Implemetation by @YaoJiayi in #762
- [CI/Build]: Fix linux CUDA wheel builds by @hickeyma in #775
- [Fix] missing 'int' for reading
config.cufile_buffer_sizeby @da-x in #793 - [Misc] GDS backend: use the SafeTensors format for metadata by @da-x in #783
- [FSConnector] improve performance with asyncio and direct read to memory object by @guymguym in #740
- [bugfix] add VLLMBufferLayerwiseGPUConnector in union type by @chunxiaozheng in #797
- [Enhancement] Support for ROCm on LMCache by @vllmellm in #702
- [bugfix] set gds_path by defaults by @chunxiaozheng in #798
- [Enhacement] Improve layerwise cache store/load by @YaoJiayi in #794
- [optimize] support don't save unfull chunk by @chunxiaozheng in #804
- [bugfix] set use_mla in lmcache metadata by @chunxiaozheng in #810
- [#771]Support dynamic loading of external remote connector implementations by @maobaolong in #774
- feat: enhance controller manager to support JSON messages from Mooncake by @xiaguan in #799
- [CI]: Add static checker for GitHub actions workflows by @hickeyma in #808
- [CI/Build]: Build container image when a new release is published by @hickeyma in #784
- [Mooncake] Config: Use extra config in LMCacheEngine by @stmatengss in #806
- [CI]: Stable NIXL checkout for Dockerfile by @sammshen in #824
- [Enhancement] PD proxy optimization by @YaoJiayi in #809
- [bugfix] delete duplicate header files by @LuyuZhang00 in #827
- [Doc] The description of performance snapshot in README.md is not readable. by @shwgao in #816
- [Docs] fix lmcache config file environment variable name in docs by @diabloneo in #826
- [Fix] Fix race between test_gds/weka rmtree and asyncio thread loop by @sdimitro in #825
- [CI]: Add disk cleanup as a GH action by @hickeyma in #830
- [Refactor] Unify nixl and offloading code paths and add batch_put interface by @YaoJiayi in #831
- [Doc] Add the introduction and install instructions for P/D disagg and NIXL by @ApostaC in #837
- [Doc] Clarify enable_prefix_caching=False for cold-start benchmark by @amulil in #834
- Reduce space cost to 1/TP while using MLA by @maobaolong in #803
- add username and password for redis connector by @calvin0327 in #737
- [Bugfix] add health probe for kubernetes by @zerofishnoodles in #846
- hotfix: revert mooncake setup function for compatibility by @xiaguan in #844
- [optimize] simplify the implementation of MLA by @chunxiaozheng in #780
- [optimize] support timeout when get blocking by @chunxiaozheng in #849
- [Misc] Use max_num_batched_tokens from Vllm to Determine Batch Size in GPU Connector by @huaxuan250 in #870
- [Misc] GdsBackend: add internal fallback to POSIX APIs by @da-x in #811
- Fix old vllm compatible issue by update LMCacheConnectorV1Impl by @maobaolong in #853
- [Bugfix]: unexpected argument ‘config’ in MooncakestoreConnectorAdapter by @popsiclexu in #874
- [Doc] Completed example "share_kv_cache" in "quickstart" by @TeenSpirit1107 in #850
- [hotfix] Add host to health probe to improve flexibility by @zerofishnoodles in #856
- [FEAT] Add package support for runai and tensorize model loading by @zerofishnoodles in #868
- [CI]: Correctness through MMLU by @sammshen in #769
- [Doc] Update README to include newsletter sign-up option in connection section by @kobe0938 in #883
- [Misc] Sort model names in kv_cache_calculator by @Unprincess17 in #876
- Fix double invoke ref_count_down issue while using audit connector by @maobaolong in #877
- [optimize] optimize pin/unpin log level by @chunxiaozheng in #887
- [Feat] GdsBackend: allow to pass use_direct_io flag by @da-x in #862
- [MLA] Fix default remote_serde by @Shaoting-Feng in #885
- [Core]SGLang End to End Integration by @Oasis-Git in #869
- [Bugfix] Only retrieve the LMCache "hit" chunk during the KV cache load by @ApostaC in #884
- [#842] Maintain the vllm lmcache_connector for v1 in lmcache repository itself by @maobaolong in #843
- FSConnector#get should not output error log for FileNotFoundError by @maobaolong in #878
- [test]: Update test workflow to improve test coverage metrics by @hickeyma in #823
- [Core] Support multimodal models that use
mm_hashesin vLLM by @Shaoting-Feng in #882 - Refactor remote connector to make it easy to extends by @maobaolong in #858
- [Refactor] Unify layerwise and non-layerwise code paths by @YaoJiayi in #833
- [Bugfix] fix batched insert in lookup server by @chunxiaozheng in #902
- [Bugfix]: MooncakestoreConnector init error (#904) by @jeremyzhang866 in #905
- [Bugfix] Fix layerwise buffer size by @YaoJiayi in #908
New Contributors
- @sdimitro made their first contribution in #699
- @mengbingrock made their first contribution in #715
- @LLLL114 made their first contribution in #726
- @blossomin made their first contribution in #752
- @4D0R made their first contribution in #721
- @dependabot made their first contribution in https://gi...
v0.3.0
LMCache v0.3.0 is a feature release. Users are encouraged to upgrade for the best experience.
Highlights
- Documentation updated and improved
- CPU support added
- Full support added for vLLM V1 integration
- Support added for XpYd
- Bug fixes
What's Changed
- [Doc] Add doc pipeline by @Siddhant-Ray in #532
- [Integration] modify mooncake_connector to adapt new mooncake store APIs by @stmatengss in #525
- [Bugfix][CI] Update build_doc.yml by @Siddhant-Ray in #533
- [feat] Add configuration logging by @charmway in #529
- [Doc] Rewrite the LMCache documentation by @ApostaC in #541
- [Doc] minor fixes by @Siddhant-Ray in #548
- Local CPU Backend by @sammshen in #514
- [bugfix] fix counter type prometheus metrics error by @chunxiaozheng in #524
- [Misc] Allow lookup to skip last n tokens in vllm v1 by @YaoJiayi in #550
- [Bugfix][Doc] Add community docs and fix code overflow error by @Siddhant-Ray in #556
- [Fix] Update README.md with community meeting info by @Siddhant-Ray in #557
- [metrics] add remote backend metrics by @chunxiaozheng in #552
- [Doc] [Bugfix] Add developer guide and fix all Sphinx warnings by @Siddhant-Ray in #565
- [Bugfix] Fix zmq bind/connect in controller by @YaoJiayi in #569
- [Bugfix] Keep up with latest vllm version by @YaoJiayi in #572
- pypi tiny upgrade by @sammshen in #573
- [Doc][Fix]: Add improvements for usability by @hickeyma in #575
- [Misc, Controller] Support
get_instance_idfor production stack by @YaoJiayi in #576 - [Refactor] Remove
hot_cacheinStorageManagerby @YaoJiayi in #567 - [Doc] Adding controller and compression related docs by @YaoJiayi in #571
- [Bugfix]: Add csrc files for sdist by @sammshen in #568
- Fix p2p in vllm v1 by @orozery in #543
- SO_REUSEADDR for time-wait problem by @yangxianpku in #564
- Refactor: rename RedisMetadata to RemoteMetadata by @maobaolong in #511
- [Core, CacheBlend] Add pos encoding kernel by @YaoJiayi in #540
- [feat] Add an audit remote connector to audit remote op and verify checksum by @maobaolong in #542
- [Docs]: cpu ram, local storage, redis/redis-sentinel/lmcache server by @sammshen in #558
- [CI] Fix buildkite LMCache installation issue by @ApostaC in #581
- [PD][serde] Use
msgpackinstead ofpickleto serdeNixlRequest. by @KuntaiDu in #579 - [Doc] Update modelconfig.json to have more models in KV cache size calculator by @hey-kong in #584
- [Doc] add badge and fix typos by @HuaizhengZhang in #591
- [Bugfix] Update docker_file.rst by @Siddhant-Ray in #582
- fix: Pin precise torch version to avoid vLLM compatability issues by @hickeyma in #583
- [Doc] add uv installation by @HuaizhengZhang in #596
- [Doc] doc for LMCache configuration file by @ApostaC in #580
- [Doc]: Fix the docs of mooncake.rst by @maobaolong in #615
- [Doc] Update README.md to add DeepWiki badge and clean up spacing by @HuaizhengZhang in #614
- [Fix] Link to docker in readme by @Siddhant-Ray in #600
- [Misc] Add copyright headers by @Siddhant-Ray in #597
- [Doc] correct documentation in evictor classes by @vivamilk in #535
- CI/Build Move server to GCP by @Shaoting-Feng in #613
- [Core, Controller] Define interfaces for controller by @YaoJiayi in #560
- [Enhancement] Add error handling for remote backend by @YaoJiayi in #628
- CI/Build: Add project metadata to pyproject toml file by @hickeyma in #592
- fix: the RuntimeError "Boolean value of Tensor..." of mooncakestore_connector.py by @maobaolong in #632
- [Bugfix, Refactor] Fix memory allocation failure in hierarchical storage by @YaoJiayi in #631
- [Enhancement] Async layerwise pipelining for KV cache offloading by @YaoJiayi in #625
- [feat] Add a filesystem remote connector by @maobaolong in #506
- [Misc] Improve async layerwise pipelining for cpu offloading by @YaoJiayi in #639
- [Bugfix] Fix incorrect path joining in local_disk backend by @ZhangShuaiyi in #609
- [Misc] Standardize local_disk path processing for env and file configs by @ZhangShuaiyi in #610
- CI/Build Add publishing to TestPyPi and refactor PyPi publish by @hickeyma in #611
- [Fix]: Torch version for build by @hickeyma in #644
- Docs/test pypi installation by @sammshen in #637
- [Docs] fix incorrect controller api server command by @kebe7jun in #641
- [Bugfix] Fix GPU buffer allocator by @YaoJiayi in #645
- [Doc]: Add comments on torch version and how its synchronized with vLLM by @hickeyma in #633
- [Core] [PD] Support for multiple NIXL pipes by @ApostaC in #528
- [Misc]Add unit tests for FSConnector by @zhongmingyuan in #646
- [Bugfix] Fix incorrect single-token saves in v1 by @orozery in #653
- support KV transfer of MLA backend by @chenqianfzh in #428
- [Bugfix] compatibility with vLLM 0.9.0 by @ApostaC in #655
- [Docs]: update instructions on test pypi installation by @sammshen in #647
- CI/Build Use latest branch of vLLM for Nightly E2E tests by @Shaoting-Feng in #668
- [Bugfix] Remove nixl dependency for now by @YaoJiayi in #672
- [Bugfix] Fix LMCacheConnector for latest vllm by @YaoJiayi in #677
- [Misc]support deepseek-v3 mla in kv cache size calculator by @zzhbrr in #671
- CI/Build: Update docker specification to be compatible with latest vLLM OpenAI server by @hickeyma in #665
- [Bugfix] tutorials broken by KVTransferConfig API changes in vLLM v0.9.0 by @leeeizhang in #664
- fix issue 675 by @chenqianfzh in #676
- add lookup api response by @calvin0327 in #674
- Improve the store log of vllm v1 adapter by @maobaolong in #673
- [Bugfix] support vllm v1 prometheus multiprocess exporter by @IRONICBo in #687
- [Bugfix] Fix layerwise KV cache transfer by @Shaoting-Feng in #670
- [Fix] Fix mismatches between received and expected values in nixl test by @ZhangShuaiyi in #663
- Micro benchmark for VLLMPagedMemGPUConnectorV2.toGpu and a performance fix by @yanok in #678
- [Bugfix]: Update KV cache calculator, fix calculation issues for Qwen3 series and DeepSeek-V3 by @hammersam in #694
- [Misc] Fix docker example script by @wwl2755 in #685
- CI/Build: Centralize requirement files by @hickeyma in #695
- [Misc] Update usage_context.py by @Siddhant-Ray in #700
- fix the memoryview: unsupported format <B error at redis connector by @sydnash in #662
- [Misc] Rename experimental to v1 and refactor examples by @YaoJiayi in #704
- [Doc]: Add DCO details to contributing guide by @hickeyma in #711
- [Doc] Add usage stats collection documentation by @zhuohangu in #713
- [Bugfix] Fix missed renaming by @YaoJiayi in #716
- Refactor: handle memory_obj.ref_count_down in the RemoteBackend by @maobaolong in #650
Release v0.2.1
Release generated from tag v0.2.1.
Built with CUDA 12.4 wheels and uploaded to PyPI.
Release v0.2.0
Release generated from tag v0.2.0.
Built with CUDA 12.4 wheels and uploaded to PyPI.
v0.1.4-alpha
What's Changed
- [Doc] Update docstrings to sphinx format, add KVBlend docs, KV blend examples and graceful exit by @Siddhant-Ray in #188
- [Core] Cachegen config refactor by @Oasis-Git in https://github.com/LMCache/LMCache/pull/156he/LMCache/pull/199
- [Doc] Documentation for pulling LMCache Docker image by @Siddhant-Ray in #200
- [Misc] Remove old lmcache-server installation by @ApostaC in #202
- [Doc] Create MAINTAINERS.md by @Siddhant-Ray in #210
- [Doc] demo page doc source and installation page restructure by @Siddhant-Ray in #205
- [CI] Fix buildkite errors caused by port conflicts by @ApostaC in #216
- [Core] Fixed Hardcoded Dtype by @qyy2003 in #217
- [Core] Improve performance of
storewith memory pool by @YaoJiayi in #211 - [Core] Adding dst_device support for storage backends by @ApostaC in #214
- [Enhancement] Drop special tokens in kv blending by @XbzOnGit in #209
- [Core]lm_connector with bytearray by @Oasis-Git in #223
- [Doc] Update README.md by @Hanchenli in #229
- [Bugfix] fix disk contention bug by @YaoJiayi in #234
- [Doc] Adding KV cache size calculator to the repo by @ApostaC in #243
- [Misc] Correct URL and add contribution by @zhuohangu in #245
- [Bugfix] Fix cpu buffer memory leak by @YaoJiayi in #247
- [Bugfix] Adding LRUEvictor back by @YaoJiayi in #253
- [Misc] Show cache hit rate. by @Second222None in #255
- [Doc] Update docker and memory pool docs by @Siddhant-Ray in #260
- [Bugfix] Misaligned evictor and mempool by @YaoJiayi in #262
- [Core] refactor(connector): support connector storing tensor directly without serde by @DellCurry in #239
- [Benchmark] Add a multi-round QA/chat performance benchmark by @ApostaC in #258
- [Misc] suppress the warnings when the serving engine cannot keep up with the QPS by @ApostaC in #267
- Bump version number to 0.1.4 by @ApostaC in #268
New Contributors
- @Oasis-Git made their first contribution in #156
- @Hanchenli made their first contribution in #229
- @zhuohangu made their first contribution in #245
- @Second222None made their first contribution in #255
- @DellCurry made their first contribution in #239
Full Changelog: v0.1.3-alpha...v0.1.4-alpha
v0.1.3-alpha
Key Features
- Supporting chunked prefill in vLLM
- Faster KV loading for multi-turn conversation by saving KV at the decoding time
- Experimental KV blending feature to enable reusing non-prefix KV caches
- New model support: llama-3.1 and qwen-2
- Adding documentations (now available at docs.lmcache.ai
- Better examples in
examples/folder
What's Changed
- Using
frombufferinstead of load for deserialization by @Luke20000429 in #102 - initial documentation code by @Siddhant-Ray in #111
- Add python & CUDA requirement to README by @qyy2003 in #115
- Bug fix: avoid overwriting the right value by @KuntaiDu in #120
- Fix issue 104 --
storeis slow on CPU by @ApostaC in #106 - Add format checker by @KuntaiDu in #123
- Add format checking github workflow file by @KuntaiDu in #126
- update mypy to exclude checking types in test by @KuntaiDu in #127
- Add and organize examples by @XbzOnGit in #128
- Initial user documentation by @Siddhant-Ray in #130
- Update examples and change launch methods by @XbzOnGit in #138
- Add support for llama-3.1-8b-instruct and qwen-2-7b by @YaoJiayi in #143
- Support saving decode KV cache to boost multi-turn conversation by @YaoJiayi in #149
- Update documentation by @Siddhant-Ray in #154
- Fixed bugs in Docs installation by @qyy2003 in #155
- Create CODE_OF_CONDUCT.md by @Siddhant-Ray in #157
- [Doc] : Create SECURITY.md by @Siddhant-Ray in #159
- [Doc] Add contributing guidelines and PR template by @Siddhant-Ray in #160
- Adding a mask for retrieve to avoid repetitive data loading by @YaoJiayi in #153
- [Refactor] Add support for "dtype" (KV cache storage data type) in LMCacheEngineMetadata by @Alex-q-z in #108
- Add Developer Documentation by @qyy2003 in #172
- CacheBlend integration by @ApostaC in #121
- [Core] Add LRU eviction policy by @YaoJiayi in #162
- Add lookup and allow passing a part of KV cache to store by @XbzOnGit in #164
- [Misc] Disabling eviction by using a dummy evictor by @ApostaC in #185
- [Fix] non-blocking store blocks the model inference by @ApostaC in #186
- Bump version number to 0.1.3 by @ApostaC in #187
New Contributors
- @Luke20000429 made their first contribution in #102
- @Siddhant-Ray made their first contribution in #111
- @qyy2003 made their first contribution in #115
- @KuntaiDu made their first contribution in #120
- @XbzOnGit made their first contribution in #128
- @Alex-q-z made their first contribution in #108
Full Changelog: v0.1.2-alpha...v0.1.3-alpha
v0.1.2-alpha
What's changed
- Integrate with latest vLLM (0.6.1.post2)
- Supporting
pip install lmcache