msgpack-python/README.md

243 lines
6.9 KiB
Markdown
Raw Normal View History

2019-12-12 18:25:38 +09:00
# MessagePack for Python
2009-06-27 12:03:00 +09:00
2023-04-08 02:18:25 -03:00
[![Build Status](https://github.com/msgpack/msgpack-python/actions/workflows/wheel.yml/badge.svg)](https://github.com/msgpack/msgpack-python/actions/workflows/wheel.yml)
2019-12-12 18:25:38 +09:00
[![Documentation Status](https://readthedocs.org/projects/msgpack-python/badge/?version=latest)](https://msgpack-python.readthedocs.io/en/latest/?badge=latest)
2017-01-13 20:48:48 +09:00
2025-10-08 09:10:46 +02:00
## What is this?
2020-02-06 20:35:41 +09:00
[MessagePack](https://msgpack.org/) is an efficient binary serialization format.
It lets you exchange data among multiple languages like JSON.
But it's faster and smaller.
This package provides CPython bindings for reading and writing MessagePack data.
2019-12-12 18:25:38 +09:00
## Install
2013-02-24 18:06:50 +09:00
2020-12-18 16:13:35 +09:00
```
$ pip install msgpack
```
2019-12-12 18:25:38 +09:00
### Pure Python implementation
2024-05-04 16:10:37 +09:00
The extension module in msgpack (`msgpack._cmsgpack`) does not support PyPy.
2024-05-04 16:10:37 +09:00
But msgpack provides a pure Python implementation (`msgpack.fallback`) for PyPy.
2013-02-24 18:06:50 +09:00
2019-12-12 18:25:38 +09:00
### Windows
2013-02-24 18:06:50 +09:00
2025-10-08 09:10:46 +02:00
If you can't use a binary distribution, you need to install Visual Studio
or the Windows SDK on Windows.
Without the extension, the pure Python implementation on CPython runs slowly.
2013-02-24 18:06:50 +09:00
2019-12-12 18:25:38 +09:00
## How to use
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
### One-shot pack & unpack
2012-06-27 18:05:35 +09:00
2020-02-06 21:06:04 +09:00
Use `packb` for packing and `unpackb` for unpacking.
2025-10-08 09:10:46 +02:00
msgpack provides `dumps` and `loads` as aliases for compatibility with
2020-02-06 21:06:04 +09:00
`json` and `pickle`.
2012-06-27 18:05:35 +09:00
2025-10-08 09:10:46 +02:00
`pack` and `dump` pack to a file-like object.
`unpack` and `load` unpack from a file-like object.
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
2024-05-04 16:10:37 +09:00
>>> msgpack.packb([1, 2, 3])
2020-12-18 16:13:35 +09:00
'\x93\x01\x02\x03'
2024-05-04 16:10:37 +09:00
>>> msgpack.unpackb(_)
2020-12-18 16:13:35 +09:00
[1, 2, 3]
2019-12-12 18:25:38 +09:00
```
2012-06-27 18:05:35 +09:00
2024-05-06 02:12:46 +09:00
Read the docstring for options.
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
### Streaming unpacking
2012-06-27 18:05:35 +09:00
2020-02-06 21:06:04 +09:00
`Unpacker` is a "streaming unpacker". It unpacks multiple objects from one
stream (or from bytes provided through its `feed` method).
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
```py
2020-12-18 16:13:35 +09:00
import msgpack
from io import BytesIO
2012-06-27 18:05:35 +09:00
2020-12-18 16:13:35 +09:00
buf = BytesIO()
for i in range(100):
2024-05-04 16:10:37 +09:00
buf.write(msgpack.packb(i))
2012-06-27 18:05:35 +09:00
2020-12-18 16:13:35 +09:00
buf.seek(0)
2012-06-27 18:05:35 +09:00
2024-05-04 16:10:37 +09:00
unpacker = msgpack.Unpacker(buf)
2020-12-18 16:13:35 +09:00
for unpacked in unpacker:
print(unpacked)
2019-12-12 18:25:38 +09:00
```
2009-06-27 12:03:00 +09:00
2012-12-06 23:36:16 +11:00
2025-10-08 09:10:46 +02:00
### Packing/unpacking of custom data types
2012-12-06 23:36:16 +11:00
It is also possible to pack/unpack custom data types. Here is an example for
2020-02-06 21:06:04 +09:00
`datetime.datetime`.
2019-12-12 18:25:38 +09:00
```py
2020-12-18 16:13:35 +09:00
import datetime
import msgpack
2020-12-18 16:13:35 +09:00
useful_dict = {
"id": 1,
"created": datetime.datetime.now(),
}
2020-12-18 16:13:35 +09:00
def decode_datetime(obj):
if '__datetime__' in obj:
obj = datetime.datetime.strptime(obj["as_str"], "%Y%m%dT%H:%M:%S.%f")
return obj
2020-12-18 16:13:35 +09:00
def encode_datetime(obj):
if isinstance(obj, datetime.datetime):
return {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f")}
return obj
2024-05-04 16:10:37 +09:00
packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime)
2019-12-12 18:25:38 +09:00
```
2020-02-06 21:06:04 +09:00
`Unpacker`'s `object_hook` callback receives a dict; the
`object_pairs_hook` callback may instead be used to receive a list of
2012-12-06 23:10:25 +11:00
key-value pairs.
2009-06-27 12:03:00 +09:00
2024-05-06 02:12:46 +09:00
NOTE: msgpack can encode datetime with tzinfo into standard ext type for now.
See `datetime` option in `Packer` docstring.
2018-01-09 22:03:06 +09:00
2019-12-12 18:25:38 +09:00
### Extended types
2013-10-19 18:43:16 +02:00
It is also possible to pack/unpack custom data types using the **ext** type.
2013-10-20 23:27:32 +09:00
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
>>> import array
>>> def default(obj):
... if isinstance(obj, array.array) and obj.typecode == 'd':
... return msgpack.ExtType(42, obj.tostring())
... raise TypeError("Unknown type: %r" % (obj,))
...
>>> def ext_hook(code, data):
... if code == 42:
... a = array.array('d')
... a.fromstring(data)
... return a
... return ExtType(code, data)
...
>>> data = array.array('d', [1.2, 3.4])
2024-05-04 16:10:37 +09:00
>>> packed = msgpack.packb(data, default=default)
>>> unpacked = msgpack.unpackb(packed, ext_hook=ext_hook)
2020-12-18 16:13:35 +09:00
>>> data == unpacked
True
2019-12-12 18:25:38 +09:00
```
2013-10-19 18:43:16 +02:00
2019-12-12 18:25:38 +09:00
### Advanced unpacking control
2020-02-06 21:06:04 +09:00
As an alternative to iteration, `Unpacker` objects provide `unpack`,
2025-10-08 09:10:46 +02:00
`skip`, `read_array_header`, and `read_map_header` methods. The former two
read an entire message from the stream, respectively deserializing and returning
the result, or ignoring it. The latter two methods return the number of elements
in the upcoming container, so that each element in an array, or key-value pair
in a map, can be unpacked or skipped individually.
2019-12-12 18:25:38 +09:00
## Notes
2025-10-08 09:10:46 +02:00
### String and binary types in the old MessagePack spec
Early versions of msgpack didn't distinguish string and binary types.
The type for representing both string and binary types was named **raw**.
2020-02-06 21:06:04 +09:00
You can pack into and unpack from this old spec using `use_bin_type=False`
and `raw=True` options.
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
>>> msgpack.unpackb(msgpack.packb([b'spam', 'eggs'], use_bin_type=False), raw=True)
2020-12-18 16:13:35 +09:00
[b'spam', b'eggs']
>>> msgpack.unpackb(msgpack.packb([b'spam', 'eggs'], use_bin_type=True), raw=False)
2020-12-18 16:13:35 +09:00
[b'spam', 'eggs']
2019-12-12 18:25:38 +09:00
```
2019-12-12 18:25:38 +09:00
### ext type
2025-10-08 09:10:46 +02:00
To use the **ext** type, pass a `msgpack.ExtType` object to the packer.
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
>>> packed = msgpack.packb(msgpack.ExtType(42, b'xyzzy'))
>>> msgpack.unpackb(packed)
ExtType(code=42, data='xyzzy')
2019-12-12 18:25:38 +09:00
```
2020-02-06 21:06:04 +09:00
You can use it with `default` and `ext_hook`. See below.
2019-12-12 18:25:38 +09:00
### Security
2019-12-09 17:02:35 +09:00
2025-10-08 09:10:46 +02:00
When unpacking data received from an unreliable source, msgpack provides
2019-12-09 17:02:35 +09:00
two security options.
2020-02-06 21:06:04 +09:00
`max_buffer_size` (default: `100*1024*1024`) limits the internal buffer size.
2025-10-08 09:10:46 +02:00
It is also used to limit preallocated list sizes.
2012-12-06 22:26:39 +09:00
2020-02-06 21:06:04 +09:00
`strict_map_key` (default: `True`) limits the type of map keys to bytes and str.
2025-10-08 09:10:46 +02:00
While the MessagePack spec doesn't limit map key types,
there is a risk of a hash DoS.
2020-02-06 21:06:04 +09:00
If you need to support other types for map keys, use `strict_map_key=False`.
2019-12-09 17:02:35 +09:00
2019-12-12 18:25:38 +09:00
### Performance tips
2012-12-06 22:26:39 +09:00
2025-10-08 09:10:46 +02:00
CPython's GC starts when the number of allocated objects grows.
This means unpacking may trigger unnecessary GC.
You can use `gc.disable()` when unpacking a large message.
2012-12-06 22:26:39 +09:00
2025-10-08 09:10:46 +02:00
A list is the default sequence type in Python.
However, a tuple is lighter than a list.
2020-02-06 21:06:04 +09:00
You can use `use_list=False` while unpacking when performance is important.
2024-05-04 16:10:37 +09:00
## Major breaking changes in the history
### msgpack 0.5
2025-10-08 09:10:46 +02:00
The package name on PyPI was changed from `msgpack-python` to `msgpack` in 0.5.
2024-05-04 16:10:37 +09:00
When upgrading from msgpack-0.4 or earlier, do `pip uninstall msgpack-python` before
`pip install -U msgpack`.
### msgpack 1.0
* Python 2 support
2025-10-08 09:10:46 +02:00
* The extension module no longer supports Python 2.
2024-05-04 16:10:37 +09:00
The pure Python implementation (`msgpack.fallback`) is used for Python 2.
* msgpack 1.0.6 drops official support of Python 2.7, as pip and
2025-10-08 09:10:46 +02:00
GitHub Action "setup-python" no longer supports Python 2.7.
2024-05-04 16:10:37 +09:00
* Packer
* Packer uses `use_bin_type=True` by default.
2025-10-08 09:10:46 +02:00
Bytes are encoded in the bin type in MessagePack.
* The `encoding` option is removed. UTF-8 is always used.
2024-05-04 16:10:37 +09:00
* Unpacker
2025-10-08 09:10:46 +02:00
* Unpacker uses `raw=False` by default. It assumes str values are valid UTF-8 strings
and decodes them to Python str (Unicode) objects.
2024-05-04 16:10:37 +09:00
* `encoding` option is removed. You can use `raw=True` to support old format (e.g. unpack into bytes, not str).
2025-10-08 09:10:46 +02:00
* The default value of `max_buffer_size` is changed from 0 to 100 MiB to avoid DoS attacks.
2024-05-04 16:10:37 +09:00
You need to pass `max_buffer_size=0` if you have large but safe data.
2025-10-08 09:10:46 +02:00
* The default value of `strict_map_key` is changed to True to avoid hash DoS.
You need to pass `strict_map_key=False` if you have data that contain map keys
whose type is neither bytes nor str.