2019-12-12 18:25:38 +09:00
# MessagePack for Python
2009-06-27 12:03:00 +09:00
2023-04-08 02:18:25 -03:00
[](https://github.com/msgpack/msgpack-python/actions/workflows/wheel.yml)
2019-12-12 18:25:38 +09:00
[](https://msgpack-python.readthedocs.io/en/latest/?badge=latest)
2024-06-25 17:55:54 +05:30
[](https://piptrends.com/package/msgpack)
2017-01-13 20:48:48 +09:00
2019-12-12 18:25:38 +09:00
## What's this
2018-01-11 17:02:41 +09:00
2020-02-06 20:35:41 +09:00
[MessagePack ](https://msgpack.org/ ) is an efficient binary serialization format.
2018-01-11 17:02:41 +09:00
It lets you exchange data among multiple languages like JSON.
But it's faster and smaller.
This package provides CPython bindings for reading and writing MessagePack data.
2019-12-12 18:25:38 +09:00
## Install
2013-02-24 18:06:50 +09:00
2020-12-18 16:13:35 +09:00
```
$ pip install msgpack
```
2019-12-05 21:34:10 +09:00
2019-12-12 18:25:38 +09:00
### Pure Python implementation
2019-11-28 20:23:34 +09:00
2024-05-04 16:10:37 +09:00
The extension module in msgpack (`msgpack._cmsgpack` ) does not support PyPy.
2019-11-28 20:23:34 +09:00
2024-05-04 16:10:37 +09:00
But msgpack provides a pure Python implementation (`msgpack.fallback` ) for PyPy.
2013-02-24 18:06:50 +09:00
2019-12-12 18:25:38 +09:00
### Windows
2013-02-24 18:06:50 +09:00
2017-10-16 20:30:55 -07:00
When you can't use a binary distribution, you need to install Visual Studio
2015-11-08 17:34:52 +09:00
or Windows SDK on Windows.
2017-10-16 20:30:55 -07:00
Without extension, using pure Python implementation on CPython runs slowly.
2013-02-24 18:06:50 +09:00
2012-12-07 11:35:16 +09:00
2019-12-12 18:25:38 +09:00
## How to use
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
### One-shot pack & unpack
2012-06-27 18:05:35 +09:00
2020-02-06 21:06:04 +09:00
Use `packb` for packing and `unpackb` for unpacking.
msgpack provides `dumps` and `loads` as an alias for compatibility with
`json` and `pickle` .
2012-06-27 18:05:35 +09:00
2020-02-06 21:06:04 +09:00
`pack` and `dump` packs to a file-like object.
`unpack` and `load` unpacks from a file-like object.
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
2024-05-04 16:10:37 +09:00
>>> msgpack.packb([1, 2, 3])
2020-12-18 16:13:35 +09:00
'\x93\x01\x02\x03'
2024-05-04 16:10:37 +09:00
>>> msgpack.unpackb(_)
2020-12-18 16:13:35 +09:00
[1, 2, 3]
2019-12-12 18:25:38 +09:00
```
2012-06-27 18:05:35 +09:00
2024-05-06 02:12:46 +09:00
Read the docstring for options.
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
### Streaming unpacking
2012-06-27 18:05:35 +09:00
2020-02-06 21:06:04 +09:00
`Unpacker` is a "streaming unpacker". It unpacks multiple objects from one
stream (or from bytes provided through its `feed` method).
2012-06-27 18:05:35 +09:00
2019-12-12 18:25:38 +09:00
```py
2020-12-18 16:13:35 +09:00
import msgpack
from io import BytesIO
2012-06-27 18:05:35 +09:00
2020-12-18 16:13:35 +09:00
buf = BytesIO()
for i in range(100):
2024-05-04 16:10:37 +09:00
buf.write(msgpack.packb(i))
2012-06-27 18:05:35 +09:00
2020-12-18 16:13:35 +09:00
buf.seek(0)
2012-06-27 18:05:35 +09:00
2024-05-04 16:10:37 +09:00
unpacker = msgpack.Unpacker(buf)
2020-12-18 16:13:35 +09:00
for unpacked in unpacker:
print(unpacked)
2019-12-12 18:25:38 +09:00
```
2009-06-27 12:03:00 +09:00
2012-12-06 23:36:16 +11:00
2019-12-12 18:25:38 +09:00
### Packing/unpacking of custom data type
2012-10-12 13:32:29 +03:00
2012-12-06 23:36:16 +11:00
It is also possible to pack/unpack custom data types. Here is an example for
2020-02-06 21:06:04 +09:00
`datetime.datetime` .
2012-10-12 13:32:29 +03:00
2019-12-12 18:25:38 +09:00
```py
2020-12-18 16:13:35 +09:00
import datetime
import msgpack
2012-10-12 13:32:29 +03:00
2020-12-18 16:13:35 +09:00
useful_dict = {
"id": 1,
"created": datetime.datetime.now(),
}
2012-10-12 13:32:29 +03:00
2020-12-18 16:13:35 +09:00
def decode_datetime(obj):
if '__datetime__' in obj:
obj = datetime.datetime.strptime(obj["as_str"], "%Y%m%dT%H:%M:%S.%f")
return obj
2012-10-12 13:32:29 +03:00
2020-12-18 16:13:35 +09:00
def encode_datetime(obj):
if isinstance(obj, datetime.datetime):
return {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f")}
return obj
2012-10-12 13:32:29 +03:00
2024-05-04 16:10:37 +09:00
packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime)
2019-12-12 18:25:38 +09:00
```
2012-10-12 13:32:29 +03:00
2020-02-06 21:06:04 +09:00
`Unpacker` 's `object_hook` callback receives a dict; the
`object_pairs_hook` callback may instead be used to receive a list of
2012-12-06 23:10:25 +11:00
key-value pairs.
2009-06-27 12:03:00 +09:00
2024-05-06 02:12:46 +09:00
NOTE: msgpack can encode datetime with tzinfo into standard ext type for now.
See `datetime` option in `Packer` docstring.
2018-01-09 22:03:06 +09:00
2019-12-12 18:25:38 +09:00
### Extended types
2013-10-19 18:43:16 +02:00
2015-11-09 00:43:52 +09:00
It is also possible to pack/unpack custom data types using the **ext** type.
2013-10-20 23:27:32 +09:00
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
>>> import array
>>> def default(obj):
... if isinstance(obj, array.array) and obj.typecode == 'd':
... return msgpack.ExtType(42, obj.tostring())
... raise TypeError("Unknown type: %r" % (obj,))
...
>>> def ext_hook(code, data):
... if code == 42:
... a = array.array('d')
... a.fromstring(data)
... return a
... return ExtType(code, data)
...
>>> data = array.array('d', [1.2, 3.4])
2024-05-04 16:10:37 +09:00
>>> packed = msgpack.packb(data, default=default)
>>> unpacked = msgpack.unpackb(packed, ext_hook=ext_hook)
2020-12-18 16:13:35 +09:00
>>> data == unpacked
True
2019-12-12 18:25:38 +09:00
```
2013-10-19 18:43:16 +02:00
2012-12-06 23:34:18 +11:00
2019-12-12 18:25:38 +09:00
### Advanced unpacking control
2012-12-06 23:34:18 +11:00
2020-02-06 21:06:04 +09:00
As an alternative to iteration, `Unpacker` objects provide `unpack` ,
`skip` , `read_array_header` and `read_map_header` methods. The former two
2017-01-11 04:04:23 +01:00
read an entire message from the stream, respectively de-serialising and returning
2012-12-06 23:34:18 +11:00
the result, or ignoring it. The latter two methods return the number of elements
in the upcoming container, so that each element in an array, or key-value pair
in a map, can be unpacked or skipped individually.
2019-12-12 18:25:38 +09:00
## Notes
2015-11-09 00:43:52 +09:00
2024-05-04 16:10:37 +09:00
### string and binary type in old msgpack spec
2015-11-09 00:43:52 +09:00
2019-12-05 21:34:10 +09:00
Early versions of msgpack didn't distinguish string and binary types.
2017-10-16 20:30:55 -07:00
The type for representing both string and binary types was named **raw** .
2015-11-09 00:43:52 +09:00
2020-02-06 21:06:04 +09:00
You can pack into and unpack from this old spec using `use_bin_type=False`
and `raw=True` options.
2015-11-09 00:43:52 +09:00
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
2023-05-21 09:26:39 +02:00
>>> msgpack.unpackb(msgpack.packb([b'spam', 'eggs'], use_bin_type=False), raw=True)
2020-12-18 16:13:35 +09:00
[b'spam', b'eggs']
2023-05-21 09:26:39 +02:00
>>> msgpack.unpackb(msgpack.packb([b'spam', 'eggs'], use_bin_type=True), raw=False)
2020-12-18 16:13:35 +09:00
[b'spam', 'eggs']
2019-12-12 18:25:38 +09:00
```
2017-10-16 20:30:55 -07:00
2019-12-12 18:25:38 +09:00
### ext type
2015-11-09 00:43:52 +09:00
2020-02-06 21:06:04 +09:00
To use the **ext** type, pass `msgpack.ExtType` object to packer.
2015-11-09 00:43:52 +09:00
2019-12-12 18:25:38 +09:00
```pycon
2020-12-18 16:13:35 +09:00
>>> import msgpack
>>> packed = msgpack.packb(msgpack.ExtType(42, b'xyzzy'))
>>> msgpack.unpackb(packed)
ExtType(code=42, data='xyzzy')
2019-12-12 18:25:38 +09:00
```
2015-11-09 00:43:52 +09:00
2020-02-06 21:06:04 +09:00
You can use it with `default` and `ext_hook` . See below.
2015-11-09 00:43:52 +09:00
2019-12-12 18:25:38 +09:00
### Security
2019-12-09 17:02:35 +09:00
To unpacking data received from unreliable source, msgpack provides
two security options.
2020-02-06 21:06:04 +09:00
`max_buffer_size` (default: `100*1024*1024` ) limits the internal buffer size.
2019-12-09 17:02:35 +09:00
It is used to limit the preallocated list size too.
2012-12-06 22:26:39 +09:00
2020-02-06 21:06:04 +09:00
`strict_map_key` (default: `True` ) limits the type of map keys to bytes and str.
2019-12-09 17:02:35 +09:00
While msgpack spec doesn't limit the types of the map keys,
there is a risk of the hashdos.
2020-02-06 21:06:04 +09:00
If you need to support other types for map keys, use `strict_map_key=False` .
2019-12-09 17:02:35 +09:00
2019-12-12 18:25:38 +09:00
### Performance tips
2012-12-06 22:26:39 +09:00
CPython's GC starts when growing allocated object.
This means unpacking may cause useless GC.
2020-02-06 21:06:04 +09:00
You can use `gc.disable()` when unpacking large message.
2012-12-06 22:26:39 +09:00
List is the default sequence type of Python.
But tuple is lighter than list.
2020-02-06 21:06:04 +09:00
You can use `use_list=False` while unpacking when performance is important.
2024-05-04 16:10:37 +09:00
## Major breaking changes in the history
### msgpack 0.5
Package name on PyPI was changed from `msgpack-python` to `msgpack` from 0.5.
When upgrading from msgpack-0.4 or earlier, do `pip uninstall msgpack-python` before
`pip install -U msgpack` .
### msgpack 1.0
* Python 2 support
* The extension module does not support Python 2 anymore.
The pure Python implementation (`msgpack.fallback` ) is used for Python 2.
* msgpack 1.0.6 drops official support of Python 2.7, as pip and
GitHub Action (setup-python) no longer support Python 2.7.
* Packer
* Packer uses `use_bin_type=True` by default.
Bytes are encoded in bin type in msgpack.
* The `encoding` option is removed. UTF-8 is used always.
* Unpacker
* Unpacker uses `raw=False` by default. It assumes str types are valid UTF-8 string
and decode them to Python str (unicode) object.
* `encoding` option is removed. You can use `raw=True` to support old format (e.g. unpack into bytes, not str).
* Default value of `max_buffer_size` is changed from 0 to 100 MiB to avoid DoS attack.
You need to pass `max_buffer_size=0` if you have large but safe data.
* Default value of `strict_map_key` is changed to True to avoid hashdos.
You need to pass `strict_map_key=False` if you have data which contain map keys
which type is not bytes or str.