forked from zpoint/CPython-Internals
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathbytes.md
More file actions
90 lines (54 loc) · 2.6 KB
/
Copy pathbytes.md
File metadata and controls
90 lines (54 loc) · 2.6 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# bytes
# contents
* [related file](#related-file)
* [memory layout](#memory-layout)
* [example](#example)
* [empty bytes](#empty-bytes)
* [ascii characters](#ascii-characters)
* [nonascii characters](#nonascii-characters)
* [summary](#summary)
* [ob_shash](#ob_shash)
* [ob_size](#ob_size)
* [summary](#summary)
# related file
* cpython/Objects/bytesobject.c
* cpython/Include/bytesobject.h
* cpython/Objects/clinic/bytesobject.c.h
# memory layout

The memory layout of **PyBytesObject** looks like [memory layout of tuple object](https://github.com/zpoint/CPython-Internals/blob/master/BasicObject/tuple/tuple.md#memory-layout) and [memory layout of int object](https://github.com/zpoint/CPython-Internals/blob/master/BasicObject/long/long.md#memory-layout), but simpler than any of them.
# example
## empty bytes
**bytes** object is an immutable object, whenever you need to modify a **bytes** object, you need to create a new one, which keeps the implementation simple.
```python3
s = b""
```

## ascii characters
let's initialize a byte object with ascii characters
```python3
s = b"abcdefg123"
```

## nonascii characters
```python3
s = "我是帅哥".encode("utf8")
```

# summary
## ob_shash
The field **ob_shash** should store the hash value of the byte object, value **-1** means not computed yet.
The first time the hash value computed, it will be cached in the **ob_shash** field
the cached hash value can save recalculation and speeds up dictionary lookups
## ob_size
field **ob_size** is inside every **PyVarObject**, the **PyBytesObject** uses this **field** to store size information to keep O(1) time complexity for **len()** operation and tracks the size of non-ascii string(may be null characters inside)
## summary
The **PyBytesObject** is a python wrapper of c style null terminate string, with **ob_shash** for caching hash value and **ob_size** for storing the size information of **PyBytesObject**
The implementation of **PyBytesObject** looks like the **embstr** encoding in redis
```shell script
redis-cli
127.0.0.1:6379> set a "hello"
OK
127.0.0.1:6379> object encoding a
"embstr"
```