forked from zpoint/CPython-Internals
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathlong.md
More file actions
214 lines (131 loc) · 7.17 KB
/
Copy pathlong.md
File metadata and controls
214 lines (131 loc) · 7.17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
# int
Thanks @MambaWong for pointing out the errors [#22](https://github.com/zpoint/CPython-Internals/issues/22) of this article
# contents
* [related file](#related-file)
* [memory layout](#memory-layout)
* [how is element stored inside](#how-is-element-stored-inside)
* [ingeter 0](#ingeter-0)
* [ingeter 1](#ingeter-1)
* [ingeter -1](#ingeter--1)
* [ingeter 1023](#ingeter-1023)
* [ingeter 32767](#ingeter-32767)
* [ingeter 32768](#ingeter-32768)
* [little endian and big endian](#little-endian-and-big-endian)
* [reserved bit](#reserved-bit)
* [small ints](#small-ints)
# related file
* cpython/Objects/longobject.c
* cpython/Include/longobject.h
* cpython/Include/longintrepr.h
# memory layout

after python3, there's only type named **int**, the **long** type in python2.x is **int** type in python3.x
the structure of **long object** looks like the structure of [tuple object](https://github.com/zpoint/CPython-Internals/blob/master/BasicObject/tuple/tuple.md#memory-layout), obviously, there's only one field to store the real **int** value, that's **ob_digit**
But how does CPython represent the variable size **int** in byte level? Let's see
# how is element stored inside
## ingeter 0
notice, when the value is 0, the **ob_digit** field doesn't store anything, the value 0 in **ob_size** indicate that **long object** represent integer 0
```python3
i = 0
```

## ingeter 1
there are two different types of **ob_digit** depends on your system.
```c
#if PYLONG_BITS_IN_DIGIT == 30
typedef uint32_t digit;
typedef int32_t sdigit;
typedef uint64_t twodigits;
typedef int64_t stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT 30
#define _PyLong_DECIMAL_SHIFT 9 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE ((digit)1000000000) /* 10 ** DECIMAL_SHIFT */
#elif PYLONG_BITS_IN_DIGIT == 15
typedef unsigned short digit;
typedef short sdigit; /* signed variant of digit */
typedef unsigned long twodigits;
typedef long stwodigits; /* signed variant of twodigits */
#define PyLong_SHIFT 15
#define _PyLong_DECIMAL_SHIFT 4 /* max(e such that 10**e fits in a digit) */
#define _PyLong_DECIMAL_BASE ((digit)10000) /* 10 ** DECIMAL_SHIFT */
```
I've modified the source code to change the value of **PYLONG_BITS_IN_DIGIT** to 15
but when it's going to represent integer 1, **ob_size** becomes 1 and field in **ob_digit** represent the value 1 with type **unsigned short**
```python3
i = 1
```

## ingeter -1
when i becomes -1, the only difference from the integer 1 is the value in **ob_size** field, CPython interpret **ob_size** as a signed type to differ the positive and negative sign
```python3
i = -1
```

## ingeter 1023
the basic unit is type **digit**, which provide 2 bytes(16bits) for storage. And 1023 takes the rightmost 10 bits,
so the value **ob_size** field is still 1.

## ingeter 32767
the integer 32767 represent in the same way as usual

## ingeter 32768
CPython don't use all the 16 bits in **digit** field, the first bit of every **digit** is reserved, we will see why later

## little endian and big endian
notice, because the **digit** is the smallest unit in the CPython abstract level, The order between bytes inside a single ob_digit is the same as your machine order(mine is little endian)
Order between **digit** in the **ob_digit** array are represent as most-significant-digit-in-the-right-most order(little endian order)
we can have a better understanding with the integer value -262143
the minus sign is stored in the **ob_size** field
the interger 262143(2^18 = 262144) in binary representation is 00000011 11111111 11111111

## reserved bit
why the left-most bit in **digit** is reserved? Why order between **digit** in the **ob_digit** array are represented as little-endian?
let's try to add two integer value
```python3
i = 1073741824 - 1 # 1 << 30 == 1073741824
j = 1
```

```python3
k = i + j
```
first, initialize a temporary **PyLongObject** with size = max(size(i), size(j)) + 1

step1, sum the firt **digit** in each **ob_digit** array to a variable named **carray**

step2, set the value in temp[0] to (carry & PyLong_MASK)

step3, right shift the carray up to the leftmost bit

step4, add the second **digit** in each **ob_digit** array to the result of **carray**

step5, set the value in temp[1] to (carry & PyLong_MASK)

step6, right shift the carray again

go to step4 and repeat until no more **digit** left, set the final carray to the last index of temp

the variable temp contains the sum, now, you see the reserved bit is used for the **carry** or **borrow** when you add/sub an integer, the **digit** in **ob_digit** array are stored in little-endian order so that the add/sub operation can process each **digit** from left to right
the sub operation is similar to the add operation, so you can read the source code directly

## small ints
CPython also use a buffer pool to store the frequently used integer
```c
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
```
let's see
```python3
c = 0
d = 0
e = 0
print(id(c), id(d), id(e)) # 4480940400 4480940400 4480940400
a = -5
b = -5
print(id(a), id(b)) # 4480940240 4480940240
f = 1313131313131313
g = 1313131313131313
print(id(f), id(g)) # 4484604176 4484604016
```
