Giter Site home page Giter Site logo

zpoint / cpython-internals Goto Github PK

View Code? Open in Web Editor NEW
4.0K 336.0 426.0 34.24 MB

Dive into CPython internals, trying to illustrate every detail of CPython implementation

Python 25.30% C 33.88% Shell 9.67% C++ 31.15%
python3 cpython python c learning-material cpython-internals interpreter

cpython-internals's Introduction

Hi there 👋 Hi there

  • 💼 I am a Back End Engineer
  • 🤔 With 4+ years development working experience with Python/C/C++, 1+ years working with Go
  • ⚡ Bachelor degree major in Chemistry, self learning CS in University

zpoint's github stats

cpython-internals's People

Contributors

antoniasymeonidou avatar birdi7 avatar cherrymelon avatar evoxtorm avatar hjlarry avatar jenix21 avatar kkxue avatar melodyyuuka avatar nayeonshin avatar samsepiol1 avatar williamfzc avatar zhuozhuocrayon avatar zpoint avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cpython-internals's Issues

Markdown code formatting

Thanks for your brilliant guide. However I noticed in your files, the referenced source codes are laid out like this:

...

the PyMethodObject delegate the real call to im_func with im_self as the first argument

static PyObject *
method_call(PyObject *method, PyObject *args, PyObject *kwargs)
{
    PyObject *self, *func;
    /* get im_self */
    self = PyMethod_GET_SELF(method);
    if (self == NULL) {
        PyErr_BadInternalCall();
        return NULL;
    }
    /* get im_func */
    func = PyMethod_GET_FUNCTION(method);
    /* call im_func with im_self as the first argument */
    return _PyObject_Call_Prepend(func, self, args, kwargs);
}

It is somehow more clear to use the ``` syntax like this (see raw text):

...

the PyMethodObject delegate the real call to im_func with im_self as the first argument

    static PyObject *
    method_call(PyObject *method, PyObject *args, PyObject *kwargs)
    {
        PyObject *self, *func;
        /* get im_self */
        self = PyMethod_GET_SELF(method);
        if (self == NULL) {
            PyErr_BadInternalCall();
            return NULL;
        }
        /* get im_func */
        func = PyMethod_GET_FUNCTION(method);
        /* call im_func with im_self as the first argument */
        return _PyObject_Call_Prepend(func, self, args, kwargs);
    }

So we can have the proper syntax highlighting for each language and avoid being confused seeing the plain text. This is standard in Github markdown. What is your opinion?

Interpreter/frame/frame_cn.md 图片解释疑问


这个原文中的解释是:

字段 f_lasti 现在值是 36, 表明他在 38 YIELD_FROM 之前

但是如果这样分析的话,运行了36 LOAD_CONST 0 (None)但是没运行38 YIELD_FROM时,栈顶指针stacktop应该指向None下面的位置,而不是指向None

我认为有两种解释方式:

  1. 图存在错误,stacktop指针指向None下面的2的位置
  2. 解释存在错误,该图并不是运行完36而没有运行38,而是已经运行完38的图,因为YIELD_FROM需要多次运行,所以在cevel.c中,YIELD_FROM指令的结尾f->f_lasti -= sizeof(_Py_CODEUNIT);,将f->f_lasti保持不变,保证在下次运行的时候会再次调用38 YIELD_FROM,此时的None已经在YIELD_FROM指令中弹出了,所以与图片是相符的。

gc_cn.md 中 为什么 a1 和 a2 在本轮 GC 中存活了?

finalizer 原文中提到 ”在 step1 中, unreachable 中定义了 del 的对象的对应的 del 都会被调用, 并且所有的 unreachable 中的对象都会在当前这轮垃圾回收中存活“

a1 和 a2 最终不可达(final_unreachable),不是应该在本轮垃圾回收中被回收了吗?为什么会移到 old 中?

怎么理解 free_list 呢?

怎么直观地看到一个对象被重复使用了呢?

例如:
a = list()
id(a)
del a
b = list()
id(b)

按照这个执行顺序,我本理解为 id(a) == id(b),但事实上并没有。
被 del 的那个对象怎样会被重复利用呢?

little endian and big endian

@zpoint
您好,您在 long.md 中有这样一段描述

notice, because the digit is the smallest unit in the CPython abstract level, order between bytes inside a single ob_digit are represent in most-important-bit-in-the-left-most order(big-endian order)
order between digit in the ob_digit array are represent in most-important-digit-in-the-right-most order(little endian order)

请问您的测试CPU是大端还是小端?
因为在我的PC(小端)上 Debug 看出来的结果与您第一条结论是相反的,是按小端模式排放。而且 PyLong_FromLong(long ival) 函数中的定义:

        digit *p = v->ob_digit;
        Py_SIZE(v) = ndigits*sign;
        t = abs_ival;
        while (t) {
            *p++ = Py_SAFE_DOWNCAST(
                t & PyLong_MASK, unsigned long, digit);
            t >>= PyLong_SHIFT;
        }

根据上面的定义,可以看出:

  1. ob_digit[i] 中字节的排放方式应该是跟CPU的大小端一致的。那么 order between bytes inside a single ob_digit are represent in most-important-bit-in-the-left-most order(big-endian order) 这里的描述应该改为与CPU的一致。
  2. ob_digit[i] 之间是按小端模式排放,与设备无关。即 order between digit in the ob_digit array are represent in most-important-digit-in-the-right-most order(little endian order)

tp_getattro 的分析存在问题

https://github.com/zpoint/CPython-Internals/blob/master/Interpreter/descr/descr_cn.md

descr 章节中有

  • tp_getattro 在 c 里面表示 __getattribute__ 方法

  • tp_getattr 在 c 里面表示 __getattr__ 方法

其实这里是不正确的,参考文档 https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_getattr

An optional pointer to the get-attribute-string function.

This field is deprecated. When it is defined, it should point to a function that acts the same as the tp_getattro function, but taking a C string instead of a Python string object to give the attribute name. The signature is

这个两个应该是指向作用相同的函数,不同之处是一个接受 PyUnicodeObject,另一个接受 C 的字符串。至于 __getattribute____getattr__,这个是在 type_new 的时候将 tp_getattro 指向了 slot_tp_getattr_hook

PyObject(overview)

ENG

PyObject(overview)(eng)

As a developer interested in computer program, I need to learn more before I dive deeper to the python programming language

i.e, compiler, operating system/linux kernel, database(MySQL/MongoDB/etc...), MQ which I use every day but don't understand what's under the hood clearly, I will take times figure out how these things work internally, either by reading materials or reading source code, which means I will update this repo with more lower frequency than the previous two months

I believe that only with good comprehension of the related knowledge I list above can I write better code and share better article

If you have any suggestion please email me or leave comment below, thanks

CN

PyObject(overview)(cn)

作为一个对计算机程序感兴趣的开发者, 我认为现阶段我需要掌握更多的知识, 才有办法继续更深入的探讨 python 这门编程语言/以及解释器的实现

比如我日常工作中经常打交道的 编译原理, 操作系统/linux 内核, 数据库(MySQL/MongoDB/等), 消息队列等
日常使用频率较高, 但对背后的原理了解的并不深入
接下来的计划是花更多的时间搞懂这些东西背后的原理, 通过阅读相关资料或者源代码
这也意味着接下来一段时间这个 repo 的更新频率会比前两个月低很多

我认为只有对以上列出的这些相关的知识/组件背后的原理有更深的理解, 才能写出更好的代码以及分享出更棒的文章

如果你有任何意见或者建议可以发邮件给我或者在下面留言, 谢谢

gc 最终会不会回收 实现了__del__方法的对象?

您对__del__的图解finalizer,我的理解是a3这个列表append了b, 所以b的引用计数+1了,b就从不可达对象变成可达对象了,所以进入下一代。
我看源码的时候发现finalizers并没有被清除,而是在最后与old一代合并了,假如了gc.garbge列表中。

static void
handle_legacy_finalizers(struct _gc_runtime_state *state,
                         PyGC_Head *finalizers, PyGC_Head *old)
{
    assert(!PyErr_Occurred());

    PyGC_Head *gc = GC_NEXT(finalizers);
    if (state->garbage == NULL) {
        state->garbage = PyList_New(0);
        if (state->garbage == NULL)
            Py_FatalError("gc couldn't create gc.garbage list");
    }
    for (; gc != finalizers; gc = GC_NEXT(gc)) {
        PyObject *op = FROM_GC(gc);

        if ((state->debug & DEBUG_SAVEALL) || has_legacy_finalizer(op)) {
            if (PyList_Append(state->garbage, op) < 0) {
                PyErr_Clear();
                break;
            }
        }
    }

    gc_list_merge(finalizers, old);
}

gc_list_merge(finalizers, old);

/* All objects in unreachable are trash, but objects reachable from
* legacy finalizers (e.g. tp_del) can't safely be deleted.
*/

hash("key2") % 8 is 2!

You have a mistake here...

Whenever you search for an element or insert a new element, according to the value of hash result mod the size of indices, you can get an index in the indices array, and get the result you want from the entries according to the newly get index, For example, the result of hash("key2") % 8 is 3, and the value in indices[3] is 1, so we can go to entries and find what we need in entries[1]

next是什么时候触发 `zombie`frame的

请问:当YIELD VALUE -> POP TOP 推出栈顶元素后,此时中断了,是怎么进入zombie状态的,next调用的时候是怎么触发执行原来的frame的?
我的理解是当pop top 后,此时的PyFrameObject是进入zombie状态了,然后如果是有其他函数就走其他函数。 就您的案例来说,当继续调用next(gg)的时候,是怎么触发 zombie状态的frame的呢?

Cython GIL - check_interval & interval

最近在看Python3.8.3的源码,参照您的分析有一个疑惑。
但是还有一个check_interval参数是,初始化的时候是被设置成100,这个参数看解释是将异步事件检查间隔设置为n条指令。且也会影响线程的切换,没理解这个参数和gil.interval 二者的关系? check_interval 难道就是那个tick吗? 望解答,感谢。

初始化
Python3.8.3/Python/pystate.c line: 207 interp->check_interval = 100;

Python-3.8.3/Python/sysmodule.c

/*[clinic input]
sys.setcheckinterval

    n: int
    /

Set the async event check interval to n instructions.

This tells the Python interpreter to check for asynchronous events
every n instructions.

This also affects how often thread switches occur.
[clinic start generated code]*/

static PyObject *
sys_setcheckinterval_impl(PyObject *module, int n)
/*[clinic end generated code: output=3f686cef07e6e178 input=7a35b17bf22a6227]*/
{
    if (PyErr_WarnEx(PyExc_DeprecationWarning,
                     "sys.getcheckinterval() and sys.setcheckinterval() "
                     "are deprecated.  Use sys.setswitchinterval() "
                     "instead.", 1) < 0)
        return NULL;

    PyInterpreterState *interp = _PyInterpreterState_Get();
    interp->check_interval = n;
    Py_RETURN_NONE;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.