[GO]Some notes on Receivers 作者: rin 时间: August 17, 2023 分类: Go 评论 # Review of receivers When a value receiver is defined, Go automatically generates a corresponding pointer receiver. Auto-generating a pointer receiver causes differences in the method sets of pointer types and value types. For example, take the `Len()` method below, Go will automatically generate a corresponding pointer receiver version: ```go type List []int func (l List) Len() int { return len(l) } // func (l *List) Len() int { ... } func (l *List) Append(val int) { *l = append(*l, val) } ``` For type `List`, its method set includes: - `Len() int` For type `*List`, the method set includes: - `Len() int` - `Append(int)` This difference affects the use of interfaces: ```go type Container interface { Len() int Append(int) } var _ Container = List{} // no var _ Container = &List{} ``` However, for the type's own pointer and value, there appears to be no difference, both can call the two methods: ```go l := List{} pl := &List{} // List only implements `Len` l.Append(42) // how does Go find `Append`? // (&l).Append(42) l.Len() // gets 1 // *List implements `Append` explicitly, `Len` implicitly pl.Append(6) pl.Len() // gets 1 ``` In the `List` method set, there is no `Append(int)`. Go converts the corresponding call to `(&l).Append(42)` to implement this which is a syntactic sugar. However, if you cannot get the address of an object of a certain type, this feature does not work. Non-addressable types include `map` and `interface` and some other, as shown below: ```go m := map[string]List{"list1": List{}} p := &m["list1"] // cannot take the address of m["list1"] (map index expression of type List) var i interface{} = List{} p = &i.(List) // cannot take the address of i.(List) ``` Because the map stores `List`, you can directly call `List.Len()`. Since you cannot take the address of a map element, you cannot convert it to call `*List.Append()`. The same applies to `interface`: ```go fmt.Println(m["list1"].Len()) // ok m["list1"].Append(42) // bad, cannot call the pointer method Append on List i.(List).Len() // ok i.(List).Append() // bad, cannot call the pointer method Append on List ``` # What Do Auto-Generated Pointer Receivers Do If the auto-generated pointer receiver's function body is the same as the value receiver, can I modify the object's value through the pointer? No. In fact, the pointer receiver calls the value receiver and passes arguments by value. When using `dlv debug main.go` to debug `main.(*List).Len`, I found that it doesn't stop at this function breakpoint. Reviewing the previous code, `pl.Len()` may have been converted to `(*pl).Len()`, which caused it to use the value receiver. Therefore, it is necessary to forcefully use the pointer receiver: ```go l := List{} pl := &List{} // List only implements `Len` l.Append(42) // how does Go find `Append`? // (&l).Append(42) l.Len() // gets 1 // *List implements `Append` explicitly, `Len` implicitly pl.Append(6) pl.Len() // gets 1 // Forcefully call `func(l *List) Len()` (*List).Len(pl) // gets 1 ``` It is confirmed that `main.(*List).Len` internally calls `main.List.Len`. ![](http://static.lo-li.net/images/2023/10/2023-10-20_19-44.png) ![](http://static.lo-li.net/images/2023/10/2023-10-20_19-45.png)
[CMU15445] 锁与事务隔离级别 作者: rin 时间: August 16, 2023 分类: 未分类 评论 「基于锁的并发控制」与「多版本并发控制」是实现事务隔离级别的两种不同思路。在两阶段锁中,事务分为获取锁的**增长阶段**和释放锁的**收缩阶段**。在不同的阶段中通过设置不同的规则和限制可以实现不同的隔离级别。例如: 1. 读取数据时是否需要锁?需要什么类型的锁? 2. 什么时候释放锁? 不同的隔离级别下,事务修改数据时都需要X锁,并在事务结束时释放。这将保证两个写事务之间不会产生**脏写**的问题,这种问题在数据库中是不允许的。 因此在不同隔离级别下都可能会用到X锁,但S锁不一定。 --- **READ UNCOMMITTED** 1. 事务读数据时不需要S锁,只需要在写数据时获取X锁。这样当前事务的读操作就不会被其他事务的X锁阻塞; 2. 事务结束时释放X锁。释放X锁事务则进入收缩阶段。 由于读操作不会被阻塞,可以随时读到别的事务修改的数据,因此存在**脏读**问题。 --- **READ COMMITTED** 1. 当前事务读数据时需要S锁,以阻塞之后其他事务对数据的修改。同样,当其他事务先获取了X锁并修改数据时,当前事务对S锁的获取将被阻塞,直到其他事务的X锁释放才能读到数据。 2. 事务结束时释放X锁,释放X锁事务则进入收缩阶段。释放S锁不改变事务的阶段,因此可以随时释放(?) READ COMMITTED解决了脏读问题:S锁会被X锁阻塞+事务结束时才释放X锁=只能读到提交/回滚的数据。 但是没有解决不可重复读的问题:当前事务读完数据之后可以释放S锁,这时其他事务可以修改相应的数据并提交,当前事务再次读取时将得到不同的数据,即**不可重复读**问题。 --- **REPEATABLE READ** 1. 与READ COMMITTED相同 2. 事务结束时释放X锁与S锁。事务进入收缩阶段。 由于S锁也会被保留到事务结束,因此其他事务无法修改相应的数据,当前事务中多次读取得到的也是同样的数据。 | | S锁 | X锁 | | --- | ------ | --- | | RU | N/A | 结束时释放 | | RC | 读后释放 | 结束时释放 | | RR | 结束时释放 | 结束时释放 |
[CMU15445] bustub 2023 Project3 作者: rin 时间: May 20, 2023 分类: 未分类 评论 1. `ExecutorContext *exec_ctx` 是什么? 2. 如何根据plan中的table_name找到要seq_scan的table?(or table_heap?) 1. 查找到catalog.h中有关于table_name的信息 2. The Catalog is a non-persistent catalog that is designed for use by executors within the DBMS execution engine. It handles table creation, **table lookup**, index creation, and index lookup. 3. 查到`ExecutorContext`中包含了catalog,所以应该可以从`exec_ctx_`中获得table ## insert 3. 参考projection_executor写了insert_exector的构造函数及init、next 1. 使用InsertTuple插入时,需要TupleMeta和Tuple,后者可以从child_executor->Next中获取,前者呢?——只需手动设置TupleMeta的那三个字段 2. 需要通过\*tuple返回插入的数量,如何构造这个Tuple?构造tuple时也需要对应的schema,这个从何而来?——tuple需要一个vector\ ,这个Value可以通过ValueFactory::GetIntegerValue()构造,而schema已经由planner传过来了,即output_schema,代表这个sql输出的schema是什么样 3. 按照迭代器模型(volcano模型)的思想,我觉得在单次调用insert_executor->Next()的时候,在该executor内部应该也只应该调用一次child_executor->Next(),直到它返回false时我再返回false。但是在调用者(execution_engine.h:PollExecutor())的代码中,它每一次执行我的Next时,都会往最终的result_set中插入一次tuple,这会导致该语句输出的结果中有多个row ``` :8 insert into t1 values (0, 'Emoji', 10), (1, 'EmojiEmoji', 11), (2, 'EmojiEmojiEmoji', 12), (3, 'EmojiEmojiEmojiEmoji', 13), (4, 'EmojiEmojiEmojiEmojiEmoji', 14); --- YOUR RESULT --- 1 2 3 4 5 --- EXPECTED RESULT --- 5 ``` 因此我一次性在Next()中使用循环将child_executor中产生的所有tuples处理完 --- 在insert后,还需要插入index。从`catalog->GetTableIndexes()`可以获得该table所有的index。但是每个index覆盖的字段也不一样,无脑将新插入的这个tuple原样往index里插会导致segmentfault,要如何根据特定index schema去insert新的index? 可以根据`index_info->key_scheme_`获得这个index的scheme,然后又根据这个table的scheme进行对应,选择需要的column生成key tuple。在tuple.h中已经实现了`Tuple::KeyFromTuple()`可以直接使用。 ## delete 4. 在delete_executor中同样使用循环处理所有的tuples,在TupleMeta中修改删除标记。在seq_search_executor->Next()中对于已被标记删除的tuple,应该忽略它,而不是立刻返回false,因为可能还有别的tuples要输出 ## update 6. 在update_executor中,检查测试样例`explain update t1 set v3 = 445 where v1 >= 3;`的结果为 ``` === BINDER === BoundUpdate { table=BoundBaseTableRef { table=t1, oid=22 }, filter_expr=(t1.v1>=3), target_expr=[(t1.v3, 445)], } === PLANNER === Update { table_oid=22, target_exprs=[#0.0, #0.1, 445] } | (__bustub_internal.update_rows:INTEGER) Filter { predicate=(#0.0>=3) } | (t1.v1:INTEGER, t1.v2:VARCHAR, t1.v3:INTEGER) SeqScan { table=t1 } | (t1.v1:INTEGER, t1.v2:VARCHAR, t1.v3:INTEGER) === OPTIMIZER === Update { table_oid=22, target_exprs=[#0.0, #0.1, 445] } | (__bustub_internal.update_rows:INTEGER) SeqScan { table=t1, filter=(#0.0>=3) } | (t1.v1:INTEGER, t1.v2:VARCHAR, t1.v3:INTEGER) ``` 对应的,代码中的child_executor即Filter,向update_executor提供需要更新的tuples。然后新值存储在plan中的plan_->target_expressions_中。即`target_exprs=[#0.0, #0.1, 445]` 经过plan_->target_expressions_->Evaluate()后可以得到各列实际的值:`[3,EmojiEmojiEmojiEmoji,445]` ``` update t1 set v3 = 445 where v1 >= 3; 2023-05-19 18:47:41 [update_executor.cpp:40:Next] DEBUG - Update from (3, EmojiEmojiEmojiEmoji, 13) Tuple size is 41 2023-05-19 18:47:41 [update_executor.cpp:41:Next] DEBUG - To 2023-05-19 18:47:41 [update_executor.cpp:43:Next] DEBUG - Type: 4 Val: #0.0 Evaluated Val:3 2023-05-19 18:47:41 [update_executor.cpp:43:Next] DEBUG - Type: 7 Val: #0.1 Evaluated Val:EmojiEmojiEmojiEmoji 2023-05-19 18:47:41 [update_executor.cpp:43:Next] DEBUG - Type: 4 Val: 445 Evaluated Val:445 ``` ## index_scan 在初始化BPlusTreeIndexIteratorForTwoIntegerColumn(IndexIterator)时,出现链接错误,找不到IndexIterator,即使对特定的那几个类型进行了explicit instantiation也不行,原因仅仅是因为这里使用了无参的构造函数,但是在index_iterator.h中没有定义,补上定义即可。 ## agg 首先根据注释,给aggregation_executor.h添加成员`SimpleAggregationHashTable aht_;`和`SimpleAggregationHashTable::Iterator aht_iterator_;`,并在构造函数上初始化。 1. `SimpleAggregationHashTable`是什么?——是一个`AggregateKey->AggregateValue`的hashtable 2. `AggregateKey`代表“ a key in an aggregation operation”,包含一个`std::vector group_bys_;`。这个key应该与aggregate的columns有关。 3. `AggregateValue`里有一个`std::vector aggregates_;`,应该是aggregate后的值。 1. 但是aggregate 函数如`CountStarAggregate, CountAggregate, SumAggregate, MinAggregate, MaxAggregate`这些也只产生一个值,或许vector里存了多个group产生的值?——不是,vector里存的是不同聚集函数产生的值,第i个元素值代表`agg_types_[i]`这个类型的聚集函数 2. 在`aggregation_executor->Next()`中,像其他task一样,从child获取一个tuple然后进行处理。处理过程大概是构造AggregateKey和AggregateValue然后添加到某处 1. 查看`aggregation_executor.h`,其中有`MakeAggregateKey(),MakeAggregateValue(),aht_.InsertCombine`几个函数。构造Key的这个函数从plan_中获得**group_by_的expr**,使用evaluate求出keys;构造Value的函数从plan_中获得**aggregate的expr**,使用evalutae求出values。 2. 以第二个样例`insert into t1 values (-99999), (99999), (0), (1), (2), (3); select count(*), min(v1), max(v1), count(v1), sum(v1) from t1;`为例,第一次Next返回-99999这个tuple,MakeAggregateValue会用`count,min,max,count,sum`这五个函数产生`1,-99999,-99999,-99999,-99999`这五个值 3. 在`CombineAggregateValues()`中,对于这五个值,就需要根据对于`agg_expr`的类型(`agg_types_[i]`)进行聚合处理。 4. 注意区分`count(*), count(column)`,前者统计总行数,后者统计该列中非空的数量。对于空表,前者为0;对于空列或者该全为null的列,后者为null。对于其他聚集函数来说也是,input是null时跳过处理 5. executor的返回值是group_by columns + aggregation columns > The output schema consists of the group-by columns followed by the aggregation columns. 如果我还select了其他column,但是没有orderby它,会怎样?——这种情况可以是不允许的,参考MySQL的`only_full_group_by`。因此不用处理非聚集的值和非groupby的值 例如,`__mock_table_1.colA:INTEGER`是groupby columns,`agg#0:INTEGER, agg#1:INTEGER, agg#2:INTEGER`是aggregate column ``` EXPLAIN SELECT max(colA), colA, MIN(colB), max(colB) FROM __mock_table_1 GROUP BY colA; Agg { types=[max, min, max], aggregates=[#0.0, #0.1, #0.1], group_by=[#0.0] } | (__mock_table_1.colA:INTEGER, agg#0:INTEGER, agg#1:INTEGER, agg#2:INTEGER) ``` --- 注意当表为空的时候,child的next不会返回tuples,aht_.InsertCombine()就不会执行。 1. 如果查询中没有groupby,此时agg_executor->Next()仍要返回一些默认值,如函数`GenerateInitialAggregateValue()`中所指定的。为了复用`GenerateInitialAggregateValue(), InsertCombine()`这两个函数,我在处理空表时,对于`plan_->aggregates_`中使用到的各聚集函数,均构造一个null值作为agg_val,调用InsertCombine试图插入到hashtable中,此时`InsertCombine()`会对hashtable设置初值,而`CombineAggregateValues()`会忽略掉我构造的这些null值,不动初值。但这些只是aggregate函数们的默认值,如果查询语句有group by,那么还要构造这些列的空值作为默认值。 2. 查询中有groupby,不返回值 --- 在测试`select v4, min(v1) + sum(v2) + max(v3), count(*) from t1 group by v4;`的时候,aggregate_executor->Next()中的executed_无法等于true,导致这个Next()被无限调用。 原因是我在Next()中修改hashtable后做了多余的更新迭代器的操作`aht_iterator_ = aht_.Begin();`导致无法满足结束的条件。只需在首次更新hashtable之后更新begin迭代器即可,即`Init()`和`if(!aggregated_)`中。(在首次更新之前,hashtable为空,Begin返回的是end,这不是想要的)
[CMU15445] bustub 2023 Project2 作者: rin 时间: April 23, 2023 分类: 未分类 评论 Codes: [b_plus_tree.cpp](https://git.lo-li.net/rin/bustub/src/branch/master/src/storage/index/b_plus_tree.cpp "b_plus_tree.cpp") # Task1 B+Tree Pages ## Page types There are 4 types of headers (classes) named as `b_plus_*_page_.h` 1. `b_plus_tree_page.h` 2. `b_plus_tree_header_page.h` 3. `b_plus_tree_internal_page.h` 4. `b_plus_tree_leaf_page.h` Only `internal_page` and `leaf_page` are inherited from the base class defined in `b_plus_tree_page`. At first, I was confused by the words `RootPage` and `HeaderPage`. In fact, `header_page` does not belong to the inheritance tree. It stores the metadata of a `BPlusTree` object, which is `root_page_id` for now. `header_page` always exists in an initialized `BPlusTree`. But `root_page_id` may be `INVALID_PAGE_ID` at some time, meaning the tree is **empty** (tree exists but no data). --- ## Page structure Both 2 types of pages share the common member variables `page_type_`, `size_` and `max_size_` defined in `BPlusTreePage`. But the data area `MappingType array_[0]` is defined in `leaf_page.h` and `internal_page.h` separately. This is because the array is a *Flexible array* and it must be placed at the end of an object. But in `leaf_page`, there is an extra member `next_page_id_` after common members. The class structure is as below: ``` || page_type_ | size_ | max_size_ | next_page_id_ | array_....... || || page_type_ | size_ | max_size_ | array_....... || ``` In later codes, a pointer of `internal_page` will be `reinterpret_cast` to a point of `leaf_page`. And `page_type_` will be accessed to check the type of this page. Due to the common memory structure in both pages' headers, this operation is safe. ### Internal page ``` | KEY(0)+PAGE_ID(0) | KEY(1)+PAGE_ID(1) | KEY(2)+.... | ⬆[INVALID] ``` There are $$m+1$$ pairs of `` in an internal page. The 0th pair has an invalid key. So an Internal Page stores **m** ordered keys and **m+1** child pointers. `max_size_ == m+1` The assignment specification recommends: > split an internal node when number of values reaches `max_size` before insertion ### Leaf page Every key in a leaf page is valid. > split a leaf node when the number of values reaches `max_size` after insertion So in any stable situation, max_size_ < size_
C++ Primer Chapter 18 作者: rin 时间: October 15, 2021 分类: C++ 评论 # Ch18 ## 18.1 Exception handling Exceptions: seperate problem detection from problem resolution - Thrown expression & current call chain -> determine which handler - if a dtor wants to throw exceptions, it should handle them locally, or the program will be terminated. - When search for `catch`, the *first* matched one is selected, not the best one. - Thus catch clauses with class types must be orderd from the *most* derived one to *least* one. - Conversions (between exception and "a catch exception declaration") only allowed when: 1. from *non-const* to *const* 2. from *derived* to *base* 3. *array*/*function* to *pointer* ### function `try` block - `try` is before `: [init list]` - will catch exceptions both in *init list* and *ctor body* - An *implict* `throw` will be there, thus we HAVE to catch again in the outter scope - (thus it seems useless) ```c++ // function try block in Class struct A { A(int v) try : a(v) { throw exception(); } catch(exception &e) { cout << e.what() << endl; // implicit "throw;" here } int a; }; // in ordinary function void func() try { throw "ordinary func throws"; } catch(const char *s) { cout << s << endl; } // Notice: parameter here must be `const char*` to recieve the thrown c-str // or there will be a runtime error (not a compile-time error) int main() { func(); // must handle the rethrown exception try { A a(1); } catch(exception &e) { cout << e.what() << endl; } return 0; } ``` ## 18.2 Namespaces 1. `using` Declaration: `using std::cin` 2. `using` Directive: `using namespace std` - Inside a class can only `using` members of its (direct or not) *base class*: ```c++ using std::cout; struct A{ int v=0; }; struct B{ int v = 1; // using std::cin; // NO, not class // using A::v; // NO, not base // using namespace xxx; // NO, for `using` Directive either }; struct C: public B { using B::v; // OK `using` base class members C() { cout << v << '\n';} // 1 }; ``` ## 18.3 Multiple and Virtual Inheritance ### 18.3.2 Conversion ```c++ // D1 -> A,B // D2 -> C -> A struct A {}; struct B {}; struct D1 : A,B {}; struct C : A{}; struct D2 : C {}; void func(A &a); // func1 void func(B &b); // func2 void func(C &c); // func3 int main() { D1 d1; D2 d2; func(d1); // Ambiguous between 1 & 2 func(d2); // OK, 3, not 1 } ```