rin's

[GO]Some notes on Receivers

作者: rin
时间: August 17, 2023
分类: Go
评论

# Review of receivers
When a value receiver is defined, Go automatically generates a corresponding pointer receiver.

Auto-generating a pointer receiver causes differences in the method sets of pointer types and value types.

For example, take the `Len()` method below, Go will automatically generate a corresponding pointer receiver version:

```go
type List []int

func (l List) Len() int { return len(l) }
// func (l *List) Len() int { ... }
func (l *List) Append(val int) { *l = append(*l, val) }
```

For type `List`, its method set includes:
- `Len() int`

For type `*List`, the method set includes:
- `Len() int`
- `Append(int)`

This difference affects the use of interfaces:

```go
type Container interface {
    Len() int
    Append(int)
}
var _ Container = List{} // no
var _ Container = &List{}
```

However, for the type's own pointer and value, there appears to be no difference, both can call the two methods:

```go
l := List{}
pl := &List{}

// List only implements `Len`
l.Append(42) // how does Go find `Append`?
// (&l).Append(42)
l.Len() // gets 1

// *List implements `Append` explicitly, `Len` implicitly
pl.Append(6)
pl.Len() // gets 1
```

In the `List` method set, there is no `Append(int)`. Go converts the corresponding call to `(&l).Append(42)` to implement this which is a syntactic sugar.

However, if you cannot get the address of an object of a certain type, this feature does not work. Non-addressable types include `map` and `interface` and some other, as shown below:

```go
m := map[string]List{"list1": List{}}
p := &m["list1"] // cannot take the address of m["list1"] (map index expression of type List)

var i interface{} = List{}
p = &i.(List) // cannot take the address of i.(List)
```

Because the map stores `List`, you can directly call `List.Len()`. Since you cannot take the address of a map element, you cannot convert it to call `*List.Append()`. The same applies to `interface`:

```go
fmt.Println(m["list1"].Len()) // ok
m["list1"].Append(42) // bad, cannot call the pointer method Append on List

i.(List).Len() // ok
i.(List).Append() // bad, cannot call the pointer method Append on List
```

# What Do Auto-Generated Pointer Receivers Do

If the auto-generated pointer receiver's function body is the same as the value receiver, can I modify the object's value through the pointer? No. In fact, the pointer receiver calls the value receiver and passes arguments by value.

When using `dlv debug main.go` to debug `main.(*List).Len`, I found that it doesn't stop at this function breakpoint. Reviewing the previous code, `pl.Len()` may have been converted to `(*pl).Len()`, which caused it to use the value receiver. Therefore, it is necessary to forcefully use the pointer receiver:

```go
l := List{}
pl := &List{}

// List only implements `Len`
l.Append(42) // how does Go find `Append`?
// (&l).Append(42)
l.Len() // gets 1

// *List implements `Append` explicitly, `Len` implicitly
pl.Append(6)
pl.Len() // gets 1

// Forcefully call `func(l *List) Len()`
(*List).Len(pl) // gets 1
```

It is confirmed that `main.(*List).Len` internally calls `main.List.Len`.
![](http://static.lo-li.net/images/2023/10/2023-10-20_19-44.png)
![](http://static.lo-li.net/images/2023/10/2023-10-20_19-45.png)

[CMU15445] 锁与事务隔离级别

作者: rin
时间: August 16, 2023
分类: 未分类
评论

[CMU15445] bustub 2023 Project3

作者: rin
时间: May 20, 2023
分类: 未分类
评论

1. `ExecutorContext *exec_ctx` 是什么？
2. 如何根据plan中的table_name找到要seq_scan的table？（or table_heap?）
    1. 查找到catalog.h中有关于table_name的信息
    2. The Catalog is a non-persistent catalog that is designed for use by executors within the DBMS execution engine. It handles table creation, **table lookup**, index creation, and index lookup.
    3. 查到`ExecutorContext`中包含了catalog，所以应该可以从`exec_ctx_`中获得table

## insert
3. 参考projection_executor写了insert_exector的构造函数及init、next
    1. 使用InsertTuple插入时，需要TupleMeta和Tuple，后者可以从child_executor->Next中获取，前者呢？——只需手动设置TupleMeta的那三个字段
    2. 需要通过\*tuple返回插入的数量，如何构造这个Tuple？构造tuple时也需要对应的schema，这个从何而来？——tuple需要一个vector\<Value> ，这个Value可以通过ValueFactory::GetIntegerValue()构造，而schema已经由planner传过来了，即output_schema，代表这个sql输出的schema是什么样
    3. 按照迭代器模型（volcano模型）的思想，我觉得在单次调用insert_executor->Next()的时候，在该executor内部应该也只应该调用一次child_executor->Next()，直到它返回false时我再返回false。但是在调用者（execution_engine.h:PollExecutor()）的代码中，它每一次执行我的Next时，都会往最终的result_set中插入一次tuple，这会导致该语句输出的结果中有多个row
    ```
    <main>:8
    insert into t1 values (0, 'Emoji', 10), (1, 'EmojiEmoji', 11), (2, 'EmojiEmojiEmoji', 12), (3, 'EmojiEmojiEmojiEmoji', 13), (4, 'EmojiEmojiEmojiEmojiEmoji', 14);
    --- YOUR RESULT ---
    1 
    2 
    3 
    4 
    5 
    
    --- EXPECTED RESULT ---
    5
    ```
    因此我一次性在Next()中使用循环将child_executor中产生的所有tuples处理完

---
在insert后，还需要插入index。从`catalog->GetTableIndexes()`可以获得该table所有的index。但是每个index覆盖的字段也不一样，无脑将新插入的这个tuple原样往index里插会导致segmentfault，要如何根据特定index schema去insert新的index？
可以根据`index_info->key_scheme_`获得这个index的scheme，然后又根据这个table的scheme进行对应，选择需要的column生成key tuple。在tuple.h中已经实现了`Tuple::KeyFromTuple()`可以直接使用。

## delete
4. 在delete_executor中同样使用循环处理所有的tuples，在TupleMeta中修改删除标记。在seq_search_executor->Next()中对于已被标记删除的tuple，应该忽略它，而不是立刻返回false，因为可能还有别的tuples要输出

## update
6. 在update_executor中，检查测试样例`explain update t1 set v3 = 445 where v1 >= 3;`的结果为
  ```
  === BINDER ===
BoundUpdate {
  table=BoundBaseTableRef { table=t1, oid=22 },
  filter_expr=(t1.v1>=3),
  target_expr=[(t1.v3, 445)],
}
=== PLANNER ===
Update { table_oid=22, target_exprs=[#0.0, #0.1, 445] } | (__bustub_internal.update_rows:INTEGER)
  Filter { predicate=(#0.0>=3) } | (t1.v1:INTEGER, t1.v2:VARCHAR, t1.v3:INTEGER)
    SeqScan { table=t1 } | (t1.v1:INTEGER, t1.v2:VARCHAR, t1.v3:INTEGER)
=== OPTIMIZER ===
Update { table_oid=22, target_exprs=[#0.0, #0.1, 445] } | (__bustub_internal.update_rows:INTEGER)
  SeqScan { table=t1, filter=(#0.0>=3) } | (t1.v1:INTEGER, t1.v2:VARCHAR, t1.v3:INTEGER)
```

对应的，代码中的child_executor即Filter，向update_executor提供需要更新的tuples。然后新值存储在plan中的plan_->target_expressions_中。即`target_exprs=[#0.0, #0.1, 445]`
经过plan_->target_expressions_->Evaluate()后可以得到各列实际的值：`[3,EmojiEmojiEmojiEmoji,445]`
```
update t1 set v3 = 445 where v1 >= 3;
2023-05-19 18:47:41 [update_executor.cpp:40:Next] DEBUG - Update from (3, EmojiEmojiEmojiEmoji, 13) Tuple size is 41
2023-05-19 18:47:41 [update_executor.cpp:41:Next] DEBUG - To
2023-05-19 18:47:41 [update_executor.cpp:43:Next] DEBUG - Type: 4       Val: #0.0        Evaluated Val:3
2023-05-19 18:47:41 [update_executor.cpp:43:Next] DEBUG - Type: 7       Val: #0.1        Evaluated Val:EmojiEmojiEmojiEmoji
2023-05-19 18:47:41 [update_executor.cpp:43:Next] DEBUG - Type: 4       Val: 445         Evaluated Val:445
```

## index_scan
在初始化BPlusTreeIndexIteratorForTwoIntegerColumn（IndexIterator）时，出现链接错误，找不到IndexIterator，即使对特定的那几个类型进行了explicit instantiation也不行，原因仅仅是因为这里使用了无参的构造函数，但是在index_iterator.h中没有定义，补上定义即可。

## agg
首先根据注释，给aggregation_executor.h添加成员`SimpleAggregationHashTable aht_;`和`SimpleAggregationHashTable::Iterator aht_iterator_;`，并在构造函数上初始化。

1. `SimpleAggregationHashTable`是什么？——是一个`AggregateKey->AggregateValue`的hashtable
2. `AggregateKey`代表“ a key in an aggregation operation”，包含一个`std::vector<Value> group_bys_;`。这个key应该与aggregate的columns有关。
3. `AggregateValue`里有一个`std::vector<Value> aggregates_;`，应该是aggregate后的值。
    1. 但是aggregate 函数如`CountStarAggregate, CountAggregate, SumAggregate, MinAggregate, MaxAggregate`这些也只产生一个值，或许vector里存了多个group产生的值？——不是，vector里存的是不同聚集函数产生的值，第i个元素值代表`agg_types_[i]`这个类型的聚集函数
    2. 在`aggregation_executor->Next()`中，像其他task一样，从child获取一个tuple然后进行处理。处理过程大概是构造AggregateKey和AggregateValue然后添加到某处
        1. 查看`aggregation_executor.h`，其中有`MakeAggregateKey(),MakeAggregateValue(),aht_.InsertCombine`几个函数。构造Key的这个函数从plan_中获得**group_by_的expr**，使用evaluate求出keys；构造Value的函数从plan_中获得**aggregate的expr**，使用evalutae求出values。
        2. 以第二个样例`insert into t1 values (-99999), (99999), (0), (1), (2), (3); select count(*), min(v1), max(v1), count(v1), sum(v1) from t1;`为例，第一次Next返回-99999这个tuple，MakeAggregateValue会用`count,min,max,count,sum`这五个函数产生`1,-99999,-99999,-99999,-99999`这五个值
        3. 在`CombineAggregateValues()`中，对于这五个值，就需要根据对于`agg_expr`的类型(`agg_types_[i]`)进行聚合处理。
        4. 注意区分`count(*), count(column)`，前者统计总行数，后者统计该列中非空的数量。对于空表，前者为0；对于空列或者该全为null的列，后者为null。对于其他聚集函数来说也是，input是null时跳过处理
        5. executor的返回值是group_by columns + aggregation columns
        > The output schema consists of the group-by columns followed by the aggregation columns.
            如果我还select了其他column，但是没有orderby它，会怎样？——这种情况可以是不允许的，参考MySQL的`only_full_group_by`。因此不用处理非聚集的值和非groupby的值
            
例如，`__mock_table_1.colA:INTEGER`是groupby columns，`agg#0:INTEGER, agg#1:INTEGER, agg#2:INTEGER`是aggregate column
```
EXPLAIN SELECT max(colA), colA, MIN(colB), max(colB) FROM __mock_table_1 GROUP BY colA;
Agg { types=[max, min, max], aggregates=[#0.0, #0.1, #0.1], group_by=[#0.0] } | (__mock_table_1.colA:INTEGER, agg#0:INTEGER, agg#1:INTEGER, agg#2:INTEGER)
```
---
注意当表为空的时候，child的next不会返回tuples，aht_.InsertCombine()就不会执行。
1. 如果查询中没有groupby，此时agg_executor->Next()仍要返回一些默认值，如函数`GenerateInitialAggregateValue()`中所指定的。为了复用`GenerateInitialAggregateValue(), InsertCombine()`这两个函数，我在处理空表时，对于`plan_->aggregates_`中使用到的各聚集函数，均构造一个null值作为agg_val，调用InsertCombine试图插入到hashtable中，此时`InsertCombine()`会对hashtable设置初值，而`CombineAggregateValues()`会忽略掉我构造的这些null值，不动初值。但这些只是aggregate函数们的默认值，如果查询语句有group by，那么还要构造这些列的空值作为默认值。
2. 查询中有groupby，不返回值
---
在测试`select v4, min(v1) + sum(v2) + max(v3), count(*) from t1 group by v4;`的时候，aggregate_executor->Next()中的executed_无法等于true，导致这个Next()被无限调用。
原因是我在Next()中修改hashtable后做了多余的更新迭代器的操作`aht_iterator_ = aht_.Begin();`导致无法满足结束的条件。只需在首次更新hashtable之后更新begin迭代器即可，即`Init()`和`if(!aggregated_)`中。（在首次更新之前，hashtable为空，Begin返回的是end，这不是想要的）

[CMU15445] bustub 2023 Project2

作者: rin
时间: April 23, 2023
分类: 未分类
评论

Codes: [b_plus_tree.cpp](https://git.lo-li.net/rin/bustub/src/branch/master/src/storage/index/b_plus_tree.cpp "b_plus_tree.cpp")
# Task1 B+Tree Pages 
## Page types
There are 4 types of headers (classes) named as `b_plus_*_page_.h`
1. `b_plus_tree_page.h`
2. `b_plus_tree_header_page.h`
3. `b_plus_tree_internal_page.h`
4. `b_plus_tree_leaf_page.h`

Only `internal_page` and `leaf_page` are inherited from the base class defined in `b_plus_tree_page`.

At first, I was confused by the words `RootPage` and `HeaderPage`. 
In fact, `header_page` does not belong to the inheritance tree. It stores the metadata of a `BPlusTree`  object, which is `root_page_id` for now.

`header_page` always exists in an initialized `BPlusTree`. But `root_page_id` may be `INVALID_PAGE_ID` at some time, meaning the tree is **empty** (tree exists but no data).

---

## Page structure

Both 2 types of pages share the common member variables `page_type_`, `size_` and `max_size_` defined in `BPlusTreePage`.
But the data area `MappingType array_[0]` is defined in `leaf_page.h` and `internal_page.h` separately.

This is because the array is a *Flexible array* and it must be placed at the end of an object. But in `leaf_page`, there is an extra member `next_page_id_` after common members. The class structure is as below:
```
|| page_type_ | size_ | max_size_ | next_page_id_ | array_....... ||
|| page_type_ | size_ | max_size_ | array_....... ||
```
In later codes, a pointer of `internal_page` will be `reinterpret_cast` to a point of `leaf_page`. And `page_type_` will be accessed to check the type of this page.
Due to the common memory structure in both pages' headers, this operation is safe.

### Internal page
```
| KEY(0)+PAGE_ID(0) | KEY(1)+PAGE_ID(1) | KEY(2)+.... |
  ⬆[INVALID]
```

There are $$m+1$$ pairs of `<key, page_id_t>` in an internal page. The 0th pair has an invalid key. So an Internal Page stores **m** ordered keys and **m+1** child pointers. `max_size_ == m+1`

The assignment specification recommends:
> split an internal node when number of values reaches `max_size` before insertion

### Leaf page
Every key in a leaf page is valid.

> split a leaf node when the number of values reaches `max_size` after insertion

So in any stable situation,  max_size_ < size_

C++ Primer Chapter 18

作者: rin
时间: October 15, 2021
分类: C++
评论

# Ch18
## 18.1 Exception handling
Exceptions: seperate problem detection from problem resolution
- Thrown expression & current call chain -> determine which handler
- if a dtor wants to throw exceptions, it should handle them locally, or the program will be terminated.

- When search for `catch`, the *first* matched one is selected, not the best one.
- Thus catch clauses with class types must be orderd from the *most* derived one to *least* one.
- Conversions (between exception and "a catch exception declaration") only allowed when: 
    1. from *non-const* to *const*
    2. from *derived* to *base*
    3. *array*/*function* to *pointer*
### function `try` block
- `try` is before `: [init list]`
- will catch exceptions both in *init list* and *ctor body*
- An *implict* `throw` will be there, thus we HAVE to catch again in the outter scope
- (thus it seems useless)

```c++
// function try block in Class
struct A {
    A(int v) try : a(v) {
        throw exception();
    } catch(exception &e) {
        cout << e.what() << endl;
        // implicit "throw;" here
    }
    int a;
};

// in ordinary function
void func() try { throw "ordinary func throws"; }
catch(const char *s) { cout << s << endl; }
// Notice: parameter here must be `const char*` to recieve the thrown c-str
// or there will be a runtime error (not a compile-time error)

int main() {
   func();

// must handle the rethrown exception
   try {
        A a(1);
   } catch(exception &e) { cout << e.what() << endl; }
   return 0;
}
```

## 18.2 Namespaces
1. `using` Declaration: `using std::cin`
2. `using` Directive: `using namespace std`
- Inside a class can only `using` members of its (direct or not) *base class*:
```c++
using std::cout;
struct A{ int v=0; };
struct B{
    int v = 1;
    // using std::cin; // NO, not class
    // using A::v; // NO, not base
    // using namespace xxx; // NO, for `using` Directive either
};
struct C: public B {
    using B::v; // OK `using` base class members
    C() { cout << v << '\n';} // 1
};
```

## 18.3 Multiple and Virtual Inheritance 
### 18.3.2 Conversion
```c++
// D1 -> A,B
// D2 -> C -> A
struct A {};
struct B {};
struct D1 : A,B {};

struct C : A{};
struct D2 : C {};

void func(A &a); // func1
void func(B &b); // func2
void func(C &c); // func3

int main() {
    D1 d1; D2 d2;
    func(d1); // Ambiguous between 1 & 2
    func(d2); // OK, 3, not 1
}
```

[GO]Some notes on Receivers

[CMU15445] 锁与事务隔离级别

[CMU15445] bustub 2023 Project3

[CMU15445] bustub 2023 Project2

C++ Primer Chapter 18

最近回复

分类

归档

Links

其它