[ozone]-deleteBlock 部分Jira整理

发布时间:2026/5/19 6:40:23

[ozone]-deleteBlock 部分Jira整理 [ozone]-deleteBlock 部分Jira整理事情最早追溯到HDDS-3217该Jira是为了解决DN启动时慢加载的问题在此之前预留committedBytes需要计算container的usedBytesusedBytes是通过全表扫描Block信息累加出来的。需要注意的是Merge rocksdb in datanode是在该Jira之后提出所以此时一个container会对应一个rocksdb实例另外在计算pending deleted的block数量时也需要根据DELETING_KEY_PREFIX即#deleting#做前缀过滤扫描。综上在DN启动时需要两次全表扫描导致key数量较多时DN启动较慢。为了解决上述问题提出了如下解决方法。通过KeyValueContainerData新增的updateAndCommitDBCounters()方法可以一目了然。在RocksDB中直接添加三个keyValue pari用来表示blockCountbytesUsedpendingDeleteBlockCountpublicvoidupdateAndCommitDBCounters(ReferenceCountedDBdb,BatchOperationbatchOperation,intdeletedBlockCount)throwsIOException{// Set Bytes used and block count key.batchOperation.put(DB_CONTAINER_BYTES_USED_KEY,Longs.toByteArray(getBytesUsed()));batchOperation.put(DB_BLOCK_COUNT_KEY,Longs.toByteArray(getKeyCount()-deletedBlockCount));batchOperation.put(DB_PENDING_DELETE_BLOCK_COUNT_KEY,Longs.toByteArray(getNumPendingDeletionBlocks()-deletedBlockCount));db.getStore().writeBatch(batchOperation);}// OzoneConsts.javapublicstaticfinalStringBLOCK_COUNT#BLOCKCOUNT;publicstaticfinalStringCONTAINER_BYTES_USED#BYTESUSED;publicstaticfinalStringPENDING_DELETE_BLOCK_COUNT#PENDINGDELETEBLOCKCOUNT;紧接着HDDS-4344针对delete key的效率问题做出了改进HDDS-4297针对以下场景Key1(b1, c1), (b2, c1), (b3, c1), (b4, c2), (b5, c2)Key2(b6, c1), (b7, c1), (b8, c2), (b9, c2), (b10, c2)Key3(b11, c1), (b12, c1), (b13, c2), (b14, c2), (b15, c2)当client发送delete key1 key3的请求时SCM会根据OM发送的blocks列表创建transactionsit collects all the blocks from a particular container and creates a transaction per container即T1(b1, c1), (b2, c1), (b3, c1), (b11, c1), (b12, c1)T2(b4, c2), (b5, c2), (b13, c2), (b14, c2), (b15, c2)此时client发送delete key的请求时T3(b6, c1), (b7, c1)T4(b8, c2), (b9, c2), (b10, c2)The idea is to send one transaction per container, In the above example it would scan the DELETED_BLOCKS table to send T1, T2 onlySCM 删除了容器 ID 重复检查逻辑只要该 DN 的待处理事务总数未超过maximumAllowedTXNum上限就直接添加新事务HDDS-4369针对以下场景在原有代码逻辑中标记待删除的block时针对每一个block都会在DB中生成一个DELETING_KEY_PREFIX前缀blockID的keyvalue为BlockDataHDDS-4369对其进行了优化将原有逻辑修改为markBlocksForDeletionSchemaV1加入了markBlocksForDeletionSchemaV2新的key为txnId新的value为DeletedBlocksTransaction对象该对象有一个container完整的待删除的block信息HDDS-4370针对以下场景在该Jira之前DN在收到删除block的Command后会将对应的block元数据添加到DB的deleting表中待删除完成再将该block元数据从deleting表中删除添加到deleted表中。HDDS-4370认为对于已经删除成功的表没有必要在deleted表中再存储一份block元数据的确没啥必要HDDS-4426针对以下场景在该Jira之前SCM针对一个Key创建transactions其中每一个container为一个transactionHDDS-4426认为完全没有必要可以针对一个批次的key创建transactions依旧每一个container为一个transactionHDDS-4481针对以下场景在该Jira之前存在一处Bug即OM deletion service会重复向SCM发送删除块的请求。这是由于HA OM导致的。在OM内部当KeyDelete Service调教一个KeyPurge的request后它会被提交给内部的ratis但是由于其callID和clientID完全相同在ratis侧会被认为是重复请求那么不会被OM处理直到OM侧的request cache过期之后。HDDS-4481在Purge请求中携带的callID设置为随机数解决以上问题HDDS-8139做出了很大的改进但是它的标题起的有点歧义Datanodes should not drop block delete transactions based on transaction ID这里做出的逻辑改变是DN不再对SCM发送的transaction做单调过滤。在该Jira之前DN会缓存它接收到的最大的delete transactionID若SCM发送过来的transactionID小于它本地缓存那么DN什么都不做但是却会回复SCM成功的ACK在生产中会出现delete transaction乱序到达的情况Jira中举例Case of mis-ordering of deletedTransactionId (SCM side):If container is open in SCM and parallel getting close, deleteTransactionId can be mis-orderedDuring retry of old failure deletedTransactionId, order for deleteTransactionId for the container can be differentCase of mis-ordering of deletedTransactionId (DN side):DN dispatcher add to queue, and then DeleteBlockCommandHandler thread all transactionIds assign to executorService (5), so based on thread schedule, there can be mis-ordering这就导致删除命令实际上产生了堆积该Jira删除了这部分的逻辑判断

相关新闻