Giter Site home page Giter Site logo

恢复数据的方法讨论 about bmybbs HOT 4 OPEN

IronBlood avatar IronBlood commented on June 26, 2024
恢复数据的方法讨论

from bmybbs.

Comments (4)

IronBlood avatar IronBlood commented on June 26, 2024

第一轮恢复的脚本

第一轮仅处理版面文章,精华区文件夹、个人信件、还有一些元数据(例如注册成功、提醒文件等等)暂不做处理。

记录日志的 pglog

#!/usr/bin/env nodejs

var pg = require('pg');
var conString = "postgres://bmy:[email protected]/bmy-fix-log";
var client = new pg.Client(conString);

client.connect(function(err, client, done) {
        if(err) {
                return console.error('cannot connet', err);
        }

        client.query('INSERT INTO log(path, status, result) VALUES (\'' + process.argv[2] + '\', \'' + process.argv[3] + '\', \'' + process.argv[4] + '\')', function(err, result) {
                if(err) {
                        return console.error('insert error', err);
                }
//              console.log('insert successfully');
                client.end();
        });
});

处理时间转换的 pts

#!/usr/bin/php
<?php
        echo(strtotime($argv[1]));
?>

执行任务的 bmyfix

#!/bin/bash

BASE=/home/ironblood/boards

if [ -f "$1/count.person" ] ; then
        pglog $1 panding announce
        exit 1
fi

for bmyfile in "$1"/*
do
        if [ -d $bmyfile ] ; then
                bmyfix $bmyfile
        else
                FIRSTLINE=`sed '1q;d' $bmyfile | iconv -f gbk -t utf8`

                if [[ $FIRSTLINE =~ "寄信人" ]] ; then
                        # handle as mail
                        pglog $bmyfile panding PersonalMail
                elif [[ $FIRSTLINE =~ "信区" ]] ; then
                        # handle as board post
                        BOARDNAME=`sed '1q;d' $bmyfile | iconv -f gbk -t utf8 | awk 'BEGIN{FS="信区: "} {print $2}'`
                        TIME=`sed '3q;d' $bmyfile | iconv -f gbk -t utf8 | cut -d"(" -f2 | cut -d")" -f1`
                        TIMESTAMP=`pts "$TIME"`

                        if [ ! -d "$BASE/$BOARDNAME" ] ; then
                                mkdir $BASE/$BOARDNAME -p
                        fi

                        NEWFILENAME=$BASE/$BOARDNAME/M.$TIMESTAMP.A

                        cp $bmyfile $NEWFILENAME

                        pglog $bmyfile successful $NEWFILENAME
                else
                        # handle as normal file
                        pglog $bmyfile panding unknow
                fi
        fi
done

from bmybbs.

IronBlood avatar IronBlood commented on June 26, 2024

第一轮分析结束,现状:

  • 数据库中总计处理条数 2745639 条;
  • 数据库中记录的整理过的版面文章 1360769
  • 位于目标修复路径下的文章数 2105840

分析:

  • 由于第一轮并发过大,某些文件可能处理成功了但是结果没有写入数据库
  • 不确定是否由于资源紧张,部分脚本执行中断,即没有对全部的 8766971 个二级文件/目录都完成了处理
  • 某些版面文件,可能由于版面名称或者时间问题,存放的路径无效

计划补充如下脚本:

  1. 完整扫描 unknown 文件夹,若当前文件没有处理日志,则补充处理
  2. 对路径无效的文件统一再次处理

另外:

  • 考虑用户目录($BBSHOME/home)的修复方案
  • 考虑站内信件的修复方案

from bmybbs.

IronBlood avatar IronBlood commented on June 26, 2024

修复用户文件夹名称的方法

依据目录下的 register/webregister 文件中包含的 userid 字段进行重命名。但是 有关的数据还需要校验,避免长度正确但其实字符值全为 '\0' 的情况。

#!/bin/bash

for i in {A..Z}; do

    FOLDERLIST=`ls $i | grep obj`

    for userhome in $FOLDERLIST
    do
        if [ -f $i/$userhome/register ] ; then
            USERNAME=`sed '2q;d' $i/$userhome/register | cut -d" " -f2`
            if [ ${#USERNAME} -eq 0 ] ; then
                psql -U bmy -h 127.0.0.1 -p 5444 bmy-fix-log -c "INSERT INTO userhomelog(path, status, result) VALUES ('$i/$userhome', 'failed', 'invalid register file');" > /dev/null 2>&1 &
            else
                mv $i/$userhome $i/$USERNAME
                psql -U bmy -h 127.0.0.1 -p 5444 bmy-fix-log -c "INSERT INTO userhomelog(path, status, result) VALUES ('$i/$userhome', 'successful', '$i/$USERNAME');" > /dev/null 2>&1
            fi
        elif [ -f $i/$userhome/webregister ] ; then
            USERNAME=`sed '2q;d' $i/$userhome/webregister | cut -d" " -f2`
            if [ ${#USERNAME} -eq 0 ] ; then
                psql -U bmy -h 127.0.0.1 -p 5444 bmy-fix-log -c "INSERT INTO userhomelog(path, status, result) VALUES ('$i/$userhome', 'failed', 'invalid webregister file');" > /dev/null 2>&1 &
            else
                mv $i/$userhome $i/$USERNAME
                psql -U bmy -h 127.0.0.1 -p 5444 bmy-fix-log -c "INSERT INTO userhomelog(path, status, result) VALUES ('$i/$userhome', 'successful', '$i/$USERNAME');" > /dev/null 2>&1
            fi
        else
            psql -U bmy -h 127.0.0.1 -p 5444 bmy-fix-log -c "INSERT INTO userhomelog(path, status, result) VALUES ('$i/$userhome', 'failed', 'No register files');" > /dev/null 2>&1 &
        fi
    done
done

对于 .PASSWDS 文件中存在,而用户文件夹已经丢失的用户,按照新注册用户补充建立个人文件夹。

此行为会丢失的数据包括:

  • nju09 的自定义样式
  • 签名档
  • 好友名单、黑名单
  • 提醒文件
  • 收藏夹
  • 注册信息
  • 存放已读信息的 brc 文件

from bmybbs.

IronBlood avatar IronBlood commented on June 26, 2024

用户主目录处理数据的结果:

  • 重命名了58791个目录
  • 有46691个目录下 register 或者 webregister 文件未包含 userid 字段
  • 还有4356个目录下不存在 register 或者 webregister 文件

from bmybbs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.