Giter Site home page Giter Site logo

hivemind's Introduction

Hivemind [BETA]

Developer-friendly microservice powering social networks on the Steem blockchain.

Hive is a "consensus interpretation" layer for the Steem blockchain, maintaining the state of social features such as post feeds, follows, and communities. Written in Python, it synchronizes an SQL database with chain state, providing developers with a more flexible/extensible alternative to the raw steemd API.

Development Environment

  • Python 3.6 required
  • Postgres 10+ recommended

Dependencies:

  • OSX: $ brew install python3 postgresql
  • Ubuntu: $ sudo apt-get install python3 python3-pip

Installation:

$ createdb hive
$ export DATABASE_URL=postgresql://user:pass@localhost:5432/hive
$ git clone https://github.com/steemit/hivemind.git
$ cd hivemind
$ pip3 install -e .[test]

Start the indexer:

$ hive sync
$ hive status
{'db_head_block': 19930833, 'db_head_time': '2018-02-16 21:37:36', 'db_head_age': 10}

Start the server:

$ hive server
$ curl --data '{"jsonrpc":"2.0","id":0,"method":"hive.db_head_state","params":{}}' http://localhost:8080
{"jsonrpc": "2.0", "result": {"db_head_block": 19930795, "db_head_time": "2018-02-16 21:35:42", "db_head_age": 10}, "id": 0}

Run tests:

$ make test

Production Environment

Hivemind is deployed as a Docker container.

Here is an example command that will initialize the DB schema and start the syncing process:

docker run -d --name hivemind --env DATABASE_URL=postgresql://user:pass@hostname:5432/databasename --env STEEMD_URL=https://yoursteemnode --env SYNC_SERVICE=1 -p 8080:8080 steemit/hivemind:latest

Be sure to set DATABASE_URL to point to your postgres database and STEEMD_URL to point to your steemd node to sync from.

Once the database is synced, Hivemind will be available for serving requests.

To follow along the logs, use this:

docker logs -f hivemind

Configuration

Environment CLI argument Default
LOG_LEVEL --log-level INFO
HTTP_SERVER_PORT --http-server-port 8080
DATABASE_URL --database-url postgresql://user:pass@localhost:5432/hive
STEEMD_URL --steemd-url https://api.steemit.com
REDIS_URL --redis-url redis://localhost:6379/
MAX_BATCH --max-batch 50
MAX_WORKERS --max-workers 4
TRAIL_BLOCKS --trail-blocks 2
RECOMMEND_COMMUNITIES --recommend-communities hive-108451,hive-172186,hive-187187

Precedence: CLI over ENV over hive.conf. Check hive --help for details.

Requirements

Hardware

  • Focus on Postgres performance
  • 2.5GB of memory for hive sync process
  • 250GB storage for database

Steem config

Build flags

  • LOW_MEMORY_NODE=OFF - need post content
  • CLEAR_VOTES=OFF - need all vote data
  • SKIP_BY_TX=ON - tx lookup not used

Plugins

  • Required: reputation reputation_api database_api condenser_api block_api
  • Not required: follow*, tags*, market_history, account_history, witness

Postgres Performance

For a system with 16G of memory, here's a good start:

effective_cache_size = 12GB # 50-75% of avail memory
maintenance_work_mem = 2GB
random_page_cost = 1.0      # assuming SSD storage
shared_buffers = 4GB        # 25% of memory
work_mem = 512MB
synchronous_commit = off
checkpoint_completion_target = 0.9
checkpoint_timeout = 30min
max_wal_size = 4GB

JSON-RPC API

The minimum viable API is to remove the requirement for the follow and tags plugins (now rolled into condenser_api) from the backend node while still being able to power condenser's non-wallet features. Thus, this is the core API set:

condenser_api.get_followers
condenser_api.get_following
condenser_api.get_followers_by_page
condenser_api.get_following_by_page
condenser_api.get_follow_count

condenser_api.get_content
condenser_api.get_content_replies

condenser_api.get_state

condenser_api.get_trending_tags

condenser_api.get_discussions_by_trending
condenser_api.get_discussions_by_hot
condenser_api.get_discussions_by_promoted
condenser_api.get_discussions_by_created

condenser_api.get_discussions_by_blog
condenser_api.get_discussions_by_feed
condenser_api.get_discussions_by_comments
condenser_api.get_replies_by_last_update

condenser_api.get_blog
condenser_api.get_blog_entries
condenser_api.get_discussions_by_author_before_date

Overview

History

Initially, the steemit.com app was powered exclusively by steemd nodes. It was purely a client-side app without any backend other than a public and permissionless API node. As powerful as this model is, there are two issues: (a) maintaining UI-specific indices/APIs becomes expensive when tightly coupled to critical consensus nodes; and (b) frontend developers must be able to iterate quickly and access data in flexible and creative ways without writing C++.

To relieve backend and frontend pressure, non-consensus and frontend-oriented concerns can be decoupled from steemd itself. This (a) allows the consensus node to focus on scalability and reliability, and (b) allows the frontend to maintain its own state layer, allowing for flexibility not feasible otherwise.

Specifically, the goal is to completely remove the follow and tags plugins, as well as get_state from the backend node itself, and re-implement them in hive. In doing so, we form the foundational infrastructure on which to implement communities and more.

Purpose

Hive tracks posts, relationships, social actions, custom operations, and derived states.
  • discussions: by blog, trending, hot, created, etc
  • communities: mod roles/actions, members, feeds (in 1.5; spec)
  • accounts: normalized profile data, reputation
  • feeds: un/follows and un/reblogs
Hive does not track most blockchain operations.

For anything to do with wallets, orders, escrow, keys, recovery, or account history, query SBDS or steemd.

Hive can be extended or leveraged to create:
  • reactions, bookmarks
  • comment on reblogs
  • indexing custom profile data
  • reorganize old posts (categorize, filter, hide/show)
  • voting/polls (democratic or burn/send to vote)
  • modlists: (e.g. spammy, abuse, badtaste)
  • crowdsourced metadata
  • mentions indexing
  • full-text search
  • follow lists
  • bot tracking
  • mini-games
  • community bots

Core indexer

Ingests blocks sequentially, processing operations relevant to accounts, post creations/edits/deletes, and custom_json ops for follows, reblogs, and communities. From these we build account and post lookup tables, follow/reblog state, and communities/members data. Built exclusively from raw blocks, it becomes the ground truth for internal state. Hive does not reimplement logic required for deriving payout values, reputation, and other statistics which are much more easily attained from steemd itself in the cache layer.

Cache layer

Synchronizes the latest state of posts and users, allowing us to serve discussions and lists of posts with all expected information (title, preview, image, payout, votes, etc) without needing steemd. This layer is first built once the initial core indexing is complete. Incoming blocks trigger cache updates (including recalculation of trending score) for any posts referenced in comment or vote operations. There is a sweep to paid out posts to ensure they are updated in full with their final state.

API layer

Performs queries against the core and cache tables, merging them into a response in such a way that the frontend will not need to perform any additional calls to steemd itself. The initial API simply mimics steemd's condenser_api for backwards compatibility, but will be extended to leverage new opportunities and simplify application development.

Fork Resolution

Latency vs. consistency vs. complexity

The easiest way to avoid forks is to only index up to the last irreversible block, but the delay is too much where users expect quick feedback, e.g. votes and live discussions. We can apply the following approach:

  1. Follow the chain as closely to head_block as possible
  2. Indexer trails a few blocks behind, by no more than 6s - 9s
  3. If missed blocks detected, back off from head_block
  4. Database constraints on block linking to detect failure asap
  5. If a fork is encountered between hive_head and steem_head, trivial recovery
  6. Otherwise, pop blocks until in sync. Inconsistent state possible but rare for TRAIL_BLOCKS > 1.
  7. A separate service with a greater follow distance creates periodic snapshots

Documentation

$ make docs && open docs/hive/index.html

License

MIT

hivemind's People

Contributors

ety001 avatar gl2748 avatar happyberrysboy avatar imwatsi avatar john-g-g avatar jolly-pirate avatar jonathanporta avatar jredbeard avatar jwrct avatar netherdrake avatar relativityboy avatar roadscape avatar sneak avatar ss334452 avatar syvb avatar yuekun0707 avatar zhang0125 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hivemind's Issues

some deleted posts not updating

21 specific records on 2018-01-22 from 19:33:36 to 22:03:51 are refusing to update.

SELECT * FROM hive_posts_cache WHERE is_paidout = '0' AND payout_at < '2018-01-24 16:33:42'

blacklist spec

Spec out the classes of blacklists and how to read/write

stats tables

in the future we may need a summary table to assist certain queries.

potentially

  • avg rshares
  • payouts
  • posts
  • accounts (active)
  • comments
  • txs/ops?
  • posts_per_account
  • comments_per_account

tables

  • stat_hour
  • stat_day

Exception: found cache gap: 28009430 --> 28009433 (1)

Within a period of 29 blocks, post 28009432 was created, deleted, and re-created. Sync detected that it was about to skip indexing this post and aborted.

The solution is likely to sort cache list by id before writing. Deleted posts may be breaking the assumption about sequential inserts when we process blocks in batch.

batching workers

For bulk requests, add the ability to retrieve batches in parallel, and determine optimal batch/worker sizes for jussi/steemd.

  • implement
  • balance parameters
  • test resync

finalize db schema

todo:

  • bool fields

consider:

  • use INT ids instead of varchar(16) account names
  • use block_num instead of timestamp

hive_posts_cache:

  • few missing fields: depth, get_post_stats vals
  • bonus: distinguish simple vote updates (payout/ranking fields) from body/thread updates

fast block sync w/ jussi

Add new sync strategy: sync from jussi, if it's configured. It allows us to request blocks in large batches.

OPTIONS call support

The issue is triggered by a slight (and appropriate) tightening of the rpc request in steem-js. The following was added, which is triggering Chrome to decide it needs to sent an OPTIONS call before the actual post:

   headers: {
     Accept: 'application/json, text/plain, */*',
    'Content-Type': 'application/json',
   },

https://github.com/aio-libs/aiohttp-cors looks promising.

OPTIONS call support is part of the HTTP standard. If our services properly support HTTP it will make development going forward more predictable, and stable.

It also currently eases development when using tools that assume full HTTP support, without needing to setup and run a local jussi instance.

fork recovery

in case of fork:

  • determine fork block
  • pop forked block
    • need to delete associated post/account records?
  • sync to head
    • filling in and/or overwriting fork data

temporary workarounds

revert this commit once steem-python is updated on pypi:
2159f6a

revert this commit when steem-python stops writing to disk (or if removed from server completely)
e2b48a9

re-evaluate this change to healthcheck (done to remove reliance on steem-python)
41d8ca6

update readme / docs

  • revamp readme
  • include community spec
  • include hive spec (see also #19 (comment))
  • update install instructions
  • determine docs framework (pdoc)
  • doc generator
  • fill in missing code docs
  • document mysql support (branch)
  • fix cli/click daemon

maybe:

  • update hive_api and associated schema (check w/ relativityboy) out of scope; cont. in #92

community implementation

  • creation op
  • custom ops, verify auths
  • API methods
    • get_community(name): admins, mods, descriptions, settings
    • list_communities(start, sort) (name/trending/subs)
    • list_tags(start, sort) (trending)
    • list_blogs(start, sort)
    • get_user_subscriptions(account, start, sort)
    • search: tag/user/community/title
    • (more)
  • internal methods
    • top authors in community (trending)
    • top curators in community
    • top blogs
    • next 24h reward pool
    • community rank
  • frontend
  • test/profile
  • steemit.com "global" acct (subs, follows, mutes?)

bonus

  • daily stats: subs, rank
    • top authors (d/w/m)

evaluate:

  • self as null-comm? allows sub to trending blogs
    • does this impact blog-follow?

reblog comments

Hive could accept a comment along with a reblog. How would we handle multi-reblogs?

db tunings: test, apply, document

Evaluate:

  • postgres config tuning (mem usage, autovac)
  • deferring of constraints (moved to #95: initial sync perf)
  • disabling indexes during initial sync (moved to #95 -- initial sync perf)
  • sync index tuning (done; read-side to be handled in #93)
  • table partitioning

Top level dropdowns do not accept theme-ing.

Condenser's current top-level dropdowns exist outside the theming context. They cannot respond to 'night mode' theme changes (or any others for that matter)

Though theme-dark is set, the dropdown exists outside the regular tree.
screen shot 2018-01-08 at 2 51 19 pm

Two possible solutions (of many) add the theme class to either the <body> tag, or the root of the tree that holds the dropdown.

body:
screen shot 2018-01-08 at 2 53 06 pm
dropdown root:
screen shot 2018-01-08 at 2 50 52 pm

performance profiling

This issue is a container for various test results

  • increase content fetch speed or decrease per-block post updates
  • get live block processing time below 500ms

use is_valid flag for community posts

Provides more flexibility in how we interpret data, and leaves possibilities open for UI and reports.

It doesn't make sense to force-override the community field for comments, since they would not be accessible anyway. They should still be part of the tree, just flagged automatically.

For root posts, it's not yet clear which is the ideal approach.

missing post cache records

Encountered a case where some rows were missing from the posts cache table. This should not be possible because rows are always written in sequence, within a transaction. It's possible this was a side-effect of dev testing.

frontend implementation

In condenser we need to replace:

  • existing discussion API's, including blog/feeds (get_discussions_by_X, get_blog_feed, get_user_feed)
  • follows APIs (get_followers/get_following)
  • discussion threads APIs (get_state)

block eta

Currently there is a loop which checks dgp until expected block is detected. This makes a lot of calls to jussi (and returned data is imprecise because calls cached by 3s). Each call can be up to 100-200ms which is a significant waste. Since blocks are at regular 3s intervals, hive should just request blocks at their ETA.

Also, once we have a known amount of idle time between blocks, it can be used for maintenance.

account cache

store:

  • rep (and remove from posts.votes? or preprocess?)
  • followers
  • following
  • proxy weight
  • voting weight
  • join date
  • profile fields (json?)

Evaluate replacing steem-python

Currently, the software relies extensively on 3 specific API calls: get_content, get_block, and get_dynamic_global_properties. It calls these through steem-python, with which there have been issues that may affect HA. It also doesn't support websockets. Evaluate whether if it's worth to run the whole stack vs purpose-specific API routines.

Add Migrations

I tried updating hive to the latest version, but the schema is out of date.

2017-09-28T10:24:00.298529363Z sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1054, "Unknown column 'is_valid' in 'field list'")

Could you please add db migrations, or at least publish a simple textfile with SQL required to upgrade /w dates or commit hashes?

deploy hivemindsync to aws

We need a process which will periodically save db snapshots to be used for quickly launching new instances

  • hivesync service
  • hivesync full sync
  • get init time < 30 mins #108
  • dev
  • stage
  • prod
  • hive listeners/API servers in dev/stage/prod #109

env vars

Where to use env vars vs. args?

  • appbase flag
  • consistent use of db_url
  • steemd vs jussi flag

steemd-compatible APIs

Responses should be as close as possible to steemd, making it easy for devs to switch to hive APIs.

mysql tuning

table stats:
image

hive_feed_cache works well with MEMORY engine. had to set:

tmp_table_size=2G
max_heap_table_size=2G

pinned posts

As a user of sufficient privilege within a community, I want to be able to pin 1 or more posts to the top of that community's feed; in an order chosen by me.

encoding issue on a strange (invalid?) username in follow history

I guess the follow plugin doesn't necessarily enforce constraints on what is a valid username, and there is one with ศ‚ in it.

It breaks hive indexer like this:

INFO:sqlalchemy.engine.base.Engine:
        INSERT IGNORE INTO hive_follows (follower, following, created_at, state)
        VALUES (%s, %s, %s, %s) ON DUPLICATE KEY UPDATE state = %s
        
INFO:sqlalchemy.engine.base.Engine:('tuakanamorgan', 'najem\u202c', '2017-06-27T20:42:03', 1, 1)
INFO:sqlalchemy.engine.base.Engine:ROLLBACK
Traceback (most recent call last):
  File "/usr/local/bin/hive", line 9, in <module>
    load_entry_point('hivemind', 'console_scripts', 'hive')()
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/app/hive/indexer/cli.py", line 22, in index_from_steemd
    run()
  File "/app/hive/indexer/core.py", line 421, in run
    sync_from_steemd(is_initial_sync)
  File "/app/hive/indexer/core.py", line 335, in sync_from_steemd
    dirty |= process_blocks(blocks, is_initial_sync)
  File "/app/hive/indexer/core.py", line 278, in process_blocks
    dirty |= process_block(block, is_initial_sync)
  File "/app/hive/indexer/core.py", line 264, in process_block
    process_json_follow_op(account, op_json, date)
  File "/app/hive/indexer/core.py", line 173, in process_json_follow_op
    query(sql, fr=follower, fg=following, at=block_date, state=state)
  File "/app/hive/db/methods.py", line 17, in query
    res = conn.execute(query, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 945, in execute
    return meth(self, multiparams, params)
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/sql/elements.py", line 263, in _execute_on_connection
    return connection._execute_clauseelement(self, multiparams, params)
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1053, in _execute_clauseelement
    compiled_sql, distilled_params
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1189, in _execute_context
    context)
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1405, in _handle_dbapi_exception
    util.reraise(*exc_info)
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/util/compat.py", line 187, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
    context)
  File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
    cursor.execute(statement, parameters)
  File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 234, in execute
    args = tuple(map(db.literal, args))
  File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 318, in literal
    s = self.escape(o, self.encoders)
  File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 225, in unicode_literal
    return db.string_literal(str(u).encode(db.encoding))
UnicodeEncodeError: 'latin-1' codec can't encode character '\u202c' in position 5: ordinal not in range(256)

index state management

  • skip feed,post,account cache during initial sync
  • account id map
  • post id map (LRU cache)
  • dirty account queue
  • dirty post queue
  • awareness of sync status (initial, normal, listen)
  • move dirty/flush methods to respective classes (incl Block)
  • accounts: flush over an n-block period during listen
  • posts: flush inserts asap, edits over an n-block period during listen half-implemented; see #83
  • events which affect payout/votes vs. content of posts
  • events which affect rep(/+?) vs. stats/profile of accounts ignoring this; solved w/ slow-flush
  • in-memory payout queue (vs. using vops) moved to #76

evaluate:

  • use vops rather than SQL to detect payouts
    • vops could also be used to track top comm curators
    • call overhead probably prohibitive
  • how to quickly & durably mark cached post dirty
  • fetch post state on requests not a good idea; reads > writes
  • track steemd post_id if batch fetching is an option store in raw_json for now
    • makes root_comment useful

sbds jsonrpc lib: "Attempt to overwrite %r in LogRecord"

Saw several of these occurring in dev, not sure what the cause is:

Traceback (most recent call last):  
  File "/usr/local/lib/python3.5/dist-packages/bottle.py", line 862, in _handle  
    return route.call(**args)  
  File "/usr/local/lib/python3.5/dist-packages/bottle.py", line 1740, in wrapper  
    rv = callback(*a, **ka)  
  File "/app/hive/sbds/jsonrpc.py", line 64, in rpc  
    self.logger.error('Parse Error, Not JSON', extra=request.body.read())  
  File "/usr/lib/python3.5/logging/__init__.py", line 1308, in error  
    self._log(ERROR, msg, args, **kwargs)  
  File "/usr/lib/python3.5/logging/__init__.py", line 1414, in _log  
    exc_info, func, extra, sinfo)  
  File "/usr/lib/python3.5/logging/__init__.py", line 1388, in makeRecord  
    raise KeyError("Attempt to overwrite %r in LogRecord" % key)  
KeyError: 'Attempt to overwrite 109 in LogRecord' 
KeyError: 'Attempt to overwrite 51 in LogRecord'

etc

api must return resteem status

Currently in condenser, resteem state is stored client-side. This makes it minimally usable but unreliable if it is to support un-resteeming.

The naive solution would be to return all resteeming accounts for each requested post, just as votes are currently. A better solution would be to specify an account context and return user-specific state from hive.

thread fetching API

need an API that works similarly to get_state for fetching full discussions along with all relevant commenters' metadata

cursor-style pagination?

currently with steemd all "posts" queries take a start (author,permlink) as a cursor to load successive pages from. hive just uses offset/limit. if we want to replicate existing APIs (nearly) 100% we would need to add the cursor option. this involves a bit more complexity because we'll need to perform a lookup on the column we're ordering by based on the author-permlink provided to know on which value to start. aside from the extra queries an upside is that seeking could be more efficient than simple offset; it's more infinite-scroll friendly; and, may be more consistent when result rows are 'moving' around.

for reference:

struct discussion_query {
   void validate()const{
      FC_ASSERT( filter_tags.find(tag) == filter_tags.end() );
      FC_ASSERT( limit <= 100 );
   }

   string           tag;
   uint32_t         limit = 0;
   set<string>      filter_tags;
   set<string>      select_authors; ///< list of authors to include, posts not by this author are filtered
   set<string>      select_tags; ///< list of tags to include, posts without these tags are filtered
   uint32_t         truncate_body = 0; ///< the number of bytes of the post body to return, 0 for all
   optional<string> start_author;
   optional<string> start_permlink;
   optional<string> parent_author;
   optional<string> parent_permlink;
};

dockerfile is divergent

Why is the python base being used in dockerfile instead of the same ubuntu-based baseimage we're using in sbds? Reusing infrastructure wherever possible is preferred.

resync on wake

if the process is behind on blocks by a certain threshold it needs to go back to the sync routine

structured logging

Currently hive indexer logging is basic console output, though tuned to produce useful output at the INFO level at a reasonable volume. Hive server logging could use some work. And generally there's a mix of prints and logger.getLogger, with hacks for other packages' noisy loggers (sa & jsonrpcserver). Hive's logging would benefit from a refactoring and polishing.

  • standard logging solution -- untangle logger config, or use other package?
  • tune hive.server logging -- each request is logged; need to track errors but control spam
  • json-based logging? (jg)

yo uses http://www.structlog.org/en/stable/

schema changes

  • cached_posts.user_agent
  • cached_posts.lang
  • cached_posts.canonical_url
  • url field instead of split (keep uniq constraint)
    • author -> author_id
    • author/permlink -> url
    • compressed to max length of 256 for mysql? (approx 191 in base-122 possible)
  • accounts.created_by
  • use block_num over created_at where possible
    • and consider posts.deleted_at_block, etc

Leftover items from #242 to evaluate:

Tier 1

  • hpc.author_id [smaller idxs, faster lookups]

Tier 2

  • legacy mutes migration (iredeemables, rep<0)
  • catchall cid for blogs or legacy
  • comm stat: # of unique authors (pending)
  • hpc.parent_id / parent_author / parent_permlink
  • hpc.parent_author_id (only needed for replies)
  • hpc.root_author_id (blog stats)
  • hpc.root_post_id [disc retrieval, stats]
  • hpc.tags (list)

jussi-awareness

Hivemind needs to know if jussi is configured so it can upgrade to batch requests for certain calls.

evaluate rep score replacement

One way to approximate the current rep score is to sum of all of a user's posts net_rshares. Where this differs from follow plugin is that users with lower rep can bring down those with higher rep.

In the short term it may be ideal to keep it in steemd (see steemit/steem#1425). Long term, we may want to implement some other algorithm entirely.

docker out-of-data-space bug

This happens sometimes. Possibly due to db error log?

Oct 11 08:14:11 docker/cd60a907a168[7347]: [SYNC] Got block 16231286 (68.7/s, 1096rps 73wps) -- 0.0m remaining
Oct 11 08:14:11 docker/cd60a907a168[7347]: [INIT] *** Initial sync complete. Rebuilding cache. ***
Oct 11 08:14:11 docker/cd60a907a168[7347]: [INIT] Found 15034490 missing post cache entries
Oct 11 08:14:56 kernel: [1692646.139464] device-mapper: thin: 253:2: switching pool to out-of-data-space (error IO) mode
Oct 11 08:14:56 kernel: [1692646.144088] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1048576000 size 8388608 starting block 2519552)
Oct 11 08:14:56 kernel: [1692646.150472] Buffer I/O error on device dm-4, logical block 2519552
Oct 11 08:14:56 kernel: [1692646.153929] Buffer I/O error on device dm-4, logical block 2519553
Oct 11 08:14:56 kernel: [1692646.157063] Buffer I/O error on device dm-4, logical block 2519554
Oct 11 08:14:56 kernel: [1692646.160150] Buffer I/O error on device dm-4, logical block 2519555
Oct 11 08:14:56 kernel: [1692646.163072] Buffer I/O error on device dm-4, logical block 2519556
Oct 11 08:14:56 kernel: [1692646.165959] Buffer I/O error on device dm-4, logical block 2519557
Oct 11 08:14:56 kernel: [1692646.168900] Buffer I/O error on device dm-4, logical block 2519558
Oct 11 08:14:56 kernel: [1692646.171721] Buffer I/O error on device dm-4, logical block 2519559
Oct 11 08:14:56 kernel: [1692646.174681] Buffer I/O error on device dm-4, logical block 2519560
Oct 11 08:14:56 kernel: [1692646.177909] Buffer I/O error on device dm-4, logical block 2519561
Oct 11 08:14:56 kernel: [1692646.181019] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1048576000 size 8388608 starting block 2519808)
Oct 11 08:14:56 kernel: [1692646.187836] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1048576000 size 8388608 starting block 2520064)
Oct 11 08:14:56 kernel: [1692646.194356] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1048576000 size 8388608 starting block 2520320)
Oct 11 08:14:56 kernel: [1692646.200836] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1048576000 size 8388608 starting block 2520576)
Oct 11 08:14:56 kernel: [1692646.207192] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1048576000 size 8388608 starting block 2520832)
Oct 11 08:14:56 kernel: [1692646.214593] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1056964608 size 8388608 starting block 2521088)
Oct 11 08:14:56 kernel: [1692646.221018] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1056964608 size 8388608 starting block 2521344)
Oct 11 08:14:56 kernel: [1692646.227704] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1056964608 size 8388608 starting block 2521600)
Oct 11 08:14:56 kernel: [1692646.234261] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1056964608 size 8388608 starting block 2521856)
Oct 11 08:14:56 kernel: [1692646.271736] JBD2: Detected IO errors while flushing file data on dm-4-8
Oct 11 08:15:05 kernel: [1692655.338808] EXT4-fs warning: 287 callbacks suppressed
Oct 11 08:15:05 kernel: [1692655.342246] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 0 size 0 starting block 2519552)
Oct 11 08:15:05 kernel: [1692655.351639] buffer_io_error: 75798 callbacks suppressed
Oct 11 08:15:05 kernel: [1692655.354860] Buffer I/O error on device dm-4, logical block 2519552
Oct 11 08:15:05 kernel: [1692655.358579] Buffer I/O error on device dm-4, logical block 2519553
Oct 11 08:15:05 kernel: [1692655.362446] Buffer I/O error on device dm-4, logical block 2519554
Oct 11 08:15:05 kernel: [1692655.366161] Buffer I/O error on device dm-4, logical block 2519555
Oct 11 08:15:05 kernel: [1692655.370238] Buffer I/O error on device dm-4, logical block 2519556
Oct 11 08:15:05 kernel: [1692655.373862] Buffer I/O error on device dm-4, logical block 2519557
Oct 11 08:15:05 kernel: [1692655.377500] Buffer I/O error on device dm-4, logical block 2519558
Oct 11 08:15:05 kernel: [1692655.381224] Buffer I/O error on device dm-4, logical block 2519559
Oct 11 08:15:05 kernel: [1692655.384238] Buffer I/O error on device dm-4, logical block 2519560
Oct 11 08:15:05 kernel: [1692655.387449] Buffer I/O error on device dm-4, logical block 2519561
Oct 11 08:15:05 kernel: [1692655.390647] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2519808)
Oct 11 08:15:05 kernel: [1692655.402506] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2520064)
Oct 11 08:15:05 kernel: [1692655.411143] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2520320)
Oct 11 08:15:05 kernel: [1692655.420756] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2520576)
Oct 11 08:15:05 kernel: [1692655.429972] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2520832)
Oct 11 08:15:05 kernel: [1692655.439005] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2521088)
Oct 11 08:15:05 kernel: [1692655.448575] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2521344)
Oct 11 08:15:05 kernel: [1692655.457532] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2521600)
Oct 11 08:15:05 kernel: [1692655.466855] EXT4-fs warning (device dm-4): ext4_end_bio:314: I/O error -28 writing to inode 528049 (offset 1361182720 size 6160384 starting block 2521856)
Oct 11 08:15:06 kernel: [1692655.516649] JBD2: Detected IO errors while flushing file data on dm-4-8
Oct 11 08:15:09 docker/cd60a907a168[7347]: #033[93m[SQL][58385ms] SELECT id, author, permlink FROM hive_posts WHERE is_deleted = 0 AND id > (SELECT IFNULL(MAX(post_id), 0) FROM hive_posts_cache) ORDER BY id LIMIT 1000000#033[0m
Oct 11 08:15:15 docker/cd60a907a168[7347]: Traceback (most recent call last):
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/base.py", line 1182, in _execute_context
Oct 11 08:15:15 docker/cd60a907a168[7347]:     context)
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/sqlalchemy/engine/default.py", line 470, in do_execute
Oct 11 08:15:15 docker/cd60a907a168[7347]:     cursor.execute(statement, parameters)
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 250, in execute
Oct 11 08:15:15 docker/cd60a907a168[7347]:     self.errorhandler(self, exc, value)
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 50, in defaulterrorhandler
Oct 11 08:15:15 docker/cd60a907a168[7347]:     raise errorvalue
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 247, in execute
Oct 11 08:15:15 docker/cd60a907a168[7347]:     res = self._query(query)
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 411, in _query
Oct 11 08:15:15 docker/cd60a907a168[7347]:     rowcount = self._do_query(q)
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/MySQLdb/cursors.py", line 374, in _do_query
Oct 11 08:15:15 docker/cd60a907a168[7347]:     db.query(q)
Oct 11 08:15:15 docker/cd60a907a168[7347]:   File "/usr/local/lib/python3.5/dist-packages/MySQLdb/connections.py", line 277, in query
Oct 11 08:15:15 docker/cd60a907a168[7347]:     _mysql.connection.query(self, query)
Oct 11 08:15:15 docker/cd60a907a168[7347]: _mysql_exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1")
Oct 11 08:15:15 docker/cd60a907a168[7347]:
Oct 11 08:15:15 docker/cd60a907a168[7347]: The above exception was the direct cause of the following exception:
Oct 11 08:15:15 docker/cd60a907a168[7347]:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.