Giter Site home page Giter Site logo

gangly / datafaker Goto Github PK

View Code? Open in Web Editor NEW
614.0 21.0 166.0 1.39 MB

Datafaker is a large-scale test data and flow test data generation tool. Datafaker fakes data and inserts to varied data sources. 测试数据生成工具

Python 98.97% Batchfile 1.03%
mysql bigdata postgresql kafka datafaker testing hive hbase python fakedata

datafaker's Introduction

Datafaker - Tool for faking data

License

Stargazers over time

English | 中文

1. Introduction

Datafaker is a large-scale test data and flow test data generation tool. It is compatible with python2.7 and python3.4+. Welcome to download and use. The github address is:

https://github.com/gangly/datafaker

Document sync updates on github

2. Background

In the software development testing process, test data is often needed. These scenarios include:

  • Backend development. After creating a new table, you need to construct database test data and generate interface data for use by the front end.
  • Database performance test. Generates a lot of test data to test database performance
  • Stream data test. For kafka streaming data, it is necessary to continuously generate test data to write to kafka.

After research, there is currently no open source test data generation tool for generating data with similar structure in mysql table. The common method is to manually create several pieces of data into the database. The disadvantage of this method is

  • Wasting work hours. Needs to construct different data for fields of different data types of the table
  • Small amount of data. If you need to construct a lot of data, you can't do it manually.
  • Not accurate enough. For example, you need to construct a mailbox (satisfying a certain format), a phone number (determined number of digits), an ip address (fixed format), age (cannot be negative, have a size range), and so on. These test data have certain restrictions or rules, and the manual construction may not meet the data range or some format requirements, resulting in the backend program error.
  • Multi-table association. The amount of data created manually is small, and the primary key in multiple tables may not be associated with, or associated with no data.
  • Dynamic random write. For example, for streaming data, you need to write kafka randomly every few seconds. Or dynamically insert mysql randomly, manual operation is relatively cumbersome, and it is not good to count the number of data written.

In response to these current pain points, datafaker came into being. Datafaker is a multi-data source test data construction tool that can simulate most common data types and easily solve the above pain points. Datafaker has the following features:

  • Multiple data types. Includes common database field types (integer, float, character), custom types (IP address, mailbox, ID number, etc.)
  • Simulate multi-table association data By formulating some fields as enumerated types (randomly selected from the specified data list), in the case of a large amount of data, it can ensure that multiple tables can be associated with each other and query data.
  • Support batch data and stream data generation, and specify stream data interval time
  • Support multiple data output methods, including screen printing, files and remote data sources
  • Support for multiple data sources. Currently supports relational databases, Hive, Kafka. Will be extended to Mongo, ES and other data sources.
  • Can specify the output format, currently supports text, json

3. Architecture

Datafaker is written in python and supports python2.7, python3.4+. The current version has been released on pypi.

architectur

The architecture diagram completely shows the execution process of the tool. From the figure, the tool has gone through five modules:

  • Parameter parser. Parse the commands that the user enters from the terminal command line.
  • Metadata parser. Users can specify metadata from local files or remote data source tables. After the parser obtains the content of the file, the text content is parsed into table field metadata and data construction rules according to the rules.
  • Data construction engine. The construction engine constructs rules based on the data generated by the metadata parser, simulating the generation of different types of data.
  • Data routing. According to different data output types, it is divided into batch data and stream data generation. Stream data can specify the frequency of generation. The data is then converted to a user-specified format for output to a different data source.
  • Data source adapter. Adapt to different data sources and import the data into the data source.

4. Installation

Method 1, install from source code:

Download the source code, unzip and install:

python setup.py install

Method 2, use pip:

pip install datafaker

Upgrade tool

pip install datafaker --upgrade

Uninstall tool

pip uninstall datafaker

Install require package

data source package note
mysql/tidb mysql-python/mysqlclient windows+python3 use mysqlclient
oracle cx-Oracle need some oracle lib
postgresql/redshift psycopg2
sqlserver pyodbc mssql+pyodbc://mssql-v
Hbase happybase,thrift
es elasticsearch
hive pyhive
kafka kafka-python

5. examples

usage example(使用举例)

6. command parameters

parameters detail(命令行参数)

7. construction rule

construction rule(构造规则)

8. note

note(注意事项)

9. Release note

Release note(发布记录)


Give a star or donate a coffee to the author

  • 给作者点个star或请作者喝杯咖啡

pay

datafaker's People

Contributors

gangly avatar moody1117 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datafaker's Issues

fakedate.py中fake_data_between有问题,end_data处理时没有加上+,后续调用fake_data时会报错

def fake_date_between(self, start_date=None, end_date=None, format='%Y-%m-%d'):
    # 去掉时分秒,不然后续计算天差值会出错
    today = datetime.datetime.strftime(datetime.datetime.today(), "%Y-%m-%d")
    today = datetime.datetime.strptime(today, '%Y-%m-%d')

    if start_date is None:
        start_diff = 'today'
    else:
        start_date = datetime.datetime.strptime(start_date, '%Y-%m-%d')
        diff = (start_date - today).days
        start_diff = '%dd' % diff if diff != 0 else 'today'

    if end_date is None:
        end_diff = today
    else:
        end_date = datetime.datetime.strptime(end_date, '%Y-%m-%d')
        diff = (end_date - today).days
        end_diff = '%sd' % diff if diff != 0 else 'today'

mysql 的 datetime 插入数据显示语法错误,

sqlalchemy.exc.ProgrammingError: (MySQLdb._exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '11:28:15,0),(2,'dingjie',56.58,'玉华市',40383614455,'姜雷',2019-12-11 11:28' at line 1")

日期没有添加引号好像是这个问题

使用python3.7运行报错:No module named 'MySQLdb'

Python版本:3.7.6
meta文件内容:
person_no||int||auto increament person_no[:inc(person_no,20201000)]
school_code||varchar(20)||school_code[:enum(12440104455354382L)]
name||varchar(20)||name
sex||varchar(5)||sex[:enum(男,女)]
person_role||int||person_role[:enum(1,2)]
grade_no||int||grade_no size[:enum(1,2,3,4,5,6)]
class_no||int||class_no[:enum(1,2,3,4,5,6,7,8,9)]
face_id||varchar(20)||face_id [:inc(700044010402000027202005080000000002,1)]
脚本命令: datafaker mysql mysql+mysqldb://root:root@localhost:3306/medical_door tb_person 10 --outprint --meta person_meta.txt --outspliter _

不知道是什么错误?生成的SQL有语法问题

UUID||VARCHAR(32)||自增id[:inc(id,1)]
CREATE_USER||VARCHAR(32)||[:enum(file://account_uuid.txt)]
CREATE_TIME||VARCHAR(19)||[:enum(2020-04-05)]
CREATE_ORG||VARCHAR(32)||[:enum(组织1, 组织2)]
CREATE_DEP||VARCHAR(32)||[:enum(部门1, 部门2)]
CHANGE_USER||VARCHAR(32)||[:enum(file://account_uuid.txt)]
ENABLED||CHAR(1)||[:enum(Y)]
REMOVED||CHAR(1)||[:enum(N)]
PRIORITY||DECIMAL(10,0)||顺序号[:decimal(4,2,1)]
REMARK||VARCHAR(32)||
BUSINESS_STATUS||VARCHAR(8)||业务状态[:enum(状态1, 状态2, 状态3, 状态4)]
ID_BUSINESS||VARCHAR(32)||所属业务活动[:enum(春查, 秋查, 安评)]
ID_ITEM||VARCHAR(32)||所属指标(针对安评)
ID_WORK_TYPE||VARCHAR(32)||问题类型(即业务类型)
FIND_DATE||VARCHAR(19)||发现日期
ID_FINDER||VARCHAR(32)||发现人的ID[:enum(file://account_uuid.txt)]
ID_DEPT_FINDER||VARCHAR(32)||发现人所属部门
PHE_CONTENT||VARCHAR(500)||现象描述
ID_BUSINESS_OBJ||VARCHAR(32)||具体业务对象id
ID_BUSINESS_OBJTYPE||VARCHAR(32)||业务对象类型
ID_DEPT_RES||VARCHAR(32)||处理负责部门
ID_PERSON_RES||VARCHAR(32)||处理负责人(审核人)
ID_DEAL||VARCHAR(32)||处理单id
HAZARD_ANALYSIS||VARCHAR(500)||危害分析
PROBLEM_CODE||VARCHAR(32)||编码
ID_WORK_ORDER||VARCHAR(32)||工单ID
ASK_DATE||VARCHAR(19)||要求处理完毕日期[:enum(2020-04-05, 2020-08-08)]

没有为扩展名 .py 找到文件关联
Exception in thread Thread-2:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
cursor, statement, parameters, context
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_execute
cursor.execute(statement, parameters)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 206, in execute
res = self._query(query)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 319, in _query
db.query(q)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\connections.py", line 259, in query
_mysql.connection.query(self, query)
MySQLdb._exceptions.OperationalError: (1054, "Unknown column 'cf12c42e3c844c84b9d4900628893509' in 'field list'")

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\threading.py", line 926, in _bootstrap_inner
self.run()
File "C:\ProgramData\Anaconda3\lib\threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\datafaker-0.7.4-py3.7.egg\datafaker\dbs\basedb.py", line 122, in save
self.save_data(lines)
File "C:\ProgramData\Anaconda3\lib\site-packages\datafaker-0.7.4-py3.7.egg\datafaker\dbs\rdbdb.py", line 26, in save_data
self.save_other_rdb(lines, names_format, column_names)
File "C:\ProgramData\Anaconda3\lib\site-packages\datafaker-0.7.4-py3.7.egg\datafaker\dbs\rdbdb.py", line 42, in save_other_rdb
self.session.execute(sql)
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\orm\session.py", line 1278, in execute
clause, params or {}
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 984, in execute
return meth(self, multiparams, params)
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\sql\elements.py", line 293, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1103, in _execute_clauseelement
distilled_params,
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1288, in execute_context
e, statement, parameters, cursor, context
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1482, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from
=e
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\util\compat.py", line 178, in raise

raise exception
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\base.py", line 1248, in _execute_context
cursor, statement, parameters, context
File "C:\ProgramData\Anaconda3\lib\site-packages\sqlalchemy\engine\default.py", line 590, in do_execute
cursor.execute(statement, parameters)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 206, in execute
res = self._query(query)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 319, in _query
db.query(q)
File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\connections.py", line 259, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1054, "Unknown column 'cf12c42e3c844c84b9d4900628893509' in 'field list'")
[SQL: insert into wo_comm_problem (UUID,CREATE_USER,CREATE_TIME,CREATE_ORG,CREATE_DEP,CHANGE_USER,ENABLED,REMOVED,PRIORITY,REMARK,BUSINESS_STATUS,ID_BUSINESS,ID_ITEM,ID_WORK_TYPE,FIND_DATE,ID_FINDER,ID_DEPT_FINDER,PHE_CONTENT,ID_BUSINESS_OBJ,ID_BUSINESS_OBJTYPE,ID_DEPT_RES,ID_PERSON_RES,ID_DEAL,HAZARD_ANALYSIS,PROBLEM_CODE,ID_WORK_ORDER,ASK_DATE) values (1,cf12c42e3c844c84b9d4900628893509,2020-04-05,组织2,部门1,3ddf1219a6c44802bcb073ebd941d8ee,Y,N,49.28,None,状态4,安评,None,None,None,cf12c42e3c844c84b9d4900628893509,None,None,None,None,None,None,None,None,None,None,2020-08-08),(2,0ccfebc5962d46dd916257c0a33201ce,2020-04-05,组织1,部门2,39d9aa650f864e7d9c7e49abc1f521be,Y,N,41.81,None,状态1,安评,None,None,None,0ccfebc5962d46dd916257c0a33201ce,None,None,None,None,None,None,None,None,None,None,2020-08-08),(3,cf4f0c886c4e4fbc955a652487b0ae0e,2020-04-05,组织1,部门2,0ccfebc5962d46dd916257c0a33201ce,Y,N,17.12,None,状态4,秋查,None,None,None,cf12c42e3c844c84b9d4900628893509,None,None,None,None,None,None,None,None,None,None,2020-08-08),(4,0ccfebc5962d46dd916257c0a33201ce,2020-04-05,组织1,部门1,0a5f6438cd9f47ed8d8fa26cfe931672,Y,N,27.3,None,状态1,春查,None,None,None,0ccfebc5962d46dd916257c0a33201ce,None,None,None,None,None,None,None,None,None,None,2020-08-08),(5,cf4f0c886c4e4fbc955a652487b0ae0e,2020-04-05,组织2,部门2,354dac55ffc342c8a21665569754a928,Y,N,55.91,None,状态2,春查,None,None,None,cf4f0c886c4e4fbc955a652487b0ae0e,None,None,None,None,None,None,None,None,None,None,2020-04-05),(6,6cb7d9cfced04873b0827af619e9510b,2020-04-05,组织2,部门1,cf12c42e3c844c84b9d4900628893509,Y,N,75.18,None,状态1,秋查,None,None,None,39d9aa650f864e7d9c7e49abc1f521be,None,None,None,None,None,None,None,None,None,None,2020-04-05),(7,be047b8c3d6048c7820f8ee69ca1002e,2020-04-05,组织1,部门2,be047b8c3d6048c7820f8ee69ca1002e,Y,N,35.61,None,状态4,秋查,None,None,None,ae1eb1fb17e54334b2cb71b0d74ea702,None,None,None,None,None,None,None,None,None,None,2020-08-08),(8,be047b8c3d6048c7820f8ee69ca1002e,2020-04-05,组织1,部门2,6cb7d9cfced04873b0827af619e9510b,Y,N,50.52,None,状态1,安评,None,None,None,cf4f0c886c4e4fbc955a652487b0ae0e,None,None,None,None,None,None,None,None,None,None,2020-04-05),(9,354dac55ffc342c8a21665569754a928,2020-04-05,组织2,部门1,39d9aa650f864e7d9c7e49abc1f521be,Y,N,22.86,None,状态1,秋查,None,None,None,39d9aa650f864e7d9c7e49abc1f521be,None,None,None,None,None,None,None,None,None,None,2020-08-08),(10,cf4f0c886c4e4fbc955a652487b0ae0e,2020-04-05,组织2,部门2,39d9aa650f864e7d9c7e49abc1f521be,Y,N,78.3,None,状态3,秋查,None,None,None,ae1eb1fb17e54334b2cb71b0d74ea702,None,None,None,None,None,None,None,None,None,None,2020-04-05)]

window bug?

`
C:\Users\leile\Desktop>datafaker mysql mysql+mysqldb://root:Hd@123456@localhost:3306/test stu 10 --meta meta.txt
Traceback (most recent call last):
File "build\bdist.win-amd64\egg\datafaker\cli.py", line 89, in main
db = load_db_class(args.dbtype)(args)
File "build\bdist.win-amd64\egg\datafaker\cli.py", line 79, in load_db_class
module = import(pkgname, fromlist=(classname))
File "build\bdist.win-amd64\egg\datafaker\dbs\mysqldb.py", line 3, in
File "build\bdist.win-amd64\egg\datafaker\dbs\basedb.py", line 7, in
File "build\bdist.win-amd64\egg\datafaker\compat.py", line 41, in
File "build\bdist.win-amd64\egg\datafaker\multithreading.py", line 6, in
ImportError: No module named queue

No module named queue`

按照举例来操作生成数据报错啊

File "D:\Python\Python36\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd7 in position 9: invalid continuation byte

'utf-8' codec can't decode byte 0xd7 in position 9: invalid continuation byte
Exception ignored in: <bound method RdbDB.del of <datafaker.dbs.rdbdb.RdbDB object at 0x0000018E8E5F0518>>
Traceback (most recent call last):
File "D:\Python\Python36\lib\site-packages\datafaker\dbs\rdbdb.py", line 14, in del
self.session.close()
AttributeError: 'RdbDB' object has no attribute 'session'

在开启kerberos+权限的集群写入数据,上报错误“missing authentication credentials”

往开启kerberos和权限的集群中写入数据,报错
sh-4.2$ datafaker es 101.12.67.77:9200 mytest01/_doc 10 --meta meta.txt
Process Process-4:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/site-packages/datafaker/dbs/basedb.py", line 122, in save
self.save_data(lines)
File "/usr/lib/python2.7/site-packages/datafaker/dbs/esdb.py", line 38, in save_data
success, _ = bulk(self.es, actions, index=self.args.table, raise_on_error=True)
File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/actions.py", line 310, in bulk
for ok, item in streaming_bulk(client, actions, *args, **kwargs):
File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/actions.py", line 240, in streaming_bulk
**kwargs
File "/usr/lib/python2.7/site-packages/elasticsearch/helpers/actions.py", line 126, in _process_bulk_chunk
raise e
AuthenticationException: AuthenticationException(401, u'security_exception', u'missi
ng authentication credentials for REST request [/mytest01%2F_doc/_bulk]')
time used: 0.301 s
数据规则:
sh-4.2$ cat meta.txt
id||int||自增id[:inc(id,1)]
name||varchar(20)||学生名字
sh-4.2$
索引:
curl -XPUT --negotiate -u : 'http://101.12.67.77:9200/mytest01/?pretty=true' -H 'Content-Type:application/json' -d '{"mappings":{"properties":{"id":{"type":"long"},"name":{"type":"text"}}}}'

加一个数字格式化功能

数字格式化功能

%d,3
1=001,10=010, 233=233

类似这种,往前自动补0,但不会超过3位。
或者是

ABC-%d
ABC-1,ABC-2,ABC-3

ABC-%d,3
ABC-001,ABC-099,ABC-233

类似于 .format("ABC-{0}",[:inc(id)]),这种操作。

随机完的数据,再有一个格式化的功能,往前补0或者固定前缀,后缀,截取。

数据库密码带有@的怎么设置

如题,数据库密码带有@的要怎么设置。比如:
datafaker rdb mysql+mysqldb://user:pass@2020@localhost:3306/mydb?charset=utf8 stu 10 --meta meta.txt --batch 10

你好,执行文件写入到文件时报错

RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.

    This probably means that you are not using fork to start your
    child processes and you have forgotten to use the proper idiom
    in the main module:

        if __name__ == '__main__':
            freeze_support()
            ...

    The "freeze_support()" line can be omitted if the program
    is not going to be frozen to produce an executable.

time used: 0.110 s

造kafka数据报错:TypeError: fake_column() takes exactly 2 arguments (1 given)

[root@node03 bin]# datafaker kafka node01:6667 test4 1 --meta meta.txt --outprint
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/datafaker/cli.py", line 78, in main
db.do_fake()
File "/usr/local/lib/python2.7/site-packages/datafaker/utils.py", line 72, in wrapper
ret = func(*args, **kwargs)
File "/usr/local/lib/python2.7/site-packages/datafaker/dbs/kafkadb.py", line 26, in do_fake
lines = self.fake_column()
TypeError: fake_column() takes exactly 2 arguments (1 given)

fake_column() takes exactly 2 arguments (1 given)

ES中的对象中嵌入对象,可以做到吗?谢谢

    "uid" : "2",
      "kdmc" : "考点名称",
      "kdjc" : "考点简称",
      "kdbsm" : "考点标识码",
      "sfbzhkd" : true,
      "kdjcsj" : "2018-02-01",
      "csxx" : {
        "kwbgsdh" : "考务办公室电话",
        "sjbgsdh" : "试卷保管(保密)室电话",
        "spjksdh" : "视频监考室电话",
        "sjbgsdhsxjsl" : 1,
        "sjffssxjsl" : 1,
        "kwbgssxjsl" : 1,
        "spjkssxjsl" : 1,
        "yybfssxjsl" : 1,
        "sjlzhtdsxjsl" : 1
      },

csxx这个字段可以写吗?感谢您的开源

同样的命令,偶尔出现入库失败

如题,偶尔就出现下面的错误:
F:\Python\test_data>datafaker rdb mysql+mysqldb://root:@localhost:3306/datafaker?charset=utf8 stu 10 --meta meta.txt --batch 1 --workers 2
insert 1 records
insert 2 records
insert 3 records
insert 4 records
insert 5 records
insert 6 records
insert 7 records
insert 8 records
insert 9 records
insert 10 records
Exception in thread Thread-2:
Traceback (most recent call last):
File "F:\Python\Python-2.7.16\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "F:\Python\Python-2.7.16\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "build\bdist.win-amd64\egg\datafaker\dbs\basedb.py", line 122, in save
self.save_data(lines)
File "build\bdist.win-amd64\egg\datafaker\dbs\rdbdb.py", line 26, in save_data
self.save_other_rdb(lines, names_format, column_names)
File "build\bdist.win-amd64\egg\datafaker\dbs\rdbdb.py", line 42, in save_other_rdb
self.session.execute(sql)
File "build\bdist.win-amd64\egg\sqlalchemy\orm\session.py", line 1269, in execute
clause, params or {}
File "build\bdist.win-amd64\egg\sqlalchemy\engine\base.py", line 988, in execute
return meth(self, multiparams, params)
File "build\bdist.win-amd64\egg\sqlalchemy\sql\elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "build\bdist.win-amd64\egg\sqlalchemy\engine\base.py", line 1107, in _execute_clauseelement
distilled_params,
File "build\bdist.win-amd64\egg\sqlalchemy\engine\base.py", line 1253, in _execute_context
e, statement, parameters, cursor, context
File "build\bdist.win-amd64\egg\sqlalchemy\engine\base.py", line 1473, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "build\bdist.win-amd64\egg\sqlalchemy\util\compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "build\bdist.win-amd64\egg\sqlalchemy\engine\base.py", line 1249, in _execute_context
cursor, statement, parameters, context
File "build\bdist.win-amd64\egg\sqlalchemy\engine\default.py", line 552, in do_execute
cursor.execute(statement, parameters)
File "F:\Python\Python-2.7.16\lib\site-packages\MySQLdb\cursors.py", line 174, in execute
self.errorhandler(self, exc, value)
File "F:\Python\Python-2.7.16\lib\site-packages\MySQLdb\connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
ProgrammingError: (_mysql_exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL s
erver version for the right syntax to use near '' at line 1")
[SQL: insert into stu (name,school,nickname,age,class_num,score,phone,email,ip,address) values ]
(Background on this error at: http://sqlalche.me/e/f405)

time used: 4.324 s

ValueError: empty range for randrange() (11,1, -10)

(venv366-64bit-mysql) D:\012_python3\datafaker-master>datafaker mysql mysql+mysqldb://api_test:[email protected]:3306/mailserver virtual_users 2
Exception in thread Thread-1:
Traceback (most recent call last):
File "c:\python366-64bit\Lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "c:\python366-64bit\Lib\threading.py", line 864, in run
self._target(*self._args, **self.kwargs)
File "D:\evn\venv366-64bit-mysql\lib\site-packages\datafaker\dbs\basedb.py", line 42, in fake_data
columns = self.fake_column()
File "D:\evn\venv366-64bit-mysql\lib\site-packages\datafaker\dbs\basedb.py", line 53, in fake_column
columns.append(self.fakedata.do_fake(item['cmd'], item['args']))
File "D:\evn\venv366-64bit-mysql\lib\site-packages\datafaker\fakedata.py", line 215, in do_fake
return method(*args)
File "D:\evn\venv366-64bit-mysql\lib\site-packages\datafaker\fakedata.py", line 40, in fake_int
return self.faker.random_int(min, max)
File "D:\evn\venv366-64bit-mysql\lib\site-packages\faker\providers_init
.py", line 106, in random_int
return self.generator.random.randrange(min, max + 1, step)
File "D:\evn\venv366-64bit-mysql\lib\random.py", line 199, in randrange
raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width))
ValueError: empty range for randrange() (11,1, -10)

打印输出没问题,但是录入数据库报错了,macos/datafaker新版

[root@localhost ~]# datafaker mysql mysql+mysqldb://root:123456@localhost:3306/ant_3.0_qm t_account_detail 10 --meta tad.txt
Process Process-4:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python2.7/site-packages/datafaker/dbs/basedb.py", line 122, in save
self.save_data(lines)
File "/usr/lib/python2.7/site-packages/datafaker/dbs/rdbdb.py", line 26, in save_data
self.save_other_rdb(lines, names_format, column_names)
File "/usr/lib/python2.7/site-packages/datafaker/dbs/rdbdb.py", line 42, in save_other_rdb
self.session.execute(sql)
File "/usr/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1269, in execute
clause, params or {}
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 988, in execute
return meth(self, multiparams, params)
File "/usr/lib/python2.7/site-packages/sqlalchemy/sql/elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1107, in _execute_clauseelement
distilled_params,
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1253, in _execute_context
e, statement, parameters, cursor, context
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1473, in _handle_dbapi_exception
util.raise_from_cause(sqlalchemy_exception, exc_info)
File "/usr/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 398, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1249, in _execute_context
cursor, statement, parameters, context
File "/usr/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 552, in do_execute
cursor.execute(statement, parameters)
File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 163, in execute
result = self._query(query)
File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 321, in _query
conn.query(q)
File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 505, in query
self._affected_rows = self._read_query_result(unbuffered=unbuffered)
File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 724, in _read_query_result
result.read()
File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1069, in read
first_packet = self.connection._read_packet()
File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 676, in _read_packet
packet.raise_for_error()
File "/usr/lib/python2.7/site-packages/pymysql/protocol.py", line 223, in raise_for_error
err.raise_mysql_exception(self._data)
File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception
raise errorclass(errno, errval)
ProgrammingError: (pymysql.err.ProgrammingError) (1064, u"You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '19:53:46,1384292845,0,569193),(15,18,2,'\u5546\u54c1\u8d2d\u4e70','65628319757403516',655645' at line 1")
[SQL: insert into t_account_detail (account_detail_id,shop_id,detail_type,description,biz_no,detail_amount,created_at,deleted_at,flag,avialable_amount) values (14,17,1,'\u5546\u54c1\u51fa\u552e','34250527876847164',691106,2020-08-27 19:53:46,1384292845,0,569193),(15,18,2,'\u5546\u54c1\u8d2d\u4e70','65628319757403516',655645,2020-08-27 19:53:46,350582394,0,110706),(16,19,1,'\u5546\u54c1\u51fa\u552e','19636796119968741',808004,2020-08-27 19:53:46,1256861491,0,466446),(17,20,2,'\u5546\u54c1\u8d2d\u4e70','46234490332455651',659574,2020-08-27 19:53:46,486638023,0,178019),(18,21,1,'\u5546\u54c1\u51fa\u552e','49266513315780232',743029,2020-08-27 19:53:46,717553851,0,246508),(19,22,2,'\u5546\u54c1\u8d2d\u4e70','90561178200881606',197382,2020-08-27 19:53:46,809470599,0,821661),(20,23,1,'\u5546\u54c1\u51fa\u552e','47997281259358174',232968,2020-08-27 19:53:46,1356754672,0,883010),(21,24,2,'\u5546\u54c1\u8d2d\u4e70','77079434011691921',162385,2020-08-27 19:53:46,1260205987,0,787554),(22,25,1,'\u5546\u54c1\u51fa\u552e','32080318843531665',429457,2020-08-27 19:53:46,42826607,0,557156),(23,26,2,'\u5546\u54c1\u8d2d\u4e70','15619744594723433',865789,2020-08-27 19:53:46,334998537,0,119933)]
(Background on this error at: http://sqlalche.me/e/f405)

地址生成的有些乱

行政区错乱 , 比如 , 海南省银川市梁平广州路z座 ( 银川市不是在海南的 )

windows执行datafaker报错

python版本:3.6.8
datafaker版本:0.6.4
在执行【使用举例.md】中向mysql造数时候,报如下错误:

D:\datafaker>datafaker rdb mysqlclient://root:root@localhost:3600/test?charset=utf8 stu 10 --outprint --meta meta.txt --outspliter ',,'
Traceback (most recent call last):
File "D:\python\lib\site-packages\datafaker\cli.py", line 77, in main
db = load_db_class(args.dbtype)(args)
File "D:\python\lib\site-packages\datafaker\dbs\basedb.py", line 18, in init
self.schema = self.parse_schema()
File "D:\python\lib\site-packages\datafaker\dbs\basedb.py", line 127, in parse_schema
schema = self.parse_meta_schema()
File "D:\python\lib\site-packages\datafaker\dbs\basedb.py", line 137, in parse_meta_schema
rows = self.construct_meta_rows()
File "D:\python\lib\site-packages\datafaker\dbs\basedb.py", line 201, in construct_meta_rows
lines = read_file_lines(filepath)
File "D:\python\lib\site-packages\datafaker\utils.py", line 84, in read_file_lines
lines = fp.read().splitlines()
File "D:\python\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd7 in position 9: invalid continuation byte

'utf-8' codec can't decode byte 0xd7 in position 9: invalid continuation byte
Exception ignored in: <bound method RdbDB.del of <datafaker.dbs.rdbdb.RdbDB object at 0x000002171485B6D8>>
Traceback (most recent call last):
File "D:\python\lib\site-packages\datafaker\dbs\rdbdb.py", line 14, in del
self.session.close()
AttributeError: 'RdbDB' object has no attribute 'session'
@gangly 麻烦您看一下是什么问题?

你好,文件写入文件时报错 mac os 10.15 / python 3.8

% datafaker file . out.txt 10 --meta mg_mock_meta.txt
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/datafaker-0.7.4-py3.8.egg/datafaker/cli.py", line 89, in main
db = load_db_class(args.dbtype)(args)
File "/usr/local/lib/python3.8/site-packages/datafaker-0.7.4-py3.8.egg/datafaker/cli.py", line 79, in load_db_class
module = import(pkgname, fromlist=(classname))
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 618, in _load_backward_compatible
File "", line 259, in load_module
File "/usr/local/lib/python3.8/site-packages/datafaker-0.7.4-py3.8.egg/datafaker/dbs/filedb.py", line 5, in
from datafaker.compat import safe_encode
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 655, in _load_unlocked
File "", line 618, in _load_backward_compatible
File "", line 259, in load_module
File "/usr/local/lib/python3.8/site-packages/datafaker-0.7.4-py3.8.egg/datafaker/compat.py", line 46, in
List = multiprocessing.Manager().list
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 57, in Manager
m.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 579, in start
self._process.start()
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
prep_data = spawn.get_preparation_data(process_obj._name)
File "/usr/local/Cellar/[email protected]/3.8.5/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 183, in get_preparation_data
main_mod_name = getattr(main_module.spec, "name", None)
AttributeError: module 'main' has no attribute 'spec'

module 'main' has no attribute 'spec'

没有找到文档上写的 --outfile选项,无法输出到文件

版本 0.7.1
help信息
optional arguments:
-h, --help show this help message and exit
--auth [AUTH] user and password
--meta [META] meta file path
--interval INTERVAL the interval to make stream data
--batch BATCH the interval to make stream data
--workers WORKERS the interval to make stream data
--version print the version number and exit
--outprint print fake date to screen
--outspliter OUTSPLITER
print data, to split columns
--locale LOCALE locale language
--format FORMAT outprint and outfile format: json, text (default:
text)
--withheader print data or write data to file with column header

运行脚本时报错

C:\Users\yzft1>datafaker mysql mysql+mysqldb://root:[email protected]:3307/population prop_knowledge_patent 20 --meta prop_knowledge_patent2.txt
Traceback (most recent call last):
File "C:\Users\yzft1\AppData\Local\Programs\Python\Python36\lib\site-packages\datafaker\cli.py", line 89, in main
db = load_db_class(args.dbtype)(args)
File "C:\Users\yzft1\AppData\Local\Programs\Python\Python36\lib\site-packages\datafaker\dbs\basedb.py", line 28, in init
self.queue = compat.Queue(maxsize=MAX_QUEUE_SIZE)
AttributeError: module 'datafaker.compat' has no attribute 'Queue'

module 'datafaker.compat' has no attribute 'Queue'

Python 3.7.3生成数据格式和预期不符

1. 系统环境

操作系统: Mac
Python 3.7.3版本
faker 4.0.2版本

2. 问题描述

class FackData(object):
    def __init__(self, locale):
         self.faker = Faker(locale)
         self.faker_funcs = dir(self.faker)

FackData初始化的时候, dir(Faker(locale))无法找到模拟数据方法,测试打印返回如下:

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_factories', '_factory_map', '_locales', '_map_provider_method', '_select_factory', '_weights', 'cache_pattern', 'factories', 'generator_attrs', 'items', 'locales', 'random', 'seed', 'seed_instance', 'seed_locale', 'weights']
  if keyword in self.faker_funcs:

无法找到meta定义的方法,所以模拟数据返回None,不符合预期

3. 解决方法

修改如下代码:

self.faker = Factory().create(locale)

dir(Factory().create(locale))方法,测试打印返回如下:

['_Generator__config', '_Generator__format_token', '_Generator__random', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'add_provider', 'address', 'am_pm', 'android_platform_token', 'ascii_company_email', 'ascii_email', 'ascii_free_email', 'ascii_safe_email', 'bank_country', 'bban', 'binary', 'boolean', 'bothify', 'bs', 'building_number', 'catch_phrase', 'century', 'chrome', 'city', 'city_name', 'city_suffix', 'color', 'color_name', 'company', 'company_email', 'company_prefix', 'company_suffix', 'coordinate', 'country', 'country_calling_code', 'country_code', 'credit_card_expire', 'credit_card_full', 'credit_card_number', 'credit_card_provider', 'credit_card_security_code', 'cryptocurrency', 'cryptocurrency_code', 'cryptocurrency_name', 'csv', 'currency', 'currency_code', 'currency_name', 'currency_symbol', 'date', 'date_between', 'date_between_dates', 'date_object', 'date_of_birth', 'date_this_century', 'date_this_decade', 'date_this_month', 'date_this_year', 'date_time', 'date_time_ad', 'date_time_between', 'date_time_between_dates', 'date_time_this_century', 'date_time_this_decade', 'date_time_this_month', 'date_time_this_year', 'day_of_month', 'day_of_week', 'district', 'domain_name', 'domain_word', 'dsv', 'ean', 'ean13', 'ean8', 'email', 'file_extension', 'file_name', 'file_path', 'firefox', 'first_name', 'first_name_female', 'first_name_male', 'first_romanized_name', 'format', 'free_email', 'free_email_domain', 'future_date', 'future_datetime', 'get_formatter', 'get_providers', 'hex_color', 'hexify', 'hostname', 'http_method', 'iban', 'image_url', 'internet_explorer', 'ios_platform_token', 'ipv4', 'ipv4_network_class', 'ipv4_private', 'ipv4_public', 'ipv6', 'isbn10', 'isbn13', 'iso8601', 'job', 'language_code', 'last_name', 'last_name_female', 'last_name_male', 'last_romanized_name', 'latitude', 'latlng', 'lexify', 'license_plate', 'linux_platform_token', 'linux_processor', 'local_latlng', 'locale', 'location_on_land', 'longitude', 'mac_address', 'mac_platform_token', 'mac_processor', 'md5', 'mime_type', 'month', 'month_name', 'msisdn', 'name', 'name_female', 'name_male', 'null_boolean', 'numerify', 'opera', 'paragraph', 'paragraphs', 'parse', 'password', 'past_date', 'past_datetime', 'phone_number', 'phonenumber_prefix', 'port_number', 'postcode', 'prefix', 'prefix_female', 'prefix_male', 'profile', 'provider', 'providers', 'province', 'psv', 'pybool', 'pydecimal', 'pydict', 'pyfloat', 'pyint', 'pyiterable', 'pylist', 'pyset', 'pystr', 'pystr_format', 'pystruct', 'pytuple', 'random', 'random_choices', 'random_digit', 'random_digit_not_null', 'random_digit_not_null_or_empty', 'random_digit_or_empty', 'random_element', 'random_elements', 'random_int', 'random_letter', 'random_letters', 'random_lowercase_letter', 'random_number', 'random_sample', 'random_uppercase_letter', 'randomize_nb_elements', 'rgb_color', 'rgb_css_color', 'romanized_name', 'safari', 'safe_color_name', 'safe_email', 'safe_hex_color', 'seed', 'seed_instance', 'sentence', 'sentences', 'set_formatter', 'sha1', 'sha256', 'simple_profile', 'slug', 'ssn', 'street_address', 'street_name', 'street_suffix', 'suffix', 'suffix_female', 'suffix_male', 'tar', 'text', 'texts', 'time', 'time_delta', 'time_object', 'time_series', 'timezone', 'tld', 'tsv', 'unix_device', 'unix_partition', 'unix_time', 'upc_a', 'upc_e', 'uri', 'uri_extension', 'uri_page', 'uri_path', 'url', 'user_agent', 'user_name', 'uuid4', 'windows_platform_token', 'word', 'words', 'year', 'zip']

oracle造数据显示编码问题

按照demo例子 所有编码都是utf8,
datafaker rdb oracle://:@ip:port/sid stu 10 --meta meta2.txt
导入报错
UnicodeEncodeError: 'ascii' codec can't encode characters in position 115-118: ordinal not in range(128)

datafaker rdb oracle+cx_Oracle://:@ip:port/sid stu 10 --meta meta2.txt
导入报错
NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:oracle.cx_Oracle

Can't load plugin: sqlalchemy.dialects:oracle.cx_Oracle
Exception AttributeError: "'RdbDB' object has no attribute 'session'" in <bound method RdbDB.del of <datafaker.dbs.rdbdb.RdbDB object at 0x7f03636a9310>> ignored

关于enum类型的需求

enum类型能否增加支持从数据库中取数据;例如
[:enum(mysql://root:[email protected]:3306/test:{table_name:column_name})]
[:enum(mysql://root:[email protected]:3306/test:{table_name:column_name1+'|'+column_name2})]
[:enum(mysql://root:[email protected]:3306/test:table_name:column_name1+'|'+column_name2)]
希望允许多表关联查询数据;例如
[:enum(mysql://root:[email protected]:3306/test:{table_name1:[column_name1,same_column],table_name2:[column_name2,same_column]})]
或者
[:enum(mysql://root:[email protected]:3306/test:{table_name1:[column_name1,same_column],table_name2:[column_name2,same_column]})]
甚至可以加上过滤条件

其实复杂逻辑确实可以通过sql 输出到 文件,但是如果程序原生支持就更棒了
简单的数据库数据希望可以直接由程序实现

生成中文数据导入MYSQL数据库失败

运行系统:WIN10,python3.6.6,mysqlclient

meta.txt
id||int||[:inc(id,1)]
name||varchar(20)||[:name]
age||int||[:age]

cmd命令:
datafaker mysql mysql+mysqldb://root:root@localhost:3306/sqoop student 10 --meta D:\meta.txt

错误:
Exception in thread Thread-2:
Traceback (most recent call last):
File "D:\python\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "D:\python\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "D:\python\lib\site-packages\datafaker\dbs\basedb.py", line 122, in save
self.save_data(lines)
File "D:\python\lib\site-packages\datafaker\dbs\rdbdb.py", line 26, in save_data
self.save_other_rdb(lines, names_format, column_names)
File "D:\python\lib\site-packages\datafaker\dbs\rdbdb.py", line 42, in save_other_rdb
self.session.execute(sql)
File "D:\python\lib\site-packages\sqlalchemy\orm\session.py", line 1269, in execute
clause, params or {}
File "D:\python\lib\site-packages\sqlalchemy\engine\base.py", line 988, in execute
return meth(self, multiparams, params)
File "D:\python\lib\site-packages\sqlalchemy\sql\elements.py", line 287, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "D:\python\lib\site-packages\sqlalchemy\engine\base.py", line 1107, in _execute_clauseelement
distilled_params,
File "D:\python\lib\site-packages\sqlalchemy\engine\base.py", line 1253, in _execute_context
e, statement, parameters, cursor, context
File "D:\python\lib\site-packages\sqlalchemy\engine\base.py", line 1475, in _handle_dbapi_exception
util.reraise(*exc_info)
File "D:\python\lib\site-packages\sqlalchemy\util\compat.py", line 153, in reraise
raise value
File "D:\python\lib\site-packages\sqlalchemy\engine\base.py", line 1249, in _execute_context
cursor, statement, parameters, context
File "D:\python\lib\site-packages\sqlalchemy\engine\default.py", line 552, in do_execute
cursor.execute(statement, parameters)
File "D:\python\lib\site-packages\MySQLdb\cursors.py", line 191, in execute
query = query.encode(db.encoding)
File "D:\python\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 45-47: character maps to

能否增加从文件中enum时不重复

首先非常感谢大神的工具,非常非常好用,下面提个小小的建议

比如从文件a.txt中进行enum,里面有:
aaa
bbb
ccc
在生成3条 以内的数据时,aaa bbb ccc 分别只出现一次
超过3条重复
或者如果指定了文件中enum且不重复的时候,生成的条数不能超过文件总行数

如何导数据到hive

您好,请问一下
datafaker hive hive://yarn@localhost:10000/test stu 1000 --meta data/hive_meta.txt
这里yarn是指什么呢?我是通过docker拉了一个hive(https://www.huangyunkun.com/2018/06/05/docker-compose-hive/)
我的界面上显示的是
hive2://localhost:10000
是否你的语句中yarn在我这里可以替换成hive2。

另外,是不是需要更改datafaker目录里 init.py ? 该文件我修改成
from datafaker.cli import main
import pymysql
pymysql.install_as_MySQLdb()
import pyhive

目前跑mysql是正常了 。在跑hive的过程中一直遇到如下错误
No module named sasl

谢谢:)

date_this_month 类型打印错误

C:\Users\87293\Desktop\datafaker
λ datafaker file ./ t_date 5 --meta t_date.txt --outprint

sequence item 0: expected str instance, datetime.date found
sequence item 0: expected str instance, datetime.date found
sequence item 0: expected str instance, datetime.date found
sequence item 0: expected str instance, datetime.date found
sequence item 0: expected str instance, datetime.date found
time used: 0.130 s

C:\Users\87293\Desktop\datafaker
λ cat t_date.txt
c_date || date || 当前月份[:date_this_month]
C:\Users\87293\Desktop\datafaker
λ cat t_date
2020-06-03
2020-06-08
2020-06-13
2020-06-06
2020-06-05

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.