Giter Site home page Giter Site logo

circus-train-bigquery's Issues

OOM when replicating big tables

When replicating a Big Query table that contains large exported Avro files the replication process runs out of memory as it attempts to load the entire file into memory to extract the schema.

Partition column gets duplicated while replicating a partitioned GCP-BQ table to AWS-hive.

Hi Team,

I have used the following copier options (code snippet below) in my YAML config.

copier-options: circustrain-bigquery-partition-by: partition_date circustrain-bigquery-partition-filter: partition_date BETWEEN DATE_SUB(current_date(), INTERVAL 2 MONTH) AND CURRENT_DATE()

The replication is generally successful but the resultant table has two issues:

  1. There is an extra regular column with the same name as the partition column.
  2. The data is getting inserted to the extra column and not the partition column, the partitioned column is getting inserted with nulls.

Please let me know if you need any other details or logs to debug this, and I will send the details privately.

Table Export Failures

Circus Train BigQuery and the Google BigQuery API places limitations on the size of file that can be exported without specifying an export scheme to shard the data.

Extend the Circus Train BigQuery plugin to support replication of these large tables.

Error extracting BigQuery table data to Google storage: Table gs://circus-train-bigquery-tmp-dbc416a1-3aa4-404f-8f9c-7d82ecedb870/49b9d045-f0ed-442d-84c7-6c6f42261520.csv too large to be exported to a single file. Specify a uri including a * to shard export. See 'Exporting data into one or more files' in https://cloud.google.com/bigquery/docs/exporting-data., reason=invalid, location=null

Getting bean creating errors when running only with the housekeeping module

Circus Train version: 13.0.0
Circus Train BigQuery version: 5.0.2

When I run circus-train-13.0.0/bin/circus-train.sh --config=/home/hadoop/ctbq5.0.2/test-config.yml --modules=housekeeping I get the following error:

Using /home/hadoop/ctbq5.0.2/circus-train-13.0.0
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/12/04 15:40:28 INFO circustrain.CircusTrain: Maven-ArtifactId=circus-train-core; Built-By=s-jenkinsrbac; Build-DateTime=2018-11-15T17:57:20Z; Build-Jdk=1.7.0_80; Build-Version=13.0.0; Created-By=Apache Maven 3.3.9; Manifest-Version=1.0; Maven-GroupId=com.hotels;
         ee@@@@@@@@@@@@@@@
       e@@@@@@@@@@@@@@@
      @@@"     .-.----.      ___ _                   ___ _                   _
     @@"___   / o )    \    / __\ |__   ___   ___   / __\ |__   ___   ___   / \
    II__[w]   || ´ __  |'  / /  | '_ \ / _ \ / _ \ / /  | '_ \ / _ \ / _ \ /  /
   {======|_ '- |||  |||  / /___| | | | (_) | (_) / /___| | | | (_) | (_) /\_/
  /oO--000'"`--OO----OO-' \____/|_| |_|\___/ \___/\____/|_| |_|\___/ \___/\/18/12/04 15:40:28 INFO util.Version: HV000001: Hibernate Validator 5.2.4.Final

18/12/04 15:40:29 INFO extension.ExtensionInitializer: Adding packageNames '[com.hotels.bdp.circustrain.bigquery]' to component scan.
18/12/04 15:40:29 INFO circustrain.CircusTrain: Starting CircusTrain on ip-10-29-168-200.us-west-2.compute.internal with PID 12875 (/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-all-latest.jar started by hadoop in /home/hadoop/ctbq5.0.2)
18/12/04 15:40:29 INFO circustrain.CircusTrain: The following profiles are active: housekeeping
18/12/04 15:40:29 INFO annotation.AnnotationConfigApplicationContext: Refreshing org.springframework.context.annotation.AnnotationConfigApplicationContext@1eeb5818: startup date [Tue Dec 04 15:40:29 UTC 2018]; root of context hierarchy
18/12/04 15:40:30 WARN annotation.ConfigurationClassEnhancer: @Bean method EnableEncryptablePropertySourcesConfiguration.enableEncryptablePropertySourcesPostProcessor is non-static and returns an object assignable to Spring's BeanFactoryPostProcessor interface. This will result in a failure to process annotations such as @Autowired, @Resource and @PostConstruct within the method's declaring @Configuration class. Add the 'static' modifier to this method to avoid these container lifecycle issues; see @Bean javadoc for complete details.
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Post-processing PropertySource instances
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Converting PropertySource commandLineArgs[org.springframework.core.env.SimpleCommandLinePropertySource] to EncryptableEnumerablePropertySourceWrapper
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Converting PropertySource systemProperties[org.springframework.core.env.MapPropertySource] to EncryptableMapPropertySourceWrapper
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Converting PropertySource systemEnvironment[org.springframework.core.env.SystemEnvironmentPropertySource] to EncryptableMapPropertySourceWrapper
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Converting PropertySource random[org.springframework.boot.context.config.RandomValuePropertySource] to EncryptablePropertySourceWrapper
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Converting PropertySource applicationConfig: [file:/home/hadoop/ctbq5.0.2/test-config.yml][org.springframework.core.env.MapPropertySource] to EncryptableMapPropertySourceWrapper
18/12/04 15:40:30 INFO jasyptspringboot.EnableEncryptablePropertySourcesPostProcessor: Converting PropertySource defaultProperties[org.springframework.core.env.MapPropertySource] to EncryptableMapPropertySourceWrapper
18/12/04 15:40:30 INFO annotation.AutowiredAnnotationBeanPostProcessor: JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration' of type [class org.springframework.transaction.annotation.ProxyTransactionManagementConfiguration$$EnhancerBySpringCGLIB$$f939a424] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'bindS3AFileSystem' of type [class com.hotels.bdp.circustrain.aws.BindS3AFileSystem] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 's3CredentialsUtils' of type [class com.hotels.bdp.circustrain.aws.S3CredentialsUtils] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO housekeeping.HousekeepingConfiguration: Loading default housekeepingEnvironment
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'housekeepingEnvironment' of type [class com.hotels.shaded.com.google.common.collect.RegularImmutableMap] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'housekeepingConfiguration' of type [class com.hotels.housekeeping.HousekeepingConfiguration$$EnhancerBySpringCGLIB$$70e6ddd3] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'stringToDurationConverter' of type [class com.hotels.housekeeping.converter.StringToDurationConverter] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'GCPSecurity' of type [class com.hotels.bdp.circustrain.gcp.context.GCPSecurity$$EnhancerBySpringCGLIB$$ae922d27] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'GCPCredentialPathProvider' of type [class com.hotels.bdp.circustrain.gcp.GCPCredentialPathProvider] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'randomStringFactory' of type [class com.hotels.bdp.circustrain.gcp.RandomStringFactory] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'distributedFileSystemPathProvider' of type [class com.hotels.bdp.circustrain.gcp.DistributedFileSystemPathProvider] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'bindGoogleHadoopFileSystem' of type [class com.hotels.bdp.circustrain.gcp.BindGoogleHadoopFileSystem] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'fileSystemFactory' of type [class com.hotels.bdp.circustrain.gcp.FileSystemFactory] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:30 INFO support.PostProcessorRegistrationDelegate$BeanPostProcessorChecker: Bean 'GCPCredentialCopier' of type [class com.hotels.bdp.circustrain.gcp.GCPCredentialCopier] is not eligible for getting processed by all BeanPostProcessors (for example: not eligible for auto-proxying)
18/12/04 15:40:31 INFO hikari.HikariDataSource: HikariPool-1 - Starting...
18/12/04 15:40:31 INFO hikari.HikariDataSource: HikariPool-1 - Start completed.
18/12/04 15:40:31 INFO init.ScriptUtils: Executing SQL script from class path resource [schema.sql]
18/12/04 15:40:31 INFO init.ScriptUtils: Executed SQL script from class path resource [schema.sql] in 3 ms.
18/12/04 15:40:31 INFO jpa.LocalContainerEntityManagerFactoryBean: Building JPA container EntityManagerFactory for persistence unit 'default'
18/12/04 15:40:31 INFO util.LogHelper: HHH000204: Processing PersistenceUnitInfo [
	name: default
	...]
18/12/04 15:40:31 INFO hibernate.Version: HHH000412: Hibernate Core {4.3.11.Final}
18/12/04 15:40:31 INFO cfg.Environment: HHH000206: hibernate.properties not found
18/12/04 15:40:31 INFO cfg.Environment: HHH000021: Bytecode provider name : javassist
18/12/04 15:40:32 INFO common.Version: HCANN000001: Hibernate Commons Annotations {4.0.5.Final}
18/12/04 15:40:32 INFO dialect.Dialect: HHH000400: Using dialect: org.hibernate.dialect.H2Dialect
18/12/04 15:40:32 INFO ast.ASTQueryTranslatorFactory: HHH000397: Using ASTQueryTranslatorFactory
18/12/04 15:40:33 INFO hbm2ddl.SchemaUpdate: HHH000228: Running hbm2ddl schema update
18/12/04 15:40:33 INFO hbm2ddl.SchemaUpdate: HHH000102: Fetching database metadata
18/12/04 15:40:33 INFO hbm2ddl.SchemaUpdate: HHH000396: Updating schema
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000261: Table found: HOUSEKEEPING.CIRCUS_TRAIN.AUDIT_REVISION
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000037: Columns: [id, timestamp]
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000108: Foreign keys: []
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000126: Indexes: [primary_key_5]
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000261: Table found: HOUSEKEEPING.CIRCUS_TRAIN.LEGACY_REPLICA_PATH
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000037: Columns: [path, event_id, creation_timestamp, path_event_id, id]
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000108: Foreign keys: []
18/12/04 15:40:33 INFO hbm2ddl.TableMetadata: HHH000126: Indexes: [primary_key_1, uk_tlauv1txmi2vux3fqp1y1ueav_index_1]
18/12/04 15:40:33 INFO hbm2ddl.SchemaUpdate: HHH000232: Schema update complete
18/12/04 15:40:33 WARN annotation.AnnotationConfigApplicationContext: Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastoreClientFactory' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/client/BigQueryMetastoreClientFactory.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.hotels.bdp.circustrain.bigquery.util.BigQueryMetastore]: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
18/12/04 15:40:33 INFO jpa.LocalContainerEntityManagerFactoryBean: Closing JPA EntityManagerFactory for persistence unit 'default'
18/12/04 15:40:33 INFO hikari.HikariDataSource: HikariPool-1 - Shutdown initiated...
18/12/04 15:40:33 INFO pool.HikariPool: HikariPool-1 - Close initiated...
18/12/04 15:40:33 INFO pool.HikariPool: HikariPool-1 - Closed.
18/12/04 15:40:33 INFO hikari.HikariDataSource: HikariPool-1 - Shutdown completed.
18/12/04 15:40:33 ERROR boot.SpringApplication: Application startup failed
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastoreClientFactory' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/client/BigQueryMetastoreClientFactory.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.hotels.bdp.circustrain.bigquery.util.BigQueryMetastore]: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:749)
	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:185)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1147)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1050)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:510)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
	at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:778)
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:839)
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:538)
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:760)
	at org.springframework.boot.SpringApplication.createAndRefreshContext(SpringApplication.java:360)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:306)
	at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:101)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:749)
	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:185)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1147)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1050)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:510)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
	at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1199)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1123)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1021)
	at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:814)
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741)
	... 22 more
Caused by: org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.raiseNoSuchBeanDefinitionException(DefaultListableBeanFactory.java:1380)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1126)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1021)
	at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:814)
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741)
	... 36 more
18/12/04 15:40:33 INFO logging.ClasspathLoggingApplicationListener: Application failed to start with classpath: [file:/tmp/hadoop-unjar1533479091642944733/, file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-all-latest.jar, file:/tmp/hadoop-unjar1533479091642944733/classes]
18/12/04 15:40:33 ERROR circustrain.CircusTrain: Error creating bean with name 'bigQueryMetastoreClientFactory' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/client/BigQueryMetastoreClientFactory.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.hotels.bdp.circustrain.bigquery.util.BigQueryMetastore]: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastoreClientFactory' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/client/BigQueryMetastoreClientFactory.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.hotels.bdp.circustrain.bigquery.util.BigQueryMetastore]: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:749)
	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:185)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1147)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1050)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:510)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
	at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:778)
	at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:839)
	at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:538)
	at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:760)
	at org.springframework.boot.SpringApplication.createAndRefreshContext(SpringApplication.java:360)
	at org.springframework.boot.SpringApplication.run(SpringApplication.java:306)
	at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:101)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:234)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:/home/hadoop/ctbq5.0.2/circus-train-13.0.0/lib/circus-train-bigquery-5.0.3-SNAPSHOT.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:749)
	at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:185)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1147)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1050)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:510)
	at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:482)
	at org.springframework.beans.factory.support.AbstractBeanFactory$1.getObject(AbstractBeanFactory.java:306)
	at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:230)
	at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:302)
	at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:197)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.findAutowireCandidates(DefaultListableBeanFactory.java:1199)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1123)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1021)
	at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:814)
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741)
	... 22 more
Caused by: org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.raiseNoSuchBeanDefinitionException(DefaultListableBeanFactory.java:1380)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1126)
	at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1021)
	at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:814)
	at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:741)
	... 36 more

However, running it without --modules=housekeeping or running it with --modules=replication works just fine.

BigQuery error during data extraction doesn't fail replication

If an error occurs while extracting the data from BigQuery the Circus Train replication doesn't fail, instead it just logs an error and continues to replicate an empty table. It should fail and tidy up after itself instead.

Here is a sample of the log during a run showing an error and CT continuing:

2018-03-07 13:10:31,277 INFO com.hotels.bdp.circustrain.bigquery.extraction.BigQueryDataExtractionService:65 - Creating bucket circus-train-bigquery-tmp-b8de7be4-486b-44d5-8151-c789ed362fa4
2018-03-07 13:10:32,226 INFO com.hotels.bdp.circustrain.bigquery.extraction.BigQueryDataExtractionService:76 - Extracting hotel_ads_data_from_google.ei_ha_click_share_2016 to temporary location gs://circus-train-bigquery-tmp-b8de7be4-486b-44d5-8151-c789ed362fa4/3c66c20e-28e5-4207-95a6-3f87fed6b9b8.csv
2018-03-07 13:10:33,015 ERROR com.hotels.bdp.circustrain.core.event.CompositeTableReplicationListener:43 - Listener 'com.hotels.bdp.circustrain.bigquery.listener.BigQueryReplicationListener@57e6fd09' threw exception on tableReplicationStart.
com.hotels.bdp.circustrain.api.CircusTrainException: Error extracting BigQuery table data to Google storage: Using table foo:bar is not allowed for this operation because of its type. Try using a different table that is of type TABLE., reason=invalid, location=null
at com.hotels.bdp.circustrain.bigquery.extraction.BigQueryDataExtractionService.extractDataFromBigQuery(BigQueryDataExtractionService.java:89)
at com.hotels.bdp.circustrain.bigquery.extraction.BigQueryDataExtractionService.extract(BigQueryDataExtractionService.java:42)
at com.hotels.bdp.circustrain.bigquery.extraction.BigQueryDataExtractionManager.extract(BigQueryDataExtractionManager.java:50)
at com.hotels.bdp.circustrain.bigquery.listener.BigQueryReplicationListener.tableReplicationStart(BigQueryReplicationListener.java:44)
at com.hotels.bdp.circustrain.core.event.CompositeTableReplicationListener.tableReplicationStart(CompositeTableReplicationListener.java:41)
at com.hotels.bdp.circustrain.core.Locomotive.run(Locomotive.java:112)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:791)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:781)
at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:771)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:307)
at com.hotels.bdp.circustrain.CircusTrain.main(CircusTrain.java:87)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
2018-03-07 13:10:33,018 DEBUG com.hotels.bdp.circustrain.core.metastore.ThriftMetaStoreClientFactory:43 - Connecting to '' metastore at 'thrift://bla:9083'
2018-03-07 13:10:33,019 INFO hive.metastore:433 - Trying to connect to metastore with URI thrift://bla:9083
2018-03-07 13:10:33,111 INFO hive.metastore:478 - Opened a connection to metastore, current connections: 1
2018-03-07 13:10:33,206 INFO hive.metastore:530 - Connected to metastore.

You can reproduce this issue by attempting to replicate a VIEW instead of a table.

Integer values from GCP exceed the MAX value of an Integer in Hive

We have several integer values in GCP (costs, ids) which exceed the possible MAX value of the Integer in Hive. Circus Train keeps the Integer type in the Hive schemas of the migrated tables and it ignores that the maximum value for an Integer in Hive is 2,147,483,647 as it stores integers in 4 bytes.
We receive NULL in Hive upon exceeding the MAX. It would be better to use bigint in these cases.

Ignore CSV Headers

CSV headers are skewing Hive table contents. Set replica Hive table to ignore CSV header upon replication.

Missing information when BigQuery extraction job fails

If extracting data from BigQuery throws an error the following appears in the log:

Could not extract BigQuery table data to Google storage
This is missing error message information that the BigQuery api makes available and should be logged too in order to aid debugging.

Wrongly parsed column values by Circus Train

During our GCP-AWS migration tasks we realized that there are separate column values which are not parsed appropriately by Circus Train. For example in one of the tables that we would like to migrate to AWS from GCP we have a similar value like this:

columnB = """{""""abcd"""":""""efgh1234""""","""""abcd"""":""""dcba1234""""}"""

Circus Train will split this value up along the comma. So in the output CSV we will get this separation:

columnA: ""
columnB: "{""abcd"":""efgh1234""
columnC: ""abcd"":""dcba1234""}"
columnD: ""

Landing table properties are:

ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
'field.delim'=',',
'serialization.format'=',',
'skip.header.line.count'='1')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

I've tried to use OpenCSVSerde with ',' as separatorChar and '"' as quoteChar on the landing table on AWS, but it did not solve the issue entirely.
There are some cases when it would be great to have other than e.g. comma as a separator char in the output CSV.

Circus Train fails when running with the BigQuery jar for on non-BigQuery replications

It would be great to be able to run both BigQuery and non-BigQuery replications using Circus Train, without having to change the classpath of Circus Train.

Possible solution: identify if the configuration file specifies a BigQuery table uri for metastore-uris and "activate" the Circus Train BigQuery plugin; otherwise, run normal Circus Train.

How to replicate the issue: run a replication for non-BigQuery table and check that BigQuery beans cannot be created

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'bigQueryMetastore' defined in URL [jar:file:<path>/circus-train-bigquery-4.0.0.jar!/com/hotels/bdp/circustrain/bigquery/util/BigQueryMetastore.class]: Unsatisfied dependency expressed through constructor argument with index 0 of type [com.google.cloud.bigquery.BigQuery]: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}; nested exception is org.springframework.beans.factory.NoSuchBeanDefinitionException: No qualifying bean of type [com.google.cloud.bigquery.BigQuery] found for dependency: expected at least 1 bean which qualifies as autowire candidate for this dependency. Dependency annotations: {}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.