Spark SQLIn the table, there are often many small files (the size is much smaller thanHDFSblock size), in this case,Sparkwill enable aTaskto process these small files, whenSQLexist in operationShufleWhen operating, will greatly increasehashThe number of dynamic buckets will seriously affect the performance.
LoaderIf the job execution fails, the data imported during the running of this job will not be deleted automatically.
must be deleted manually.
existKafkamiddle,ProducerThis can be done by configuring the synchronization parameters (producer.type), to ensure that the data press
existFlumemiddle,sourceWhat is the main function of the function module?
Suppose there is an application that needs to be accessed frequentlyOracleThe user table in the database, in order to improve performance, introduceRedisto cache users
information.
For this scene,RedisWhich of the following is the best data structure choice for ?