11 Oct 2018 Migrating Blobstore between Projects
What is Blobstore? What is a Blob?
Like horse-drawn carriages, video rental stores, and scurvy, Blobstore is a leftover from an earlier time. It is a storage option on Google Cloud Platform (GCP) that stores objects called blobs and associates each blob with a key. It is used with Google App Engine services and allows applications to serve or get files based on an HTTP connection.
Blobstore is now superseded by Google Cloud Storage (GCS) but its usage is still possible with the actual storage in GCS, the same upload behaviour and minimal changes to the app.
In contrast to other modules in GCP, migration of Blobstore from one project to another is not straightforward. In this blog, we will investigate this migration.
As per the documentation,
“The Blobstore API allows your application to serve data objects, called blobs, that are much larger than the size allowed for objects in the Datastore service. Blobs are useful for serving large files, such as video or image files, and for allowing users to upload large data files. Blobs are created by uploading a file through an HTTP request. Typically, your applications will do this by presenting a form with a file upload field to the user. When the form is submitted, the Blobstore creates a blob from the file’s contents and returns an opaque reference to the blob, called a blob key, which you can later use to serve the blob.”
Creating blobs and their blob keys
The process of creating blobs in an App Engine project is as follows:
- Create an upload URL using the Blobstore API.
- Create an upload form. Upon submitting the form, the POST is handled by the API which creates the blob.
The info for each blob shows us the blob key and the storage bucket. Each object is associated with a blob storage record. The objective of Blobstore is to provide a channel to access the uploaded files by the App Engine service.
Scenario: You have an application that uses Blobstore for storage with Google App Engine on Google Cloud Platform, and you need to migrate it to a different GCP project.
To be able to migrate storage to a different project’s bucket, searching the approach online gives us the easiest way to do it – Use the cp command and voila! Migrated. But, you are not quite there, sorry about that!
Why copying buckets would not work
Even though the blobs are stored in Cloud Storage, it is to be managed via the Blobstore. But moving, deleting a blob from storage would mean that it would be deleted from Blobstore as well.
On the other hand, copying the buckets would not infer that the Blobstore references would be migrated as well and we’d notice an empty Blobstore in our new project.
There are different ways to solve this but it all depends on your application, of course.
Re-upload the objects that are to be migrated. Note that in the process of inserting a new blob, the blob keys are auto-generated. These objects/blobs are tightly coupled to the blob keys and updating the key would not be possible.
Obviously, re-uploading would not be an ideal approach to take if there are a large number of objects to be migrated. Each upload of an object would involve the process to create a new upload URL and a POST request to that URL as explained previously. With larger databases, the time taken would be noticeable and larger number of requests could lead to POST floods which may lead it to be considered an attack and consequently be blocked.
If a refactor in the code would not seem time-challenging but possible, another possible option would be to use the Google App Engine Blobstore to Google Cloud Storage Migration Tool to migrate blobs from Blobstore to GCS.
Once all the references to the blobs are in accordance with serving an object from GCS, an easy-peasy copy command should do wonders.
gsutil cp -r gs://<BUCKET_NAME> gs://<NEW_BUCKET_NAME>
Read the file directly from Google Cloud Storage (GCS) if the blob key does not exist in blobstore.
We found that when we copied buckets from different projects, the objects are present in the bucket but the blobs are missing from the Blobstore. So, this solution involves generating the blob key from GCS.
- Given that we know the object id (blob key) for every object that is to be migrated, get the GCS address of each object. For example, using Java App Engine API getGsObjectName() returns the blob name.
BlobKey blobKey = new BlobKey(objId); BlobInfo info = new BlobInfoFactory().loadBlobInfo(blobKey); String gcsName = info.getGsObjectName();
- Store the object mapping, preferably in the database. i.e. Store the blob key of each object that is to be migrated along with its respective object GCS address.
- To be able to get or serve the blob, get the blob key using createGsBlobKey()
BlobstoreService bService = BlobstoreServiceFactory.getBlobstoreService(); blobKey = bService.createGsBlobKey("/gs/" + <BUCKET_NAME> + "/" + gcsName);
- Serve the blob using the blob key
The benefit of this approach is that it requires a minor change to the existing code and there would be no need to re-upload through REST requests. However, this implies that a new field would be added to the table in the database.
- Don’t use superseded solutions unless you have to.
- If you have to and you find yourself migrating projects, option 1 would be to re-upload the objects with new blob keys, option 2 provides a tool to migrate the Blobstore to GCS completely, and then to copy buckets to the new project. Finally, option 3 would be to store the newly migrated GCS name/address of the object and obtain the blob key while serving it.