Displaying the correct thumbnail image and title for Single Page Applications (SPAs)

2023/06/09   HIRANO Satoshi

When sharing a link to a web page on social media platforms such as Twitter or Facebook, the title and thumbnail image of the page are displayed. Unfortunately, if the page is a part of a Single Page Application (SPA), the server provides a fixed index.html instead of the actual page, resulting in a fixed title and image being displayed. Some search engines also cannot index the correct title and content. This article explains how to overcome these issues and display the correct title and image for the page.

dev webpack SPA SEO
フォロー
シェア

The problem


While developing Requestland, we encountered an issue. When a user shared a link to a page of Requestland on social media platforms such as Twitter or Facebook, the same default title and thumbnail image were displayed instead of the page's specific title and thumbnail image.


If Twitter readers couldn't see the page's thumbnail image, they were less likely to click the link.


The root cause of this issue was that Requestland was a Single Page Application (SPA). Although the issue is common,  I was not able to find a good solution on the net and ChatGPT.


In an SPA, when any link is accessed or reloaded, the web server serves the default index.html file that contains <script> tags pointing to bundle files with JavaScript code and CSS. These files are necessary for running the SPA. Since the index.html file has a title and thumbnail image, the user always saw the fixed ones.



For instance, loading a page on a note app at https://my-service.com/app/note/995 returns the following index.html. <title> is the app's title, not the page's title.


<html>
  <head>
    <base href="/app">
    <title>My SPA notes app</title>
    <description>This is a notes app.</description>
    <meta property="og:image" content="https://file-server.image1.png">
    <!-- injected script tags for bundle files -->
    <script src="https://file-server.bundle.a94.js"></script>
    <link href="https://file-server.bundle.vc0.css" rel="stylesheet"></link>
  </head>
  <body>
    ...
  </body>
</html>


Our goal is to display the title and thumbnail image of the actual page, even if it is a part of an SPA.


With this, some search engines may be able to retrieve the correct title, content, and thumbnail image for each page listed in the sitemap.xml file without Server Side Rendering (SSR).


The problem we need to solve is how do we provide the correct title and thumbnail image for each URL within the fixed index.html file.



If a web server serves both the bundle files and generates index.html for each URL, it can easily inject the <script> tags for the bundle files. However, this can cause performance issues because the web server has to process requests and serve static files.




Fig.1 Typical components in production.



Fig.1 shows a typical component relationship in production for better performance, but still having the problem.


There are a file server that serves the bundle files and index.html, and an API server that serves /api/* links for the SPA. The load balancer splits HTTP requests to either the API server or the file server according to routes (URLs). It can be a simple Nginx proxy. The file server returns the index.html file for all URLs except /api/*.



For example, loading or reloading https://my-service.com/app/note/995 should return the title and thumbnail image of note 995 as well as the <script> tags in the HTTP response as follows:


<html>
  <head>
    <base href="/app">
    <title>Note 995 title</title>
    <description>Note 995 description</description>
    <meta property="og:image" content="https://file-server.note995-image.png">
    <!-- injected script tags for bundle files -->
    <script src="https://file-server.bundle.a94.js"></script>
    <link href="https://file-server.bundle.vc0.css" rel="stylesheet"></link>
  </head>
  <body>
    (content here)
  </body>
</html>



A desirable components structure in the production environment is shown in Fig.2. The API server is responsible to generating the HTTP response by injecting the title and thumbnail URL for the page directed with the given route and <script> tags. The HTTP response is shown as "index response" in the figure.




Fig.2 Improved components in production.




Local development environment


First, let's create a typical local development environment with Webpack.


When developing an SPA, we use Webpack dev server to serve our app locally, as shown in Fig.3. All static files, including bundle files, are served by it. The <script> tags are injected by the Webpack bundler according to the filenames of the bundle files.




Fig.3 Typical components in a local development environment.



The webpack.config.js file is a configuration file for Webpack that contains various settings and options for our application. The proxy setting in the webpack.config.js file is used to forward requests from the SPA to the API server located at localhost:8080.


  devServer: {
    historyApiFallback: true,
    proxy: [
        { context: [ '/api' ],
          target: 'http://localhost:8080,
          secure: false,
          changeOrigin: true,
        }]




There is "historyApiFallback: true" for returning the index.html for all routes except /api for SPAs.



Our solution for the local development environment



Fig.4 Improved components in the local development environment.



Fig.4 shows an improved structure for the problem.


Now, the webpack.config.js file has another proxy setting that transfers HTTP requests for /app to the API server.


For example, https://localhost/app/note/995 will go to https://localhost:8080/app/note/995 and return the title and thumbnail image of note 995.


    proxy: [
        { context: [ '/api', '/app' ],
          target: 'http://localhost:8080,
          changeOrigin: true,
        }]


Since the API server does not know about the bundle files, we need to let it know. We add WebpackManifestPlugin that generates a manifest file of bundle files to the webpack.config.js file.


  % yarn add WebpackManifestPlugin




  const { WebpackManifestPlugin } = require('webpack-manifest-plugin');

  ...


  plugins: [
      new WebpackManifestPlugin({
          fileName: 'app-bundle-manifest.json',
          writeToFileEmit: true,
          filter: (file) => {
              return file.isChunk;
          },
          sort: (fileA, fileB) => {
              if (fileA.path.endsWith('.js') && fileB.path.endsWith('.css')) return -1;
              if (fileA.path.endsWith('.css') && fileB.path.endsWith('.js')) return 1;
              if (fileA.path > fileB.path) return -1;
              return 1;
          },
          //seed: {
          //    version: 'local-dev-server-version'
          //}
      }),


The plugin selects bundle files and generates an app-bundle-manifest.json file like this:


      {
        "app-a1d170d4.css": "/app-a1d170d4.d1dd426331d40f5930fe.bundle.css",
        "app-a1d170d4.js": "/app-a1d170d4.d1dd426331d40f5930fe.bundle.js",
        "runtime~app.js": "/runtime~app.d1dd426331d40f5930fe.bundle.js",
        "vendors-91c40cd8.css": "/vendors-91c40cd8.d1dd426331d40f5930fe.bundle.css",
        "vendors-91c40cd8.js": "/vendors-91c40cd8.d1dd426331d40f5930fe.bundle.js",
      ...


The JSON file can be accessed via an HTTP request and does not exist in the file system during local development. When you build the SPA app for production, it is emitted into the /dist directory.


In any case, the API server can fetch it via an HTTP request.



API Server


Here is a template version of the index.html file that has been converted from SPA's index.ejs and is used to generate the HTTP response. It has some {{variables}} that will be replaced by Jinja2 template engine. You may generate index.ejs from this index.html.


<html>
  <head>
    <base href="/app">
    <title>{{title}}</title>
    <description>{{desc}}</description>
    <meta property="og:image" content="{{image_url}}">
    {{bundles}}
  </head>
  <body>
    ...
  </body>
</html>



Here is a API server in Python that uses the Falcon web framework and Jinja2. Please note that there may be errors in the code. You will need to install the imported libraries and handle errors for requests.get().


import falcon
import requests
from jinja2
from markupsafe import Markup
is_local = True
file_server = 'https://localhost' if is_local else 'https://file-server.com'
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader('.'))
class APIServer():
    ''' Generates the HTTP response for reloading. '''
    def on_get_api(self, req, resp):
        resp.body = 'API result'
        resp.content_type = 'text/html; charset=UTF-8'
    def on_get_note(self, req, resp, note_id):
        note = get_note_from_db(note_id)
        resp.body = jinja_env.get_template("index.html").render({
            'title':       note.title,
            'desc':        note.content,
            'image_url':   note.image_url,
            'bundles':     Markup(self.get_script_tags())
        })
        resp.content_type = 'text/html; charset=UTF-8'
    def get_script_tags(self) -> str:  # need caching
        response = requests.get(file_server + '/app-bundle-manifest.json', timeout=5, verify=is_local)
        js = css = ''
        for filename in json.loads(response.text).values():
            if filename.endswith('.js'):
                js += '    <script defer src="' + filename + '"></script>\n'
            if filename.endswith('.css'):
                css += '    <link href="' + filename + '" rel="stylesheet"></link>\n'
        return js + css
api_server = APIServer()
app = falcon.App()
app.add_route('/api, api_server, suffix='api')
app.add_route('/app/note/{note_id}', api_server, suffix='note')
def sink(req, resp):
    resp.body = jinja_env.get_template("index.html").render({
        'title':       'My SPA notes app',
        'desc':        'This is a notes app.,
        'image_url':   'https://file-server.image1.png',
        'bundles':     Markup(api_server.get_script_tags())
    })
    resp.content_type = 'text/html; charset=UTF-8'
app.add_sink(sink, prefix='/app')
if __name__ == '__main__':
    from wsgiref.simple_server import make_server
    with make_server('', 8080, app) as httpd:
        httpd.serve_forever()



APIServer.on_get_api() handles API requests at the /api route. It just returns a text.


APIServer.on_get_note() handles the load and reload of the /app/note/995 route. It loads a note record from the DB and fills the HTTP response with its title and thumbnail image. <script> tags for the bundle files are injected using app-bundle-manifest.json obtained from the file server. You may add on_get handlers for more routes.


APIServer.get_script_tags() returns <script> tags for the bundle files using app-bundle-manifest.json obtained from the file server.


sink() handles all other routes and fills the HTTP response with the default title and thumbnail image.


To test this, reload the /app/note/995 page and check its response in the Network tab of Chrome dev tools.



The production environment


Once we have completed the local version, the production version is easy. The above API server works if some constants are adjusted for the environment. Here is an example of the setting for the load balancer shown Fig.2 that transfers traffic for /app/note/* to the API server. It should work on Google Cloud Platform with slight modification. It has not been verified.


 pathMatchers:
    - name: Note API server
      routeRules:
        - description: Note API server
          matchRules:
            - pathTemplateMatch: '/app/note/{rest=**}'
          service: note-server
          priority: 1
          routeAction:
            urlRewrite:
              pathTemplateRewrite: '/app/note/{rest}'






Further improvement


The generated index response can include the content of the note. While the content is not visible to users on the SPA, it can be visible to search engines. Although it has not been tested, it is expected that some search engines may index the title and the content. This could improve the search engine optimization (SEO) of your page and make it easier for users to find your content through search engines.


Here are an improved index.html and API server:


    <body>
      <div>
        {{content}}
      </div>
    </body>


    def on_get_note(self, req, resp, note_id):
        note = get_note_from_db(note_id)
        resp.body = jinja_env.get_template("index.html").render({
            'title':       note.title,
            'desc':        note.content,
            'image_url':   note.image_url,
            'bundles':     Markup(self.get_script_tags()),
            'content':     note.content
        })
        resp.content_type = 'text/html; charset=UTF-8'



You may need caching app-bundle-manifest.json for production use.


The deployment of the bundle files to the file server and the deployment of the API server may occur separately. When a new version of the SPA app is deployed, bundle files are changed. The server does not know about new bundle files and keeps using stale app-bundle-manifest.json that has bundle filenames that are no longer valid.


So, when designing caching of app-bundle-manifest.json, it has to handle the time gap between deployments. This can be challenging but is possible.


Conclusion


By using a combination of proxy settings, an API server, and a template engine, developers can inject the correct title and thumbnail image for each page within an SPA. That will provide users with a better experience.


While there may be some challenges in implementing this solution, such as caching app-bundle-manifest.json and handling the time gap between deployments, the end result is well worth the effort.





新着